Unpackme I am Famous


Because it is simply obvious that I have not been posting on this blog for a while, here is a post about Safedisc v3.
Last week I was studying this protection in deep, each component under IDA, but I accidentally broke my external hard drive by giving a shot in. I lost a lot of .idb from different games, softwares or malware, my personal toolz, unpackers, ...
So to smile again I decided to write about how to unpack this protection.
For those familiar with safedisc, the only interesting part will be Nanomites, restoring Imports or emulated opcodes is a joke when you know how older versions work.

Extra data

During introduction I talked about different components, they are placed at the end of the file.
The size of the target game is 1 830 912 bytes, but if we look IMAGE_SECTION_HEADER closely :

 Name      VirtSize   VirtAddr   SizeRaw    PtrRaw     Flags      Pointing Directories
 .text     00131148h  00401000h  00132000h  00001000h  60000020h  
 .rdata    0002F497h  00533000h  00030000h  00133000h  40000040h  Debug Data
 .data     012CDCE8h  00563000h  0000D000h  00163000h  C0000040h  
 .rsrc     0003289Eh  01831000h  00033000h  00170000h  40000040h  Resource Table
 stxt774   00002059h  01864000h  00003000h  001A4000h  E0000020h  
 stxt371   00003358h  01867000h  00004000h  001A7000h  E0000020h  Import Table
								  Import Address Table

If we sum the last Real Offset and Real Size of stxt371 section :

>>> 0x1A7000 + 0x4000
>>> hex(0x1A7000 + 0x4000)

1 748 992 bytes != 1 830 912 bytes.
Clearly there is some extra data at the end of the file.
By looking the main executable under IDA, I was able to find an interesting sub that retrieves and extracts those datas.
First, here is the structure used for extra data :

struct extra_data
	DWORD sig_1;
	DWORD sig_2;
	DWORD num_file;
	DWORD offset_1;
	DWORD offset_2;
	DWORD unknow_1;
	DWORD unknow_2;
	BYTE  name[0xD];

sig_1 must always be set to 0xA8726B03 and sig_2 to 0xEF01996C
And after deleting all there (weak?) obfuscation, we can retrieve the following "pseudo code" to extract additionnal data.

	SetFilePointer(hFile, actual_pos, NULL, FILE_BEGIN);
	ReadFile(hFile, buff, 0x121, &bread, 0);
	key = actual_pos;

	for (i = 0; i < bread; i++)
		key = key * 0x13C6A5;
		key += 0x0D8430DED;
		buff[i] ^= (((((key >> 0x10) ^ (key >> 0x8)) ^ (key >> 0x18)) ^ (key & 0xFF)) & 0xFF);
	memcpy(&data, buff, sizeof(struct extra_data));

	actual_pos += data.offset_1 + data.offset_2;

} while (data.sig_1 == 0xA8726B03 && data.sig_2 == 0xEF01996C);

Result :

Name : ~def549.tmp
Num : 1
Name : clcd32.dll
Num : 1100
Name : clcd16.dll
Num : 1100
Name : mcp.dll
Num : 1101
Num : 2
Name : DrvMgt.dll
Num : 2
Name : SecDrv04.VxD
Num : 11
Name : ~e5.0001
Num : 0
Name : PfdRun.pfd
Num : 0
Name : ~df394b.tmp
Num : 0

As you can see we can extract a lot of files, and here is the algorithm to decypher it :

ptr = (BYTE*)GlobalAlloc(GPTR, data.offset_1);
SetFilePointer(hFile, actual_pos - data.offset_1, NULL, FILE_BEGIN);
ReadFile(hFile, ptr, data.offset_1, &bread, NULL);
if (bread != data.offset_1)
	printf("[-] ReadFile() failed\n");
key = 0x8142FEA1;
int init_key;
init_key = 0x8142FEA1;
for (i = 0; i < bread; i++)
	key = init_key ^ 0x7F6D09ED;
	ptr[i] = (((((key >> 0x18) ^ (key >> 0x10)) ^ (key >> 0x8))) & 0xFF) ^ ptr[i];
	ptr[i] ^= key & 0xFF;
	init_key = init_key << 0x8;
	init_key += ptr[i];

Each component will be extracted into %temp% path, they got their own goal, we will not study all of them there is no interest.

I will not discuss more about all this stuff, by loosing all my idb I am bored to reverse (rename sub) again and again with all this shitty C++ stuff, you can find some fun crypto when they decypher pfd file or code section, rijndael modified, different xor operation, anyway let's continue !

Find OEP

This is the easiest part :

stxt371:018670A2                 mov     ebx, offset start
stxt371:018670A7                 xor     ecx, ecx
stxt371:018670A9                 mov     cl, ds:byte_186703D
stxt371:018670AF                 test    ecx, ecx
stxt371:018670B1                 jz      short loc_18670BF
stxt371:018670B3                 mov     eax, offset loc_1867113
stxt371:018670B8                 sub     eax, ebx
stxt371:018670BA                 sub     eax, 5
stxt371:018670BD                 jmp     short loc_18670CD
stxt371:018670BF ; ---------------------------------------------------------------------------
stxt371:018670BF loc_18670BF:                            ; CODE XREF: start+13j
stxt371:018670BF                 push    ecx
stxt371:018670C0                 mov     ecx, offset loc_1867159
stxt371:018670C5                 mov     eax, ecx
stxt371:018670C7                 sub     eax, ebx
stxt371:018670C9                 add     eax, [ecx+1]
stxt371:018670CC                 pop     ecx
stxt371:018670CD loc_18670CD:                            ; CODE XREF: start+1Fj
stxt371:018670CD                 mov     byte ptr [ebx], 0E9h
stxt371:018670D0                 mov     [ebx+1], eax

This code will replace Module Entrypoint by a jump to Real OEP, so if you like using OllyDbg execute first instructions and put a breakpoint on that jump.
But you will encounter a "dead lock" problem, before jumping to real OEP, it decyphers sections, loads dll AND CreateProcess "~e5.0001" giving the pid of the game process as argument.
This process will load ~df394b.tmp aka SecServ.dll, all strings inside this dll are encrypted, we can decrypt all of them :

int decrypt_func_01(char *mem_alloc, char *addr_to_decrypt)
    DWORD count;
    DWORD key;
    char  actual;

    if (mem_alloc && addr_to_decrypt)
        count = 0;
        key = 0x522CFDD0;
        while (1)
            actual = *addr_to_decrypt++;
            actual = actual ^ (char)key;
            *mem_alloc++ = actual;
            key = 0xA065432A - 0x22BC897F * key;
            if (!actual)
            if (count != 127)
            return 0;
        return 1;
        return 0;

Here is the result of all strings decyphered :

Addr = 667A9240 : drvmgt.dll
Addr = 667A9264 : secdrv.sys
Addr = 667A9298 : SecDrv04.VxD
Addr = 667A92BC : ALT_
Addr = 667A9C78 : Kernel32
Addr = 667AA71C : \\.\NTICE
Addr = 667AA73C : \\.\SICE
Addr = 667AA75C : \\.\SIWVID
Addr = 667AAB80 : .text
Addr = 667A9928 : Ntdll.dll
Addr = 667A9948 : Kernel32
Addr = 667AA3F0 : GetVersionExA
Addr = 667AA6BC : ZwQuerySystemInformation
Addr = 667AA6EC : NtQueryInformationProcess
Addr = 667AA780 : IsDebuggerPresent
Addr = 667AAB50 : ZwQuerySystemInformation
Addr = 667AADBC : ExitProcess
Addr = 667A99F8 : DeviceIoControl
Addr = 667A9A40 : CreateFileA
Addr = 667A9A64 : ReadProcessMemory
Addr = 667A9A8C : WriteProcessMemory
Addr = 667A9AB8 : VirtualProtect
Addr = 667A9AE0 : CreateProcessA
Addr = 667A9B08 : CreateProcessW
Addr = 667A9B30 : GetStartupInfoA
Addr = 667A9B58 : GetStartupInfoW
Addr = 667A9B80 : GetSystemTime
Addr = 667A9BA4 : GetSystemTimeAsFileTime
Addr = 667A9BD4 : TerminateProcess
Addr = 667A9BFC : Sleep
Addr = 667AB8C0 : WriteProcessMemory
Addr = 667AB8EC : FlushInstructionCache
Addr = 667AB918 : VirtualProtect
Addr = 667ABB90 : SetThreadContext
Addr = 667ABBB8 : GetThreadContext
Addr = 667ABBE0 : SuspendThread
Addr = 667ABB64 : FlushInstructionCache
Addr = 667ABB38 : WriteProcessMemory
Addr = 667ABC84 : ContinueDebugEvent
Addr = 667ABB0C : DebugActiveProcess
Addr = 667ABAE4 : WaitForDebugEvent
Addr = 667A99F8 : DeviceIoControl
Addr = 667ACF00 : System\CurrentControlSet\Services\VxD
Addr = 667ACF5C : cmapieng.vxd
Addr = 667ACF3C : StaticVxD

The most interesting things are DebugActiveProcess, ContinueDebugEvent, WriteProcessMemory, FlushInstructionCache, SetThreadContext.
As I said earlier this dll will be in charge of debugging the game process, it prevents debugging it with Olly or any Ring3 debugger.
The game process after calling CreateProcess will wait (WaitForSingleObject) signal that temp executable will attach to it and give it signal and continue to debug it, but if you are already debugging game process, WaitForSingleObject will never catch this signal.
All the code below can be found inside ~df394.tmp aka SecServ.dll :

.text:667250C1 loc_667250C1:                           ; CODE XREF: sub_66724FDE+D5j
.text:667250C1                 push    0FFFFFFFFh      ; dwMilliseconds
.text:667250C3                 push    edi             ; hHandle
.text:667250C4                 call    ds:WaitForSingleObject
.text:667250CA                 push    [ebp+hObject]   ; hObject
.text:667250CD                 mov     [ebp+return_value], eax
.text:667250D0                 call    esi ; CloseHandle
.text:667250D2                 push    edi             ; hObject
.text:667250D3                 call    esi ; CloseHandle
.text:667250D5                 cmp     [ebp+return_value], 0 ; WAIT_OBJECT_0
.text:667250D9                 pop     edi
.text:667250DA                 pop     esi
.text:667250DB                 jz      short exit_func
.text:667250DD                 call    ebx ; GetLastError
.text:667250DF                 call    Exit_Process
.text:667250E4 exit_func:                              ; CODE XREF: sub_66724FDE+11j
.text:667250E4                                         ; sub_66724FDE+20j ...
.text:667250E4                 pop     ebx
.text:667250E5                 leave
.text:667250E6                 retn
.text:667250E6 sub_66724FDE    endp

If you want to use OllyDBG, put a breakpoint on WaitForSingleObject call, and modify argument TIMEOUT to something different than INFINITE, and change ZF flag during the test of the return value.


Now the fun stuff can start, if you followed what I said, you can continue to debug game process, but at a moment you will encounter problem like follow :

.text:004519FC                 call    ds:dword_5331B0 ; kernel32.IsBadReadPtr
.text:004519FC ; ---------------------------------------------------------------------------
.text:00451A02                 db 0CCh
.text:00451A03                 db 0CCh ;
.text:00451A04                 db 0CCh ;
.text:00451A05                 db 0CCh ;

What are doing these 0xCC (int 3) aka Trap to Debugger or software breakpoint after a call to a kernel32 API ?
It's a well known technique, instructions are replaced by this opcode and informations about the removed opcode is stored in a table. (Remember pfd file ?)
Then, by using self-debugging, when one of these breakpoints is hit, the debugging process will handle the debug exception, and will look up certain information about the debugging break.

But the problem is, if Nanomites are called several times, it can impact a little the performance, right ? (Not anymore today), but Safedisc decided to count how much time a Nanomite is executed, and if this Nanomite is executed too much time, it will restore the replaced opcodes by writting it inside the debugged process.
So if we want to fix theses Nanomites, we just have to patch a branch instruction that say : "This nanomites has been executed too much time, restore opcode !", and scan txt section of game process to find all the nanomites, call them, and the debugger process will restore all the removed opcode :).

How To

When unpacking (real?) protection you need to write cool toolz, here are all the steps that I did :

Need a diagram ?

I encountered a little problem during those operation, if we create a thread at an addr containing 0xCC followed by nop operation (0x90), Safedisc debugger crashes or emulates shit...
Visual Studio uses 0xCC, 0x90 and 0x00 opcode for padding, don't ask me why they don't just use only 0x00, I don't know.
Just so you know, if you don't provide the full path of these dll while you are injecting it, the first dll must be placed in the folder of the game process, and the second one in %temp% path, because debugger process is extracted and executed here.

You can find the branch instruction inside ~def394.tmp (SecServ.dll) at addr 0x6678F562 :

.txt5:6678F562                 cmp     ax, 1
.txt5:6678F566                 jnz     not_write_process_memory


Just some debug information :

Process id : 894
EventCode : 1
Exception Code : 80000003
Exception Addr : 40170F
[+] GetThreadContext(0xB8, 0x635080); return_addr = 66733C55
lpContext->EIP = 7C91120F
[+] WriteProcessMemory(0x5C, 0x40170F, 0x61F58C, 0x2, 0x61F0F0); return_addr = 6672BA45
85 C0 
[+] SetThreadContext(0xB8, 0x635080); return_addr = 66733C23
lpContext->EIP = 40170F

As you can see at address 0x40170F, an event occured 0x1 -> EXCEPTION_DEBUG_EVENT and his code 0x80000003 (EXCEPTION_BREAKPOINT), so the debugger process replaces the 0xCC 0xCC by 0x85 0xC0 -> "test eax, eax", and try to SetThreadContext but we hooked it to terminate the thread.

Restoring Imports

Like the previous version import points to some virtual address where the code calls routine to find the correct import.
By using algo against itself we can resolve all correct address of imports.
Inside txt section we can find different type of call to imports :

The idea is simple, scan .txt section look for call dword ptr or jmp dword ptr or jmp section Stxt774, hook the function that resolve the api and get the result and save into into a linked list.
This function in question is in ~df394b.tmp :

.txt:6678D644                 call    resolve_api
.txt:6678D649                 pop     ecx
.txt:6678D64A                 pop     ecx

Just replace the pop ecx, by "add esp, X; ret" and get the result into register eax.

BUT ! Sometimes by calling the same virtual_addr but from other location it don't resolve the same API address.

API (0x7E3AC17E) has rdata.0x53327C (txt.0x51A656)  rdata.0x53327C (txt.0x454509)  rdata.0x53327C (txt.0x454149)  rdata.0x533260 (txt.0x453773)  rdata.0x53327C (txt.0x4535BD)
API (0x7E39869D) has rdata.0x53329C (txt.0x51A686)  rdata.0x53329C (txt.0x50B64E)  rdata.0x53329C (txt.0x50B1E4)  rdata.0x53327C (txt.0x4FDC5E)  rdata.0x53329C (txt.0x4FD7CA)  rdata.0x533284 (txt.0x4FD718)  

As you can see the address in rdata 0x53327C, can resolve different API when it is called from different locations (txt address).
To fix it, it's very simple we reorder the linked list according to the api address, and choose one rdata for each call, and we will change value of the call or jmp dword ptr at txt address for each entry of an api.

After reorder

Output after reordering :

API (0x7E3AC17E) has rdata.0x53327C (txt.0x51A656)  rdata.0x53327C (txt.0x454509)  rdata.0x53327C (txt.0x454149)  rdata.0x53327C (txt.0x453773)  rdata.0x53327C (txt.0x4535BD)  
API (0x7E39869D) has rdata.0x53329C (txt.0x51A686)  rdata.0x53329C (txt.0x50B64E)  rdata.0x53329C (txt.0x50B1E4)  rdata.0x53327C (txt.0x4FDC5E)  rdata.0x53329C (txt.0x4FD7CA)  rdata.0x533284 (txt.0x4FD718)

We can now write back into rdata addr the real adress of the api and fix the call or jmp at adress in txt section, to point to the good rdata address.
Now you can look with ImportRec and see that all imports are restored correctly :)

To fix jmp section Stxt774, we just have to replace the jmp by a call dword ptr[rdata], but wait jmp stxt774 is 5 bytes and we need 6 bytes to change it to call dword ptr, don't worry, after resolving the api and ret to it, the api will return at jmp stxt774 + 6, so there is enough place.

And Import Reconstructor is happy (Invalid imports 0) :

Emulated opcodes

After fixing Nanomites and restoring imports, I encounter a last problem.

.text:00404909                 push    ecx
.text:0040490A                 push    eax
.text:0040490B                 call    sub_4013F3
.text:0040490B sub_404909      endp ; sp-analysis failed
.text:004013F3                 mov     eax, 1E1Bh
.text:004013F8                 pop     ecx
.text:004013F9                 lea     eax, [eax+ecx]
.text:004013FC                 mov     eax, [eax]
.text:004013FE                 jmp     eax

This code will just compute an address in txt section, get the value pointed by this address and jump to it. The jump destination is an address from ~df394b.tmp.

The goal of sub 0x6673E090 is simply to check from where it has been called, lookup in a table of emulated opcodes and restore it.
Here only one emulation is performed then it will write original opcode back.
Like for restoring imports, we find each reference to the sub 0x00404909, setup an hook at the end of the sub 0x6673E09, call each reference, and emulated opcodes will be restored automatically :)


Safedisc v3 is really not difficult, you can find the source of all my codes at the end of this post.
I will go back to school project, hopefully graduating this year :)



First dll

Second dll


Binary Auditing Training Package unpackme_03


Today, I was bored so I decided to have fun with http://www.binary-auditing.com/.
And I found a very fun challenge inside unpacking exercices :



Packer starts here :

seg009:00411800 start           proc near
seg009:00411800                 call    $+5
seg009:00411805                 xor     ebp, ebp
seg009:00411807                 pop     ebp
seg009:00411808                 sub     ebp, offset word_401A1E
seg009:0041180E                 xor     ebx, ebx
seg009:00411810                 lea     eax, byte_401A4D[ebp]
seg009:00411816 loc_411816:                             ; CODE XREF: start+1Fj
seg009:00411816                 inc     bl
seg009:00411818                 mov     cl, [eax]
seg009:0041181A                 xor     cl, bl
seg009:0041181C                 cmp     cl, 55h
seg009:0041181F                 jnz     short loc_411816
seg009:00411821                 mov     ecx, 55Ah
seg009:00411826                 lea     esi, byte_401A4D[ebp]
seg009:0041182C                 mov     edi, esi
seg009:0041182E loc_41182E:                             ; CODE XREF: start+32j
seg009:0041182E                 lodsb
seg009:0041182F                 xor     al, bl
seg009:00411831                 stosb
seg009:00411832                 loop    loc_41182E

The first loop is for computing key for XOR operation. ebx will be equal to 0x77.
The second loop will decrypt first stage of the packer with the key stored into ebx.
Next the packer will resolve base address of kernel32.dll by getting the current structured exception handling (SEH) frame into fs:[0] and get an address inside kernel32 after the seh handler, and back from this address into memory for finding 'PE' and 'MZ' signature.

At this point it will have the base address of kernel32.dll
Then it will parse PE header of this dll, get export function name table and search for GlobalAlloc().
It will Alloc some space, and copy different portion of code into it. We will return to the analysis of this code later (some stuff are here for api resolution during main execution).
For not loosing time by analysing all the copy of portion of code, we will setup memory breakpoint on acces on code section and run our debugger.

We land here :

00157BFC    AD                         LODS DWORD PTR DS:[ESI]
00157BFD    35 DEC0ADDE                XOR EAX,DEADC0DE
00157C02    AB                         STOS DWORD PTR ES:[EDI]
00157C03  ^ E2 F7                      LOOPD SHORT 00157BFC
00157C05    C3                         RET

At this point ecx equal to 0x1E00, and raw size of code section equal to 0x7800, so it's actually deciphering all code section with 0xDEADCODE as XOR key.
Disable the memory breakpoint on access, and go to ret, then do the operation again (setup memory breakpoint acces), and we land here :

00157BEB    8DBD BF6F4000   	       LEA EDI,DWORD PTR SS:[EBP+406FBF]
00157BF1    8B85 A71F4000              MOV EAX,DWORD PTR SS:[EBP+401FA7]
00157BF7    8B0F                       MOV ECX,DWORD PTR DS:[EDI]
00157BF9    81C1 00004000              ADD ECX,400000                           ; ASCII "MZP"
00157BFF    C601 E8                    MOV BYTE PTR DS:[ECX],0E8
00157C02    83C1 05                    ADD ECX,5
00157C05    50                         PUSH EAX
00157C06    2BC1                       SUB EAX,ECX
00157C08    8941 FC                    MOV DWORD PTR DS:[ECX-4],EAX
00157C0B    58                         POP EAX
00157C0C    81C7 88000000              ADD EDI,88
00157C12    837F 04 00                 CMP DWORD PTR DS:[EDI+4],0
00157C16  ^ 75 DF                      JNZ SHORT 00157BF7
00157C18    C3                         RET

Do you recognize this operation ?
Opcode 0xE8, add 5 ?, it is making a call.
The destination of the call (eax) go to the first virtual part I talked, we will call this "api address solving".
The packer is making call redirection for each API.
The next memory breakpoint on access will land us here :

004085D4    E8 9706D5FF                CALL 00158C70

A call crafted juste before, let's follow it.

00158C70    9C                         PUSHFD
00158C71    60                         PUSHAD
00158C72    E8 00000000                CALL 00158C77
00158C77    5D                         POP EBP
00158C78    81ED 1C1C4000              SUB EBP,401C1C
00158C7E    8BBD B01C4000              MOV EDI,DWORD PTR SS:[EBP+401CB0]
00158C84    8B7424 24                  MOV ESI,DWORD PTR SS:[ESP+24]
00158C88    83EE 05                    SUB ESI,5
00158C8B    81EE 00004000              SUB ESI,400000                                        ; ASCII "MZP"
00158C91    81EF 88000000              SUB EDI,88
00158C97    81C7 88000000              ADD EDI,88
00158C9D    3B37                       CMP ESI,DWORD PTR DS:[EDI]
00158C9F  ^ 75 F6                      JNZ SHORT 00158C97
00158CA1    68 2680ACC8                PUSH C8AC8026
00158CA6    FFB5 B41C4000              PUSH DWORD PTR SS:[EBP+401CB4]
00158CAC    E8 A4000000                CALL 00158D55
00158CB1    8D4F 48                    LEA ECX,DWORD PTR DS:[EDI+48]
00158CB4    8BD1                       MOV EDX,ECX
00158CB6    E8 3B000000                CALL 00158CF6
00158CBB    51                         PUSH ECX
00158CBC    FFD0                       CALL EAX
00158CBE    93                         XCHG EAX,EBX
00158CBF    68 EEEAC01F                PUSH 1FC0EAEE
00158CC4    FFB5 B41C4000              PUSH DWORD PTR SS:[EBP+401CB4]
00158CCA    E8 86000000                CALL 00158D55
00158CCF    8D4F 08                    LEA ECX,DWORD PTR DS:[EDI+8]
00158CD2    E8 1F000000                CALL 00158CF6
00158CD7    51                         PUSH ECX
00158CD8    53                         PUSH EBX
00158CD9    FFD0                       CALL EAX
00158CDB    8D4F 08                    LEA ECX,DWORD PTR DS:[EDI+8]
00158CDE    E8 13000000                CALL 00158CF6
00158CE3    8D4F 48                    LEA ECX,DWORD PTR DS:[EDI+48]
00158CE6    E8 0B000000                CALL 00158CF6
00158CEB    894424 1C                  MOV DWORD PTR SS:[ESP+1C],EAX
00158CEF    61                         POPAD
00158CF0    9D                         POPFD
00158CF1    83C4 04                    ADD ESP,4
00158CF4    FFE0                       JMP EAX

Maybe with a schem it will be more clear, example of a typical call to an api :

So if you follow me, here is the structur for each api :

struct api
	DWORD 	offset;			/* +0x00 */
	DWORD 	unknow;			/* +0x04 */
	char	api_name[0x40];		/* +0x08 */
	char	dll_name[0x40];		/* +0x48 */

Here is one exemple :

Writing test program :

> cat test.c
int main(void)
  char api_name[] = "\xB8\x9A\x8B\xB2\x90\x9B\x8A\x939A\xB7\x9E\x91\x9B\x93\x9A\xBE\xFF\xFF";
  char dll_name[] = "\xB4\xBA\xAD\xB1\xBA\xB3\xCC\xCD\xD1\xBB\xB3\xB3\xFF\xFF";
  int i;

  for (i = 0; i < strlen(api_name); i++)
    api_name[i] = ~api_name[i];
  for (i = 0; i < strlen(api_name); i++)
    dll_name[i] = ~dll_name[i];
  printf("%s\n", api_name);
  printf("%s\n", dll_name);
> ./test

This entry in the table was for solving call to GetModuleHandleA() from kernel32.dll
So for dumping our program, we will have to reconstruct all those redirections, we will write a dll and inject it into the process.
What the injected code will do ?

But have we got enough for replacing call by jmp dword ptr [idata_section], the answer is yes !, the packer have replace them and left 1 byte between each call.

Opcode for jmp dword ptr [0x42424242] = FF 25 42 42 42 42, size : 6, we got enought place.

Code for the dll :

#include <stdio.h>
#include <Windows.h>

#define LDE_X86 0

#ifdef __cplusplus
extern "C"
int __stdcall LDE(void* address , DWORD type);

void	fix_call(void)
	DWORD imagebase;
	DWORD imageend;
	FILE *fp = NULL;
	BYTE	*ptr;
	int val;
	DWORD addr_call;
	DWORD nb_api = 0;

	fp = fopen("debug_msg.txt", "w");
	if (!fp)
		MessageBoxA(NULL, "fopen failed :(", "failed", 0);
	imagebase = (DWORD)GetModuleHandle(NULL);
	idh = (IMAGE_DOS_HEADER *)imagebase;
	inh = (IMAGE_NT_HEADERS *)((BYTE*)imagebase + idh->e_lfanew);
	imageend = imagebase + inh->OptionalHeader.SizeOfImage;
	fprintf(fp, "Image Base : %08X\nImage End : %08X\n", imagebase, imageend);
	ish_import = (IMAGE_SECTION_HEADER*)((BYTE*)inh + sizeof(IMAGE_NT_HEADERS) + sizeof(IMAGE_SECTION_HEADER) * 4);
	fprintf(fp, "Ish Import : %08X\n", imagebase + ish_import->VirtualAddress);
	for (i = 0; i < ish->Misc.VirtualSize; i++)
		ptr = (BYTE*)(imagebase + ish->VirtualAddress + i);
		/* Look for a call */
		if (*(ptr) == 0xE8)
			val = *(ptr + 4) << 0x18 | *(ptr + 3) << 0x10 | *(ptr + 2) << 0x8  | *(ptr + 1);
			val += imagebase + ish->VirtualAddress + i + 5;
			/* Is destination inside code section ? */
			if (val > imagebase + ish->VirtualAddress && val < imagebase + ish->VirtualAddress + ish->Misc.VirtualSize)
				ptr = (BYTE*)val;
				addr_call = val;
				/* Look for a call */
				if (*(ptr) == 0xE8)
					val = *(ptr + 4) << 0x18 | *(ptr + 3) << 0x10 | *(ptr + 2) << 0x8  | *(ptr + 1);
					val += (int)ptr + 5;
					/* Is destination is not into code section ? */
					if (val < imagebase + ish->VirtualAddress || val > imagebase + ish->VirtualAddress + ish->Misc.VirtualSize)
						fprintf(fp, "Call Redirect found at %08X to %08X, ", imagebase + ish->VirtualAddress + i, val);
						ptr = (BYTE*)val;
						/* Change JMP EAX to JMP EBX */
						val = imagebase + ish->VirtualAddress + i;
						*(ptr + 0x85) = 0xE3;
							mov ebx, end_api
							mov eax, val
							call eax
							add esp, 8
							mov val, eax
						fprintf(fp, "Addr Api : %08X\n", val);
						/* Put Api Addr into idata section */
						ptr = (BYTE*)(imagebase + ish_import->VirtualAddress + 0x58 + nb_api * 8);
						memcpy(ptr, &val, 4);
						val = (imagebase + ish_import->VirtualAddress + 0x58 + nb_api * 8);
						/* Replace call by jmp dword ptr [idata_section] */
						ptr = (BYTE*)addr_call;
						*ptr = 0xFF;
						*(ptr + 1) = 0x25;
						memcpy(ptr + 2, &val, 4);

BOOL (__stdcall *Resume_GetVersionExA)(LPOSVERSIONINFO lpVersionInfo) = NULL;

BOOL __stdcall Hook_GetVersionExA(LPOSVERSIONINFO lpVersionInfo)
	DWORD	return_addr;

		mov eax, [ebp + 4]
		mov return_address, eax
	if (return_addr == 0x00405ABF)
		MessageBoxA(NULL, "ready ?", "Ready ?", 0);
		__asm jmp $
	return Resume_GetVersionExA(lpVersionInfo);

void	setup_hook(char *module, char *name_export, void *Hook_func, void *trampo)
	DWORD	OldProtect;
	DWORD	len;

	Proc = GetProcAddress(GetModuleHandleA(module), name_export);
	if (!Proc)
		MessageBoxA(NULL, name_export, module, 0);
	len = 0;
	while (len < 5)
		len += LDE((BYTE*)Proc + len , LDE_X86);
	memcpy(trampo, Proc, len);
	*(BYTE *)((BYTE*)trampo + len) = 0xE9;
	*(DWORD *)((BYTE*)trampo + len + 1) = (BYTE*)Proc - (BYTE*)trampo - 5;
	VirtualProtect(Proc, len, PAGE_EXECUTE_READWRITE, &OldProtect);
	*(BYTE*)Proc = 0xE9;
	*(DWORD*)((char*)Proc + 1) = (BYTE*)Hook_func - (BYTE*)Proc - 5;
	VirtualProtect(Proc, len, OldProtect, &OldProtect);

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved)
	Resume_GetVersionExA = (BOOL(__stdcall *)(LPOSVERSIONINFO))VirtualAlloc(0, 0x1000, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
	memset(Resume_GetVersionExA, 0x90, 0x1000);
	setup_hook("kernel32.dll", "GetVersionExA", &Hook_GetVersionExA, Resume_GetVersionExA);
	return (1);
My code use the Length Disassembly Engine from BeatriX.
Now we code a little injector in masm for changing :
.model flat,stdcall
option casemap:none

include \masm32\include\windows.inc 
include \masm32\include\kernel32.inc 
includelib \masm32\lib\kernel32.lib


Kernel			db		"kernel32.dll", 0
LLib			db		"LoadLibraryA", 0
exe_name		db 		"03_unpackme.exe", 0
dll_name		db		"inject_unpackme03.dll", 0
Addr_name		dd		0

	invoke GetStartupInfo, addr SInfo 
	invoke CreateProcess, addr exe_name, NULL, NULL, NULL, FALSE,
						CREATE_SUSPENDED, NULL, NULL, addr SInfo, addr PInfo
	invoke VirtualAllocEx, PInfo.hProcess, 0, 100h, MEM_COMMIT, PAGE_READWRITE
	mov [Addr_name], eax
	invoke WriteProcessMemory, PInfo.hProcess, Addr_name, addr dll_name, LENGTHOF dll_name, NULL
	invoke GetModuleHandleA, addr Kernel
	invoke GetProcAddress, eax, addr LLib
	invoke CreateRemoteThread, PInfo.hProcess, NULL, 0, eax, [Addr_name], 0, NULL
	invoke WaitForSingleObject, eax, INFINITE
	invoke ResumeThread, PInfo.hThread
	invoke ExitProcess, 0
end start

So as you can see, I used a little trick for waiting unpacking of all executables :
I setup an hook on GetVersionExA() and if the call occurs from one interesting address (near OEP), I call fix "fix_call" function and enter in infinite loop.
With this infinite loop we can attach Olly to our process and watch the result :

It's cool, but wait i forgot to talk about one thing, finding real OEP !
Restart OllyDBG, let the loop xor all the first stage, and setup breakpoint on :
0041196F    E8 C2010000     CALL 03_unpac.00411B36
Step into and add breakpoint on :
00411B61    FF95 4B1D4000   CALL DWORD PTR SS:[EBP+401D4B]
You should land here (Addr can change due to allocated memory) :
00157C44  ^\FFA5 BB1F4000   JMP DWORD PTR SS:[EBP+401FBB]
Trace the code until you got something like that :
00153BC4    9D              POPFD
00153BC5    61              POPAD
00153BC6    5A              POP EDX
00153BC7    58              POP EAX
00153BC8    E8 D35F0000     CALL 00159BA0
If we step into, we will find something very interesting :
00159BA0    60              PUSHAD
00159BA1    E8 00000000     CALL 00159BA6
00159BA6    5D              POP EBP
00159BA7    81ED E21B4000   SUB EBP,401BE2
00159BAD    8B7424 20       MOV ESI,DWORD PTR SS:[ESP+20]
00159BB1    83EE 05         SUB ESI,5
00159BB4    8B9D 111C4000   MOV EBX,DWORD PTR SS:[EBP+401C11]
00159BBA    83EB 28         SUB EBX,28
00159BBD    83C3 28         ADD EBX,28
00159BC0    3973 10         CMP DWORD PTR DS:[EBX+10],ESI
00159BC3  ^ 75 F8           JNZ SHORT 00159BBD
00159BC5    8B73 08         MOV ESI,DWORD PTR DS:[EBX+8]
00159BC8    89B5 0C1C4000   MOV DWORD PTR SS:[EBP+401C0C],ESI
00159BCE    61              POPAD
00159BD4    C3              RET
This not api resolution, but call resolution !
This sub is quite simple, like api resolution it will check into a table the offset of the call and replace 0xDEADCODE by the addr of the (stolen ?) call.
I think (it's not sure) the packer has stolen some call from the virgin file and reconstruct them with a push addr ret.

Let's put a conditional log on ret address ( Expression = "[esp]" ).
We run the program and exit him and watch the log.
00159BD4   COND: 004085D4
00159BD4   COND: 00402128
00159BD4   COND: 004014C4
00159BD4   COND: 0040212C
00159BD4   COND: 00402D44
00159BD4   COND: 004085D4
00159BD4   COND: 00403D20
00159BD4   COND: 00403FA4
00159BD4   COND: 00403394
00159BD4   COND: 004033A4
00159BD4   COND: 00402F2C
00159BD4   COND: 004085B6
00159BD4   COND: 004085AA
00159BD4   COND: 00405BD4
00159BD4   COND: 00405C18
00159BD4   COND: 004066AC
00159BD4   COND: 004066B4
00159BD4   COND: 00406978
00159BD4   COND: 004085D4
7E390000   Module C:\WINDOWS\system32\USER32.DLL
00159BD4   COND: 00405B84
77EF0000   Module C:\WINDOWS\system32\GDI32.dll
           Process terminated, exit code 0
If you remember at the begining of the article, the second breakpoint on access on code section land us to the first entry of your log, is it OEP ?
I don't think so, it's a call to GetModuleHandleA(), ... strange, ... strange.
If you look closely, there is another thing strange, before the log of the loading module "USER32.dll", we can see a call to 0x004085D4, but this call is just a redirection to GetModuleHandleA, so what's happen between ?
We will restart our debugger and put a breakpoint on the ret of the call redirection function and wait until it go to the last 0x004085D4.

We trace the code, call "api address solving", we put a breakpoint on the JMP EAX, trace into GetModuleHandleA(), and execute till return.
We are back into virtual memory code, and trace it until get :
00154119    9D              POPFD
0015411A    61              POPAD
0015411B    5A              POP EDX
0015411C    58              POP EAX
0015411D    FF56 18         CALL DWORD PTR DS:[ESI+18]
[ESI + 18] will be equal to 0x00401000, is it the real OEP ?
00401000    55              PUSH EBP
00401001    8BEC            MOV EBP,ESP
00401003    6A 00           PUSH 0
00401005    68 20104000     PUSH 03_unpac.00401020
0040100A    6A 00           PUSH 0
0040100C    68 E7030000     PUSH 3E7
00401011    8B45 08         MOV EAX,DWORD PTR SS:[EBP+8]
00401014    50              PUSH EAX
00401015    E8 56760000     CALL 03_unpac.00408670
0040101A    33C0            XOR EAX,EAX
0040101C    5D              POP EBP
0040101D    C2 1000         RET 10
CALL 03_unpac.00408670 will go to resolve api, and call DialogBoxParamA().
But wait we store first argument into eax, so this function need an argument.
If we look msdn documentation first parameter of DialogBoxParamA() is a handle to the module whose executable file contains the dialog box template.
So the parameter of this function should be the result of GetModuleHandleA(NULL) (this will be first stolen fix).
A second problem is when we will return from DialogBoxParamA, and return from sub_00401000 we should ret to a fonction wich call ExitProcess().
Launch the injector, and attach olly to the process, and search reference to kernel32.ExitProcess, and we found this sub :
0040669C    55              PUSH EBP
0040669D    8BEC            MOV EBP,ESP
0040669F    8B45 08         MOV EAX,DWORD PTR SS:[EBP+8]
004066A2    50              PUSH EAX
004066A3    E8 F01E0000     CALL 03_unpac.00408598                   ; JMP to kernel32.ExitProcess
004066A8    5D              POP EBP
004066A9    C3              RET

Now we just have to find some place for putting the stolen bytes, just after all the jmp dword ptr [idata_section] it's cool.

Now we can dump our process with our favorite toolz dumper, and fix iat with ImportRec and putting the new OEP.

We test the dump file and it works :].


As deroko said, this unpackme is really not difficult, but I enjoyed solving it.

Pages : 1