Debugging - WinDBG(X) Automation & Scripting - Part 1

This article has 276 views

Table of Contents

Introduction

Welcome back!

I am a big fan of automation. In a way, automation is humanity’s most polite way of admitting:

“I understand this problem deeply enough to never want to think about it again.”

In my previous article on debugging, we've covered the fundamentals of using WinDBG and WinDBGX.

That gave us the baseline needed to actually use the debugger.

Today, we take it a step further.

In addition to manually driving the debugger, we’re going to explore how to make it work for us — through automation, instrumentation, and scripting.

In order not keep the content digestible I'm going to split the topic in 2 parts.

In part 1, we're going to look at a first set of possibilities:

  • WinDBG(X) 'first break' automation using the -c startup flag
  • Events & exceptions
  • Event-driven automation & instrumentation (breakpoints that execute commands)
  • Built-in scripting & automation (.if/.for/.foreach/aliases, command programs / script files)
  • Expression evaluators (MASM & C++)
  • PyKD (Python automation)

In a second part, we'll look at a few other topics, including:

  • Data Model, NatVis, JsProvider
  • WinDBG Extensions

I'm using a fully up-to-date Windows 11 machine x64, and unless specified otherwise, all techniques discussed in this article will work on both WinDBG Classic and WinDBGX.
Please check out the post on WinDBG Fundamentals for information on how to install WinDBG Classic & WinDBGX.

First things first. Let's see what we can do with WinDBG's -c startup flag.

windbg(x).exe -c

Our journey begins right when windbg starts.

WinDBG usually gets control at an initial break before the target application really runs. Or, when attaching to an already running process, it will pause after attaching itself.
In any case, when you connect a debugger to a process, it initially ends up in a stopped state, due to a "break" event.
This is a normal part of the debugger lifecycle.

WinDBG Session initialization

WinDBG's -c command-line argument is the first and simplest step into automation. It allows you to execute debugger commands automatically when WinDBG hits that initial break.

As explained, by default this is when the Debugger runs and either launches a process, or gets attached to a process. In both cases, the debugger will end up paused, and that's the moment when the commands specified after -c get executed.

This allows you to deterministically prepare and configure your debugging environment right from the start. For example, you can:

  • load extensions
  • define aliases
  • set breakpoints
  • execute command files or scripts

In short, -c lets you bring the debugger into a known, ready-to-use state without manual interaction.

If you want the debugger to continue execution after running these commands, you can simply end your command sequence with ;g.

Example:

windbg.exe -c ".load pykd;g"

This will load the PyKD extension and immediately resume execution after the initial break.

WinDBG break-oriented automation

A powerful variation is to combine -c with the -g flag:

-c "windbg cmd1;windbg cmd2" : sets up the command(s) you would like to run.
-g tells WinDBG to ignore initial breakpoint exception and continue execution. It'll look like WinDBG 'skips' the initial break.

That changes the behavior significantly. Instead of stopping at startup, the application runs immediately. When the debugger does break later, it's usually for a meaningful reason - such as an access violation, memory corruption, a second-chance exception, etc.

This combination allows you to prepare the debugger up front and then automatically run analysis when a real break occurs during exection.

In practice, this means:

  • use -c to configure the debugger and define what should happen on break
  • use -g to let the application run until something interesting/meaningful happens

I typically combine this with the -xd sov and -xi eh startup flags:

  • SOV: Stack Overflow
  • EH: C++ Exceptions

These flags modify how the debugger reacts to specific exceptions (first-chance vs second-chance), effectively reducing noise from expected exceptions. We'll talk more about events & exceptions in the next chapter.

Applications sometimes throw exceptions (SOV = Stack Overflow, EH = C++ EH exceptions), that may be handled perfectly fine by the OS/application.
In other words, although the exception causes the debugger to break, these are often expected or handled exceptions, not indicative of a real issues.
If, after "ignoring" those exceptions, something breaks after all, you're still going to see it.

With break-oriented automation, the commands you execute are no longer about setup—they’re about analysis. Example actions may include:

  • running !exploitable to classify the crash
  • dumping the call stack
  • inspecting registers
  • examining heap/stack state
  • reviewing exception records
  • optionally quit the debugging session

If you combine this with logging (e.g. using -logo path/to/logfile), you effectively automate crash triage. Every meaningful break results in a structured diagnostic output written to a log file, ready for later analysis.

Finally, if you're fully automating and scripting the end-to-end execution of an application (for instance during a fuzzing initiative), you also may want to use the following 2 extra startup flags to streamline everything and avoid any user interaction.

  • -Q (uppercase Q) : WinDBG Classic only : This avoids that WinDBG would prompt to save the Workspace when WinDBG tries to close
  • -G (uppercase G) : Exit WinDBG when the process that is being debugged terminates.

WinDBGX -c and Startup Settings

The same -c startup flag is available in WinDBGX as well.
Additionally, WinDBGX has a "Startup" setting, allowing you to run commands each time you start a debugging session.
Open WinDBGX, click "Settings" and open "Debugging settings". Scroll down to the bottom, you'll find the "Startup" section:

windbgx-settings-startup

If you have specified commands with -c as well, you'll notice that those will be executed first (before the 'Startup commands').

Events & Exceptions

Definitions

In the context of WinDBG, events and exceptions are both notifications from the debuggee (the process being debugged) to the debugger, but they represent different categories of situations and are handled differently.

An event is any noteworthy occurrence during the execution of a process that the debugger is informed about. This includes things like process creation and exit, thread creation and exit, module (DLL) load and unload, and breakpoint hits. Events are part of the normal lifecycle and behavior of a program. They are expected, structured, and generally not indicative of something going wrong.

An exception is a specific type of event that indicates an abnormal condition or disruption in normal execution flow.
Exceptions are typically generated by the CPU or the operating system when something unusual happens, such as accessing invalid memory (access violation), executing an illegal instruction, dividing by zero, or triggering a breakpoint instruction.
Exceptions may or may not be handled by the application itself. If they are not handled, they can lead to program termination.

So the key difference is that all exceptions are events, but not all events are exceptions. Events describe what is happening, while exceptions describe something going wrong or out of the ordinary in execution.

Where do breakpoints belong?

As mentioned above, a “breakpoint” can show up in two different ways, and they map to different underlying mechanisms.

When I listed breakpoint hits under events, I was referring to debugger-managed breakpoints. These are breakpoints you explicitly set with commands like bp, bu, or ba. When one of those triggers, the debugger gets a debug event (specifically an EXCEPTION_DEBUG_EVENT internally, but treated as a controlled/debugger event). From your perspective, this is a normal, expected event that you asked for.

When I mentioned the breakpoint instruction under exceptions, I was referring to the CPU instruction int 3 (opcode 0xCC). When the CPU executes this instruction, it raises a breakpoint exception (STATUS_BREAKPOINT). This is a real exception generated by the processor, just like an access violation or divide-by-zero.

Of course, and that's perhaps what makes this a bit confusing: when setting breakpoints, the debugger will use the int 3 instruction to do so...

So the distinction is this:

A debugger breakpoint (bp/bu/ba) is something the debugger sets up. It may be implemented by temporarily patching the code with an int 3 instruction, but conceptually it is a controlled debugger event. You asked for it, and WinDBG treats it as part of normal debugging flow.

A breakpoint exception (int 3) is something the program executes. It may come from:

  • actual code containing int 3 (for example, anti-debugging tricks or debug builds)
  • a breakpoint inserted by the debugger
  • other mechanisms like DebugBreak()

From the OS point of view, both cases generate an exception (STATUS_BREAKPOINT). The difference is intent and control:

  • If the debugger inserted it, WinDBG knows it is “its own” breakpoint and handles it as a breakpoint event.
  • If the program triggers it itself, it is just another exception, and WinDBG treats it according to your exception settings (sxe, sxd, sxi).

This is why you can do things like:

sxd bp

and suddenly your manually set breakpoints appear to be “ignored” or behave differently, because under the hood they are still breakpoint exceptions.

So there is no real contradiction, just two layers:

  • Low level (OS/CPU): everything is an exception, including breakpoints (STATUS_BREAKPOINT)
  • Debugger abstraction (WinDBG): some of those exceptions are elevated to “events you asked for” (your breakpoints), while others remain “exceptions you need to decide how to handle”

That distinction is important for automation, because you can choose whether to hook the high-level debugger behavior (breakpoints you set) or the low-level exception mechanism (all breakpoint exceptions, regardless of origin).

Handling events & exceptions

WinDBG allows you to control how it reacts to both events and exceptions. This is where automation becomes powerful. Instead of manually responding to each situation, you can configure the debugger to break, ignore, log, or execute commands automatically when specific events or exceptions occur.

For exceptions, WinDBG uses the concept of first chance and second chance.
A first chance exception is the initial notification that an exception has occurred. The application is given a chance to handle it.
A second chance exception occurs if the application does not handle the exception, and at that point the debugger typically breaks because the program is about to crash.

You can configure how WinDBG responds using commands like sxe (break on exception), sxd (ignore exception), and sxi (break on second chance only).

In WinDBG, you can access the settings via "Debug" - "Event Filters". (You'll need to be connected to a process to access the options)

windbg-eventfilters

For each event/exception, you can define how WinDBG needs to respond and if you'd like to execute commands when something happens.
The GUI isn't great, and the events & exceptions are all grouped together.

WinDBGX has an improved GUI, accessible via "Settings" - "Events & exceptions"

windbgx-events

While Events & Exceptions are now listed separately, the GUI no longer seems to offer an easy way to link a command to a certain event or exception.

Of course, we can control behavior from the command line, which is probably want you're after if you're trying to automate the automation anyway 🙂

These are various commands to manage how events & exceptions are handled by WinDBG.
(You'll need to combine them with an event or exception type, I'll list those in a table later in this post)

Command Meaning Explanation
sxe Break on event/exception (first chance) Configures WinDBG to break immediately when the specified event or exception occurs (first chance).
sxd Ignore event/exception Configures WinDBG to ignore the specified event or exception and continue execution without breaking.
sxi Break on second chance only Configures WinDBG to ignore first chance occurrences and only break if the exception is unhandled (second chance).
sxn Notify only Displays a message when the event or exception occurs, but does not break execution.
sxe -c "cmd" Break and execute command Executes the specified debugger command when the event/exception occurs, then breaks.
sxd -c "cmd" Ignore and execute command Executes the specified command but continues execution without breaking.
sxi -c "cmd" Second chance + command Executes the command when the exception reaches second chance and then breaks.
sxn -c "cmd" Notify + command Executes the command and prints notification without breaking execution.
sx List current settings Displays the current configuration for all event and exception handling rules.

For example, if you want to break immediately on access violations, you can use:

sxe av

If you want to ignore first chance access violations and only break if they are unhandled, you can use:

sxi av

We can attach commands to events so that when they occur, WinDBG executes predefined debugger commands automatically.
This allows you to build event-driven automation.

For example, suppose you want to log register state every time an access violation occurs. You could do something like:

sxe -c ".printf \"Access violation at %p\\n\", @$ip; r" av

Now, whenever an access violation happens, WinDBG will automatically print the instruction pointer and dump the registers.

Events such as module loads can also be hooked. For example, to run commands whenever a DLL is loaded, you can use:

sxe -c ".printf \"Loaded module\\n\"; lm" ld

This tells WinDBG to execute the command string whenever a load DLL event occurs.

In practice, this means you can “steer” the debugger.
You can decide which situations matter, which ones should be ignored, and what actions should be taken automatically.
Instead of passively observing execution, you turn the debugger into an active instrument that reacts to the behavior of the target process in real time.

Common event & exceptions codes

Events
Code Meaning Explanation
ld Load module Triggered when a module (DLL or EXE) is loaded into the process.
ud Unload module Triggered when a module is unloaded from the process.
ct Create thread Occurs when a new thread is created in the debuggee.
et Exit thread Occurs when a thread exits.
cp Create process Triggered when a process is created or attached to the debugger.
ep Exit process Occurs when the debugged process terminates.
out Debug output Triggered when the application calls OutputDebugString().
Exceptions
Code Meaning Explanation
av Access violation Triggered when the process attempts to read, write, or execute invalid memory. Most relevant exception for exploit development.
bp Breakpoint Raised when a breakpoint instruction (int 3) is executed or when a debugger breakpoint is hit.
gp Guard page violation Occurs when accessing a guard page. Often used by the OS for stack growth or heap protection mechanisms.
ss Single step Generated when the CPU trap flag is set or when hardware breakpoints trigger.
ibp Initial breakpoint The breakpoint hit automatically when a process starts under the debugger.
eh Exception handled Indicates that an exception was handled by the application.
ud Illegal instruction Raised when the CPU encounters an invalid or unsupported instruction.
dz Divide by zero Occurs when an integer division by zero is attempted.
ov Overflow Integer overflow exception. Rarely used in practice.
so Stack overflow Raised when the thread stack exceeds its allocated limits.
dm Data misalignment Occurs when accessing unaligned memory. Mostly irrelevant on x86/x64.
ip In-page error Memory access failed due to paging or I/O issues.
ii Invalid instruction Variant or alias for illegal instruction handling depending on context.
hc Heap corruption Raised by the Windows heap manager when corruption is detected.
cc Control-C Debugger interrupt initiated manually by the user.

Event-driven automation & instrumentation

In the previous post, I have introduced the mechanics of using a breakpoint to execute WinDBG Commands.
As explained at that time, Event-driven debugging becomes powerful when you stop using a breakpoint to pause, and start using it to observe, annotate, classify, log, and steer execution.

In previous chapter, we had a closer look at the system of events & exceptions, and we learned how to link WinDBG to certain events & exceptions.
I would like to take the opportunity to dive a little deeper into the use of breakpoints, for various purposes.

Telemetry

You could use breakpoints to gather telemetry, statistics and dynamic insights on execution, for instance:

  • log every call to a specified API with selected arguments
  • count how often a code path executes
  • record call-site addresses to build a frequency map
  • observe allocator behavior by logging heap alloc/free routines
  • document code paths by systematically logging CALLs and their arguments
  • record the saved return pointer when CALLs are made to certain APIs, it tells you where those calls are made from
  • using the f command, fill/overwrite the contents of a heap allocation so you can see where it's being used, or if it's being initialized or not. (Imagine not having access to page heap and wanting to find uninitialized memory access bugs)

Conditional context harvesting

Taking it one step further, the use of conditions could help reduce noise and be more specific, for example

  • log only allocations within a certain range
  • log when one of the arguments is a pointer to your input
  • log when the contents of a to-be-freed heap contains a specific fill pattern

Debugger instrumentation

Theory

Debugger instrumentation is about making the debugger react to events triggered from within the application itself. Instead of passively observing execution, you actively use the application’s behavior to control what the debugger does—and when.

A simple and effective technique is to use breakpoints as control points.

You deliberately trigger a known function or code path in the application, and place a breakpoint on it in the debugger. When that breakpoint is hit, it doesn’t just pause execution—it performs actions inside the debugger, such as enabling or disabling other breakpoints.

This gives you precise control over when certain things happen.

For example:

  • start logging heap dynamics from a specific point in the application
  • stop the logging again when all information has been gathered

Using scripting to drive instrumentation

While not strictly required, having access to a scripting language (JavaScript in a browser, scripting inside a PDF, etc.) makes this approach much easier to implement. Scripting gives you control over timing and usually makes it easier to define the control breakpoints we're going to trigger.

You can:

  • execute a specific function at an exact moment
  • trigger a predictable code path
  • reliably hit a breakpoint tied to that function

That breakpoint then acts as a bridge between the application and the debugger:

  • the script triggers the breakpoint
  • the breakpoint triggers debugger actions

In effect, you are instrumenting the debugger from inside the application.

Implementing debugger instrumentation: 2 sets of breakpoints

The approach typically relies on two distinct sets of breakpoints.

Set 1: Trigger breakpoints

These are hit as a direct result of application behavior:

  • normal code flow, and/or
  • explicit function calls (e.g. from a scripting engine)

Their role is simple: act as triggers.
We typically need a trigger to enable, and a trigger to disable. Additional scenarios might involve passing a string as an argument in the scripting language, and picking it up/printing it in the debugger session. Finally, we could also make the debugger simply break.

Set 2: Action Breakpoints

These breakpoints perform the actual work, for example:

  • logging
  • memory inspection
  • analysis commands

These are usually disabled, and have predictable breakpoint IDs.

Workflow

This is how it works:

  1. A Set 1 breakpoint is triggered (e.g. via a scripted function call)
  2. That breakpoint executes commands in the debugger
  3. Those commands enable or disable one or more Set 2 breakpoints
  4. Execution continues

You may use:

  • one trigger to enable instrumentation
  • another trigger to disable it again
  • additional triggers as needed (break the debugger, pick up a string and print it on the screen, etc)

This gives you fine-grained, runtime control with minimal overhead.

The main challenge is to find the triggers, to identify the "Set 1" breakpoints.

They should meet two key criteria:

  1. Controlled: They should not trigger unless you explicitly invoke them
  2. Reachable: They must be easy to trigger when needed.

Step by step:

  • Identify a callable function in the scripting language or application, ideally something that has no side effects on your exploit
  • Locate the corresponding native function in the debugger (with symbols, we may be able to search for name or keywords. without symbols, we may have to trace execution or identify the functions based on behavior
  • Set a breakpoint on that function.
  • Attach commands to that breakpoint. (enable, disable, print, stop, etc)
  • Trigger the function from the scripting engine when needed

With symbols (and if the symbols expose some reasonably fair naming conventions,) this may be relatively straightforward.
You could do searches, looking for certain keywords, and set breakpoints directly. Using a use-case, we can see which ones get hit when you run the function statement in your scripting language.

Without symbols, you'd have to either trace what happens when you execute a certain function call in the scripting language, or you could try to "find" the function based on what it does.

Let's look at both scenarios.

Finding application functions through symbols

Some popular historical implementations in applications that have a scripting environment, were/are based on the use of Math functions.
Calling a cos(), sin(), tan() function usually plays no active role in triggering a vulnerability. Their impact on heap layouts may be limited as well (you still have to check!!).
Of course, we still need to find their position in the application binaries. That may be relatively easy if the application has symbols and if the symbols (naming conventions) make sense.

Let's take Microsoft Edge as an example. Let's attach WinDBGX to the msedge.exe process that corresponds with a browser tab.

I could now consider doing some searches. Let's say I'd like to find the math.cos() function in one of the Edge binaries.
(I'm aware, in the example below, that I am assuming that the module I need contains the word "edge". In reality, if you're not sure, you may have to perform a search in ALL loaded modules and simple put breakpoints on everything. For instance: x *!*math*cos*)

Anyway, in order to save some time (and to avoid the download of the symbols for all DLLs in your process), I'll begin the search by looking at module names that contain the word edge. I may be right, I may be wrong. We'll see.

0:018> x *edge*!*math*cos*
00000226`c727a880 msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathAcosh (void)
00000226`c727a6e0 msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathAcos (void)
00000226`d055668e msedge!libm::math::k_cos::k_cos (void)
00000226`cc6173d0 msedge!std::__Cr::__math::cos (void)
00000226`c30d7d10 msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathCos (void)
00000226`c727b0a0 msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathCosh (void)
00000226`c1e13240 msedge!Builtins_MathAcos (Builtins_MathAcos)
00000226`d05559f1 msedge!RNvNtNtCsgdwlvZkgXt4_4libm4math3cos3cos (_RNvNtNtCsgdwlvZkgXt4_4libm4math3cos3cos)
00000226`c1e13380 msedge!Builtins_MathAcosh (Builtins_MathAcosh)
00000226`d0555e37 msedge!RNvNtNtCsgdwlvZkgXt4_4libm4math4acos4acos (_RNvNtNtCsgdwlvZkgXt4_4libm4math4acos4acos)
00000226`c1e13dc0 msedge!Builtins_MathCos (Builtins_MathCos)
00000226`c1e13f40 msedge!Builtins_MathCosh (Builtins_MathCosh)

That looks promising. We could very easily set mass-breakpoints and turn this entire list into a simple logging mechanism. We'll do that in a moment.
The idea is to have the application open a use case, which triggers the Math function that I'm trying to find, and to attach WinDBG to the right process, so we can activate the breakpoints in that process. Sounds logical, but requires a bit of attention with applications like modern browsers.

Let's begin by making the use case, which is just a small html file with a bit of javascript.

Create a file test.html, for example inside folder c:\tmp

<html>
<script>
Math.cos(0);
</script>
</html>

I usually run a small python webserver in the folder that contains the html file.
Open a command prompt, go to the folder where you placed the html file and run this python oneliner

If you're using python2:

python -m SimpleHTTPServer 8080

If you're using python3:

python3 -m http.server 8080

or if you want to invoke a specific Python(3) version, installed via Python Install Manager:

py -3.9-64 -m http.server 8080

Open a new instance of Microsoft Edge and in one of the tabs, enter http://127.0.0.1:8080.
You should see the contents of the folder where your use case html file is located. Don't click or open it yet.

Now open Task Manager, select "Processes" on the left, look at the "Apps" and open the section for "Microsoft Edge"

taskmgr1

Find the line that corresponds with the Tab that is accessing http://127.0.0.1:8080.
Right-click on that line, select "Go to details"

taskmgr2

That should give you the pid of that TAB

Alternatively, you can also look for msedge.exe processes that are marked as "renderer".

Sometimes however you'll see more than one, even with just one tab open.

The following powershell one-liner will at least list the msedge.exe processes that have a reference to "renderer":

powershell -command "Get-CimInstance Win32_Process -Filter \"Name='msedge.exe'\" | ? { $_.CommandLine -match '--type=renderer' } | select ProcessId,CommandLine"

Anyway, I'll assume you know how to get the PID of the tab.

Now launch WinDBGX and attach it to that pid:

windbgx -p PID

You can now set the mass breakpoints:

bm *edge*!*math*cos* ".printf \"%y called\\n\", @$ip;g"
0: 00000226`c727a880 @!"msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathAcosh"
1: 00000226`c727a6e0 @!"msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathAcos"
2: 00000226`d055668e @!"msedge!libm::math::k_cos::k_cos"
3: 00000226`cc6173d0 @!"msedge!std::__Cr::__math::cos"
4: 00000226`c30d7d10 @!"msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathCos"
5: 00000226`c727b0a0 @!"msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathCosh"
6: 00000226`c1e13240 @!"msedge!Builtins_MathAcos"
7: 00000226`d05559f1 @!"msedge!RNvNtNtCsgdwlvZkgXt4_4libm4math3cos3cos"
8: 00000226`c1e13380 @!"msedge!Builtins_MathAcosh"
9: 00000226`d0555e37 @!"msedge!RNvNtNtCsgdwlvZkgXt4_4libm4math4acos4acos"
10: 00000226`c1e13dc0 @!"msedge!Builtins_MathCos"
11: 00000226`c1e13f40 @!"msedge!Builtins_MathCosh"

The %y format specifier will perform a symbol lookup and print the address as well as the symbol name, so you get to see what gets called.
I like to use $ip as opposed to eip or rip. $ip is a pseudo-register that is architecture-aware.

Now let the process run:

0:000> g

Go back to the browser, use the already open tab and click on the use-case html file.
Providing that you're attached to the right process, you should now see:

msedge!Builtins_MathCos (00000233`00e13dc0) called

Cool! You can now set a breakpoint just at msedge!Builtins_MathCos and it will get hit when the Math.cos(0); gets executed.
That gives you a lot of control.

Quick note before we proceed. Always double-check the full module name.
In Microsoft Edge, there may be a msedge.dll as well as a msedge.exe file loaded in the process. The output from the x and the bm commands above are not showing the file extension.

In fact, in this case, the Builtins_MathCos function is inside msedge.dll, not msedge.exe:

0:053> lm a msedge!Builtins_MathCos
Browse full module list
start             end                 module name
000001c4`d83a0000 000001c4`eb180000   msedge     (pdb symbols)          C:\ProgramData\Dbg\sym\msedge.dll.pdb\6640F030371CFBB74C4C44205044422E1\msedge.dll.pdb

0:053> !address msedge!Builtins_MathCos

Usage:                  Image
Base Address:           000001c4`d83a1000
End Address:            000001c4`e7905000
Region Size:            00000000`0f564000 ( 245.391 MB)
State:                  00001000          MEM_COMMIT
Protect:                00000020          PAGE_EXECUTE_READ
Type:                   01000000          MEM_IMAGE
Allocation Base:        000001c4`d83a0000
Allocation Protect:     00000080          PAGE_EXECUTE_WRITECOPY
Image Path:             C:\Program Files (x86)\Microsoft\Edge\Application\146.0.3856.62\msedge.dll
Module Name:            msedge
Loaded Image Name:      C:\Program Files (x86)\Microsoft\Edge\Application\146.0.3856.62\msedge.dll
Mapped Image Name:      
More info:              lmv m msedge
More info:              !lmi msedge
More info:              ln 0x1c4d91b3dc0
More info:              !dh 0x1c4d83a0000

Content source: 1 (target), length: e751240

On my system, the Builtins_MathCos function sits at offset 00e13dc0 from the start of msedge.dll:


0:053> ? msedge!Builtins_MathCos - msedge
Evaluate expression: 14761408 = 
00000000`00e13dc0

(Please take a moment to calculate the correct offset on your machine, we'll need it later on)

We have found a first trigger. Of course, you can keep searching for others (sin(), tan(), etc). I have provided a list of common Math functions later in this chapter.

Finding Math functions without symbols

Without symbols, it's a bit more challenging. After all, the Math functions you're trying to us in a particular application may be based on some sort of custom implementations.
Trying to find or identify the corresponding function in memory by looking for a specific byte-sequence that would acts as some sort of "signature" may not be an option.

Often tasked with this challenge, and especially in applications that don't have symbols, I decided to build a Frida script called corelan_trigscan.py that attempts to find functions that contain are math-heavy (functions that contain a certain density of instructions that might possibly indicate some sort of cos, sin, tan function), and prints out WinDBG compatible breakpoint statements so we can see which one(s) are used.

Of course, it's all going to be based on heuristics, density and variables that make the script hit or miss. That said, I have been quite successful in finding certain math-related functions in binaries that had zero symbols.

Let's see if the script would find the Builtins_MathCos function in msedge.dll, and possibly/ideally other Math functions as well.

Installing Python3 and Frida Python Bindings

First of all, we'll need to install Python3, and Frida Python bindings to make the script work.
(I found it a bit easier to use Python to create the Frida script that gets injected, and then process the output, than to try to do everything in frida directly)

This step requires installing Python3. If you are using Immunity Debugger or WinDBG Classic with a working mona.py installation, then your default python version is probably Python 2.7.x 32bit. If you change the default Python version to Python3 (by putting it's folder in the PATH environment variable before the Python2 folder), mona.py and other similar scripts will stop working.

There are 2 main ways to install Python3: through the good-ol'-trusted standalone installer, or by using the Python Install Manager.
I am currently working on making mona.py run with Python3. The most recent version of PyKD is compatible with Python 3.9. When we get to the chapter on PyKD, I'll explain how to install that version specifically. For now, and for the sake of running Friday, you can just take the most recent version of Python if you'd like. At the time of writing, it's Python 3.14.3.

In any case, if you do care about running mona.py, then please do NOT install Python through the Python Install Manager. Remove the Python Install Manager if you already have it, and install the required Python versions via the standalone installers.

Installing Python3 - Standalone installer

Download the package from the Python website and run a default installation. Again, pick whatever recent version you'd like.
Leave "Install Launcher for all users" enabled
Do NOT check the "Add Python 3.x to PATH" option. Leave it unchecked.
(We're going to use the Python Launcher anyway.)

By default, the Python version will be installed as a folder inside your %LOCALAPPDATA%\Programs\Python folder.
After installing Python3, open an admin command prompt and check if the installation was successful.
The Python Launcher should have installed the py.exe binary inside the c:\windows folder. It should show up as the first one when you run where py:

C:\>where py
C:\Windows\py.exe

The Python launcher (installed through the Python3 standalone installer) should be able to find the Python version(s) that you have installed on your system.
I have 3 versions on mine, the output may be different on yours:

py --list
Installed Pythons found by py Launcher for Windows
 -3.9-64 *
 -3.9-32
 -2.7-32

Consequently winget should only show those versions. If you ever want to get a recent version of PyKD to work, you'll need to remove any python versions and Python Launcher that were installed via the MS Store.

C:\>winget list python
Name                   Id                Version   Available Source
-------------------------------------------------------------------
Python Launcher        Python.Launcher   < 3.9.8   3.13.5    winget
Python 2.7.18          Python.Python.2   2.7.18150           winget
Python 3.9.13 (32-bit) Python.Python.3.9 3.9.13              winget
Python 3.9.13 (64-bit) Python.Python.3.9 3.9.13              winget

Good.

Test if the versions work:

C:\>py -3.9-32
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:24:45) [MSC v.1929 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>quit()

and

C:\>py -3.9-64
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

Let's update pip for both Python 3.9.13 versions:

Run

py -3.9-32 -m pip install --upgrade pip

and

py -3.9-64 -m pip install --upgrade pip

We're going to install the Frida Tools and Frida Python Bindings.
Let's say we plan on using Python 3.9.13 64bit, so we'll run this command:

C:\>py -3.9-64 -m pip install frida-tools
Collecting frida-tools
  Downloading frida_tools-14.8.0.tar.gz (4.7 MB)
...

It's a good idea to check for updates to frida-tools from time to time:

C:\>py -3.9-64 -m pip install frida-tools --upgrade
Running corelan_trigscan.py

I'll obviously have to run the Python version that has the frida bindings installed, so I'll be running py -3.9-64

Let's run the script.

Students of recent Corelan Heap classes may already have seen a previous version of this corelan_trigscan.py script inside their HeapMgmt / Scripts folder.
In preparation for this blogpost, I have updated it quite a bit.
Today, I'm happy to share the latest version of this (previously private) script with the world.

You can download a copy of the script from the Github repository that complements this series of blogposts on debugging.

Check out the "debugging" folder, and then look inside the "scripts", "frida" folder.

The script takes a number of arguments:

-h / --help         Show help message
-p / --process      Process name or PID (e.g. MyApp.exe or 1234)
-m / --module       Module name to scan (e.g. MyApp.exe or a DLL).
                    If omitted, the process main image module will be used.
--min-density       Minimum relevant-insn density (default: 0.6)
--min-relevant      Minimum number of relevant instructions (default: 40)
--min-total         Minimum total instructions in function+helpers (default: 40)
--min-trig          Minimum trig-count (trig column) (default: 0)
--min-ical          Minimum indirect call count (default: 0)
--max-helper-insns  Max instructions to scan in each helper block (default: 256)
--max-func-insns    Max instructions to scan in main function (default: 4096)
--limit-bp          Maximum number of printed/emitted breakpoints (default: 1000)
--check-offset      Optional sanity-check offset within module (hex or dec).
--splitsize         Number of breakpoints per numbered .bps file (default: 500)

In human language, this is what the arguments mean:

--min-density : How 'math' heavy the function needs to be. It takes the number of relevant math instructions and divides it by the total number of instructions
--min-relevant : Minimum number of math-like instructions a function must contain
--min-total : Minimum total number of instructions the function must have
--min-trig : Minimum number of "strong trig signals". (fsin, fcos, sqrtsd, psrlq, pinsrw, etc)
--min-ical : When lowering the density and relevant arguments (i.e. include functions with a low number of math heavy functions, it would be logical that these small functions call other functions. The code from Direct calls are included as "part of the function". Indirect calls aren't (no way to follow them). Setting min-ical to 1 in combination with lower density & relevance arguments will help reduce a lot of noise.
--max-helper-insns : How many instructions the script is allowed to inspect in a directly called helper block
--max-func-insns : How many instructions the script is allowed to inspect in the main function
--limit-bp : How many breakpoints get printed and written to the log. It does not affect scanning or analysis, it only limits the number of breakpoint statements
--splitsize : Breakpoints are written in chunks of 'splitsize' lines to numbered .bps file. 500 is default

the arguments that matter the most are

-p processName or pid
-m modulename.dll/exe
--min-density 0.5
--min-relevant 40
--min-total 40

You can tweak density, relevance and total trig values, but the values above are a good starting point for applications where the Math functions are very obvious.
In my experience, in browsers, we often have to lower the density & relevance values quite a bit (which means we'll get a much longer list of breakpoints to work with).
In other words, you could definitely run the script with values as low as --min-density 0.1 --min-relevant 5 --min-total 20 --limit-bp * and simply activate all breakpoints in WinDBG. WinDBG might start to heat up a little though, and you may need to take some time off to process the results 😉

The challenge is always: how can I reduce the results and make them more meaningful without removing the actual function I'm looking for. What are criteria that may allow the script to determine if a function is a good candidate or not.

You can initially add --min-trig 1 to further reduce the volume of results. Ideally, you're trying to find results with a trig value larger than 0, and you may have to lower the density and relevance variables to increase the scope of what may be considered an interesting function. This will limit the candidates to functions that are very obviously Math heavy. That doesn't mean it will include the ones you need, but it's a good starting point.

When the script runs, it will create a folder in the current working folder, that has the name of the module you're scraping. It will then write analysis into a corelan_trigscan.log file, and all WinDBG compatible breakpoint statements to corelan_trigscan.bps. Finally, it will break the list of breakpoints into chunks of 500, and make individual numbered files, making it easier for you to process them in smaller batches.

In order to organize the log & bps files, I have decided to create a folder "logs" on the C: drive of my machine.
I'll be running the frida script from a command prompt, in that folder.

When the script has completed, you will be able to consult the log file for detailed info on density, relevance, totals and trig for each of the routines found. As explained, high density (of math instructions) and trig > 0 is a really good indicator. (Trig = presence of very specific instructions)

Of course, ultimately, the breakpoints is what we need. You can copy them from the file and paste them in the debugger, or you can just tell WinDBG to read the corelan_trigscan.bps file or the individually numbered (smaller) files using a WinDBG command that loads and runs a Command File.

(I'm assuming here that you're searching msedge.dll)

$$><C:\logs\msedge.dll\corelan_trigscan.bps

(of course, you'll need to provide the full path to the bps file on your system)
We'll talk about WinDBG Command Files in more details later.

With the breakpoints in place, you can make the application run a use-case that triggers the use of Math function(s), so you can see if one of the breakpoints gets hit, and if so, which one it is.

Let's try it on Microsoft Edge.

We're already aware of the fact that there is a msedge!Builtins_MathCos function, so ideally the script should be able to find it as well.

Let's start with some standard density & relevance values and we'll see what happens.
Of course, success heavily depends on the actual implementation of the Math functionality. In the case of msedge!Builtins_MathCos, the function may be a wrapper around other code, so maybe the script doesn't even recognise it as Math heavy. We'll see what happens.

If needed, you may have to run Edge with the --no-sandbox flag.

(Make a copy of the existing shortcut to Microsoft Edge, and edit the Target: field.
edge-nosandbox

Open the "No Sandbox" version of Edge.

Just like other modern browsers, when Edge runs, you'll get many msedge.exe processes.

In order to do the analysis of a dll, it's not that important what process we're going to attach to.
We just need a process that has the module we want to scrape and investigate, in this case msedge.dll

In fact, unless you provide a PID, the corelan_trigscan.py script will iterate over all processes that have the provided name, look for the first one that has the module we want to analyse and it will perform the search in the process that meets those requirements.

Let's see. Let's try with some average values and see what happens.
Do not attach a debugger to any of the msedge.exe processes at this time.

From an administrator prompt:

C:\logs>py -3.9-64 g:\blogposts\debugging\scripts\frida\corelan_trigscan.py -p msedge.exe -m msedge.dll --min-density 0.5 --min-relevant 50 --min-total 20 --min-trig 1
[+] Configuration:
    Process (name)      : msedge.exe
    Module              : msedge.dll
    Min density         : 0.5
    Min relevant        : 50
    Min total           : 20
    Min trig (min-trig) : 1
    Min icall           : 0
    Max helper insns    : 256
    Max function insns  : 4096
    Limit breakpoints   : 1000
    Split size          : 500
    Check offset        : (none)

[+] Found 8 process(es) named 'msedge.exe'. Looking for one with module 'msedge.dll' loaded...
[+] Trying PID 9344 (msedge.exe)...
[+] Successfully attached to PID 9344 (msedge.exe).
[+] Attached to process 'msedge.exe' via PID 9344 (arch=x64).
[+] Scanning module 'msedge.dll' for FP/SSE/AVX-heavy routines...
[AGENT] Module msedge.dll size=0x12de0000 arch=x64
[AGENT] x64 prolog scan modes: fp64, shadow64
[AGENT] Scanning range msedge.dll+0x00001000 - msedge.dll+0x0f565000 (approx 0.00%)
[AGENT] Progress 0.28% — analyzing function @ msedge.dll+0x000ae257 [fp64]
[AGENT] FOUND msedge.dll+0x000b1aa8 dens=0.874 rel=146 total=167 trig=2 icalls=0 mode=fp64 (candidates=1)
...
[+] Found 58 candidate functions. 


[+] Output directory         : 'msedge.dll'
[+] Full analysis written to : 'msedge.dll\corelan_trigscan.log'
[+] Breakpoints written to   : 'msedge.dll\corelan_trigscan.bps'
[+] Wrote 1 numbered breakpoint file(s):
    first: msedge.dll\corelan_trigscan_0001.bps
    last : msedge.dll\corelan_trigscan_0001.bps
[+] Done.

Open the msedge.dll folder and look for the log and bps files

  • corelan_trigscan.log : log file with statistics/results for each candidate found, sorted by trig / density / relevance / icalls, descending
  • corelan_trigscan.bps : this contains all breakpoints
  • corelan_trigscan_XXXX.bps files : up to 500 breakpoints per file. If there are too many breakpoints, you have the ability to activate max 500 breakpoints at a time

Of course, the corelan_trigscan.py script will remove all bps files with every run.

Our log file looks like this:

corelan_trigscan log
=====================
Timestamp          : 2026-03-21T08:52:11
Process            : msedge.exe
Module             : msedge.dll
Arch               : x64
Min density        : 0.5
Min relevant       : 50
Min total          : 20
Min trig (min-trig): 1
Min icall          : 0
Max helper insns   : 256
Max function insns : 4096
Breakpoint limit   : 1000
Split size         : 500
Output directory   : msedge.dll
Breakpoint file    : msedge.dll\corelan_trigscan.bps
Total candidates   : 58
Sort order         : trig desc, density desc, relevant desc, indirect_calls desc

# Candidate list (see sort order above)
# idx  location                  density   relevant   total   trig   icalls   prolog
   1  msedge.dll+0x051553e0       0.821       3363    4096     42        0   shadow64
   2  msedge.dll+0x05154990       0.795       3258    4096     42        0   shadow64
   3  msedge.dll+0x05156600       0.881       3610    4096     39        0   shadow64
   4  msedge.dll+0x05155f70       0.868       3555    4096     39        0   shadow64
   5  msedge.dll+0x051d9380       0.851       2733    3211     27        1   shadow64
   6  msedge.dll+0x051d8d50       0.833       2978    3576     27        1   shadow64
   7  msedge.dll+0x051d8bb0       0.825       3040    3684     27        1   shadow64
...

Open a new MS Edge browser process, launch WinDBGX, attach it to the right pid (the one that corresponds with the tab),
paste in all the breakpoints (or tell WinDBG to load your bps file with the $< command),
and then open the use-case that tries to call one or more Math functions.

For instance:

<html>
<script>
Math.cos(0);
Math.sin(0);
Math.tan(0);
alert("done");
</script>
</html>

The corelan_trigscan.py parameters specified resulted in 58 breakpoints. I activated them, ran the usecase html file... but I got no results. None of the 58 breakpoints got hit by any of the 3 Math functions that I used.

That means I have to lower the values, which will hopefully get me more candidate functions, and thus more breakpoints. In theory this poses no problem for WinDBG... but it's not exactly great from a performance perspective. The python script will run longer (more stuff to investigate, but that's ok). But when you're going to set a large volume of breakpoints in WinDBG, you'll see it slow down significantly.
Of course, you could also set breakpoints in smaller batches. That's what the chunked breakpoint files are for.

Anyway, before we do that, allow me to introduce another feature in the script that may help.

As explained before, we are already aware of the presence of the Builtins_MathCos function. As calculated earlier, on my machine it sits at offset 00e13dc0 from the start of msedge.dll

The python script has a --check-offset argument, which takes an offset. That offset will be added to the base address of the module you've specified with the -m argument, and that position will be considered a function that you'd like to examine.

When the script is finished doing the full analysis of the module (and it may or may not have found the function you need), it will look at that function specifically, do the analysis and give you the corresponding density, relevance, and other statistics.
In other words, if the script was not able to find that function by itself, you should be able to tell why it was excluded and where you possibly need to lower certain criteria to find this (and other similar) functions.

Let's do a second run, adding the --check-offset variable. We're just trying to make the script provide us with statistics about that function specifically, so in order to make the script finish faster I'll even increase the variables.

Close Edge. Open a new instance. Again, make sure there are no debuggers attached.

C:\logsgt;py -3.9-64 g:\blogposts\debugging\scripts\frida\corelan_trigscan.py -p msedge.exe -m msedge.dll --min-density 0.8 --min-relevant 100 --min-total 100 --min-trig 1 --check-offset 00e13dc0

The output shows the analysis of the function at offset 0x00e13dc0:

[+] Direct analysis of requested offset:
    location        : msedge.dll+0x00e13dc0
    module base     : 0x25780000000
    module end      : 0x25792de0000
    module size     : 0x12de0000
    absolute addr   : 0x25780e13dc0
    in range        : True
    first insn      : 0x25780e13dc0  push rbp
    second insn     : 0x25780e13dc1  mov rbp, rsp
    prolog match    : True
    prolog mode     : fp64
    density         : 0.160
    relevant        : 8
    total           : 50
    trig            : 0
    indirect calls  : 1

It looks like it has a rather low density (0.16) and relevance (8), and it has an indirect call. The combination of these 3 elements are a possible indicator that our Builtins_MathCos is a wrapper around the actual function, and/or uses other function(s) to do the actual Math. That other function may or may not be in the list already. Maybe we should lower the criteria even further.
When corelan_trigscan.py runs, it's able to following direct calls and it considers the code in the child functions to be part of the parent function. But with indirect calls, it's difficult to determine where the call will go.

Functions with low density, low relevance, and without indirect calls are - very likely - not going to be that interesting. So if we lower the density & relevance parameters, we could initially try to focus on the ones with an indirect call.

For instance, we could lower the density to 0.15 and relevance to 8. We'll also have to increase the --limit-bp argument as well, because by default it will only produce breakpoints for the first 1000 functions. You could set it to 0 or * to make it consider ALL the breakpoints.

Be careful with filtering on indirect calls. You'll essentially skip over legit Math functions that don't use indirect calls. It's a useful technique if the Math functions are indeed wrappers that use indirect calls. Based on the analysis of the Builtins_MathCos function, let's set --min-ical 1 as well.

(Of course, in reality you may not have an example such as Builtins_MathCos yet. I usually don't filter out the indirect calls unless I have to start reducing relevance and density parameters a lot. I'll share my personal step by step workflow later.

Let's try this:

C:\logs>py -3.9-64 g:\blogposts\debugging\scripts\frida\corelan_trigscan.py -p msedge.exe -m msedge.dll --min-density 0.15 --min-relevant 8 --min-total 45 --check-offset 00e13dc0 --limit-bp * --min-ical 1

I have asked the script to perform the analysis of the function again, and I can see this promising information:

[+] Sanity check for offset msedge.dll+0x00e13dc0: FOUND among candidates.
    density=0.160, relevant=8, total=50, trig=0, indirect_calls=1, prolog=fp64
[+] Direct analysis of requested offset:
    location        : msedge.dll+0x00e13dc0
    module base     : 0x25780000000
    module end      : 0x25792de0000
    module size     : 0x12de0000
    absolute addr   : 0x25780e13dc0
    in range        : True
    first insn      : 0x25780e13dc0  push rbp
    second insn     : 0x25780e13dc1  mov rbp, rsp
    prolog match    : True
    prolog mode     : fp64
    density         : 0.160
    relevant        : 8
    total           : 50
    trig            : 0
    indirect calls  : 1

[+] Output directory         : 'msedge.dll'
[+] Full analysis written to : 'msedge.dll\corelan_trigscan.log'
[+] Breakpoints written to   : 'msedge.dll\corelan_trigscan.bps'
[+] Wrote 9 numbered breakpoint file(s):
    first: msedge.dll\corelan_trigscan_0001.bps
    last : msedge.dll\corelan_trigscan_0009.bps
[+] Function at offset written to  : 'msedge.dll\corelan_trigscan_0009.bps'
[+] Done.

The function at the provided offset (Builtins_MathCos) was found and labeled as a viable candidate. Perhaps it means that the script was able to find other Math functions as well. We'll try to find out.

This time, the script gave me 4265 candidate functions.
Activating all of them at the same time in WinDBG might take a little while, and WinDBG won't exactly run super smooth.

Additionally, as a side note, with that many breakpoints, there is the obvious risk that the application will use some of these functions just by itself. That's inevitable, but that's ok. From a timing perspective, we can most likely see the difference between breakpoints that are just getting hit, versus the ones that get hit by our code.

As explained earlier, the script will not just write all breakpoints into corelan_trigscan.bps, but it will also create numbered 'chunked' files, with a more "manageable" amount of up to 500 breakpoints per file. In this case, I have 9 individual files. I'd have to run the use case 9 times, that's absolutely doable and manageable.
Note that the breakpoint for the function with the provided offset was written to file number 0009. It's not uncommon to find similar functions relatively close to each other in a binary, so perhaps it's an idea to start processing that file first and work our way up to number 0001 backwards.

Routine:

  • Make WinDBG load the contents of a single numbered file (I feel like beginning at number 9),
  • run the use-case,
  • pause the debugging session,
  • clear all breakpoints,
  • load the next numbered file (number 8).
  • and so on

For example:

$$><C:\logs\msedge.dll\corelan_trigscan_0009.bps
g

Open the use case

If breakpoints get hit, document them

Pause the debugger

Clear the breakpoints and load the next file:

bc *
$$><C:\logs\msedge.dll\corelan_trigscan_0008.bps
g

and so on...

Patience is a virtue.

Sometimes you'll notice that you have to close the debugger and open a new session. Fortunately the chunked files are here to avoid that you have to do everything all over again.

Sometimes you'll see certain breakpoints getting hit over and over again, forcing you to intervene (pause the debugger session, remove that breakpoint, and continue doing the analysis).
When enabling a large list of breakpoints, we obviously don't know what the ID is going to be. If you need to disable or remove a specific breakpoint, you'll have to run bl first to get all breakpoints, find (in the long list) the one you want to disable/delete, and then disable/delete it.

That's why the breakpoint statements provided by corelan_trigscan.py are not only numbered (you'll see the ID when the breakpoint gets hit). The printf statements in the breakpoints will use DML markup language to show a clickable link on the screen. That way, when a certain breakpoint gets hit over and over again, you can simply pause the debugger, click on the [disable] link and simply continue running the session.

The ID numbers assigned by the script start at 1000, so if you wish to set some other breakpoints as well, keep their ID below 1000.

Anyway, let's go back to our use case. I decided to load the breakpoint from file 0009 first.
I let the process run in the debugger (g), and opened the usecase test.html, which calls 3 Math functions.
This is the result:

0:016> g
----- msedge.dll+0x00e13dc0 bp5263 hit dens=0.160 rel=8 tot=50 trig=0 icalls=1 prolog=fp64 ----- [disable]
----- msedge.dll+0x00e14d40 bp5264 hit dens=0.160 rel=8 tot=50 trig=0 icalls=1 prolog=fp64 ----- [disable]
----- msedge.dll+0x01b1cfd2 bp5058 hit dens=0.163 rel=53 tot=325 trig=0 icalls=1 prolog=shadow64 ----- [disable]

windbgx-edge-bphit1

I can now investigate if there is a link between a Math statement and one of the the breakpoints that got hit.
After doing a bit of testing, I got these results:

Math.cos() = msedge.dll+0x00e13dc0 = msedge!Builtins_MathCos (000001bd`11153dc0)
Math.sin() = msedge.dll+0x00e14d40 = msedge!Builtins_MathSin (000001bd`11154d40)

Continue working through the other files as well, take your time.

After processing file 0007, I was able to find the third one as well:

Math.tan() = msedge.dll+0x00e15200 = msedge!Builtins_MathTan (000001bd`11155200)

Mission accomplished!

You can now set breakpoints at those functions and make them do other WinDBG things, for instance activate the logging of heap allocations, etc.

My typical workflow

In conclusion of this chapter, this is the approach I usually implement when I don't have symbols

Start with relatively high values:

--min-density 0.5
--min-relevant 30
--min-total 50
--min-trig 1

Next run, reduce them a little

--min-density 0.35
--min-relevant 20
--min-total 30

(Pay attention to how many candidates you get. If it's more than 1000, you'll have to set the --limit-bp argument.

If that doesn't get you the functions you're looking for, I usually drop the numbers, but filter on icals:

--min-density 0.15
--min-relevant 8
--min-total 20
--min-ical 1
--limit-bp *

With regards to building use-cases. It's obviously very important to provide syntactically correct code, so you'll have to figure out what Math statements exist in the language that you're exploring.
Most of these applications have some sort of Developer tools or Scripting console that allows you to just type commands and execute them. That might make it easier to pinpoint which one exactly triggers a certain breakpoint.

Overall, this is a starting point for javascript engines in browsers:

Math.abs(1)
Math.ceil(1.2)
Math.floor(1.8)
Math.round(1.5)
Math.trunc(1.8)

Math.min(1, 2)
Math.max(1, 2)

Math.sqrt(4)
Math.pow(2, 3)
Math.exp(1)
Math.log(10)
Math.log10(10)
Math.log2(8)

Math.sin(1)
Math.cos(1)
Math.tan(1)

Math.asin(0.5)
Math.acos(0.5)
Math.atan(1)
Math.atan2(1, 1)

Math.cbrt(8)
Math.hypot(3, 4)

Math.random()

Math.PI
Math.E

A somewhat safer subset for PDF readers etc, might look like this:

Math.abs(1)
Math.ceil(1.2)
Math.floor(1.8)
Math.round(1.5)

Math.min(1, 2)
Math.max(1, 2)

Math.sqrt(4)
Math.pow(2, 3)
Math.exp(1)
Math.log(10)

Math.sin(1)
Math.cos(1)
Math.tan(1)

Math.asin(0.5)
Math.acos(0.5)
Math.atan(1)
Math.atan2(1, 1)

Math.random()

Math.PI
Math.E

Good luck!

Built-in automation

WinDBG has had a bit of automation for a long time, even before the modern Data Model and the JavaScript API became a thing. We've played with action breakpoints before, which is already a form of automation. We're attaching commands to an event.

A step up from that, is adding control flow logic. The options are bit limited though.
We have .if to take decisions and .foreach to perform iterations.

Although not strictly "automation", I'd like to mention that we can create aliases to make our code a bit more readable.

In modern WinDBG versions, the dx command provides access to the Data Model. We'll talk about that in part 2 of the Automation & Scripting series.

Conditions (.if)

.if is WinDBGs conditional control-flow token. Conceptually, it behaves like if in C. It evaluates an expression, and if that expression is met it executes the control block. You have the ability to use .else and .elsif as well to make the decision process more complete.

Basic syntax:

.if (Condition) { Commands }
.if (Condition) { Commands } .else { Commands }
.if (Condition) { Commands } .elsif (Condition) { Commands }
.if (Condition) { Commands } .elsif (Condition) { Commands } .else { Commands }

You can specify multiple commands (seperated by semi-colon ;). Commands (even if it's just one) have to be placed inside the braces.

I mostly use .if statements in breakpoints. That said, I only do it when I'm sure about the condition. Allow me to clarify what I mean. Sometimes conditions are based on assumptions. Like the size of something. If you filter out information based on an assumption, you may end up (partially) blind. In my humble opinion, it may be better to log and document everything, and do grep-style filtering on the output.
After all, you can just write the output of WinDBGs command window to a file with the .logopen path/to/logfile statement.

But if you're 100% sure about the condition, then an .if statement may be what you need.

Iterations

If .if gives you decision power, then .foreach and .for will give you even more the feeling of programming in WinDBG.
Don't get too excited though. It misses a lot of options and flexibility.

In general:
.foreach gives you control over data, and
.for gives you control over execution.

Let's begin with .foreach

.foreach

In short, .foreach:

  • runs a command (or reads a string/file),
  • tokenizes the output
  • executes a command block once per token

Basic syntax looks like this:

.foreach (var { command-producing-output }) { commands-using-var }
.foreach /s (var "string") { ... }
.foreach /f (var "file.txt") { ... }

The .foreach command is pretty useful if you already have a list or a command that produces a list, and if you want to pipe that list (the elements in that list) to another command.
Because it tokenizes everything, the foreach loops can get a bit messy very easily, especially if the output contains more text than what you need, and certainly if that kind of breaks "predictability" of where the "tokens" will appear that you want to iterate over.

I'll explain:

Let's say you want to do something with the list of loaded modules.
lm provides that list. You'll get start address, end address and module name, an indication if you have symbols, and if so, the path to the symbol file. (the latter is optional, it depends on whether you have symbols or not).

For example:

0:014> lm
start             end                 module name
000001ff`81000000 000001ff`93de0000   msedge     (pdb symbols)          C:\ProgramData\Dbg\sym\msedge.dll.pdb\6640F030371CFBB74C4C44205044422E1\msedge.dll.pdb
000001ff`97510000 000001ff`975e7000   OLEAUT32   (deferred)             
00007ff6`ed5b0000 00007ff6`edaaa000   msedge_exe   (deferred)             
00007ffd`f42a0000 00007ffd`f4711000   ffmpeg     (deferred)             
00007ffe`0b120000 00007ffe`0b5e6000   msedge_elf   (deferred)             
00007ffe`34160000 00007ffe`343a2000   dbghelp    (deferred)             
00007ffe`36d10000 00007ffe`36d45000   WINMM      (deferred)             
00007ffe`36d50000 00007ffe`36d5b000   VERSION    (deferred)             
00007ffe`38610000 00007ffe`38877000   dwrite     (deferred)             
00007ffe`3be30000 00007ffe`3be3a000   DPAPI      (deferred)             
00007ffe`3c240000 00007ffe`3c267000   win32u     (deferred)             
00007ffe`3c3e0000 00007ffe`3c52b000   ucrtbase   (deferred)        

Again, when we look at the output, as humans, we can see 5 columns:

  • Start
  • End
  • Module name
  • Symbols
  • Symbol path, which may be empty

If there would be a way to tell foreach to do something with the third column, then we'd get what we want.
If we'd had awk in WinDBG, it would be as simple as doing something like this:

awk 'NR>1 {print $3}'

But that's not how it works. In fact, there are a few important limitations.

  • It's purely text based
  • It really depends on clean output formatting
  • It doesn't have the notion of columns
  • It doesn't have sed or awk like mechanics

When foreach parses output, it basically flattens every string on an individual line.
Let me show what you that looks like with the output of the lm command:

0:014> .foreach (x { lm }) { .echo ${x} }
start
end
module
name
000001ff`81000000
000001ff`93de0000
msedge
(pdb
symbols)
C:\ProgramData\Dbg\sym\msedge.dll.pdb\6640F030371CFBB74C4C44205044422E1\msedge.dll.pdb
000001ff`97510000
000001ff`975e7000
OLEAUT32
(deferred)
00007ff6`ed5b0000
00007ff6`edaaa000
msedge_exe
(deferred)
00007ffd`f42a0000
00007ffd`f4711000
ffmpeg
(deferred)

Empty "columns" are skipped, which makes it even messier to handle.
The lm command has a 1m option, which makes it return just the module names. But that's not a generic solution.

If you're working with a clean list of items, for instance pointers or symbol names, then .foreach can be a very powerful tool.
But as soon as things get a bit more sophisticated, you may have to look at other scripting capabilities (Data Model, PyKD, Extensions, etc).

That said. What if we install awk on our Windows machine and use .shell?
Let's see if that works.

First of all, let's get ourselves a working version of awk.

There are a few ways to do so. Git Bash includes some of these tools, and we can very easily get a copy of Git Bash throught winget.
Open an admin prompt and type the following command:

winget install Git.Git
Found Git [Git.Git] Version 2.53.0.2
This application is licensed to you by its owner.
Microsoft is not responsible for, nor does it grant any licenses to, third-party packages.
Downloading https://github.com/git-for-windows/git/releases/download/v2.53.0.windows.2/Git-2.53.0.2-64-bit.exe
  ██████████████████████████████  61.5 MB / 61.5 MB
Successfully verified installer hash
Starting package install...
Successfully installed

This will install the Git tools, as well as some unix-like tools. These tools are stored in C:\Program Files\Git\usr\bin

The goal is to enable awk.exe to be callable from anywhere (certainly from inside WinDBG). This means we'll have to add this folder to the PATH, ideally to the end of the PATH (to avoid collissions with other OS tools that may happen to have the same name).

From your admin prompt, run this:

setx PATH "%PATH%;C:\Program Files\Git\usr\bin" /M

Close the prompt, close WinDBG. Open a new prompt and type awk to see if it works:

awk -V
GNU Awk 5.3.2, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.2, GNU MP 6.3.0)
Copyright (C) 1989, 1991-2025 Free Software Foundation.
...

Good!

In WinDBG(X), attached to MS Edge browser, I ran the following command
(I truncated the output to save space)

0:019> .shell -ci "lm"  awk "NR>1 {print $3}"
msedge
OLEAUT32
msedge_exe
ffmpeg
msedge_elf
dbghelp
WINMM
VERSION
dwrite
DPAPI
...
USER32
ADVAPI32
ole32
IMM32
ntdll


shcore.dll
wldp.dll
.shell: Process exited

That opens perspectives, doesn't it. The question is, can we make .foreach take the output of the .shell command?

0:019> .foreach (x { .shell -ci "lm"  awk "NR>1 {print $3}" } ) { .echo ${x} }
msedge
OLEAUT32
msedge_exe
ffmpeg
msedge_elf
dbghelp
WINMM
VERSION
dwrite
DPAPI
...
USER32
ADVAPI32
ole32
IMM32
ntdll
shcore.dll
wldp.dll
.shell:
Process
exited

That looks great! We'd just have to get rid of the last 3 lines. .foreach took the closing message .shell: Process exited and tokenized it as well.

An easy way to avoid it, is to tell .shell to write the output to a file using the -o flag, and tell .foreach to read the file.

0:019> .shell -ci "lm" -o modules.txt awk "NR>1 {print $3}"
.shell: Process exited
0:019> .foreach /f (x "modules.txt") { .echo ${x} }
msedge
OLEAUT32
msedge_exe
ffmpeg
msedge_elf
dbghelp
WINMM
VERSION
dwrite
DPAPI
...
USER32
ADVAPI32
ole32
IMM32
ntdll
shcore.dll
wldp.dll

.for

If .foreach allows you to iterate over what the debugger already shows you, then .for lets you explore what the debugger does not show you yet.
It's a bit modeled after a class C-style for loop:

  • Initialize a state
  • Evaluate a condition
  • Execute a block
  • Update state
  • Repeat until the condition fails

Basic syntax:

.for ( init ; condition ; increment ) { commands }

Simple example:

r $t0 = 0
.for ( ; @$t0 < 5 ; r $t0 = @$t0 + 1 ) {
    .printf "i = %d\n", @$t0
}

A .for loop is really useful if you like to walk memory or traverse structures or create lists that you can't create using another command.
You can use it to walk Linked Lists, chains of pointers, or loop over memory ranges and look for things.

Aliases

Aliases in WinDBG are named text substitutions.

Think of them as:

  • macros
  • variables that expand into text
  • shortcuts for commands or expressions

When WinDBG sees an alias, it replaces it with its value before executing the command.

Creating, managing, using aliases

The basic syntax to create an alias looks like this:

as aliasname value

For example

as mycmd r eax
When you now run mycmd, it will expand to (and execute)
r eax

You can see all configured aliases with al.
You can delete an alias with ad name.
And you can force-overwrite an alias with as /x name value

You can use aliases in other commands:

as myaddr 00401000
db ${myaddr}

${alias} is the safe/explicit form. Bare alias also works in many cases, but wrapping the alias in ${} avoids ambiguity.

Processing order => risk of overriding built-in commands

Aliases take precedence over built-in commands!
That means you can actually override existing windbg commands using aliases, and thus breaking things)
When WinDBG resolves a command, it will perform alias expension first, interpreting the resulting text. If you were trying to run a command, WinDBG will then run the resulting text as a command.

If you broke something, simply delete the alias by running ad against the alias name.

Obviously it's a good practice to avoid using alias names that collide with important built-in commands, including x, r, bp, dt, dp, u, k, etc

Also, aliases are token-based, not substring based.
If you accidentally create an alias and override the d, it only breaks the d command, but not the variations that begin with d, such as dp, db, etc.

Alias expansion

Aliases get expanded:

  • before execution
  • as raw text substitution
Alias examples

In general, be careful with quotes:

as test ".printf \"corelan\""
test

expands to

.printf "corelan"

(You get the text, not the command)

While

as test .printf "corelan"
test

expands to

corelan

(Now it becomes the command and executes)

You can also specify multiple commands to be executed:

as showinfo .echo ANALYSIS; u $ip L 1; kb; dps @esp L 8

Practical tip:
Define a set of aliases, store them in a script and run the script when the debugger launches (-c)
Maybe you have a set of scripts you'd like to run on a regular basis. You could create aliases for them, making your life a lot easier.

Using arguments with aliases

We can't really pass arguments to aliases.
But if the "variable" component is whatever needs to be "added" to a command, then you can create the alias for the static part of the command, and then anything that you add to it, will be passed to the command. After all, an alias is just substituting stuff.

Expression evaluators: MASM and C++

WinDBG has 2 expression evaluators:

  1. MASM, the default, is typically used to evaluate expressions related with addresses, values, registers, symbol names, memory
  2. C++ is type aware, and is commonly used to access structs, fields, symbols.

MASM vs C++

MASM

When you type something like poi(@$t0) or @$t0 + 0x150, or using ? to do some quick math, you're using the MASM evaluator.
It's mostly address and register focused. It allows you to dereference pointers via poi(), and is arithmetic friendly...
On the flipside, it does not have understanding of C structures and is loosely typed.

MASM is good for quick math, pointer chasing, low level memory work.

MASM is the default, but you can also explicitly force MASM by starting your code with @@masm(...)

C++

You can invoke the C++ evaluator using @@c++(...)

It understands types, supports casts, ->, &.
It uses symbol/type information, and is a safe and clean way to access structures and their fields.

C++ is great for structure access, offsets, readability of your code.

Examples

Let's look at a few examples that - technically - combines both of WinDBG's expression evaluators.

Example 1: Get Heap Encoding keys from NT Heap Headers

In this first example, I'll create a script that iterates through a list and accesses elements in various structures in such a way that the code is readable, short and doesn't make assumptions about offsets, positions or architecture.

Let's say we want to make a list of all heaps, print if they are NT style or Segment style, and - for the NT heaps -print the encoding key.
The high-level approach would look like this:

  • The list of heaps can be found in a field called ProcessHeaps in the PEB.
  • We can determine if a heap is NT or Segment by checking the Signature field in the heap header (0xeeffeeff for NT, 0xddeeddee for Segment)
  • We can get the encoding key from the Encoding field in the heap header

It's worth noting that the ProcessHeaps field primarily lists NT heaps. Segment heaps may not all appear here, depending on the process and OS version.

Instead of hardcoding positions and offsets in PEB, Heap Headers etc, we're going to use corresponding structs and field names, provided by the symbols in ntdll.
For instance, the Signature field in the NT Heap (Windows 11) sits at offset 0x60, and for a Segment heap it's offset 0x8.
(In fact, at offset 0x8 in the NT Heap, we find a SegmentSignature field, which is not the same thing as the Signature field. It does not have the same value at the Signature field.)
What I'm trying to say is that hardcoding offsets may not be the more reliable technique going forward.

The 2 major datastructures we're going to access are:

  • PEB: _PEB
  • Heap Header: _HEAP (for NT style heaps)

Plan of attack:

  • We can get the address of the PEB using the @$peb pseudo-register, that's where the journey begins.
  • The C++ expression to obtain the pointer to the list with process Heaps would be something like @@c++((void*)(@$peb->ProcessHeaps)). We can store that in a pseudo-register @$t0.
  • We can also get the number of heaps by accessing @@c++((unsigned int)(@$peb->NumberOfHeaps)). That allows us to access the members in the ProcessHeaps as if were an array. To be more specific, we can use a pseudo-register to acts as "index" and use that it access positions in the list. We'll calculate positions using start_of_the_list + (indexcounter x pointersize)
  • To determine the architecture/pointersize, we can simply use the MASM $ptrsize pseudo-register. Technically, we can also determine the pointer size dynamically via the C++ evaluator: @@c++(sizeof(void*))
  • The Heap signature is found in the Signature field of the corresponding heap. @@c++(((ntdll!_HEAP*)XXXXXXX)->Signature) (XXXXXXX has to be the address of the heap you're accessing)
  • We'll print the EncodeFlagMask and dump the raw Encoding bytes to the screen. Tf XXXXXXX is the address of the corresponding heap, then we can access those 2 fields via @@c++(((ntdll!_HEAP*)@$t3)->EncodeFlagMask) and @@c++(((ntdll!_HEAP*)@$t3)->Encoding)

Check out the full script corelan_heap_encoding.txt from the Github repository
Open the "debugging", "scripts", "windbg" folder.

Example (against MS Edge):

0:013>$$>< g:\blogposts\debugging\scripts\windbg\corelan_heap_encoding.txt
Idx  HeapAddress        Type      EncEnabled  EncodeFlagMask     EncodingRaw
---  -----------------  --------  ----------  -----------------  ----------------------------------
  0  0000025c92010000  Segment   n/a         n/a                n/a
  1  0000025c91f30000  Segment   n/a         n/a                n/a
  2  0000025c91f40000  NT        yes         0x00100000         0000000000000000 000087ec435545a8
  3  0000025c921c0000  Segment   n/a         n/a                n/a
Example 2: Enumerating NT Heap Segments & VirtualAllocdBlocks

Let's look at a second example.

Let's enumerate all NT heaps in the process, determine the number of segments for each heap, and print all segments (start & end address).
We'll also enumerate the VirtualAllocdBlocks, print the number and then print each VA Block: addresses, commit size and reserve size

This script needs to access a few components:

From PEB:

  • The address of the ProcessHeaps: @@c++((void**)(@$peb->ProcessHeaps))
  • The number of heaps: @@c++((unsigned int)(@$peb->NumberOfHeaps))

Note: you can run dt _PEB to see the structure prototype, showing the type for each field:

   +0x090 ProcessHeaps     : Ptr32 Ptr32 Void
   +0x088 NumberOfHeaps    : Uint4B

We can consult the heap header structure with dt _HEAP to find the 3 fields we need:

0:002> dt _HEAP
ntdll!_HEAP
	...
   +0x060 Signature        : Uint4B
   ...
   +0x09c VirtualAllocdBlocks : _LIST_ENTRY
   +0x0a4 SegmentList      : _LIST_ENTRY
  ...

For each heap in the ProcessHeaps list:

  • We'll determine the signature (to see if it's NT Heap or Segment heap): @@c++(((ntdll!_HEAP*)XXXXX)->Signature) (XXXXX = address of the heap)
  • The SegmentList: @@c++(&((ntdll!_HEAP*)XXXXX)->SegmentList) (XXXXX = address of the heap)
  • The VirtualAllocdBlocksList: @@c++(&((ntdll!_HEAP*)XXXXX)->VirtualAllocdBlocks (XXXXX = address of the heap)

For each Segment:

We can get the Base and the number of pages by reading that from the Segment header.

0:002> dt _HEAP_SEGMENT
ntdll!_HEAP_SEGMENT
   +0x000 Entry            : _HEAP_ENTRY
   +0x008 SegmentSignature : Uint4B
   +0x00c SegmentFlags     : Uint4B
   +0x010 SegmentListEntry : _LIST_ENTRY
   +0x018 Heap             : Ptr32 _HEAP
   +0x01c BaseAddress      : Ptr32 Void
   +0x020 NumberOfPages    : Uint4B
   +0x024 FirstEntry       : Ptr32 _HEAP_ENTRY
   +0x028 LastValidEntry   : Ptr32 _HEAP_ENTRY
   +0x02c NumberOfUnCommittedPages : Uint4B
   +0x030 NumberOfUnCommittedRanges : Uint4B
   +0x034 SegmentAllocatorBackTraceIndex : Uint2B
   +0x036 Reserved         : Uint2B
   +0x038 UCRSegmentList   : _LIST_ENTRY

Suppose XXXXX is the address of the Segment, then we can get the needed info by accessing the following structure fields:

@@c++(((ntdll!_HEAP_SEGMENT*)XXXXX)->BaseAddress) and
@@c++(((ntdll!_HEAP_SEGMENT*)XXXXX)->NumberOfPages)
The end address is just the BaseAddress + (NumberOfPages x 0x1000)

For each VirtualAllocdBlock:

If you have a bit of experience with older Windows systems, then perhaps you remember that there used to be problems with the _HEAP_VIRTUAL_ALLOC_ENTRY symbol/structure.

So while we can find the ListHead of the VirtualAllocdBlocksList in the Heap Header, we can't use the _HEAP_VIRTUAL_ALLOC_ENTRY structure on older Windows versions. Fortunately, the VirtualAllocBlocksList is just a simple doubly-linked list. From the ListHead, we can just walk through the entire list. The CommitSize and ReserveSize are not encoded and sit at offset 0x10 and 0x14 respectively from the start of the VirtualALlocdBlock header on 32bit processes, just right before its regular Chunk header. (These offsets are correct for many versions, but may vary. When possible, rely on symbols instead of hardcoding.)

On newer Windows versions, you can see the offsets:

32bit:

0:002> dt _HEAP_VIRTUAL_ALLOC_ENTRY
ntdll!_HEAP_VIRTUAL_ALLOC_ENTRY
   +0x000 Entry            : _LIST_ENTRY
   +0x008 ExtraStuff       : _HEAP_ENTRY_EXTRA
   +0x010 CommitSize       : Uint4B
   +0x014 ReserveSize      : Uint4B
   +0x018 BusyBlock        : _HEAP_ENTRY

64bit:

0:019> dt _HEAP_VIRTUAL_ALLOC_ENTRY
ntdll!_HEAP_VIRTUAL_ALLOC_ENTRY
   +0x000 Entry            : _LIST_ENTRY
   +0x010 ExtraStuff       : _HEAP_ENTRY_EXTRA
   +0x020 CommitSize       : Uint8B
   +0x028 ReserveSize      : Uint8B
   +0x030 BusyBlock        : _HEAP_ENTRY

It might all look a bit complicated at first, but once you understand what datastructures you need to access and what fields are at your disposal, it's actually not that difficult.

You can find the script in the Github repository, as corelan_heap_seg_va.txt.

Example 3: List all modules, addresses and some properties

I'll let you take a look at the corelan_modules.txt script by yourself.
Try to figure out what datastructure it uses and how it accesses them.

Hints:
You can access the list of loaded modules by accessing datastructures in the PEB
For each module, we can access its PE Header at specific offsets

Enjoy!

Running WinDBG Command Files

In previous chapter, I started using script files aka Command Files.

There are a few ways to tell WinDBG to open a file from disk and run the commands inside:

Command Description
< Filename Reads commands from file and executes them as if typed line-by-line. Each line is a separate command. Filename parsing is strict (wrap the filename in quotes if it contains space). Supports $$ comments at the start of a line to annotate your code. This is good for simple scripts, for example lists of breakpoints. Multiline constructs may be fragile and often need to be written on one line.
$< Filename Same as < filename, but allows more flexible filename parsing. Still executes line-by-line. You can use $$ at the start of a line to annotate your code.
$>< Filename Reads file and replaces line breaks with semicolon (;). Everything will become one long line of commands. You can put $$ at the start of a line to add comments, they are terminated by ;, so they behave as inline comments rather than full-line comments.
$$< Filename Same as $<, but prevents unwanted alias/macro expansion. A bit safer in complex scripting environments. Still line-based and supports $$ comments normally.
$$>< Filename Combination of $$< and $><. Prevents alias expansion and converts newlines to semicolons. You can use $$ to add comments in the file, and they behave as inline comments (they get terminated by a ;), they are not true line-based comments.
$$>a< Filename [args] Executes script with arguments. Format: $arg1, $arg2, .... Uses the same parsing behavior as $$><

I would recommend using $$>< Filename.
It provides predictable execution and avoids alias expansion issues.

You can still use $$ to document your code, but keep in mind that comments are terminated by ;, so they behave as inline comments rather than true line-based comments.

Multiline constructs may be a bit fragile with WinDBG scripts, especially .for, .if, alias handling, etc.
Try, as much as possible, to keep things on one line.

If you'd like to add comments, you can also use * at the start of a line. This is often more reliable for full-line comments. Test and see what mode works best for you and your script.

PyKD

PyKD Basics

If you're a bit familiar with running mona.py on WinDBG, you most likely know that I have been using the PyKD extension and a library called windbglib to make mona.py work.

Pykd is a module for the CPython interpreter. Pykd itself is written in C++ and uses Boost.Python to export functions and classes to Python.
PyKD is a WinDBG extension that provides the ability to run Python scripts, and interacting with the process and debugger using an API. We can load the pykd library in our Python script, and that combination allows me to interact with the process that is being debugged.

The original version of PyKD is no longer maintained. pip still has versions, but only up to (and including) Python 3.9.
There are some repo forks on Github as well. The latest version I could find was pykd 0.3.4.15. It not perfect, but I can live with that.

If you figure out how to build pykd against newer Python versions, let me know!

My original installation procedure to make mona.py work inside WinDBG was based on the use of a relatively old version of PyKD (2.0.29), a 32bit version of Python2.7.14 or higher, the windbglib.py library and a 32bit debugging environment.
The pre-compiled PyKD version that I used, is called pykd.pyd. (Don't be fooled by the file extension, it's really just a .dll.) The windbglib github repo contains a copy of that binary (v0.2.0.29). That's the version I have been using for years.

Going forward, and in line with my ambition to make mona.py compatible with Python3 and do more useful things in 64bit processes as well (stay tuned - things are brewing), I'll explain today how we can use a more up-to-date approach to using PyKD in modern WinDBG versions.

The goal is to set up an environment that allows us to run Python3 code, and the most recent version of PyKD, in both a 32Bit and 64bit debugger environment.

Clean up (if needed)

First things first.

If you have been using mona.py inside WinDBG and your system is running an old version of pykd.kyd, it's a good idea to clean up first.

Remove all copies of pykd.pyd from your system, more specifically from the following folders and subfolders:

  • Classic WinDBG Program Folder (f.i. C:\Program Files (x86)\Windows Kits\10\Debuggers): check the x86/winext and x64/winext folders
  • %LOCALAPPDATA%\DBG\EngineExtensions
  • %LOCALAPPDATA%\DBG\EngineExtensions32
  • %USERPROFILE%\AppData\dbg\UserExtensions

You don't need to delete mona.py or windbglib.py. And if you still use Immunity Debugger, feel free to keep its copy of mona.py in place as well.
Just make sure all pykd.pyd files are gone. Please check your entire hard drive if needed, just to avoid that something gets picked up from somewhere later on.

Finally, make sure you do NOT have any Python versions installed via Microsoft Store, nor via the Python Install Manager.

On recent Windows systems, we can use winget to see if everything looks good.
Open a command prompt and run winget list python

The output should only list Python versions that have source winget

Good. Let's build up a new environment from scratch now.

Python3 and PyKD

What follows is a detailed step-by-step procedure on how to set up your system to run a modern version of PyKD in WinDBG/WinDBGX, using the pykd-ext bootstrapper.

Automated installer: CorelanPyKDInstall.ps1

If you prefer to use an automated installer, feel free to use the CorelanPyKDInstall.ps1 script.
You can grab a copy of the script from my CorelanTraining Github repository.
In order to keep your system clean, the script will remove existing pykd.pyd files inside WinDBG folders, before installing the new pykd components.

Get yourself an administrator powershell prompt. Run Set-ExecutionPolicy RemoteSigned and press "Y" when prompted.
Then, run ./CorelanPyKDInstall.ps1. If powershell still refuses to run the script, try Set-ExecutionPolicy Unrestricted and then try again.

The script requires winget, so make sure you're using a recent / up-to-date version of Windows.

If everything went well, you can now skip straight to the section on using pykd.
(Don't worry if you're seeing warnings about the VC 2010 Runtime - the script will report that the installation has failed if the packages were already installed)

Manual install

If you prefer to get your hands dirty and do the heavy lifting all by yourself (or if you are using a system without winget, these are the steps:

  1. Install Python 3.9 (32bit and 64it)
  2. Install PyKD for each Python version
  3. Install PyKD-ext
Manual install: Python 3.9.13

We'll need Python 3.9.13 specifically, and we're going to install both 32bit and 64bit versions.
If you have not installed those versions yet, download the standalone installer from the Python.org website:

Python 3.9.13 32bit

Launch the 32Bit installer
Click "Install NOW"
Leave "Install Launcher for all users" enabled
Choose a Default installation. do NOT click the "Add Python 3.x to PATH"

Python 3.9.13 64bit

Launch the 64 installer
Click "Install NOW"
Again, Default installation. do NOT click the "Add Python 3.x to PATH"

Open an admin command prompt and run py --list. You should see both 3.9 versions in the list. If you already had other versions installed (like I did), you'll might see them in the list as well.

C:\>py --list
Installed Pythons found by py Launcher for Windows
 -3.9-64 *
 -3.9-32
 -2.7-32

Also, please verify once again that you do not have any Python versions installed other than the ones with source winget.
You'll see a Python Launcher as well. That's what we need.

C:\>winget list python
Name                   Id                Version   Available Source
-------------------------------------------------------------------
Python Launcher        Python.Launcher   < 3.9.8   3.13.5    winget
Python 2.7.18          Python.Python.2   2.7.18150           winget
Python 3.9.13 (32-bit) Python.Python.3.9 3.9.13              winget
Python 3.9.13 (64-bit) Python.Python.3.9 3.9.13              winget

If you don't have winget, check the installed Apps and confirm that the Python versions are the ones you have installed manually.
When in doubt, remove Python versions from your apps and reinstall the ones you need by running the standalone installers again.

Next, check for updates to pip for both Python 3.9 versions.
As we have the ability to invoke a specific python version using the Python launcher py, we can run the following 2 commands to update the corresponding pip versions:

py -3.9-32 -m pip install --upgrade pip
py -3.9-64 -m pip install --upgrade pip

The Py launcher will default to the most recent version of Python that is installed.
As explained, you can use py -version to run a specific version of Python.
You can also overrule the automatic default version selection by either creating an .ini file or using an environment variable.
You can find more information on customizing the python launcher here.

Manual Install: PyKD

We can now install the PyKD library for both Python 3.9.13 (32 and 64) versions.

32bit:
From an administrator command prompt:

py -3.9-32 -m pip install pykd

This will install pykd inside %LOCALAPPDATA\Programs\Python\Python39-32\Lib\site-packages\pykd
(on my system, that path becomes C:\Users\corel\AppData\Local\Programs\Python\Python39-32\Lib\site-packages\pykd>)

From that folder, copy msdia140.dll into C:\Program Files (x86)\Common Files\Microsoft Shared\VC (create that folder first if needed)
Then, register the dll:

C:\>cd "C:\Program Files (x86)\Common Files\Microsoft Shared\VC"
C:\Program Files (x86)\Common Files\Microsoft Shared\VC>regsvr32 msdia140.dll

msdia-reg

64bit:
From an administrator command prompt:

py -3.9-64 -m pip install pykd

In this case, we need to register msdia120.dll.
You'll find a copy of the file already inside C:\Program Files (x86)\Windows Kits\10\App Certification Kit\

From the admin command prompt, simply run

regsvr32 "C:\Program Files (x86)\Windows Kits\10\App Certification Kit\msdia120.dll"

regsvr-msdia120

If, at any time, you get errors about msdia100.dll - you should be able to get a copy by installing the MS VC++ Runtime 2010.
(download from here)

Good. At this point, you should already be able to open a python prompt and import the pykd library:

C:\>py -3.9
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pykd
>>> pykd.dprintln("Hello world")
Hello world
>>> quit()

(You should be able to do the same thing by opening a 64bit Python prompt as well via py -3.9-64

If the pip3 installations went well earlier on, we should already have a pykd.pyd file inside the Lib\site-packages\pykd folder(s) of both Python3.9 versions.
That's all we need.

We'll now go ahead and install the PyKD-ext bootstrapper.

Manual install: PyKD-Ext

The pykd-ext bootstrapper is the WinDBG extension that will run a specific (and selectable) Python version, and allows us to load the corresponding pykd.pyd version. It glues all components together. You can find the original repo here (last updated 5 years ago)

Strictly speaking, you may still be able to load pykd.pyd directly, just like what we did with mona v2. With recent versions of pykd, you may notice however that the arguments you're passing on the command line, may not get passed on correctly to your python script.

The solution is to use the pykd-ext bootstrapper.

You can find pre-compiled versions of pykd-ext, for both x86 and x64, from this github repository. (thank you @apl3b)

Both archives have a pykd.dll file inside the "Release" folder.
The idea is to put the x86 pykd.dll inside %LOCALAPPDATA%\DBG\EngineExtensions32,
and the x64 pykd.dll inside %LOCALAPPDATA%\DBG\EngineExtensions

Believe it or not, that should do the trick.

Using pykd (via pykd-ext) : WinDBG Classic

In WinDBG Classic, I can now run .load pykd or !load pykd.

0:000> .load pykd

(Note, unlike what we used to do with mona.py v2, I am not telling WinDBG to load pykd.pyd directly. If you have removed the old pykd.pyd file from your WinDBG Program Fodler, running the .load pykd.pyd command should actually fail.
In fact, the only versions of pykd.pyd we should have, are the ones stored inside the Python Lib\site-packages\pykd folders.

We're actually going to invoke the pykd-ext bootstrapper instead.
(I.e. the pykd.dll file that we have placed inside the %LOCALAPPDATA%\DBG\EngineExtensions and %LOCALAPPDATA%\DBG\EngineExtensions32 folders).
As it is a dll, we don't have to specify the .dll extension.

We can now run a few interesting !pykd commands:

0:000> !pykd.help

usage:

!help
	print this text

!info
	list installed python interpreters

!select version
	change default version of a python interpreter

!py [version] [options] [file]
	run python script or REPL

	Version:
	-2           : use Python2
	-2.x         : use Python2.x
	-3           : use Python3
	-3.x         : use Python3.x

	Options:
	-g --global  : run code in the common namespace
	-l --local   : run code in the isolated namespace
	-m --module  : run module as the __main__ module ( see the python command line option -m )

	command samples:
	"!py"                          : run REPL
	"!py --local"                  : run REPL in the isolated namespace
	"!py -g script.py 10 "string"" : run a script file with an argument in the commom namespace
	"!py -m module_name" : run a named module as the __main__

!pip [version] [args]
	run pip package manager

	Version:
	-2           : use Python2
	-2.x         : use Python2.x
	-3           : use Python3
	-3.x         : use Python3.x

	pip command samples:
	"pip list"                   : show all installed packagies
	"pip install pykd"           : install pykd
	"pip install --upgrade pykd" : upgrade pykd to the latest version
	"pip show pykd"              : show info about pykd package
0:000> !pykd.info

pykd bootstrapper version: 2.0.0.24

Installed python:

Version:        Status:     Image:
------------------------------------------------------------------------------
  2.7 x86-32    Unloaded    C:\Python27\python27.dll
* 3.9 x86-32    Loaded      C:\Users\corel\AppData\Local\Programs\Python\Python39-32\python39.dll

As you can see in the output above, we now have the option to change python version, install packages, etc.
You can also see in the !pykd.info ouput that it is using the Python3 version we intended to use.

You can now run !py to get an interactive shell

0:000> !py
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:24:45) [MSC v.1929 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> 

The WinDBG status field (on the left side, before the command line input) says Input>:

windbg-py

Similar to what we've done previously at the Operating System command prompt, we can now enter python commands:

>>> print("hello world\n")
hello world
>>> 

We could try to load the pykd extension and use its API:

>>> import pykd
>>> print(pykd.dbgCommand("r"))
eax=00000000 ebx=00000000 ecx=41760000 edx=00000000 esi=008967a0 edi=0022b000
eip=77498218 esp=0064fa54 ebp=0064fa80 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
ntdll!LdrpDoDebuggerBreak+0x2b:
77498218 cc              int     3
>>> 

Type quit() to exit the interactive mode.

And of course, the goal is to run full-blown python scripts (such as mona.py)

Let's create a basic script mini.py:

import pykd
print("hello world\n")

Save it inside the WinDBG application folder (C:\Program Files (x86)\Windows Kits\10\Debuggers\x86).
Open WinDBG, attach it to a process (or open an executable) and run the following commands at the WinDBG Command Prompt:

!load pykd
!py mini
hello world

As indicated above, you have the option to select a specific Python version. If you have Python2 installed and you insist on running that version, you could add the -2 switch:

0:000> !py -2
Python 2.7.18 (v2.7.18:8d21aa21f2, Apr 20 2020, 13:19:08) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> 

(this works with scripts too, of course. In other words, if you'd like to run Mona 2 using pykd-ext, you'll have to !load pykd first, and then you can run !py -2 mona.py
You may still have to install the pykd library first. From an admin command prompt:

c:
cd\Python27
python -m pip install --upgrade pip
python -m pip install pykd

Please note that !py will default to using the most recent Python version installed.
For instance, on one of my lab machines, I am running multiple versions of Python 2 and 3:

0:000> !pykd.info

pykd bootstrapper version: 2.0.0.24

Installed python:

Version:        Status:     Image:
------------------------------------------------------------------------------
  2.7 x86-32    Unloaded    C:\Python27\python27.dll
  3.9 x86-32    Unloaded    C:\Users\corelan\AppData\Local\Programs\Python\Python39-32\python39.dll
  3.11 x86-32   Unloaded    C:\Users\corelan\AppData\Local\Programs\Python\Python311-32\python311.dll
* 3.13 x86-32   Loaded      C:\Users\corelan\AppData\Local\Programs\Python\Python313-32\python313.dll

As PyKD is only compatible up to 3.9, I either have to run !py -3.9 every single time, or I could also create an alias as py !py -3.9

A windbg launcher script (for instance w.bat) may look like this:

set "WINDBG_CMD=windbg.exe -hd -c '!load pykd; as py !py -3.9' "
%WINDBG_CMD% %*
Multiple Python versions (on the same system)

A quick note on having multiple python versions installed on the same system.
In my experience, it might be a good idea to consider removing all references to python-related folders from your system / user PATH environment variable and to always run Python scripts using py instead of python or python3.
If you do need python to work, then add the path to the specific Python version you'd like to invoke to the path, but remove all of the others.

If you ever encounter the scenario where you run !load pykd in WinDBG, and WinDBG dies without any warning or error when you try to run the !py command, this may be caused by a mismatch between the python version you're running, and the place where it picks up its libraries.

You could create simple Windbg launcher batch files for each Python/windbg combination.
For example, if I want to run Python2.7.18 in WinDBG x86, the script looks like this:

@echo off
REM ==========================================
REM Run WinDBG with optional arguments
REM Corelan Stack / Heap Training
REM www.corelan-training.com
REM ==========================================

set ORIGPATH=%PATH%
set PATH=C:\Python27;%PATH%
set PYTHONHOME=C:\Python27
set PYTHONPATH=C:\Python27\Lib

set "WINDBG_CMD=windbg.exe -hd -c '!load pykd; as !mona !py -2 mona.py'"

%WINDBG_CMD% %*

set PATH=%ORIGPATH%
SET PYTHONHOME=
SET PYTHONPATH=

Launching WinDBG x86 with, for example, Python3.8:

@echo off
REM ==========================================
REM Run WinDBG with optional arguments
REM Corelan Stack / Heap Training
REM www.corelan-training.com
REM ==========================================

set ORIGPATH=%PATH%
set PATH=%LOCALAPPDATA%\Programs\Python\Python38-32;%PATH%
set PYTHONHOME=%LOCALAPPDATA%\Programs\Python\Python38-32
set PYTHONPATH=%LOCALAPPDATA%\Programs\Python\Python38-32\Lib

REM Define base command (adjust path to wew file as needed)
set "WINDBG_CMD=windbg.exe -hd -c '!load pykd; as !mona !py -3 mona.py' "

%WINDBG_CMD% %*

set PATH=%ORIGPATH%
set PYTHONHOME=
set PYTHONPATH=

We're basically setting up an environment with the right things in the right places.
Of course you can now do this for any python version.
Just make sure you're running a python version that has the same architecture as the debugger, and that you're loading the corresponding pykd.dll file as well.

With this setup, you can simply run !mona at the WinDBG command prompt.

Still got an old Windows 7 machine?

You can get pykd-ext / pykd to work on Windows 7. Make sure it has at least SP1 (ideally full up to date).
Begin by performing a default installation of Python 2.7.18.
Then, download a copy of this installer script and run it from an administrator command prompt.

This will install 32bit and 64 bit Python versions (2.7.18 and 3.9), pykd and pykd-ext. It will also put mona.py and windbglib.py in place.
The script will also install .Net Framework 4.8 and WinDBG, and it will create .bat files inside your windbg x86 and x64 folders:

  • wpy2.bat: runs WinDBG, activating the Python2 environment. Use !py -2 to run scripts
  • wpy3.bat: runs WinDBG, activating the Python3 environment. Use !py -3 to run scripts

The .bat allow you to run windbg, using the -c switch, it will already load pykd for you, as well as create an alias to run mona.

Using pykd (via pykd-ext) : WinDBGX

The installation above allows us to simply load pykd in WinDBGX as well.
If you prefer to have only one copy of your scripts, you can either store them in the WinDBG Classic Program Folder and then simply run windbgx.exe from a Command Prompt that is inside that folder. That will allow you to just run the exact same commands, without having to specify a path.

!load pykd
!py mini

Of course, you can always specify a path, put your files in a central location, and perhaps even create an alias.
For example:

as myscript !py c:\scripts\myscript.py

Writing PyKD scripts

Although the PyKD project is no longer maintained by its original author, that doesn't mean it's no longer useful.
Fortunately, the Internet Archive's Wayback Machine has a copy of the original documentation (user manual and API reference). It's in Russian, but you can always translate the content if needed.

The goal of this post is not to to provide a detailed manual on how to write code that uses pykd. I just want to provide some ideas and examples that will hopefully inspire you to get started.

For starters, there's obviously mona.py and windbglib.py, but you can find some other resources as well, including:

Additionally, I have included some basic scripts in the debugging / scripts / pykd folder of my blogposts Github repository.

Some basic examples:

PEB & Modules

We can find the list of loaded modules in the PEB.
In WinDBG, we have the ability to run a "dump type" command to get the contents of the peb: dt _PEB @$peb. (If you're new to this, please check my previous post on WinDBG for more info on typed dump/display).
For instance (in a 64bit process):

0:000> dt _PEB @$peb
ntdll!_PEB
   +0x000 InheritedAddressSpace : 0 ''
   +0x001 ReadImageFileExecOptions : 0 ''
   +0x002 BeingDebugged    : 0x1 ''
   +0x003 BitField         : 0x4 ''
   +0x003 ImageUsesLargePages : 0y0
   +0x003 IsProtectedProcess : 0y0
   +0x003 IsImageDynamicallyRelocated : 0y1
   +0x003 SkipPatchingUser32Forwarders : 0y0
   +0x003 IsPackagedProcess : 0y0
   +0x003 IsAppContainer   : 0y0
   +0x003 IsProtectedProcessLight : 0y0
   +0x003 IsLongPathAwareProcess : 0y0
   +0x004 Padding0         : [4]  ""
   +0x008 Mutant           : 0xffffffff`ffffffff Void
   +0x010 ImageBaseAddress : 0x00007ff6`031a0000 Void
   +0x018 Ldr              : 0x00007ffe`fd3b2920 _PEB_LDR_DATA

In PyKD, we're going to do something similar, using the pykd.typedVar() function.
pykd has a function getCurrentProcess(), which returns the address of the PEB.

There is a function getCurrentProcessId() as well, but that one does not seem to return the PID of the debuggee unfortunately).
In fact, maybe I was missing something, but turns out it takes a bit of an effort to get the PID. Anyway, I included a small routine to get the current PID in the script, in case you're curious

Back to the use case.

This pykd statement provides access to the PEB:

peb = pykd.typedVar("ntdll!_PEB", pykd.getCurrentProcess())

Comparing dt with typedVar(), we can clearly see similarities. They both take a symbol name and an address.

This statement allows us to access the peb object and its fields/lists.

Let's say we're interested in listing the loaded modules and their start addresses.

The peb has a Ldr field, which contains the address of a _PEB_LDR_DATA structure. (I marked it in a different color in the output above)
That loader data structure contains several doubly linked list heads used to track loaded modules, more specifically:

  • InLoadOrderModuleList : List of modules in the order they were loaded in the process
  • InMemoryOrderModuleList : List of modules in the order they are placed in memory
  • InInitializationOrderModuleList : List of modules in the order in which the Windows loader calls the corresponding module’s entry point (DllMain)

More info here: http://undocumented.ntinternals.net/index.html?page=UserMode%2FStructures%2FPEB_LDR_DATA.html

Each list contains the same module entries, but linked through different LIST_ENTRY members inside each _LDR_DATA_TABLE_ENTRY, so the order differs depending on which list you walk.

PEB
 └──> PEB->Ldr
        └──> PEB_LDR_DATA
               └──> One of the LIST_ENTRY heads:
                      - InLoadOrderModuleList
                      - InMemoryOrderModuleList
                      - InInitializationOrderModuleList
                         └──> walk doubly linked list
                                └──> each node = LDR_DATA_TABLE_ENTRY

The idea is to start from a list head and follow the Flink pointers from one entry to the next until you reach the list head again.

With PyKD, that's as easy as doing this:

moduleLst = pykd.typedVarList(
    peb.Ldr.deref().InLoadOrderModuleList,
    "ntdll!_LDR_DATA_TABLE_ENTRY",
    "InLoadOrderLinks.Flink"
)

This dereferences PEB.Ldr to obtain the PEB_LDR_DATA structure, takes its InLoadOrderModuleList list head, and then asks PyKD to walk that list by treating each node as an ntdll!_LDR_DATA_TABLE_ENTRY linked through its InLoadOrderLinks field.
(This assumes the linked list is intact; corrupted lists may cause incomplete or invalid traversal.)

The pykd-modules.py will enumerate all 3 lists and print the output. Based on my experience, the InInitializationOrderLinks technique may not return all loaded modules. Use with caution.

Of course, you can obtain all module properties by parsing header information and reading values from memory. I'll talk about how to read from memory in a moment.
PyKD, however, has a module class as well, which already does a lot of the heavy lifting for you.
Likewise, there is already functionality in pykd that will enumerate through the modules for you. (pykd.getModulesList())

Let's look at script pykd-module-obj.py to see what that looks like.

pykd.getModulesList() returns a list of module objects.
If you would like to get a module object for a certain file, you can create an instance of the module class using the module's name (which is not the same thing as the filename), or an already existing object (such as one that was returned via pykd.getModulesList()

You may notice that pykd does not seem to always return the full path for a certain file. That's why I usually get the list of modules from the PEB myself (including all of its properties), and use my own module-type classes.

(The likely evolution for mona.py is to no longer rely on pykd.module)

Registers

The second use-case covers access to registers.

pykd offers a simple and straightforward way to registers: pykd.reg(regname).
(Please note that pykd expects you to specify the register name in lowercase.)

Changing a register value can be done with the setRegs function: pykd.setReg(registername, newvalue)

The pykd-regs.py script shows how to use both of these mechanisms.

Reading & writing memory

In its purest form, reading and writing bytes can be done via

  • pykd.loadBytes(location, size)
  • pykd.writeBytes(location, list_of_bytes_to_write)

The pykd-memory.py script shows how to use these 2 commands.

PyKD has a few variations as well. Reading strings (ansi or wide), for instance, can be done using the following functions:

  • loadCStr(location) : read a string from the location
  • loadWStr(location) : read a wide string from the location

Those 2 will read memory until they reach the corresponding terminator. (single null byte for a string, double null byte for a wide string).
If you're not really accessing a string that is properly terminated, you may be causing an uncontrolled read, leading to some sort of read access violation.

You can always use loadChars() and loadWChars() as well. These 2 functions take an address and the number of characters to read. That way you can avoid reading more than what you intended to.

Combining a few concepts, we could build a little routine that reads a string from memory:

def readString(self,location):
	if pykd.isValid(location):
		try:
			return pykd.loadCStr(location)
		except pykd.MemoryException:
			return pykd.loadChars(location, 0x100)
		except:
			return ""
	else:
		return ""

Executing WinDBG commands

The next technique I would like to demonstrate today, is executing a WinDBG command and parsing the output.

It might feel a bit like cheating - after all, PyKD has a lot of features.
But why reinvent the wheel if you can just run a command and parse the output, right ? 🙂
It comes with a performance hit - you're causing some I/O that wouldn't be there if you're just accessing memory directly. Additionally, you're relying on WinDBG commands to never change.
But ok, it's a convenient way to blend the best of both worlds.

This is how it's done:

cmd2run = "u eip L 0x20"
output = pykd.dbgCommand(cmd2run)

You can now split the output on newline '\n' and access the output of the WinDBG command for parsing or display.

Assembling / Disassembling

Assembling instructions to bytecode, and disassembling bytecode to instructions is possible with pykd... but you'll see it's a bit cumbersome, as it comes with a bit of collateral damage.

To assemble (i.e. convert an assembly instruction into the corresponding opcode), you'll have to pick a writeable "anchor" address first.
PyKD will 'assemble' the instruction to that location.

d = pykd.disasm(address)
asm_result = d.asm(instr)

Unfortunately, that means pykd has now overwritten a few bytes of memory at the anchor address.
Be careful when using this in a live target, as modifying instructions may affect execution if not restored correctly.
That's why I'll have my script read 20 bytes from the anchor location first, then let pykd do the assembling, and then we have to restore the original bytes.
(20 bytes may be too much, but I want to be sure I'm reading enough bytes to accomodate any instruction sequence lenght - usually only up to 16)

Additionally, and interestingly enough, the .asm() call does not return the opcode. It just positions itself to the next instruction.
In other words, the output of .asm() is not that relevant if you're just trying to get the opcode for instruction you're trying to assemble.
In order to get the opcode, you have to access the memory directly at the anchor address.

Of course, as you don't know how long the opcode actually is, it's going to be challenging to decide how many bytes to read. We can't really rely on "what has changed" either compared to the original bytes, because if the new instruction matches with the original one, we still won't have the length of the opcode.

Luckily, the .disasm() routine allows us to get the instruction at a given address.
The output contains the opcode, allowing us to parse & extract it

If we want to disassemble, we have to do something similar:

  • Preserve the original contents
  • Write the opcode we want to disassemble to an anchor address
  • Ask pykd to return the instruction
  • Restore the original bytes

Take a look at the pykd-asm-disam.py script, which implements both concepts.

Outro

That’s it for part 1.

We’ve only scratched the surface of what’s possible when you stop treating WinDBG as a passive tool and start using it as something you can shape, script, and control.
From startup automation to event-driven breakpoints and Python integration, you now have the building blocks to create your own debugging workflows.

In part 2, we’ll go further down the rabbit hole:
expect a closer look at the Data Model, NatVis, JavaScript providers, and extensions, and how they can take automation and introspection to a whole new level.

Also... if you’ve been following along closely, you probably noticed a few hints already:
there’s something cooking around mona.py 👀
Stay tuned — some long-awaited updates are on the horizon.

If you got value from this post, consider subscribing so you don’t miss what’s coming next.
We plan on dropping new content regularly, and subscribers are always the first to know.

And of course — feel free to follow me on social media to stay in the loop and see what I’m working on behind the scenes.

Thanks for reading 🙏


I hope you found this useful 🙏🏻 🤗

© Corelan Consulting BV. All rights reserved. ​The contents of this page may not be reproduced, redistributed, or republished, in whole or in part, for commercial or non-commercial purposes without prior written permission from Corelan Consulting bv. See our Terms of Use & Privacy Policy (https://www.corelan.be/index.php/legal) for more details.



Discover more from Corelan | Exploit Development & Vulnerability Research

Subscribe to get the latest posts sent to your email.

About the author

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.