Table of Contents
Welcome back!
I am a big fan of automation. In a way, automation is humanity’s most polite way of admitting:
“I understand this problem deeply enough to never want to think about it again.”
In my previous article on debugging, we've covered the fundamentals of using WinDBG and WinDBGX.
That gave us the baseline needed to actually use the debugger.
Today, we take it a step further.
In addition to manually driving the debugger, we’re going to explore how to make it work for us — through automation, instrumentation, and scripting.
In order not keep the content digestible I'm going to split the topic in 2 parts.
In part 1, we're going to look at a first set of possibilities:
In a second part, we'll look at a few other topics, including:
I'm using a fully up-to-date Windows 11 machine x64, and unless specified otherwise, all techniques discussed in this article will work on both WinDBG Classic and WinDBGX. Please check out the post on WinDBG Fundamentals for information on how to install WinDBG Classic & WinDBGX.
First things first. Let's see what we can do with WinDBG's -c startup flag.
Our journey begins right when windbg starts.
WinDBG usually gets control at an initial break before the target application really runs. Or, when attaching to an already running process, it will pause after attaching itself. In any case, when you connect a debugger to a process, it initially ends up in a stopped state, due to a "break" event. This is a normal part of the debugger lifecycle.
WinDBG's -c command-line argument is the first and simplest step into automation. It allows you to execute debugger commands automatically when WinDBG hits that initial break.
As explained, by default this is when the Debugger runs and either launches a process, or gets attached to a process. In both cases, the debugger will end up paused, and that's the moment when the commands specified after -c get executed.
This allows you to deterministically prepare and configure your debugging environment right from the start. For example, you can:
In short, -c lets you bring the debugger into a known, ready-to-use state without manual interaction.
If you want the debugger to continue execution after running these commands, you can simply end your command sequence with ;g.
Example:
windbg.exe -c ".load pykd;g"
This will load the PyKD extension and immediately resume execution after the initial break.
A powerful variation is to combine -c with the -g flag:
-c "windbg cmd1;windbg cmd2" : sets up the command(s) you would like to run. -g tells WinDBG to ignore initial breakpoint exception and continue execution. It'll look like WinDBG 'skips' the initial break.
That changes the behavior significantly. Instead of stopping at startup, the application runs immediately. When the debugger does break later, it's usually for a meaningful reason - such as an access violation, memory corruption, a second-chance exception, etc.
This combination allows you to prepare the debugger up front and then automatically run analysis when a real break occurs during exection.
In practice, this means:
I typically combine this with the -xd sov and -xi eh startup flags:
These flags modify how the debugger reacts to specific exceptions (first-chance vs second-chance), effectively reducing noise from expected exceptions. We'll talk more about events & exceptions in the next chapter.
Applications sometimes throw exceptions (SOV = Stack Overflow, EH = C++ EH exceptions), that may be handled perfectly fine by the OS/application. In other words, although the exception causes the debugger to break, these are often expected or handled exceptions, not indicative of a real issues. If, after "ignoring" those exceptions, something breaks after all, you're still going to see it.
With break-oriented automation, the commands you execute are no longer about setup—they’re about analysis. Example actions may include:
If you combine this with logging (e.g. using -logo path/to/logfile), you effectively automate crash triage. Every meaningful break results in a structured diagnostic output written to a log file, ready for later analysis.
Finally, if you're fully automating and scripting the end-to-end execution of an application (for instance during a fuzzing initiative), you also may want to use the following 2 extra startup flags to streamline everything and avoid any user interaction.
The same -c startup flag is available in WinDBGX as well. Additionally, WinDBGX has a "Startup" setting, allowing you to run commands each time you start a debugging session. Open WinDBGX, click "Settings" and open "Debugging settings". Scroll down to the bottom, you'll find the "Startup" section:
If you have specified commands with -c as well, you'll notice that those will be executed first (before the 'Startup commands').
In the context of WinDBG, events and exceptions are both notifications from the debuggee (the process being debugged) to the debugger, but they represent different categories of situations and are handled differently.
An event is any noteworthy occurrence during the execution of a process that the debugger is informed about. This includes things like process creation and exit, thread creation and exit, module (DLL) load and unload, and breakpoint hits. Events are part of the normal lifecycle and behavior of a program. They are expected, structured, and generally not indicative of something going wrong.
An exception is a specific type of event that indicates an abnormal condition or disruption in normal execution flow. Exceptions are typically generated by the CPU or the operating system when something unusual happens, such as accessing invalid memory (access violation), executing an illegal instruction, dividing by zero, or triggering a breakpoint instruction. Exceptions may or may not be handled by the application itself. If they are not handled, they can lead to program termination.
So the key difference is that all exceptions are events, but not all events are exceptions. Events describe what is happening, while exceptions describe something going wrong or out of the ordinary in execution.
As mentioned above, a “breakpoint” can show up in two different ways, and they map to different underlying mechanisms.
When I listed breakpoint hits under events, I was referring to debugger-managed breakpoints. These are breakpoints you explicitly set with commands like bp, bu, or ba. When one of those triggers, the debugger gets a debug event (specifically an EXCEPTION_DEBUG_EVENT internally, but treated as a controlled/debugger event). From your perspective, this is a normal, expected event that you asked for.
When I mentioned the breakpoint instruction under exceptions, I was referring to the CPU instruction int 3 (opcode 0xCC). When the CPU executes this instruction, it raises a breakpoint exception (STATUS_BREAKPOINT). This is a real exception generated by the processor, just like an access violation or divide-by-zero.
Of course, and that's perhaps what makes this a bit confusing: when setting breakpoints, the debugger will use the int 3 instruction to do so...
So the distinction is this:
A debugger breakpoint (bp/bu/ba) is something the debugger sets up. It may be implemented by temporarily patching the code with an int 3 instruction, but conceptually it is a controlled debugger event. You asked for it, and WinDBG treats it as part of normal debugging flow.
A breakpoint exception (int 3) is something the program executes. It may come from:
From the OS point of view, both cases generate an exception (STATUS_BREAKPOINT). The difference is intent and control:
This is why you can do things like:
sxd bp
and suddenly your manually set breakpoints appear to be “ignored” or behave differently, because under the hood they are still breakpoint exceptions.
So there is no real contradiction, just two layers:
That distinction is important for automation, because you can choose whether to hook the high-level debugger behavior (breakpoints you set) or the low-level exception mechanism (all breakpoint exceptions, regardless of origin).
WinDBG allows you to control how it reacts to both events and exceptions. This is where automation becomes powerful. Instead of manually responding to each situation, you can configure the debugger to break, ignore, log, or execute commands automatically when specific events or exceptions occur.
For exceptions, WinDBG uses the concept of first chance and second chance. A first chance exception is the initial notification that an exception has occurred. The application is given a chance to handle it. A second chance exception occurs if the application does not handle the exception, and at that point the debugger typically breaks because the program is about to crash.
You can configure how WinDBG responds using commands like sxe (break on exception), sxd (ignore exception), and sxi (break on second chance only).
In WinDBG, you can access the settings via "Debug" - "Event Filters". (You'll need to be connected to a process to access the options)
For each event/exception, you can define how WinDBG needs to respond and if you'd like to execute commands when something happens. The GUI isn't great, and the events & exceptions are all grouped together.
WinDBGX has an improved GUI, accessible via "Settings" - "Events & exceptions"
While Events & Exceptions are now listed separately, the GUI no longer seems to offer an easy way to link a command to a certain event or exception.
Of course, we can control behavior from the command line, which is probably want you're after if you're trying to automate the automation anyway 🙂
These are various commands to manage how events & exceptions are handled by WinDBG. (You'll need to combine them with an event or exception type, I'll list those in a table later in this post)
For example, if you want to break immediately on access violations, you can use:
sxe av
If you want to ignore first chance access violations and only break if they are unhandled, you can use:
sxi av
We can attach commands to events so that when they occur, WinDBG executes predefined debugger commands automatically. This allows you to build event-driven automation.
For example, suppose you want to log register state every time an access violation occurs. You could do something like:
sxe -c ".printf \"Access violation at %p\\n\", @$ip; r" av
Now, whenever an access violation happens, WinDBG will automatically print the instruction pointer and dump the registers.
Events such as module loads can also be hooked. For example, to run commands whenever a DLL is loaded, you can use:
sxe -c ".printf \"Loaded module\\n\"; lm" ld
This tells WinDBG to execute the command string whenever a load DLL event occurs.
In practice, this means you can “steer” the debugger. You can decide which situations matter, which ones should be ignored, and what actions should be taken automatically. Instead of passively observing execution, you turn the debugger into an active instrument that reacts to the behavior of the target process in real time.
In the previous post, I have introduced the mechanics of using a breakpoint to execute WinDBG Commands. As explained at that time, Event-driven debugging becomes powerful when you stop using a breakpoint to pause, and start using it to observe, annotate, classify, log, and steer execution.
In previous chapter, we had a closer look at the system of events & exceptions, and we learned how to link WinDBG to certain events & exceptions. I would like to take the opportunity to dive a little deeper into the use of breakpoints, for various purposes.
You could use breakpoints to gather telemetry, statistics and dynamic insights on execution, for instance:
Taking it one step further, the use of conditions could help reduce noise and be more specific, for example
Debugger instrumentation is about making the debugger react to events triggered from within the application itself. Instead of passively observing execution, you actively use the application’s behavior to control what the debugger does—and when.
A simple and effective technique is to use breakpoints as control points.
You deliberately trigger a known function or code path in the application, and place a breakpoint on it in the debugger. When that breakpoint is hit, it doesn’t just pause execution—it performs actions inside the debugger, such as enabling or disabling other breakpoints.
This gives you precise control over when certain things happen.
For example:
While not strictly required, having access to a scripting language (JavaScript in a browser, scripting inside a PDF, etc.) makes this approach much easier to implement. Scripting gives you control over timing and usually makes it easier to define the control breakpoints we're going to trigger.
You can:
That breakpoint then acts as a bridge between the application and the debugger:
In effect, you are instrumenting the debugger from inside the application.
The approach typically relies on two distinct sets of breakpoints.
These are hit as a direct result of application behavior:
Their role is simple: act as triggers. We typically need a trigger to enable, and a trigger to disable. Additional scenarios might involve passing a string as an argument in the scripting language, and picking it up/printing it in the debugger session. Finally, we could also make the debugger simply break.
These breakpoints perform the actual work, for example:
These are usually disabled, and have predictable breakpoint IDs.
This is how it works:
You may use:
This gives you fine-grained, runtime control with minimal overhead.
The main challenge is to find the triggers, to identify the "Set 1" breakpoints.
They should meet two key criteria:
Step by step:
With symbols (and if the symbols expose some reasonably fair naming conventions,) this may be relatively straightforward. You could do searches, looking for certain keywords, and set breakpoints directly. Using a use-case, we can see which ones get hit when you run the function statement in your scripting language.
Without symbols, you'd have to either trace what happens when you execute a certain function call in the scripting language, or you could try to "find" the function based on what it does.
Let's look at both scenarios.
Some popular historical implementations in applications that have a scripting environment, were/are based on the use of Math functions. Calling a cos(), sin(), tan() function usually plays no active role in triggering a vulnerability. Their impact on heap layouts may be limited as well (you still have to check!!). Of course, we still need to find their position in the application binaries. That may be relatively easy if the application has symbols and if the symbols (naming conventions) make sense.
Let's take Microsoft Edge as an example. Let's attach WinDBGX to the msedge.exe process that corresponds with a browser tab.
I could now consider doing some searches. Let's say I'd like to find the math.cos() function in one of the Edge binaries. (I'm aware, in the example below, that I am assuming that the module I need contains the word "edge". In reality, if you're not sure, you may have to perform a search in ALL loaded modules and simple put breakpoints on everything. For instance: x *!*math*cos*)
Anyway, in order to save some time (and to avoid the download of the symbols for all DLLs in your process), I'll begin the search by looking at module names that contain the word edge. I may be right, I may be wrong. We'll see.
0:018> x *edge*!*math*cos* 00000226`c727a880 msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathAcosh (void) 00000226`c727a6e0 msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathAcos (void) 00000226`d055668e msedge!libm::math::k_cos::k_cos (void) 00000226`cc6173d0 msedge!std::__Cr::__math::cos (void) 00000226`c30d7d10 msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathCos (void) 00000226`c727b0a0 msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathCosh (void) 00000226`c1e13240 msedge!Builtins_MathAcos (Builtins_MathAcos) 00000226`d05559f1 msedge!RNvNtNtCsgdwlvZkgXt4_4libm4math3cos3cos (_RNvNtNtCsgdwlvZkgXt4_4libm4math3cos3cos) 00000226`c1e13380 msedge!Builtins_MathAcosh (Builtins_MathAcosh) 00000226`d0555e37 msedge!RNvNtNtCsgdwlvZkgXt4_4libm4math4acos4acos (_RNvNtNtCsgdwlvZkgXt4_4libm4math4acos4acos) 00000226`c1e13dc0 msedge!Builtins_MathCos (Builtins_MathCos) 00000226`c1e13f40 msedge!Builtins_MathCosh (Builtins_MathCosh)
That looks promising. We could very easily set mass-breakpoints and turn this entire list into a simple logging mechanism. We'll do that in a moment. The idea is to have the application open a use case, which triggers the Math function that I'm trying to find, and to attach WinDBG to the right process, so we can activate the breakpoints in that process. Sounds logical, but requires a bit of attention with applications like modern browsers.
Let's begin by making the use case, which is just a small html file with a bit of javascript.
Create a file test.html, for example inside folder c:\tmp
<html> <script> Math.cos(0); </script> </html>
I usually run a small python webserver in the folder that contains the html file. Open a command prompt, go to the folder where you placed the html file and run this python oneliner
If you're using python2:
python -m SimpleHTTPServer 8080
If you're using python3:
python3 -m http.server 8080
or if you want to invoke a specific Python(3) version, installed via Python Install Manager:
py -3.9-64 -m http.server 8080
Open a new instance of Microsoft Edge and in one of the tabs, enter http://127.0.0.1:8080. You should see the contents of the folder where your use case html file is located. Don't click or open it yet.
Now open Task Manager, select "Processes" on the left, look at the "Apps" and open the section for "Microsoft Edge"
Find the line that corresponds with the Tab that is accessing http://127.0.0.1:8080. Right-click on that line, select "Go to details"
That should give you the pid of that TAB
Alternatively, you can also look for msedge.exe processes that are marked as "renderer".
Sometimes however you'll see more than one, even with just one tab open.
The following powershell one-liner will at least list the msedge.exe processes that have a reference to "renderer":
powershell -command "Get-CimInstance Win32_Process -Filter \"Name='msedge.exe'\" | ? { $_.CommandLine -match '--type=renderer' } | select ProcessId,CommandLine"
Anyway, I'll assume you know how to get the PID of the tab.
Now launch WinDBGX and attach it to that pid:
windbgx -p PID
You can now set the mass breakpoints:
bm *edge*!*math*cos* ".printf \"%y called\\n\", @$ip;g" 0: 00000226`c727a880 @!"msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathAcosh" 1: 00000226`c727a6e0 @!"msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathAcos" 2: 00000226`d055668e @!"msedge!libm::math::k_cos::k_cos" 3: 00000226`cc6173d0 @!"msedge!std::__Cr::__math::cos" 4: 00000226`c30d7d10 @!"msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathCos" 5: 00000226`c727b0a0 @!"msedge!v8::internal::maglev::MaglevGraphBuilder::TryReduceMathCosh" 6: 00000226`c1e13240 @!"msedge!Builtins_MathAcos" 7: 00000226`d05559f1 @!"msedge!RNvNtNtCsgdwlvZkgXt4_4libm4math3cos3cos" 8: 00000226`c1e13380 @!"msedge!Builtins_MathAcosh" 9: 00000226`d0555e37 @!"msedge!RNvNtNtCsgdwlvZkgXt4_4libm4math4acos4acos" 10: 00000226`c1e13dc0 @!"msedge!Builtins_MathCos" 11: 00000226`c1e13f40 @!"msedge!Builtins_MathCosh"
The %y format specifier will perform a symbol lookup and print the address as well as the symbol name, so you get to see what gets called. I like to use $ip as opposed to eip or rip. $ip is a pseudo-register that is architecture-aware.
Now let the process run:
0:000> g
Go back to the browser, use the already open tab and click on the use-case html file. Providing that you're attached to the right process, you should now see:
msedge!Builtins_MathCos (00000233`00e13dc0) called
Cool! You can now set a breakpoint just at msedge!Builtins_MathCos and it will get hit when the Math.cos(0); gets executed. That gives you a lot of control.
Quick note before we proceed. Always double-check the full module name. In Microsoft Edge, there may be a msedge.dll as well as a msedge.exe file loaded in the process. The output from the x and the bm commands above are not showing the file extension.
In fact, in this case, the Builtins_MathCos function is inside msedge.dll, not msedge.exe:
0:053> lm a msedge!Builtins_MathCos Browse full module list start end module name 000001c4`d83a0000 000001c4`eb180000 msedge (pdb symbols) C:\ProgramData\Dbg\sym\msedge.dll.pdb\6640F030371CFBB74C4C44205044422E1\msedge.dll.pdb 0:053> !address msedge!Builtins_MathCos Usage: Image Base Address: 000001c4`d83a1000 End Address: 000001c4`e7905000 Region Size: 00000000`0f564000 ( 245.391 MB) State: 00001000 MEM_COMMIT Protect: 00000020 PAGE_EXECUTE_READ Type: 01000000 MEM_IMAGE Allocation Base: 000001c4`d83a0000 Allocation Protect: 00000080 PAGE_EXECUTE_WRITECOPY Image Path: C:\Program Files (x86)\Microsoft\Edge\Application\146.0.3856.62\msedge.dll Module Name: msedge Loaded Image Name: C:\Program Files (x86)\Microsoft\Edge\Application\146.0.3856.62\msedge.dll Mapped Image Name: More info: lmv m msedge More info: !lmi msedge More info: ln 0x1c4d91b3dc0 More info: !dh 0x1c4d83a0000 Content source: 1 (target), length: e751240
On my system, the Builtins_MathCos function sits at offset 00e13dc0 from the start of msedge.dll:
0:053> ? msedge!Builtins_MathCos - msedge Evaluate expression: 14761408 = 00000000`00e13dc0
(Please take a moment to calculate the correct offset on your machine, we'll need it later on)
We have found a first trigger. Of course, you can keep searching for others (sin(), tan(), etc). I have provided a list of common Math functions later in this chapter.
Without symbols, it's a bit more challenging. After all, the Math functions you're trying to us in a particular application may be based on some sort of custom implementations. Trying to find or identify the corresponding function in memory by looking for a specific byte-sequence that would acts as some sort of "signature" may not be an option.
Often tasked with this challenge, and especially in applications that don't have symbols, I decided to build a Frida script called corelan_trigscan.py that attempts to find functions that contain are math-heavy (functions that contain a certain density of instructions that might possibly indicate some sort of cos, sin, tan function), and prints out WinDBG compatible breakpoint statements so we can see which one(s) are used.
Of course, it's all going to be based on heuristics, density and variables that make the script hit or miss. That said, I have been quite successful in finding certain math-related functions in binaries that had zero symbols.
Let's see if the script would find the Builtins_MathCos function in msedge.dll, and possibly/ideally other Math functions as well.
First of all, we'll need to install Python3, and Frida Python bindings to make the script work. (I found it a bit easier to use Python to create the Frida script that gets injected, and then process the output, than to try to do everything in frida directly)
This step requires installing Python3. If you are using Immunity Debugger or WinDBG Classic with a working mona.py installation, then your default python version is probably Python 2.7.x 32bit. If you change the default Python version to Python3 (by putting it's folder in the PATH environment variable before the Python2 folder), mona.py and other similar scripts will stop working.
There are 2 main ways to install Python3: through the good-ol'-trusted standalone installer, or by using the Python Install Manager. I am currently working on making mona.py run with Python3. The most recent version of PyKD is compatible with Python 3.9. When we get to the chapter on PyKD, I'll explain how to install that version specifically. For now, and for the sake of running Friday, you can just take the most recent version of Python if you'd like. At the time of writing, it's Python 3.14.3.
In any case, if you do care about running mona.py, then please do NOT install Python through the Python Install Manager. Remove the Python Install Manager if you already have it, and install the required Python versions via the standalone installers.
Download the package from the Python website and run a default installation. Again, pick whatever recent version you'd like. Leave "Install Launcher for all users" enabled Do NOT check the "Add Python 3.x to PATH" option. Leave it unchecked. (We're going to use the Python Launcher anyway.)
By default, the Python version will be installed as a folder inside your %LOCALAPPDATA%\Programs\Python folder. After installing Python3, open an admin command prompt and check if the installation was successful. The Python Launcher should have installed the py.exe binary inside the c:\windows folder. It should show up as the first one when you run where py:
C:\>where py C:\Windows\py.exe
The Python launcher (installed through the Python3 standalone installer) should be able to find the Python version(s) that you have installed on your system. I have 3 versions on mine, the output may be different on yours:
py --list Installed Pythons found by py Launcher for Windows -3.9-64 * -3.9-32 -2.7-32
Consequently winget should only show those versions. If you ever want to get a recent version of PyKD to work, you'll need to remove any python versions and Python Launcher that were installed via the MS Store.
C:\>winget list python Name Id Version Available Source ------------------------------------------------------------------- Python Launcher Python.Launcher < 3.9.8 3.13.5 winget Python 2.7.18 Python.Python.2 2.7.18150 winget Python 3.9.13 (32-bit) Python.Python.3.9 3.9.13 winget Python 3.9.13 (64-bit) Python.Python.3.9 3.9.13 winget
Good.
Test if the versions work:
C:\>py -3.9-32 Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:24:45) [MSC v.1929 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>>quit()
and
C:\>py -3.9-64 Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> quit()
Let's update pip for both Python 3.9.13 versions:
Run
py -3.9-32 -m pip install --upgrade pip
py -3.9-64 -m pip install --upgrade pip
We're going to install the Frida Tools and Frida Python Bindings. Let's say we plan on using Python 3.9.13 64bit, so we'll run this command:
C:\>py -3.9-64 -m pip install frida-tools Collecting frida-tools Downloading frida_tools-14.8.0.tar.gz (4.7 MB) ...
It's a good idea to check for updates to frida-tools from time to time:
C:\>py -3.9-64 -m pip install frida-tools --upgrade
I'll obviously have to run the Python version that has the frida bindings installed, so I'll be running py -3.9-64
Let's run the script.
Students of recent Corelan Heap classes may already have seen a previous version of this corelan_trigscan.py script inside their HeapMgmt / Scripts folder. In preparation for this blogpost, I have updated it quite a bit. Today, I'm happy to share the latest version of this (previously private) script with the world.
You can download a copy of the script from the Github repository that complements this series of blogposts on debugging.
Check out the "debugging" folder, and then look inside the "scripts", "frida" folder.
The script takes a number of arguments:
-h / --help Show help message -p / --process Process name or PID (e.g. MyApp.exe or 1234) -m / --module Module name to scan (e.g. MyApp.exe or a DLL). If omitted, the process main image module will be used. --min-density Minimum relevant-insn density (default: 0.6) --min-relevant Minimum number of relevant instructions (default: 40) --min-total Minimum total instructions in function+helpers (default: 40) --min-trig Minimum trig-count (trig column) (default: 0) --min-ical Minimum indirect call count (default: 0) --max-helper-insns Max instructions to scan in each helper block (default: 256) --max-func-insns Max instructions to scan in main function (default: 4096) --limit-bp Maximum number of printed/emitted breakpoints (default: 1000) --check-offset Optional sanity-check offset within module (hex or dec). --splitsize Number of breakpoints per numbered .bps file (default: 500)
In human language, this is what the arguments mean:
--min-density : How 'math' heavy the function needs to be. It takes the number of relevant math instructions and divides it by the total number of instructions --min-relevant : Minimum number of math-like instructions a function must contain --min-total : Minimum total number of instructions the function must have --min-trig : Minimum number of "strong trig signals". (fsin, fcos, sqrtsd, psrlq, pinsrw, etc) --min-ical : When lowering the density and relevant arguments (i.e. include functions with a low number of math heavy functions, it would be logical that these small functions call other functions. The code from Direct calls are included as "part of the function". Indirect calls aren't (no way to follow them). Setting min-ical to 1 in combination with lower density & relevance arguments will help reduce a lot of noise. --max-helper-insns : How many instructions the script is allowed to inspect in a directly called helper block --max-func-insns : How many instructions the script is allowed to inspect in the main function --limit-bp : How many breakpoints get printed and written to the log. It does not affect scanning or analysis, it only limits the number of breakpoint statements --splitsize : Breakpoints are written in chunks of 'splitsize' lines to numbered .bps file. 500 is default
the arguments that matter the most are
-p processName or pid -m modulename.dll/exe --min-density 0.5 --min-relevant 40 --min-total 40
You can tweak density, relevance and total trig values, but the values above are a good starting point for applications where the Math functions are very obvious. In my experience, in browsers, we often have to lower the density & relevance values quite a bit (which means we'll get a much longer list of breakpoints to work with). In other words, you could definitely run the script with values as low as --min-density 0.1 --min-relevant 5 --min-total 20 --limit-bp * and simply activate all breakpoints in WinDBG. WinDBG might start to heat up a little though, and you may need to take some time off to process the results 😉
The challenge is always: how can I reduce the results and make them more meaningful without removing the actual function I'm looking for. What are criteria that may allow the script to determine if a function is a good candidate or not.
You can initially add --min-trig 1 to further reduce the volume of results. Ideally, you're trying to find results with a trig value larger than 0, and you may have to lower the density and relevance variables to increase the scope of what may be considered an interesting function. This will limit the candidates to functions that are very obviously Math heavy. That doesn't mean it will include the ones you need, but it's a good starting point.
When the script runs, it will create a folder in the current working folder, that has the name of the module you're scraping. It will then write analysis into a corelan_trigscan.log file, and all WinDBG compatible breakpoint statements to corelan_trigscan.bps. Finally, it will break the list of breakpoints into chunks of 500, and make individual numbered files, making it easier for you to process them in smaller batches.
In order to organize the log & bps files, I have decided to create a folder "logs" on the C: drive of my machine. I'll be running the frida script from a command prompt, in that folder.
When the script has completed, you will be able to consult the log file for detailed info on density, relevance, totals and trig for each of the routines found. As explained, high density (of math instructions) and trig > 0 is a really good indicator. (Trig = presence of very specific instructions)
Of course, ultimately, the breakpoints is what we need. You can copy them from the file and paste them in the debugger, or you can just tell WinDBG to read the corelan_trigscan.bps file or the individually numbered (smaller) files using a WinDBG command that loads and runs a Command File.
(I'm assuming here that you're searching msedge.dll)
$$><C:\logs\msedge.dll\corelan_trigscan.bps
(of course, you'll need to provide the full path to the bps file on your system) We'll talk about WinDBG Command Files in more details later.
With the breakpoints in place, you can make the application run a use-case that triggers the use of Math function(s), so you can see if one of the breakpoints gets hit, and if so, which one it is.
Let's try it on Microsoft Edge.
We're already aware of the fact that there is a msedge!Builtins_MathCos function, so ideally the script should be able to find it as well.
Let's start with some standard density & relevance values and we'll see what happens. Of course, success heavily depends on the actual implementation of the Math functionality. In the case of msedge!Builtins_MathCos, the function may be a wrapper around other code, so maybe the script doesn't even recognise it as Math heavy. We'll see what happens.
If needed, you may have to run Edge with the --no-sandbox flag.
(Make a copy of the existing shortcut to Microsoft Edge, and edit the Target: field.
Open the "No Sandbox" version of Edge.
Just like other modern browsers, when Edge runs, you'll get many msedge.exe processes.
In order to do the analysis of a dll, it's not that important what process we're going to attach to. We just need a process that has the module we want to scrape and investigate, in this case msedge.dll
In fact, unless you provide a PID, the corelan_trigscan.py script will iterate over all processes that have the provided name, look for the first one that has the module we want to analyse and it will perform the search in the process that meets those requirements.
Let's see. Let's try with some average values and see what happens. Do not attach a debugger to any of the msedge.exe processes at this time.
From an administrator prompt:
C:\logs>py -3.9-64 g:\blogposts\debugging\scripts\frida\corelan_trigscan.py -p msedge.exe -m msedge.dll --min-density 0.5 --min-relevant 50 --min-total 20 --min-trig 1 [+] Configuration: Process (name) : msedge.exe Module : msedge.dll Min density : 0.5 Min relevant : 50 Min total : 20 Min trig (min-trig) : 1 Min icall : 0 Max helper insns : 256 Max function insns : 4096 Limit breakpoints : 1000 Split size : 500 Check offset : (none) [+] Found 8 process(es) named 'msedge.exe'. Looking for one with module 'msedge.dll' loaded... [+] Trying PID 9344 (msedge.exe)... [+] Successfully attached to PID 9344 (msedge.exe). [+] Attached to process 'msedge.exe' via PID 9344 (arch=x64). [+] Scanning module 'msedge.dll' for FP/SSE/AVX-heavy routines... [AGENT] Module msedge.dll size=0x12de0000 arch=x64 [AGENT] x64 prolog scan modes: fp64, shadow64 [AGENT] Scanning range msedge.dll+0x00001000 - msedge.dll+0x0f565000 (approx 0.00%) [AGENT] Progress 0.28% — analyzing function @ msedge.dll+0x000ae257 [fp64] [AGENT] FOUND msedge.dll+0x000b1aa8 dens=0.874 rel=146 total=167 trig=2 icalls=0 mode=fp64 (candidates=1) ... [+] Found 58 candidate functions. [+] Output directory : 'msedge.dll' [+] Full analysis written to : 'msedge.dll\corelan_trigscan.log' [+] Breakpoints written to : 'msedge.dll\corelan_trigscan.bps' [+] Wrote 1 numbered breakpoint file(s): first: msedge.dll\corelan_trigscan_0001.bps last : msedge.dll\corelan_trigscan_0001.bps [+] Done.
Open the msedge.dll folder and look for the log and bps files
Of course, the corelan_trigscan.py script will remove all bps files with every run.
Our log file looks like this:
corelan_trigscan log ===================== Timestamp : 2026-03-21T08:52:11 Process : msedge.exe Module : msedge.dll Arch : x64 Min density : 0.5 Min relevant : 50 Min total : 20 Min trig (min-trig): 1 Min icall : 0 Max helper insns : 256 Max function insns : 4096 Breakpoint limit : 1000 Split size : 500 Output directory : msedge.dll Breakpoint file : msedge.dll\corelan_trigscan.bps Total candidates : 58 Sort order : trig desc, density desc, relevant desc, indirect_calls desc # Candidate list (see sort order above) # idx location density relevant total trig icalls prolog 1 msedge.dll+0x051553e0 0.821 3363 4096 42 0 shadow64 2 msedge.dll+0x05154990 0.795 3258 4096 42 0 shadow64 3 msedge.dll+0x05156600 0.881 3610 4096 39 0 shadow64 4 msedge.dll+0x05155f70 0.868 3555 4096 39 0 shadow64 5 msedge.dll+0x051d9380 0.851 2733 3211 27 1 shadow64 6 msedge.dll+0x051d8d50 0.833 2978 3576 27 1 shadow64 7 msedge.dll+0x051d8bb0 0.825 3040 3684 27 1 shadow64 ...
Open a new MS Edge browser process, launch WinDBGX, attach it to the right pid (the one that corresponds with the tab), paste in all the breakpoints (or tell WinDBG to load your bps file with the $< command), and then open the use-case that tries to call one or more Math functions.
For instance:
<html> <script> Math.cos(0); Math.sin(0); Math.tan(0); alert("done"); </script> </html>
The corelan_trigscan.py parameters specified resulted in 58 breakpoints. I activated them, ran the usecase html file... but I got no results. None of the 58 breakpoints got hit by any of the 3 Math functions that I used.
That means I have to lower the values, which will hopefully get me more candidate functions, and thus more breakpoints. In theory this poses no problem for WinDBG... but it's not exactly great from a performance perspective. The python script will run longer (more stuff to investigate, but that's ok). But when you're going to set a large volume of breakpoints in WinDBG, you'll see it slow down significantly. Of course, you could also set breakpoints in smaller batches. That's what the chunked breakpoint files are for.
Anyway, before we do that, allow me to introduce another feature in the script that may help.
As explained before, we are already aware of the presence of the Builtins_MathCos function. As calculated earlier, on my machine it sits at offset 00e13dc0 from the start of msedge.dll
The python script has a --check-offset argument, which takes an offset. That offset will be added to the base address of the module you've specified with the -m argument, and that position will be considered a function that you'd like to examine.
When the script is finished doing the full analysis of the module (and it may or may not have found the function you need), it will look at that function specifically, do the analysis and give you the corresponding density, relevance, and other statistics. In other words, if the script was not able to find that function by itself, you should be able to tell why it was excluded and where you possibly need to lower certain criteria to find this (and other similar) functions.
Let's do a second run, adding the --check-offset variable. We're just trying to make the script provide us with statistics about that function specifically, so in order to make the script finish faster I'll even increase the variables.
Close Edge. Open a new instance. Again, make sure there are no debuggers attached.
C:\logsgt;py -3.9-64 g:\blogposts\debugging\scripts\frida\corelan_trigscan.py -p msedge.exe -m msedge.dll --min-density 0.8 --min-relevant 100 --min-total 100 --min-trig 1 --check-offset 00e13dc0
The output shows the analysis of the function at offset 0x00e13dc0:
[+] Direct analysis of requested offset: location : msedge.dll+0x00e13dc0 module base : 0x25780000000 module end : 0x25792de0000 module size : 0x12de0000 absolute addr : 0x25780e13dc0 in range : True first insn : 0x25780e13dc0 push rbp second insn : 0x25780e13dc1 mov rbp, rsp prolog match : True prolog mode : fp64 density : 0.160 relevant : 8 total : 50 trig : 0 indirect calls : 1
It looks like it has a rather low density (0.16) and relevance (8), and it has an indirect call. The combination of these 3 elements are a possible indicator that our Builtins_MathCos is a wrapper around the actual function, and/or uses other function(s) to do the actual Math. That other function may or may not be in the list already. Maybe we should lower the criteria even further. When corelan_trigscan.py runs, it's able to following direct calls and it considers the code in the child functions to be part of the parent function. But with indirect calls, it's difficult to determine where the call will go.
Functions with low density, low relevance, and without indirect calls are - very likely - not going to be that interesting. So if we lower the density & relevance parameters, we could initially try to focus on the ones with an indirect call.
For instance, we could lower the density to 0.15 and relevance to 8. We'll also have to increase the --limit-bp argument as well, because by default it will only produce breakpoints for the first 1000 functions. You could set it to 0 or * to make it consider ALL the breakpoints.
Be careful with filtering on indirect calls. You'll essentially skip over legit Math functions that don't use indirect calls. It's a useful technique if the Math functions are indeed wrappers that use indirect calls. Based on the analysis of the Builtins_MathCos function, let's set --min-ical 1 as well.
(Of course, in reality you may not have an example such as Builtins_MathCos yet. I usually don't filter out the indirect calls unless I have to start reducing relevance and density parameters a lot. I'll share my personal step by step workflow later.
Let's try this:
C:\logs>py -3.9-64 g:\blogposts\debugging\scripts\frida\corelan_trigscan.py -p msedge.exe -m msedge.dll --min-density 0.15 --min-relevant 8 --min-total 45 --check-offset 00e13dc0 --limit-bp * --min-ical 1
I have asked the script to perform the analysis of the function again, and I can see this promising information:
[+] Sanity check for offset msedge.dll+0x00e13dc0: FOUND among candidates. density=0.160, relevant=8, total=50, trig=0, indirect_calls=1, prolog=fp64 [+] Direct analysis of requested offset: location : msedge.dll+0x00e13dc0 module base : 0x25780000000 module end : 0x25792de0000 module size : 0x12de0000 absolute addr : 0x25780e13dc0 in range : True first insn : 0x25780e13dc0 push rbp second insn : 0x25780e13dc1 mov rbp, rsp prolog match : True prolog mode : fp64 density : 0.160 relevant : 8 total : 50 trig : 0 indirect calls : 1 [+] Output directory : 'msedge.dll' [+] Full analysis written to : 'msedge.dll\corelan_trigscan.log' [+] Breakpoints written to : 'msedge.dll\corelan_trigscan.bps' [+] Wrote 9 numbered breakpoint file(s): first: msedge.dll\corelan_trigscan_0001.bps last : msedge.dll\corelan_trigscan_0009.bps [+] Function at offset written to : 'msedge.dll\corelan_trigscan_0009.bps' [+] Done.
The function at the provided offset (Builtins_MathCos) was found and labeled as a viable candidate. Perhaps it means that the script was able to find other Math functions as well. We'll try to find out.
This time, the script gave me 4265 candidate functions. Activating all of them at the same time in WinDBG might take a little while, and WinDBG won't exactly run super smooth.
Additionally, as a side note, with that many breakpoints, there is the obvious risk that the application will use some of these functions just by itself. That's inevitable, but that's ok. From a timing perspective, we can most likely see the difference between breakpoints that are just getting hit, versus the ones that get hit by our code.
As explained earlier, the script will not just write all breakpoints into corelan_trigscan.bps, but it will also create numbered 'chunked' files, with a more "manageable" amount of up to 500 breakpoints per file. In this case, I have 9 individual files. I'd have to run the use case 9 times, that's absolutely doable and manageable. Note that the breakpoint for the function with the provided offset was written to file number 0009. It's not uncommon to find similar functions relatively close to each other in a binary, so perhaps it's an idea to start processing that file first and work our way up to number 0001 backwards.
Routine:
$$><C:\logs\msedge.dll\corelan_trigscan_0009.bps g
Open the use case
If breakpoints get hit, document them
Pause the debugger
Clear the breakpoints and load the next file:
bc * $$><C:\logs\msedge.dll\corelan_trigscan_0008.bps g
and so on...
Patience is a virtue.
Sometimes you'll notice that you have to close the debugger and open a new session. Fortunately the chunked files are here to avoid that you have to do everything all over again.
Sometimes you'll see certain breakpoints getting hit over and over again, forcing you to intervene (pause the debugger session, remove that breakpoint, and continue doing the analysis). When enabling a large list of breakpoints, we obviously don't know what the ID is going to be. If you need to disable or remove a specific breakpoint, you'll have to run bl first to get all breakpoints, find (in the long list) the one you want to disable/delete, and then disable/delete it.
That's why the breakpoint statements provided by corelan_trigscan.py are not only numbered (you'll see the ID when the breakpoint gets hit). The printf statements in the breakpoints will use DML markup language to show a clickable link on the screen. That way, when a certain breakpoint gets hit over and over again, you can simply pause the debugger, click on the [disable] link and simply continue running the session.
The ID numbers assigned by the script start at 1000, so if you wish to set some other breakpoints as well, keep their ID below 1000.
Anyway, let's go back to our use case. I decided to load the breakpoint from file 0009 first. I let the process run in the debugger (g), and opened the usecase test.html, which calls 3 Math functions. This is the result:
0:016> g ----- msedge.dll+0x00e13dc0 bp5263 hit dens=0.160 rel=8 tot=50 trig=0 icalls=1 prolog=fp64 ----- [disable] ----- msedge.dll+0x00e14d40 bp5264 hit dens=0.160 rel=8 tot=50 trig=0 icalls=1 prolog=fp64 ----- [disable] ----- msedge.dll+0x01b1cfd2 bp5058 hit dens=0.163 rel=53 tot=325 trig=0 icalls=1 prolog=shadow64 ----- [disable]
I can now investigate if there is a link between a Math statement and one of the the breakpoints that got hit. After doing a bit of testing, I got these results:
Math.cos() = msedge.dll+0x00e13dc0 = msedge!Builtins_MathCos (000001bd`11153dc0) Math.sin() = msedge.dll+0x00e14d40 = msedge!Builtins_MathSin (000001bd`11154d40)
Continue working through the other files as well, take your time.
After processing file 0007, I was able to find the third one as well:
Math.tan() = msedge.dll+0x00e15200 = msedge!Builtins_MathTan (000001bd`11155200)
Mission accomplished!
You can now set breakpoints at those functions and make them do other WinDBG things, for instance activate the logging of heap allocations, etc.
In conclusion of this chapter, this is the approach I usually implement when I don't have symbols
Start with relatively high values:
--min-density 0.5 --min-relevant 30 --min-total 50 --min-trig 1
Next run, reduce them a little
--min-density 0.35 --min-relevant 20 --min-total 30
(Pay attention to how many candidates you get. If it's more than 1000, you'll have to set the --limit-bp argument.
If that doesn't get you the functions you're looking for, I usually drop the numbers, but filter on icals:
--min-density 0.15 --min-relevant 8 --min-total 20 --min-ical 1 --limit-bp *
With regards to building use-cases. It's obviously very important to provide syntactically correct code, so you'll have to figure out what Math statements exist in the language that you're exploring. Most of these applications have some sort of Developer tools or Scripting console that allows you to just type commands and execute them. That might make it easier to pinpoint which one exactly triggers a certain breakpoint.
Overall, this is a starting point for javascript engines in browsers:
Math.abs(1) Math.ceil(1.2) Math.floor(1.8) Math.round(1.5) Math.trunc(1.8) Math.min(1, 2) Math.max(1, 2) Math.sqrt(4) Math.pow(2, 3) Math.exp(1) Math.log(10) Math.log10(10) Math.log2(8) Math.sin(1) Math.cos(1) Math.tan(1) Math.asin(0.5) Math.acos(0.5) Math.atan(1) Math.atan2(1, 1) Math.cbrt(8) Math.hypot(3, 4) Math.random() Math.PI Math.E
A somewhat safer subset for PDF readers etc, might look like this:
Math.abs(1) Math.ceil(1.2) Math.floor(1.8) Math.round(1.5) Math.min(1, 2) Math.max(1, 2) Math.sqrt(4) Math.pow(2, 3) Math.exp(1) Math.log(10) Math.sin(1) Math.cos(1) Math.tan(1) Math.asin(0.5) Math.acos(0.5) Math.atan(1) Math.atan2(1, 1) Math.random() Math.PI Math.E
Good luck!
WinDBG has had a bit of automation for a long time, even before the modern Data Model and the JavaScript API became a thing. We've played with action breakpoints before, which is already a form of automation. We're attaching commands to an event.
A step up from that, is adding control flow logic. The options are bit limited though. We have .if to take decisions and .foreach to perform iterations.
Although not strictly "automation", I'd like to mention that we can create aliases to make our code a bit more readable.
In modern WinDBG versions, the dx command provides access to the Data Model. We'll talk about that in part 2 of the Automation & Scripting series.
.if is WinDBGs conditional control-flow token. Conceptually, it behaves like if in C. It evaluates an expression, and if that expression is met it executes the control block. You have the ability to use .else and .elsif as well to make the decision process more complete.
Basic syntax:
.if (Condition) { Commands } .if (Condition) { Commands } .else { Commands } .if (Condition) { Commands } .elsif (Condition) { Commands } .if (Condition) { Commands } .elsif (Condition) { Commands } .else { Commands }
You can specify multiple commands (seperated by semi-colon ;). Commands (even if it's just one) have to be placed inside the braces.
I mostly use .if statements in breakpoints. That said, I only do it when I'm sure about the condition. Allow me to clarify what I mean. Sometimes conditions are based on assumptions. Like the size of something. If you filter out information based on an assumption, you may end up (partially) blind. In my humble opinion, it may be better to log and document everything, and do grep-style filtering on the output. After all, you can just write the output of WinDBGs command window to a file with the .logopen path/to/logfile statement.
But if you're 100% sure about the condition, then an .if statement may be what you need.
If .if gives you decision power, then .foreach and .for will give you even more the feeling of programming in WinDBG. Don't get too excited though. It misses a lot of options and flexibility.
In general: .foreach gives you control over data, and .for gives you control over execution.
Let's begin with .foreach
In short, .foreach:
Basic syntax looks like this:
.foreach (var { command-producing-output }) { commands-using-var } .foreach /s (var "string") { ... } .foreach /f (var "file.txt") { ... }
The .foreach command is pretty useful if you already have a list or a command that produces a list, and if you want to pipe that list (the elements in that list) to another command. Because it tokenizes everything, the foreach loops can get a bit messy very easily, especially if the output contains more text than what you need, and certainly if that kind of breaks "predictability" of where the "tokens" will appear that you want to iterate over.
I'll explain:
Let's say you want to do something with the list of loaded modules. lm provides that list. You'll get start address, end address and module name, an indication if you have symbols, and if so, the path to the symbol file. (the latter is optional, it depends on whether you have symbols or not).
0:014> lm start end module name 000001ff`81000000 000001ff`93de0000 msedge (pdb symbols) C:\ProgramData\Dbg\sym\msedge.dll.pdb\6640F030371CFBB74C4C44205044422E1\msedge.dll.pdb 000001ff`97510000 000001ff`975e7000 OLEAUT32 (deferred) 00007ff6`ed5b0000 00007ff6`edaaa000 msedge_exe (deferred) 00007ffd`f42a0000 00007ffd`f4711000 ffmpeg (deferred) 00007ffe`0b120000 00007ffe`0b5e6000 msedge_elf (deferred) 00007ffe`34160000 00007ffe`343a2000 dbghelp (deferred) 00007ffe`36d10000 00007ffe`36d45000 WINMM (deferred) 00007ffe`36d50000 00007ffe`36d5b000 VERSION (deferred) 00007ffe`38610000 00007ffe`38877000 dwrite (deferred) 00007ffe`3be30000 00007ffe`3be3a000 DPAPI (deferred) 00007ffe`3c240000 00007ffe`3c267000 win32u (deferred) 00007ffe`3c3e0000 00007ffe`3c52b000 ucrtbase (deferred)
Again, when we look at the output, as humans, we can see 5 columns:
If there would be a way to tell foreach to do something with the third column, then we'd get what we want. If we'd had awk in WinDBG, it would be as simple as doing something like this:
awk 'NR>1 {print $3}'
But that's not how it works. In fact, there are a few important limitations.
When foreach parses output, it basically flattens every string on an individual line. Let me show what you that looks like with the output of the lm command:
0:014> .foreach (x { lm }) { .echo ${x} } start end module name 000001ff`81000000 000001ff`93de0000 msedge (pdb symbols) C:\ProgramData\Dbg\sym\msedge.dll.pdb\6640F030371CFBB74C4C44205044422E1\msedge.dll.pdb 000001ff`97510000 000001ff`975e7000 OLEAUT32 (deferred) 00007ff6`ed5b0000 00007ff6`edaaa000 msedge_exe (deferred) 00007ffd`f42a0000 00007ffd`f4711000 ffmpeg (deferred)
Empty "columns" are skipped, which makes it even messier to handle. The lm command has a 1m option, which makes it return just the module names. But that's not a generic solution.
If you're working with a clean list of items, for instance pointers or symbol names, then .foreach can be a very powerful tool. But as soon as things get a bit more sophisticated, you may have to look at other scripting capabilities (Data Model, PyKD, Extensions, etc).
That said. What if we install awk on our Windows machine and use .shell? Let's see if that works.
First of all, let's get ourselves a working version of awk.
There are a few ways to do so. Git Bash includes some of these tools, and we can very easily get a copy of Git Bash throught winget. Open an admin prompt and type the following command:
winget install Git.Git Found Git [Git.Git] Version 2.53.0.2 This application is licensed to you by its owner. Microsoft is not responsible for, nor does it grant any licenses to, third-party packages. Downloading https://github.com/git-for-windows/git/releases/download/v2.53.0.windows.2/Git-2.53.0.2-64-bit.exe ██████████████████████████████ 61.5 MB / 61.5 MB Successfully verified installer hash Starting package install... Successfully installed
This will install the Git tools, as well as some unix-like tools. These tools are stored in C:\Program Files\Git\usr\bin
The goal is to enable awk.exe to be callable from anywhere (certainly from inside WinDBG). This means we'll have to add this folder to the PATH, ideally to the end of the PATH (to avoid collissions with other OS tools that may happen to have the same name).
From your admin prompt, run this:
setx PATH "%PATH%;C:\Program Files\Git\usr\bin" /M
Close the prompt, close WinDBG. Open a new prompt and type awk to see if it works:
awk -V GNU Awk 5.3.2, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.2, GNU MP 6.3.0) Copyright (C) 1989, 1991-2025 Free Software Foundation. ...
Good!
In WinDBG(X), attached to MS Edge browser, I ran the following command (I truncated the output to save space)
0:019> .shell -ci "lm" awk "NR>1 {print $3}" msedge OLEAUT32 msedge_exe ffmpeg msedge_elf dbghelp WINMM VERSION dwrite DPAPI ... USER32 ADVAPI32 ole32 IMM32 ntdll shcore.dll wldp.dll .shell: Process exited
That opens perspectives, doesn't it. The question is, can we make .foreach take the output of the .shell command?
0:019> .foreach (x { .shell -ci "lm" awk "NR>1 {print $3}" } ) { .echo ${x} } msedge OLEAUT32 msedge_exe ffmpeg msedge_elf dbghelp WINMM VERSION dwrite DPAPI ... USER32 ADVAPI32 ole32 IMM32 ntdll shcore.dll wldp.dll .shell: Process exited
That looks great! We'd just have to get rid of the last 3 lines. .foreach took the closing message .shell: Process exited and tokenized it as well.
An easy way to avoid it, is to tell .shell to write the output to a file using the -o flag, and tell .foreach to read the file.
0:019> .shell -ci "lm" -o modules.txt awk "NR>1 {print $3}" .shell: Process exited 0:019> .foreach /f (x "modules.txt") { .echo ${x} } msedge OLEAUT32 msedge_exe ffmpeg msedge_elf dbghelp WINMM VERSION dwrite DPAPI ... USER32 ADVAPI32 ole32 IMM32 ntdll shcore.dll wldp.dll
If .foreach allows you to iterate over what the debugger already shows you, then .for lets you explore what the debugger does not show you yet. It's a bit modeled after a class C-style for loop:
.for ( init ; condition ; increment ) { commands }
Simple example:
r $t0 = 0 .for ( ; @$t0 < 5 ; r $t0 = @$t0 + 1 ) { .printf "i = %d\n", @$t0 }
A .for loop is really useful if you like to walk memory or traverse structures or create lists that you can't create using another command. You can use it to walk Linked Lists, chains of pointers, or loop over memory ranges and look for things.
Aliases in WinDBG are named text substitutions.
Think of them as:
When WinDBG sees an alias, it replaces it with its value before executing the command.
The basic syntax to create an alias looks like this:
as aliasname value
For example
as mycmd r eax When you now run mycmd, it will expand to (and execute) r eax
You can see all configured aliases with al. You can delete an alias with ad name. And you can force-overwrite an alias with as /x name value
You can use aliases in other commands:
as myaddr 00401000 db ${myaddr}
${alias} is the safe/explicit form. Bare alias also works in many cases, but wrapping the alias in ${} avoids ambiguity.
Aliases take precedence over built-in commands! That means you can actually override existing windbg commands using aliases, and thus breaking things) When WinDBG resolves a command, it will perform alias expension first, interpreting the resulting text. If you were trying to run a command, WinDBG will then run the resulting text as a command.
If you broke something, simply delete the alias by running ad against the alias name.
Obviously it's a good practice to avoid using alias names that collide with important built-in commands, including x, r, bp, dt, dp, u, k, etc
Also, aliases are token-based, not substring based. If you accidentally create an alias and override the d, it only breaks the d command, but not the variations that begin with d, such as dp, db, etc.
Aliases get expanded:
In general, be careful with quotes:
as test ".printf \"corelan\"" test
expands to
.printf "corelan"
(You get the text, not the command)
While
as test .printf "corelan" test
corelan
(Now it becomes the command and executes)
You can also specify multiple commands to be executed:
as showinfo .echo ANALYSIS; u $ip L 1; kb; dps @esp L 8
Practical tip: Define a set of aliases, store them in a script and run the script when the debugger launches (-c) Maybe you have a set of scripts you'd like to run on a regular basis. You could create aliases for them, making your life a lot easier.
We can't really pass arguments to aliases. But if the "variable" component is whatever needs to be "added" to a command, then you can create the alias for the static part of the command, and then anything that you add to it, will be passed to the command. After all, an alias is just substituting stuff.
WinDBG has 2 expression evaluators:
When you type something like poi(@$t0) or @$t0 + 0x150, or using ? to do some quick math, you're using the MASM evaluator. It's mostly address and register focused. It allows you to dereference pointers via poi(), and is arithmetic friendly... On the flipside, it does not have understanding of C structures and is loosely typed.
MASM is good for quick math, pointer chasing, low level memory work.
MASM is the default, but you can also explicitly force MASM by starting your code with @@masm(...)
You can invoke the C++ evaluator using @@c++(...)
It understands types, supports casts, ->, &. It uses symbol/type information, and is a safe and clean way to access structures and their fields.
C++ is great for structure access, offsets, readability of your code.
Let's look at a few examples that - technically - combines both of WinDBG's expression evaluators.
In this first example, I'll create a script that iterates through a list and accesses elements in various structures in such a way that the code is readable, short and doesn't make assumptions about offsets, positions or architecture.
Let's say we want to make a list of all heaps, print if they are NT style or Segment style, and - for the NT heaps -print the encoding key. The high-level approach would look like this:
It's worth noting that the ProcessHeaps field primarily lists NT heaps. Segment heaps may not all appear here, depending on the process and OS version.
Instead of hardcoding positions and offsets in PEB, Heap Headers etc, we're going to use corresponding structs and field names, provided by the symbols in ntdll. For instance, the Signature field in the NT Heap (Windows 11) sits at offset 0x60, and for a Segment heap it's offset 0x8. (In fact, at offset 0x8 in the NT Heap, we find a SegmentSignature field, which is not the same thing as the Signature field. It does not have the same value at the Signature field.) What I'm trying to say is that hardcoding offsets may not be the more reliable technique going forward.
The 2 major datastructures we're going to access are:
Plan of attack:
Check out the full script corelan_heap_encoding.txt from the Github repository Open the "debugging", "scripts", "windbg" folder.
Example (against MS Edge):
0:013>$$>< g:\blogposts\debugging\scripts\windbg\corelan_heap_encoding.txt Idx HeapAddress Type EncEnabled EncodeFlagMask EncodingRaw --- ----------------- -------- ---------- ----------------- ---------------------------------- 0 0000025c92010000 Segment n/a n/a n/a 1 0000025c91f30000 Segment n/a n/a n/a 2 0000025c91f40000 NT yes 0x00100000 0000000000000000 000087ec435545a8 3 0000025c921c0000 Segment n/a n/a n/a
Let's look at a second example.
Let's enumerate all NT heaps in the process, determine the number of segments for each heap, and print all segments (start & end address). We'll also enumerate the VirtualAllocdBlocks, print the number and then print each VA Block: addresses, commit size and reserve size
This script needs to access a few components:
From PEB:
Note: you can run dt _PEB to see the structure prototype, showing the type for each field:
+0x090 ProcessHeaps : Ptr32 Ptr32 Void +0x088 NumberOfHeaps : Uint4B
We can consult the heap header structure with dt _HEAP to find the 3 fields we need:
0:002> dt _HEAP ntdll!_HEAP ... +0x060 Signature : Uint4B ... +0x09c VirtualAllocdBlocks : _LIST_ENTRY +0x0a4 SegmentList : _LIST_ENTRY ...
For each heap in the ProcessHeaps list:
For each Segment:
We can get the Base and the number of pages by reading that from the Segment header.
0:002> dt _HEAP_SEGMENT ntdll!_HEAP_SEGMENT +0x000 Entry : _HEAP_ENTRY +0x008 SegmentSignature : Uint4B +0x00c SegmentFlags : Uint4B +0x010 SegmentListEntry : _LIST_ENTRY +0x018 Heap : Ptr32 _HEAP +0x01c BaseAddress : Ptr32 Void +0x020 NumberOfPages : Uint4B +0x024 FirstEntry : Ptr32 _HEAP_ENTRY +0x028 LastValidEntry : Ptr32 _HEAP_ENTRY +0x02c NumberOfUnCommittedPages : Uint4B +0x030 NumberOfUnCommittedRanges : Uint4B +0x034 SegmentAllocatorBackTraceIndex : Uint2B +0x036 Reserved : Uint2B +0x038 UCRSegmentList : _LIST_ENTRY
Suppose XXXXX is the address of the Segment, then we can get the needed info by accessing the following structure fields:
@@c++(((ntdll!_HEAP_SEGMENT*)XXXXX)->BaseAddress) and @@c++(((ntdll!_HEAP_SEGMENT*)XXXXX)->NumberOfPages) The end address is just the BaseAddress + (NumberOfPages x 0x1000)
For each VirtualAllocdBlock:
If you have a bit of experience with older Windows systems, then perhaps you remember that there used to be problems with the _HEAP_VIRTUAL_ALLOC_ENTRY symbol/structure.
So while we can find the ListHead of the VirtualAllocdBlocksList in the Heap Header, we can't use the _HEAP_VIRTUAL_ALLOC_ENTRY structure on older Windows versions. Fortunately, the VirtualAllocBlocksList is just a simple doubly-linked list. From the ListHead, we can just walk through the entire list. The CommitSize and ReserveSize are not encoded and sit at offset 0x10 and 0x14 respectively from the start of the VirtualALlocdBlock header on 32bit processes, just right before its regular Chunk header. (These offsets are correct for many versions, but may vary. When possible, rely on symbols instead of hardcoding.)
On newer Windows versions, you can see the offsets:
32bit:
0:002> dt _HEAP_VIRTUAL_ALLOC_ENTRY ntdll!_HEAP_VIRTUAL_ALLOC_ENTRY +0x000 Entry : _LIST_ENTRY +0x008 ExtraStuff : _HEAP_ENTRY_EXTRA +0x010 CommitSize : Uint4B +0x014 ReserveSize : Uint4B +0x018 BusyBlock : _HEAP_ENTRY
64bit:
0:019> dt _HEAP_VIRTUAL_ALLOC_ENTRY ntdll!_HEAP_VIRTUAL_ALLOC_ENTRY +0x000 Entry : _LIST_ENTRY +0x010 ExtraStuff : _HEAP_ENTRY_EXTRA +0x020 CommitSize : Uint8B +0x028 ReserveSize : Uint8B +0x030 BusyBlock : _HEAP_ENTRY
It might all look a bit complicated at first, but once you understand what datastructures you need to access and what fields are at your disposal, it's actually not that difficult.
You can find the script in the Github repository, as corelan_heap_seg_va.txt.
I'll let you take a look at the corelan_modules.txt script by yourself. Try to figure out what datastructure it uses and how it accesses them.
Hints: You can access the list of loaded modules by accessing datastructures in the PEB For each module, we can access its PE Header at specific offsets
Enjoy!
In previous chapter, I started using script files aka Command Files.
There are a few ways to tell WinDBG to open a file from disk and run the commands inside:
I would recommend using $$>< Filename. It provides predictable execution and avoids alias expansion issues.
You can still use $$ to document your code, but keep in mind that comments are terminated by ;, so they behave as inline comments rather than true line-based comments.
Multiline constructs may be a bit fragile with WinDBG scripts, especially .for, .if, alias handling, etc. Try, as much as possible, to keep things on one line.
If you'd like to add comments, you can also use * at the start of a line. This is often more reliable for full-line comments. Test and see what mode works best for you and your script.
If you're a bit familiar with running mona.py on WinDBG, you most likely know that I have been using the PyKD extension and a library called windbglib to make mona.py work.
Pykd is a module for the CPython interpreter. Pykd itself is written in C++ and uses Boost.Python to export functions and classes to Python. PyKD is a WinDBG extension that provides the ability to run Python scripts, and interacting with the process and debugger using an API. We can load the pykd library in our Python script, and that combination allows me to interact with the process that is being debugged.
The original version of PyKD is no longer maintained. pip still has versions, but only up to (and including) Python 3.9. There are some repo forks on Github as well. The latest version I could find was pykd 0.3.4.15. It not perfect, but I can live with that.
If you figure out how to build pykd against newer Python versions, let me know!
My original installation procedure to make mona.py work inside WinDBG was based on the use of a relatively old version of PyKD (2.0.29), a 32bit version of Python2.7.14 or higher, the windbglib.py library and a 32bit debugging environment. The pre-compiled PyKD version that I used, is called pykd.pyd. (Don't be fooled by the file extension, it's really just a .dll.) The windbglib github repo contains a copy of that binary (v0.2.0.29). That's the version I have been using for years.
Going forward, and in line with my ambition to make mona.py compatible with Python3 and do more useful things in 64bit processes as well (stay tuned - things are brewing), I'll explain today how we can use a more up-to-date approach to using PyKD in modern WinDBG versions.
The goal is to set up an environment that allows us to run Python3 code, and the most recent version of PyKD, in both a 32Bit and 64bit debugger environment.
First things first.
If you have been using mona.py inside WinDBG and your system is running an old version of pykd.kyd, it's a good idea to clean up first.
Remove all copies of pykd.pyd from your system, more specifically from the following folders and subfolders:
You don't need to delete mona.py or windbglib.py. And if you still use Immunity Debugger, feel free to keep its copy of mona.py in place as well. Just make sure all pykd.pyd files are gone. Please check your entire hard drive if needed, just to avoid that something gets picked up from somewhere later on.
Finally, make sure you do NOT have any Python versions installed via Microsoft Store, nor via the Python Install Manager.
On recent Windows systems, we can use winget to see if everything looks good. Open a command prompt and run winget list python
The output should only list Python versions that have source winget
Good. Let's build up a new environment from scratch now.
What follows is a detailed step-by-step procedure on how to set up your system to run a modern version of PyKD in WinDBG/WinDBGX, using the pykd-ext bootstrapper.
If you prefer to use an automated installer, feel free to use the CorelanPyKDInstall.ps1 script. You can grab a copy of the script from my CorelanTraining Github repository. In order to keep your system clean, the script will remove existing pykd.pyd files inside WinDBG folders, before installing the new pykd components.
Get yourself an administrator powershell prompt. Run Set-ExecutionPolicy RemoteSigned and press "Y" when prompted. Then, run ./CorelanPyKDInstall.ps1. If powershell still refuses to run the script, try Set-ExecutionPolicy Unrestricted and then try again.
The script requires winget, so make sure you're using a recent / up-to-date version of Windows.
If everything went well, you can now skip straight to the section on using pykd. (Don't worry if you're seeing warnings about the VC 2010 Runtime - the script will report that the installation has failed if the packages were already installed)
If you prefer to get your hands dirty and do the heavy lifting all by yourself (or if you are using a system without winget, these are the steps:
We'll need Python 3.9.13 specifically, and we're going to install both 32bit and 64bit versions. If you have not installed those versions yet, download the standalone installer from the Python.org website:
Python 3.9.13 32bit
Launch the 32Bit installer Click "Install NOW" Leave "Install Launcher for all users" enabled Choose a Default installation. do NOT click the "Add Python 3.x to PATH"
Python 3.9.13 64bit
Launch the 64 installer Click "Install NOW" Again, Default installation. do NOT click the "Add Python 3.x to PATH"
Open an admin command prompt and run py --list. You should see both 3.9 versions in the list. If you already had other versions installed (like I did), you'll might see them in the list as well.
C:\>py --list Installed Pythons found by py Launcher for Windows -3.9-64 * -3.9-32 -2.7-32
Also, please verify once again that you do not have any Python versions installed other than the ones with source winget. You'll see a Python Launcher as well. That's what we need.
If you don't have winget, check the installed Apps and confirm that the Python versions are the ones you have installed manually. When in doubt, remove Python versions from your apps and reinstall the ones you need by running the standalone installers again.
Next, check for updates to pip for both Python 3.9 versions. As we have the ability to invoke a specific python version using the Python launcher py, we can run the following 2 commands to update the corresponding pip versions:
py -3.9-32 -m pip install --upgrade pip py -3.9-64 -m pip install --upgrade pip
The Py launcher will default to the most recent version of Python that is installed. As explained, you can use py -version to run a specific version of Python. You can also overrule the automatic default version selection by either creating an .ini file or using an environment variable. You can find more information on customizing the python launcher here.
We can now install the PyKD library for both Python 3.9.13 (32 and 64) versions.
32bit: From an administrator command prompt:
py -3.9-32 -m pip install pykd
This will install pykd inside %LOCALAPPDATA\Programs\Python\Python39-32\Lib\site-packages\pykd (on my system, that path becomes C:\Users\corel\AppData\Local\Programs\Python\Python39-32\Lib\site-packages\pykd>)
From that folder, copy msdia140.dll into C:\Program Files (x86)\Common Files\Microsoft Shared\VC (create that folder first if needed) Then, register the dll:
C:\>cd "C:\Program Files (x86)\Common Files\Microsoft Shared\VC" C:\Program Files (x86)\Common Files\Microsoft Shared\VC>regsvr32 msdia140.dll
64bit: From an administrator command prompt:
py -3.9-64 -m pip install pykd
In this case, we need to register msdia120.dll. You'll find a copy of the file already inside C:\Program Files (x86)\Windows Kits\10\App Certification Kit\
From the admin command prompt, simply run
regsvr32 "C:\Program Files (x86)\Windows Kits\10\App Certification Kit\msdia120.dll"
If, at any time, you get errors about msdia100.dll - you should be able to get a copy by installing the MS VC++ Runtime 2010. (download from here)
Good. At this point, you should already be able to open a python prompt and import the pykd library:
C:\>py -3.9 Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import pykd >>> pykd.dprintln("Hello world") Hello world >>> quit()
(You should be able to do the same thing by opening a 64bit Python prompt as well via py -3.9-64
If the pip3 installations went well earlier on, we should already have a pykd.pyd file inside the Lib\site-packages\pykd folder(s) of both Python3.9 versions. That's all we need.
We'll now go ahead and install the PyKD-ext bootstrapper.
The pykd-ext bootstrapper is the WinDBG extension that will run a specific (and selectable) Python version, and allows us to load the corresponding pykd.pyd version. It glues all components together. You can find the original repo here (last updated 5 years ago)
Strictly speaking, you may still be able to load pykd.pyd directly, just like what we did with mona v2. With recent versions of pykd, you may notice however that the arguments you're passing on the command line, may not get passed on correctly to your python script.
The solution is to use the pykd-ext bootstrapper.
You can find pre-compiled versions of pykd-ext, for both x86 and x64, from this github repository. (thank you @apl3b)
Both archives have a pykd.dll file inside the "Release" folder. The idea is to put the x86 pykd.dll inside %LOCALAPPDATA%\DBG\EngineExtensions32, and the x64 pykd.dll inside %LOCALAPPDATA%\DBG\EngineExtensions
Believe it or not, that should do the trick.
In WinDBG Classic, I can now run .load pykd or !load pykd.
0:000> .load pykd
(Note, unlike what we used to do with mona.py v2, I am not telling WinDBG to load pykd.pyd directly. If you have removed the old pykd.pyd file from your WinDBG Program Fodler, running the .load pykd.pyd command should actually fail. In fact, the only versions of pykd.pyd we should have, are the ones stored inside the Python Lib\site-packages\pykd folders.
We're actually going to invoke the pykd-ext bootstrapper instead. (I.e. the pykd.dll file that we have placed inside the %LOCALAPPDATA%\DBG\EngineExtensions and %LOCALAPPDATA%\DBG\EngineExtensions32 folders). As it is a dll, we don't have to specify the .dll extension.
We can now run a few interesting !pykd commands:
0:000> !pykd.help usage: !help print this text !info list installed python interpreters !select version change default version of a python interpreter !py [version] [options] [file] run python script or REPL Version: -2 : use Python2 -2.x : use Python2.x -3 : use Python3 -3.x : use Python3.x Options: -g --global : run code in the common namespace -l --local : run code in the isolated namespace -m --module : run module as the __main__ module ( see the python command line option -m ) command samples: "!py" : run REPL "!py --local" : run REPL in the isolated namespace "!py -g script.py 10 "string"" : run a script file with an argument in the commom namespace "!py -m module_name" : run a named module as the __main__ !pip [version] [args] run pip package manager Version: -2 : use Python2 -2.x : use Python2.x -3 : use Python3 -3.x : use Python3.x pip command samples: "pip list" : show all installed packagies "pip install pykd" : install pykd "pip install --upgrade pykd" : upgrade pykd to the latest version "pip show pykd" : show info about pykd package
0:000> !pykd.info pykd bootstrapper version: 2.0.0.24 Installed python: Version: Status: Image: ------------------------------------------------------------------------------ 2.7 x86-32 Unloaded C:\Python27\python27.dll * 3.9 x86-32 Loaded C:\Users\corel\AppData\Local\Programs\Python\Python39-32\python39.dll
As you can see in the output above, we now have the option to change python version, install packages, etc. You can also see in the !pykd.info ouput that it is using the Python3 version we intended to use.
You can now run !py to get an interactive shell
0:000> !py Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:24:45) [MSC v.1929 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>>
The WinDBG status field (on the left side, before the command line input) says Input>:
Similar to what we've done previously at the Operating System command prompt, we can now enter python commands:
>>> print("hello world\n") hello world >>>
We could try to load the pykd extension and use its API:
>>> import pykd >>> print(pykd.dbgCommand("r")) eax=00000000 ebx=00000000 ecx=41760000 edx=00000000 esi=008967a0 edi=0022b000 eip=77498218 esp=0064fa54 ebp=0064fa80 iopl=0 nv up ei pl zr na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 ntdll!LdrpDoDebuggerBreak+0x2b: 77498218 cc int 3 >>>
Type quit() to exit the interactive mode.
And of course, the goal is to run full-blown python scripts (such as mona.py)
Let's create a basic script mini.py:
import pykd print("hello world\n")
Save it inside the WinDBG application folder (C:\Program Files (x86)\Windows Kits\10\Debuggers\x86). Open WinDBG, attach it to a process (or open an executable) and run the following commands at the WinDBG Command Prompt:
!load pykd !py mini hello world
As indicated above, you have the option to select a specific Python version. If you have Python2 installed and you insist on running that version, you could add the -2 switch:
0:000> !py -2 Python 2.7.18 (v2.7.18:8d21aa21f2, Apr 20 2020, 13:19:08) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>>
(this works with scripts too, of course. In other words, if you'd like to run Mona 2 using pykd-ext, you'll have to !load pykd first, and then you can run !py -2 mona.py You may still have to install the pykd library first. From an admin command prompt:
c: cd\Python27 python -m pip install --upgrade pip python -m pip install pykd
Please note that !py will default to using the most recent Python version installed. For instance, on one of my lab machines, I am running multiple versions of Python 2 and 3:
0:000> !pykd.info pykd bootstrapper version: 2.0.0.24 Installed python: Version: Status: Image: ------------------------------------------------------------------------------ 2.7 x86-32 Unloaded C:\Python27\python27.dll 3.9 x86-32 Unloaded C:\Users\corelan\AppData\Local\Programs\Python\Python39-32\python39.dll 3.11 x86-32 Unloaded C:\Users\corelan\AppData\Local\Programs\Python\Python311-32\python311.dll * 3.13 x86-32 Loaded C:\Users\corelan\AppData\Local\Programs\Python\Python313-32\python313.dll
As PyKD is only compatible up to 3.9, I either have to run !py -3.9 every single time, or I could also create an alias as py !py -3.9
A windbg launcher script (for instance w.bat) may look like this:
set "WINDBG_CMD=windbg.exe -hd -c '!load pykd; as py !py -3.9' " %WINDBG_CMD% %*
A quick note on having multiple python versions installed on the same system. In my experience, it might be a good idea to consider removing all references to python-related folders from your system / user PATH environment variable and to always run Python scripts using py instead of python or python3. If you do need python to work, then add the path to the specific Python version you'd like to invoke to the path, but remove all of the others.
If you ever encounter the scenario where you run !load pykd in WinDBG, and WinDBG dies without any warning or error when you try to run the !py command, this may be caused by a mismatch between the python version you're running, and the place where it picks up its libraries.
You could create simple Windbg launcher batch files for each Python/windbg combination. For example, if I want to run Python2.7.18 in WinDBG x86, the script looks like this:
@echo off REM ========================================== REM Run WinDBG with optional arguments REM Corelan Stack / Heap Training REM www.corelan-training.com REM ========================================== set ORIGPATH=%PATH% set PATH=C:\Python27;%PATH% set PYTHONHOME=C:\Python27 set PYTHONPATH=C:\Python27\Lib set "WINDBG_CMD=windbg.exe -hd -c '!load pykd; as !mona !py -2 mona.py'" %WINDBG_CMD% %* set PATH=%ORIGPATH% SET PYTHONHOME= SET PYTHONPATH=
Launching WinDBG x86 with, for example, Python3.8:
@echo off REM ========================================== REM Run WinDBG with optional arguments REM Corelan Stack / Heap Training REM www.corelan-training.com REM ========================================== set ORIGPATH=%PATH% set PATH=%LOCALAPPDATA%\Programs\Python\Python38-32;%PATH% set PYTHONHOME=%LOCALAPPDATA%\Programs\Python\Python38-32 set PYTHONPATH=%LOCALAPPDATA%\Programs\Python\Python38-32\Lib REM Define base command (adjust path to wew file as needed) set "WINDBG_CMD=windbg.exe -hd -c '!load pykd; as !mona !py -3 mona.py' " %WINDBG_CMD% %* set PATH=%ORIGPATH% set PYTHONHOME= set PYTHONPATH=
We're basically setting up an environment with the right things in the right places. Of course you can now do this for any python version. Just make sure you're running a python version that has the same architecture as the debugger, and that you're loading the corresponding pykd.dll file as well.
With this setup, you can simply run !mona at the WinDBG command prompt.
You can get pykd-ext / pykd to work on Windows 7. Make sure it has at least SP1 (ideally full up to date). Begin by performing a default installation of Python 2.7.18. Then, download a copy of this installer script and run it from an administrator command prompt.
This will install 32bit and 64 bit Python versions (2.7.18 and 3.9), pykd and pykd-ext. It will also put mona.py and windbglib.py in place. The script will also install .Net Framework 4.8 and WinDBG, and it will create .bat files inside your windbg x86 and x64 folders:
The .bat allow you to run windbg, using the -c switch, it will already load pykd for you, as well as create an alias to run mona.
The installation above allows us to simply load pykd in WinDBGX as well. If you prefer to have only one copy of your scripts, you can either store them in the WinDBG Classic Program Folder and then simply run windbgx.exe from a Command Prompt that is inside that folder. That will allow you to just run the exact same commands, without having to specify a path.
!load pykd !py mini
Of course, you can always specify a path, put your files in a central location, and perhaps even create an alias. For example:
as myscript !py c:\scripts\myscript.py
Although the PyKD project is no longer maintained by its original author, that doesn't mean it's no longer useful. Fortunately, the Internet Archive's Wayback Machine has a copy of the original documentation (user manual and API reference). It's in Russian, but you can always translate the content if needed.
The goal of this post is not to to provide a detailed manual on how to write code that uses pykd. I just want to provide some ideas and examples that will hopefully inspire you to get started.
For starters, there's obviously mona.py and windbglib.py, but you can find some other resources as well, including:
Additionally, I have included some basic scripts in the debugging / scripts / pykd folder of my blogposts Github repository.
Some basic examples:
We can find the list of loaded modules in the PEB. In WinDBG, we have the ability to run a "dump type" command to get the contents of the peb: dt _PEB @$peb. (If you're new to this, please check my previous post on WinDBG for more info on typed dump/display). For instance (in a 64bit process):
0:000> dt _PEB @$peb ntdll!_PEB +0x000 InheritedAddressSpace : 0 '' +0x001 ReadImageFileExecOptions : 0 '' +0x002 BeingDebugged : 0x1 '' +0x003 BitField : 0x4 '' +0x003 ImageUsesLargePages : 0y0 +0x003 IsProtectedProcess : 0y0 +0x003 IsImageDynamicallyRelocated : 0y1 +0x003 SkipPatchingUser32Forwarders : 0y0 +0x003 IsPackagedProcess : 0y0 +0x003 IsAppContainer : 0y0 +0x003 IsProtectedProcessLight : 0y0 +0x003 IsLongPathAwareProcess : 0y0 +0x004 Padding0 : [4] "" +0x008 Mutant : 0xffffffff`ffffffff Void +0x010 ImageBaseAddress : 0x00007ff6`031a0000 Void +0x018 Ldr : 0x00007ffe`fd3b2920 _PEB_LDR_DATA
In PyKD, we're going to do something similar, using the pykd.typedVar() function. pykd has a function getCurrentProcess(), which returns the address of the PEB.
There is a function getCurrentProcessId() as well, but that one does not seem to return the PID of the debuggee unfortunately). In fact, maybe I was missing something, but turns out it takes a bit of an effort to get the PID. Anyway, I included a small routine to get the current PID in the script, in case you're curious
Back to the use case.
This pykd statement provides access to the PEB:
peb = pykd.typedVar("ntdll!_PEB", pykd.getCurrentProcess())
Comparing dt with typedVar(), we can clearly see similarities. They both take a symbol name and an address.
This statement allows us to access the peb object and its fields/lists.
Let's say we're interested in listing the loaded modules and their start addresses.
The peb has a Ldr field, which contains the address of a _PEB_LDR_DATA structure. (I marked it in a different color in the output above) That loader data structure contains several doubly linked list heads used to track loaded modules, more specifically:
More info here: http://undocumented.ntinternals.net/index.html?page=UserMode%2FStructures%2FPEB_LDR_DATA.html
Each list contains the same module entries, but linked through different LIST_ENTRY members inside each _LDR_DATA_TABLE_ENTRY, so the order differs depending on which list you walk.
PEB └──> PEB->Ldr └──> PEB_LDR_DATA └──> One of the LIST_ENTRY heads: - InLoadOrderModuleList - InMemoryOrderModuleList - InInitializationOrderModuleList └──> walk doubly linked list └──> each node = LDR_DATA_TABLE_ENTRY
The idea is to start from a list head and follow the Flink pointers from one entry to the next until you reach the list head again.
With PyKD, that's as easy as doing this:
moduleLst = pykd.typedVarList( peb.Ldr.deref().InLoadOrderModuleList, "ntdll!_LDR_DATA_TABLE_ENTRY", "InLoadOrderLinks.Flink" )
This dereferences PEB.Ldr to obtain the PEB_LDR_DATA structure, takes its InLoadOrderModuleList list head, and then asks PyKD to walk that list by treating each node as an ntdll!_LDR_DATA_TABLE_ENTRY linked through its InLoadOrderLinks field. (This assumes the linked list is intact; corrupted lists may cause incomplete or invalid traversal.)
The pykd-modules.py will enumerate all 3 lists and print the output. Based on my experience, the InInitializationOrderLinks technique may not return all loaded modules. Use with caution.
Of course, you can obtain all module properties by parsing header information and reading values from memory. I'll talk about how to read from memory in a moment. PyKD, however, has a module class as well, which already does a lot of the heavy lifting for you. Likewise, there is already functionality in pykd that will enumerate through the modules for you. (pykd.getModulesList())
Let's look at script pykd-module-obj.py to see what that looks like.
pykd.getModulesList() returns a list of module objects. If you would like to get a module object for a certain file, you can create an instance of the module class using the module's name (which is not the same thing as the filename), or an already existing object (such as one that was returned via pykd.getModulesList()
You may notice that pykd does not seem to always return the full path for a certain file. That's why I usually get the list of modules from the PEB myself (including all of its properties), and use my own module-type classes.
(The likely evolution for mona.py is to no longer rely on pykd.module)
The second use-case covers access to registers.
pykd offers a simple and straightforward way to registers: pykd.reg(regname). (Please note that pykd expects you to specify the register name in lowercase.)
Changing a register value can be done with the setRegs function: pykd.setReg(registername, newvalue)
The pykd-regs.py script shows how to use both of these mechanisms.
In its purest form, reading and writing bytes can be done via
The pykd-memory.py script shows how to use these 2 commands.
PyKD has a few variations as well. Reading strings (ansi or wide), for instance, can be done using the following functions:
Those 2 will read memory until they reach the corresponding terminator. (single null byte for a string, double null byte for a wide string). If you're not really accessing a string that is properly terminated, you may be causing an uncontrolled read, leading to some sort of read access violation.
You can always use loadChars() and loadWChars() as well. These 2 functions take an address and the number of characters to read. That way you can avoid reading more than what you intended to.
Combining a few concepts, we could build a little routine that reads a string from memory:
def readString(self,location): if pykd.isValid(location): try: return pykd.loadCStr(location) except pykd.MemoryException: return pykd.loadChars(location, 0x100) except: return "" else: return ""
The next technique I would like to demonstrate today, is executing a WinDBG command and parsing the output.
It might feel a bit like cheating - after all, PyKD has a lot of features. But why reinvent the wheel if you can just run a command and parse the output, right ? 🙂 It comes with a performance hit - you're causing some I/O that wouldn't be there if you're just accessing memory directly. Additionally, you're relying on WinDBG commands to never change. But ok, it's a convenient way to blend the best of both worlds.
This is how it's done:
cmd2run = "u eip L 0x20" output = pykd.dbgCommand(cmd2run)
You can now split the output on newline '\n' and access the output of the WinDBG command for parsing or display.
Assembling instructions to bytecode, and disassembling bytecode to instructions is possible with pykd... but you'll see it's a bit cumbersome, as it comes with a bit of collateral damage.
To assemble (i.e. convert an assembly instruction into the corresponding opcode), you'll have to pick a writeable "anchor" address first. PyKD will 'assemble' the instruction to that location.
d = pykd.disasm(address) asm_result = d.asm(instr)
Unfortunately, that means pykd has now overwritten a few bytes of memory at the anchor address. Be careful when using this in a live target, as modifying instructions may affect execution if not restored correctly. That's why I'll have my script read 20 bytes from the anchor location first, then let pykd do the assembling, and then we have to restore the original bytes. (20 bytes may be too much, but I want to be sure I'm reading enough bytes to accomodate any instruction sequence lenght - usually only up to 16)
Additionally, and interestingly enough, the .asm() call does not return the opcode. It just positions itself to the next instruction. In other words, the output of .asm() is not that relevant if you're just trying to get the opcode for instruction you're trying to assemble. In order to get the opcode, you have to access the memory directly at the anchor address.
Of course, as you don't know how long the opcode actually is, it's going to be challenging to decide how many bytes to read. We can't really rely on "what has changed" either compared to the original bytes, because if the new instruction matches with the original one, we still won't have the length of the opcode.
Luckily, the .disasm() routine allows us to get the instruction at a given address. The output contains the opcode, allowing us to parse & extract it
If we want to disassemble, we have to do something similar:
Take a look at the pykd-asm-disam.py script, which implements both concepts.
That’s it for part 1.
We’ve only scratched the surface of what’s possible when you stop treating WinDBG as a passive tool and start using it as something you can shape, script, and control. From startup automation to event-driven breakpoints and Python integration, you now have the building blocks to create your own debugging workflows.
In part 2, we’ll go further down the rabbit hole: expect a closer look at the Data Model, NatVis, JavaScript providers, and extensions, and how they can take automation and introspection to a whole new level.
Also... if you’ve been following along closely, you probably noticed a few hints already: there’s something cooking around mona.py 👀 Stay tuned — some long-awaited updates are on the horizon.
If you got value from this post, consider subscribing so you don’t miss what’s coming next. We plan on dropping new content regularly, and subscribers are always the first to know.
And of course — feel free to follow me on social media to stay in the loop and see what I’m working on behind the scenes.
Thanks for reading 🙏
© Corelan Consulting BV. All rights reserved. The contents of this page may not be reproduced, redistributed, or republished, in whole or in part, for commercial or non-commercial purposes without prior written permission from Corelan Consulting bv. See our Terms of Use & Privacy Policy (https://www.corelan.be/index.php/legal) for more details.
Subscribe to get the latest posts sent to your email.
Type your email…
Subscribe
Peter Van Eeckhoutte is the founder of Corelan and a globally recognized expert in exploit development and vulnerability research. With over two decades in IT security, he built Corelan into a respected platform for deep technical research, hands-on training, and knowledge sharing. Known for his influential exploit development tutorials, tools, and real-world training, Peter combines a strong research mindset with a passion for education—helping security professionals understand not just how exploits work, but why.
Tags:
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Notify me of new posts by email.
Post Comment
Δ
This site uses Akismet to reduce spam. Learn how your comment data is processed.