HITB2014AMS – Day 1 – Harder, Better, Faster Fuzzer: Advances in BlackBox Evolutionary Fuzzing | Corelan Cybersecurity ResearchCorelan Cybersecurity Research

HITB2014AMS – Day 1 – Harder, Better, Faster Fuzzer: Advances in BlackBox Evolutionary Fuzzing

Published May 29, 2014 |

Vulnerability Hunting

Active security testing, Fabien explains, is the process of generating input which travel in the application, hit a sink and violate a property. It applies to all kinds of vulnerabilities, not just limited to buffer overflows or memory corruption bugs. Blackbox and whitebox/greybox testing (both static and dynamic) are ways to perform security testing.

DSC 0556

Fuzzing is an active testing technique which automates the creation and evaluation of numerous malicious inputs. It’s based on “knowledge” (random, grammar, model) and was mostly used to find memory corruption bugs in the past. A genetic algorithm is used to guide the fuzzing.

DSC 0559

Evolutionary PDF Fuzzing

Fabien introduced the people who joined his efforts to look at PDF blackbox fuzzing and then explains that the inital approach he took is to take an initial PDF file (download from the internet), and take an ordered list of seed, anomaly operators and parameters to create mutated versions of the original PDF file. Fabien explained that you can improve the crash triage classification by using Dr Memory or AddressSanitizer. To determine how close the “individual” is to find a vuln, he used a classification scale based on Depth of the production tree, used grammar production rules, distinct anomaly operators, interpreter warnings, the fact that the file was rejected by the app, duration to load and singularity.

The fuzzing operators applied are based on anti-random testing, where the source file, seed, 1st anomaly operator and it’s parameter, etc are used as dimensions. The idea is to create an individual (file) that are as much different as possible as the other/previous files. Additionally, they used a combination of mutation and crossover fuzzing operators.

This approach is implemented in “ShiftMonkey” (Ruby + Python) and uses te Origami (Ruby) framework to create PDF files. The anomaly generator used is “Radamsa” (which has 26 operators). Comparing the results with other tools, the approach to focus on singularity (= dimension of fitness) achieved a higher level of distinct vulnerabilities using ShiftMonkey.

Another experiment that was performed is based on Basic Blocks Coverage. They instrumented the tested executable to see which basic blocks were used and used that as input to improve ShiftMonkey so it would produce cases that cover more basic blocks.

Some of the remaining questions are :

can we improve our ability to find “complex” vulnerabilities (where the metric is based on the depth of the stack trace and weighted by the rarity of the targeted basic blocks).
can we find a combination of anomaly operators, of greater efficiency, when fuzzing a given format. If you apply too many anomaly operators, you often violate too many constraints, which means the files would be rejected by the application.

Fabien continues by listing some related work on evolutionary fuzzing by Jared De Mott, Budynek (et al), B. Nagy, Noreen et al.

DSC 0561

Evolutionary XSS Fuzzing

The idea is to find Type-1 (Reflected) and Type-2 (Stored) XSS vulnerabilities in websites, in a blackbox approach. BlackBox web scanners use fuzzing to find XSS vulnerabilities, but there are some problems:

Where to fuzz? There is no model, low quality. Fabien explains that they attempted to overcome the issue using model inference and control+taint flows
How to generate input? The set of malicious input may be limited. A solution might be to define an attack grammer.
How to find/discover the XSS? There may not be a precise test verdict (hard to detect if something is an XSS or not) + sensitive to sanitizers (code that will filter your input). A possible solution would be to use precise taint inference and use a genetic algorithm.

Fabien continues by explaining the concept of XSS Control & Taint flows (and transitions), which allowed them to define where to start fuzzing.

The approach to detect XSS vulnerabilities consists of 2 steps:

Reverse Engineering (LigRE) to detect where an attacker can obtain reflection. This “RE” phase is basically based on crawling the website and analysing the outputs, creating a “model” of the application.
Use KameleonFuzz to check if the attacker can use that reflection in an malicious way.

The better you control the model, the better the fuzzing process weill be, Fabien continues. Keeping track of state changes between 2 pages (GET and POST), and identifying state differences (2 GETs of the same page, but with different output) is important too, as well as identifying what exactly has caused the state to change. The idea is to keep track of the requests vs state changes and indicate changes by coloring nodes (pages) in a different color. Next, the LigRE tool can identify where state changes occur and what parameter was used/can be used; and then save the reflections. When all interesting points of attack have been identified, KameleonFuzz can be used to create & mutate fuzzed values, using a set of attack grammer, in order to find XSS bugs.

It’s important to detect XSS without causing too many false positives. To improve the test verdict quality, to determine how “close” a finding is to a bug, they used a combination of Taint-aware parse trees and Tree patterns. In other words, they break down the page into a DOM tree and nodes, and search for the inputs (tree patterns). If a match is found, an XSS is detected. To determine how “fit” a result is or, in other words, how big the chance is that a finding is indeed a vulnerability, they used a couple of dimensions and corresponding weights. The number of tainted nodes and number of injected character classes play a big role in this process. The higher the score, the higher the fitness and thus the higher the likelihood that a finding is a true XSS vulnerability. Fabien explains that, even if the tool is not perfect, as a human you can review the potentially interesting findings and focus on the ones that “look” promising. Additionally you can edit the grammar set and find even more bugs (not only XSS, but also SQL Injection, Shell Command injection, etc).

A comparison against other XSS fuzzing frameworks (wapiti, w3af, skipfish) demonstrates that this approach appears to find more XSS bugs.

DSC 0560

He also compared the number of HTTP requests needed to find XSS bugs. Below 800 requests, w3af appears to find the largest number of bugs, but beyond that point, the use of LigRE and KameleonFuzz is a lot more effective. Testing against real targets (HITB CFP registration page, the admin page of a french DSL box, SFR webmail, mega.co.nz, etc) prove that the techniques and tools work well.

Fabien explains that it would be possible to use control & taint flow models to use it as an input to create Content Security Policies (CSP).

About the speaker

Fabien Duchene is a (soon-to-be-over) PhD candidate at LIG Lab-IMAG, University of Grenoble, France. His current research focuses on evolutionary fuzzing to improve vulnerabilities detection in black-box (not grey-box!) harness. He created the GreHack hardcore security conference. Previously, he worked at Microsoft and Sogeti-ESEC. He holds an MSc in Computer Science from the “Grande Ecole” Ensimag, France, where he created the SecurIMAG CTF team, and is now lecturing basics in fuzzing, memory corruption exploit writing, pen-testing, web security, and network security. He has also been studying at University of Queensland, Australia and Universidad Politecnica de Madrid, Spain. Fabien spoke at prestigious hacking and academic conferences: Black-Hat, IEEE WCRE, ACM Codaspy (double-blinded, 16% acceptance rate)…

Posted in Cons and Seminars | Tagged black-box, corelan-be, corelan-penetration-testing-services, drmemory-suppress-file-fuzzing, evolutionary algorithm, evolutionary-fuzzing-php, Fabien Duchene, fabien-duchne, firefox-release, fuzzer, fuzzing, heuristics, hitb2014ams, KameleonFuzz, kameleonfuzz-py, ret2libc, sgrfuztztrr, ShiftMonkey, xss-webmail-fuzzer-py

Corelan Cybersecurity Research

:: Knowledge is not an object, it's a flow ::