BlackHat Europe 2011 / Day 02

Published March 18, 2011 |

Fuzzing and Debugging Cisco IOS / Sebastian Muñiz, Alfredo Ortega

Having missed the IOActive party last night, I woke up fresh and sharp and ready for some kick-ass debugger stuff so I decided to start my second day at BlackHat Europe 2011 with attending the Cisco IOS fuzzing & debugging talk.

At the start of the presentation, Sebastian and Alfredo (both from Groundwork Technologies) provided a high-level description on what the IOS Architecture looks like, and explained that the Cisco IOS is a single binary image (huge file). In fact, the image is a compressed file that decompresses at runtime.
When it runs, all processes share the same address space, without any boundaries between them, which is an interesting fact for attackers.
The scheduler in IOS is cooperative and not preemptive (unlike most modern OSs). To compensate, some kind of watchdog can kill processes that are running for too long.
Basically, the most significant difference of IOS with modern OSs is that any process can access the memory of all other processes on the device.

When it comes down to debugging, the presenters explain that IOS has its own internal gdb server. It’s used by cisco developers/support engineers and accessible via ssh/telnet/console. Although it looks like gdb, it uses a slightly different GDB protocol . In theory, you would be able to use a regular gdb client to connect to it, but you would need to make some small modifications to make it work), which is a bit painful. (requires manual patch & recompile)
In examine mode, the debugger does not allow write operations.
In debug mode, it allows you to write memory/modify registers. (still works over telnet/ssh and can be accessed remotely). All you need is a processID to debug.
In kernel mode, all features are available but this mode can’t be accessed remotely (only serial, because it freezes the os). It does not allow you to debug a specific process. This debugging mode is a bit tricky because you may find yourself rebooting the device a lot of time.
There are no debugging symbols available, but if you use IDA pro to debug, you can use idapython and other IDA features to find interesting functions (libc etc).

Having said that, it’s clear that there are various options to perform debugging on a cisco IOS device, but it might be a bit painful.

Luckily, there is a good alternative. Sebastian and Alfredo introduced the Dynamips emulator, written by Christophe Fillot, which runs on windows/linux/mac OSX.
It is equivalent to QEMU/Bochs and implements MIPS/PowerPC architecture and Cisco hardware.
It supports 7200, 36xx, 2691, 3825, 3745, 26xx and 17xx devices, so you don’t have really have to use a physical device for fuzzing/debugging. (You will obviously need a physical device in order to be sure a given crash can actually be reproduced in real life)
Anyways, the nice thing about the emulator is that you can plant a gdb server inside Dynamips, so IOS would not even know it’s being debugged.
This technique allows any standard gdb client to connect to it, have r/w access to memory and registers, and you can set breakpoints too.
Furthermore the emulator can bridge with the network interfaces on the physical machine so you can actually perform tests over the network. The pros and cons of the Dynamips emulator look pretty much like this :

Pros	Cons
Almost complete isolation	Not 100% exact emulation
Cost effective	Not compatible with all hw models
Controlled Debugging environment	Findings still need to be verified on physical device
Bug-Hunter friendly :)	You need to get the IOS image binary file

Another advantage of debugging in VM/isolated environment is that the IOS malware won’t be able to detect that it’s being debugged.

It’s also important to understand, the presenters explain, that Cisco clearly states that you, the admin, are responsible for verifying the IOS image (and you are responsible for maintaining the chain of trust). If you decide to use a backdoored IOS image, you got to live with the consequences (and you can’t blame cisco for that :) ).
So, before uploading an ISO image to a device, you should verify it in an isolation environment.
Cisco also states that, when the image runs, you could use the “verify” command to check the md5 file validation, but the reality is that the malware may have tweaked the verify routine, so don’t rely on this feature alone.

Obviously, the fuzzing component of bug hunting is not different than fuzzing other devices or applications. The key element is that you will have to be able to debug the crash conditions, get the context (registers, stack, etc), in order to determine exploitability.

In order to demonstrate the debugging capabilities, Sebastian and Alfredo used a manually backdoored IOS image, loaded it in Dynamips, activated the gdb server and connected to it using IDA Pro.

Complex malware will not be easy to analyze, the say. You have to become familiar with the processor instruction set, and there are no debugging symbols available. Other than that, you can use the IDA features to search for strings, locate function calls, find detours, etc.

Although IOS backdoors are still a bit uncommon (because they are hard to write and requires specific skillset (powerpc, etc), the presenters expect to see more IOS malware in the future.

When it comes down to rommon debugging (which is the cisco bootloader), we can still use Dynamips. The presenters refer to a talk given by FX (on Cisco IOS Attacks & Defense). Definitely worth while checking out !

Next, Sebastian and Alfredo move onto the fuzzing part of the talk.

Typically, the requirements to set up and operate a fuzzing environment, you need to build something that will
– properly deal with exception handling
– facilitate reproducible test-cases
– provide logging
– and, ideally, allow for fuzzing against an environment that is being debugged already (which makes post-crash analysis at lot easier)

The way they have set up their environment, looks pretty much like this :
Fuzzer will attack gdb’ed Dynamips instance (with some modifications to get proper stacktrace information)
Start fuzzer and attack image running inside Dynamips
Any exception inside Dynamips is caught by gdb, which will get regs and will write the data to a log.
Finally, the image is restarted (and the fuzzer is told to wait for a minute o two before continuing.

As far as standard protocols is concerned, you *can* use existing protocol fuzzers (telnet, ftp (now removed from IOS), etc), so there might be no need to write new fuzzers for those protocols. Of course, that doesn’t stop you from writing your own if you want to.

Sebastian and Alfredo finish their really interesting presentation by providing a link to the modified gdb server code (which you can compile yourself) in order to be able to properly extract registers/trace information when a crash occurs : http://www.groundworkstech.com/projects/dynamips-gdb-mod

I really enjoyed this talk, it was well prepared, documented and presented. Overall thumbs up guys ! I am definitely going to play with Dynamips some time in the near future ! This was definitely one of the best talks I have seen at BlackHat so far.

Stuxnet Redux: Malware Attribution & Lessons Learned / Tom Parker

Although there’s not a lot of new things that can be said about Stuxnet, I still decided to attend this talk because I was curious about the “lessons learned”. At the start of his talk, Tom Parker explains that he has been working on techniques and methodologies for automatic malware analysis. He started the research a while ago, right when stuxnet hit the surface. Nevertheless, since almost everything about stuxnet has be said and done already, he emphasizes that this won’t be a stuxnet specific talk, but rather a more generic talk about the process of analysing malware in a way that might help discover who is behind a given specimen.

“If all else fail, it must be cyberwar”. Cyberwar seems to be the new buzzword, used to explain everything that has not clear origin or motive. It is hard to properly analyse/find the origin of threats, Tom continues. There is not clear method of analysis or describing the threats… and there is no clearly defined lexicon either. That way, it’s easy to attribute anything to “cyberwar”. A lot of assumptions are being made, while the term “war” itself is a clearly defined term. As Bruce Schneier explained in his keynote talk at the end of the first day at BlackHat, in a typical war, military wears a uniform. In cyber war, uniforms are less obvious.

Even the term “APT” needs to be put in perspective. Stuxnet is advanced, it’s persistent and it’s a threat. But a cat with a sniper gun, that sits there for these, would fit that definition as well. Anyways, the main stuxnet-related question that is left unanswered probably is “who did it”.

In traditional forensics, science has the capabilities to properly find/detect/classify/compare evidence and build the case based on hard facts.
In the cyber domain, this type of capability is hard to find. At the same time, attribution is really important. And that is what this talk is about.

When looking at malware, from a forensic and attribution point of view, it’s important to be able to figure out who did it, so that person can be caught. At the same time, it’s not “as easy” is in traditional forensics cases.
So, what are some of the key elements when gathering information that might help finding that out ?

profile technical capabilities
insight into state sponsored programs
create linkage between actor groups

By differentiating between actors, Tom continues, it might be possible to determine if for example it was state sponsored or not.

So, what kind of info are we looking for ? Obvious facts : address, name, social networking page, website
But… we often don’t care about this because it doesn’t help in the mitigation process.

How: We use conventional analysis

static & runtime
memory forensics (tool “Memorize” – free for download, or Volatility)
vulnerability exploitating & payload analysis
command & control
post exploitation forensics

Doing all of that manually takes a lot of time. That is why Tom feels it’s important to have solid ways to perform some kind of automated analysis as well.

Automated Analysis today :

Antivirus
- Known signatures
- Virus-Like characteristics
- Behavioural based AV is getting better, but it usually only reports 0 or 1 (bad or not bad), without looking at the grey areas
Sandboxing / Runtime Analysis, find out what the code does

The goals of the analysis should include :

what happened ?
how did they get in ?
what did they exploit ?
what was done once they get access ?
are they still there ?
how can this be prevented ?

(Current toolset / checklist usually does not contain “who do it”)

Tom explains a methodology to gather key attack meta data
– sources
– other relevant packet data
– tools & their origin
and
– planning of the attack
– execution of the attack
etc

(looking for email addresses, etc might be helpful to properly attribute)

Gathering all that info will help giving some insight about the origin of the attack. Was it public or private ? The quality of the code might often help you determine this.
Then, set up a scoring matrix to allow people to look at an attack & derive different scores to an attack (similar to what an IDS/IPS does). It will show the exploit itself, but also tries to identify the required skillset to pull off the attack.

Although toolsets / methodology to properly respond exist, not a lot of people are actually using it.
A lot more methodology needs to be put in place to increase the value of the results that are derived from the response process.

What follows are some things researches should be aware of, or should look at, because they are key to the attribution question :

Exploits are often reworked for malware purposes :
improved reliability (no crash, no suspicion, try to avoid leaving data behind, rework to IPS wouldn’t catch it). This might mean malware writers may have access to commercial grade IPS devices to QA their work.
specific host type/os level targeting
possible to automate coloration with kb of public exploits (offsets, heap spray size, pointers, protocol, etc etc)

Looking at the exploit reliability might reveal a lot of info as well :

crashes & loose lip sink ships
improved performance
advanced/improved shellcode
re-patching memory
repairing corrupted heaps
less overhead
no large heap sprays
less excessive CPU overhead
continue target process execution
(most exploits in stuxnet were specifically chosen to be reliable across OS versions, avoid crashes, etc)
and if exploit still fails, failure may be silent
How/what gets cleaned up :
log files
event log
core dumps

Documenting the reconnaissance may be important too !

scripted attack (running apache exploit against IIS box ?)
etc

Exploit selection : what do malware writers use ?

lots of attention to 0days
1+Day != Low End Adversary
old attacks often reworked ! (not everybody patches boxes – think Aviation, manufacturing, etc)
bypass IDS/IPS
improved payloads demonstrate capability
(old attacks are still effective in targeted attacks, when attacker has some inside information about applications and patch levels)

Another way to look at malware is to compare functions with other malware, find code/functions that might have been re-used, which might allow researches to link people together.
If you can automate this (VxClass), you can start correlate malware with a database of other malware samples and see if it’s just a mutated form, derived from another specimen, etc.

It’s important though, to understand that most malware nowadays is based off “kits”, which (in most cases) doesn’t tell us much about the authors. You can link a malware to the kit, but that doesn’t tell us anything useful.

What else should we look for ? Quality of the code
nested statements (a bunch of conditional statements for example) – compilers might structure things in a different way due to optimizations etc – so it might be an indication of manual work vs a normal app) . This technique could produce some false alerts, because a lot depends on what the malware author did
unclosed file handles
memory leaks
unused variables
function redundancy
debug strings present

debug symbols can indicate developer knowledge. Tool markings can make associations with compilers.
PDB locations may also disclose usernames, operating systems, library versions, etc. It would not be the first time, Tom says, that a malware author has left his public handle in a debug file and a simple google query would reveal his real identity.

So, automation a lot of the analysis is really vital for scaling purposes.
There is too much badness, but not enough analysts.
Manual time is better spent on edge cases
Automating analysis might also help identify the “needle in haystack” cases, where for example 0day bugs were used.

Tom introduces BlackAxon, a PoC tool that will score code
uses int3 debugger breakpoints (yes, malware can detect it)
The code uses known patterns of API calls that might indicate malicious behaviour

url downloadtofile – read & xor – drop & run
create process in suspended state – virtualalloc – writeprocessmemory

Future development :

detours hook
kernel hooks

Too bad Tom didn’t actually demonstrate the tool.

With regards to stuxnet, there is still a lot of speculations of origins and possible targeting
Some great analysis has been done by Symantec (probably the most comprehensive whitepaper), Langer communications blog, dhs ics-cert and ISIS.

Applying the checklists above to the analysis, Tom states that one of the most interesting things discovered in stuxnet is the fact that it had a weak C&C mechanism (only 2 domains, http traffic, no mitm cert issue handling). This might be an indication of crime-ware

Amount of collateral damage, some weak points in the code, etc might be another indication that this was not written by a state.
Building a profile based on what we know, would result in something like this :

small(er), technically astute nation state
basic IO capabilities
full time staff of operators
reliant on external assistance
compartmented approach to operations
good human intelligence capabilities to find out info about the targets
access to centrifuge/frequency convertors

So – Who did it ?

China (J-Micron & Realtek Taiwan) ? Espionage vs Siemens (to disrupt deal with Rosatom, Suspect : Areva ) ? Maybe it was Greenpeace Tom says (Disrupt NPP / Enrichment activities)

Ok, seriously now : Based on the intelligence gathered so far, there are some other theories :
Broken arrow theory
modules written to be generic (driver, signed rootkit, etc)
but targeted attack ?
-> discrepancy

Joint Effort
Private or Public contract
end user c&c + repackaging
…

Tom finishes his talk with providing a small list with countermeasures and a quick summary on Stuxnet. Countermeasures include :

disable guest accounts
no usb devices
unrequired services on PCL dev systems
host based fw
change default pw in Siemens SQL db (although that might be illegal in certain countries)

Could stuxnet have been worse ? YES

better C&C
greater propagation discipline
possible supply chain influence
other improvements

=> stuxnet is a wake up call !

A lot of things are still unconfirmed (un-confirmable ?) The extent of its success is unknown and we may have only seen the tip of the iceberg.

The overall take-away is that control systems are vulnerable, and investments are being made to attack them.

Among the blind, the squinter rules : Security visualization in the field / Wim Remes

Being able to extract information from data is often a difficult challenge.

Even if you succeed, you need to have a good way to represent the data, visualize it in a way it’s well understood.

Simply listing the outcome of a grep on a log file may work for some people, but it’s clear that management / collegues will be able to interpret the outcome far better if the information is displayed in a visual way, Wim explains.

There are 2 key elements in this challenge.

You will need a good tool, carefully picked for the purpose / information / audience combination, and you need to apply good visualization principles.

Of course, this does not only apply to information related with security or log files, but to any data set that needs to be interpreted and represented so management can draw conclusions and take a decision if necessary.

A lot of people still use MS Excel in their presentations, Wim continues, but there might be far better (and open-source) tools that are fit for the job as well. Wim lists a number of tools that might be a better alternative, including :

Tableau
Gapminder
Davix (a live cd from secviz.org)

When it comes down to security products, Wim shows that some vendors actually do a good job representing logs/other info, and other don’t. Sometimes you will need to extract the data from their products and use that to correlate & include into a new report, and in some cases a simple data export doesn’t appear to work well.

In order to demonstrate why it is important to pick the right tool and apply good visualization, Wim shows a list of examples that fail to pass on the message. This was very funny and this talk is actually the only one that made me laugh a couple of times.

Wim says that you also need to please your audience. Management often need historical info that is comparative, needs to support the decision making process & business objectives. Information should be clear & consice and actionable. If you have to visualize something for a technical audience, information often needs to be (near) real time, can be more complex, facilitating the job. But in the end it still needs to be actionable, which makes a lot of sense.

A good amount of research has been done by a number of people (non IT related), in the area of data visualization. Edward Tufte (the Zen Master of data visualization) says that “data can be beautiful, data should be beautiful”.

This research results in a great list of tips & tricks :

People often want dashboards. A lot of vendors fail to present a clear and concise dashboard, so Wim explains how to improve dashboards with some really good examples. Stephen Few is a dashboard design guru who has done a lot of research in that area.

Sparklines (a.k.a. datawords) might help emphasizing changes and can even be put inline (in a text).

Infographs are often useful, but (similar to dashboards) can often be simplified a lot.

Using text and a font size (+ optional color) might already be enough to show a distribution of a given fact.

Use pie charts wisely, Wim says. Most of the time, pie charts contain too much information, are not based on sorted data, etc.

Using quality and relevant data from external sources might be helpful as well to put things in perspective. Wim mentions that it is important though to check the value of the external source (is the report influenced by marketing ? Is the report influenced by a given vendor ?). Context creates clarity.

Then, Wim shows a couple of examples on how to make things better. One of the key elements is to reduce the “non-data ink”, he says. Removing grid lines, legend, even colors will often improve the graph and make it easier to interpret.

After all, it is important to be able to understand a graph in a blink of the eye.

Wim finishes his talk by showing a couple of demo’s of tools that might help you visualize things in a better way.

Davix – gltail (ruby, real time, logs) : http://www.fudgie.org
Davix – afterglow
Chart director (perl)
Google charts API (which appears to be very powerful and easy to use. Just paste in an array with your data, set the columns, done)
jquery libraries (http://omnipotent.net/jquery.sparkline/ and http://www.jqplot.com )

Unfortunately Wims talk was scheduled at the same time FX did his talk on writing custom disassemblers, because I really enjoyed it and it definitely deserved a bigger audience. It was a really practical talk, good examples, well prepared. Well done Wim !

Building floodgates : Cutting-Edge Denial of Service mitigation / Yuri Gushin & Alex Behar

In the fourth talk of the day, Yuri and Alex (Radware employees and founders of the ECL Labs technology thinktank) share their experiences and research they have done around DoS attacks.

They explain that the goal of a DoS attack is to exhaust the target resources up to a point where service is interrupted.

Motives range from hactivism over extortion to rivarly. Fact is, most attacks succeed because current controls and mechanisms are inadequate. So, if you derive money from being online, you are a target. If you are a cloud user (where you pay per bandwidth usage), you may suffer from this too.

There a few types of attack, Yuri and Alex explain :

Layer3 muscle attacks :

Flood of TCP/UDP/ICMP/IGMP packets, overloading infrastructure due to high packet rate to a rate where packets are discarded and filling up the packet queues or saturating pipes.
Introduce a packet workload most gear isn’t designed for.
UDP Flood to a non listening port. Just high packet rate might introduce saturation (on any device along the path)
Fail open or fail close based solutions won’t work, because we don’t want any of those 2 scenario’s.

Layer4 : slightly more sophisticated

DoS attacks consuming extra memory, CPU cycles, and triggereing responses
TCP SYN Flood. If Syn queue is full, new connections are dropped.
TCP new connections flood (check fw/IPS/router/… specsheet)
TCP concurrent connections exhaustion (check fw/IPS/router/… specsheet)
TCP/UDP garbage data flood to listening services

Layer 7 : culmination of evil

abusing application-server memory and performance limitations, masquerading as legitimate transactions (for example, use search functions to increase load on CPU, disk I/O, etc)
HTTP page flood
HTTP bandwidth consumption
DNS query flood
SIP invite flood
Low rate, high impact attacks (Slowloris, HTTP POST DoS)

Looking at DoS protection mechanisms, the presenters distinguish a few operation modes.

There’s static protection, which is mainly based on pre-defined thresholds. The user is in control (which is good and bad, because the user will need to take control, even if it’s 3am in the morning). It requires a lot of tuning, which decreases accuracy and increases operational expenses. When setting up a static protection mechanism, a detection phase is required.

Adaptive systems will learn and adjust thresholds dynamically, based on real traffic characteristics vs DoS attack traffic. This results in improved accuracy, less tuning and the behvaiour can be learned. Of course, this technique is more expensive.

Before a system can mitigate a DoS, it needs to be able to detect it. The detection relies on data from operational mode, they say, which can be rate based (single dimension) or behavioral based (multi-dimensional).

Rate based detection is often based on metrics such as nr of SYNs / sec, HTTP requests / sec, requests per source/second. This is obviously prone to false positives (where legitimate traffic is also identified as attack). If a news site, for example, all of a sudden attracts a lot of new visitors (because of breaking news for example), things can go wrong.

Behavioral based will actually correlate certain parameters/dimension. It will still look at packets per second, rate of packets, throughput, and so forth, but it wil also look at TCP Flag distribution %, HTTP content %, L4 Protocol% and so on. Combine all of that, and you will be able to detect a DoS in a more accurate way.

In terms of mitigation, it boils down to this : Analyze & Get rid of it, the presenters state.

The analysis is based on real time signatures of the ongoing DoS attack, using the highest anomaly values from L3-L7 headers. Most systems rely on manual intervention once a DoS has been detected (so basically you can set filters / ACLs, …), which is a bit painful (again – think… 3am). Some flooders will have a static signature, which is good for protection… It will help creating the filters, but you still need to take care of it yourself.

Of course, there are better ways to do this. Passive mitigation will rate-limit packets according to the threshold (no analysis needed). All traffic above the threshold will be dropped. Of course, traffic up to the threshold will still be allowed, so if the threshold is not wisely chosen, there might still be a big performance impact. A second passive mitigation technique is to drop packets based on real time signatures derived from the analysis phase.

Active mitigation may be what we really need. One way of doing this is by issuing some kind of challenge/response that will identify if the other side is a genuine browser or not. You can use some javascript & see if it got processed (by setting a cookie for example). Note that some web vulnerability scanners can actually deal with this.

Session disruption, the second active mitigation technique, is effective with stateful attacks. It will drop malicious packets and in the meantime it resets the session with the target server. That server can clean up the session, while the TCP/IP stack of the attacker eventually might get exhausted by forcing retransmits.

Finally, tarpitting (stalling the malicious tcp session of the attacker) might frustrate the attacker up to a point where he decided the DoS attack does not have any value anymore. When using tarpitting, the DoS protection system will lower the TCP window size (until 0), so the attacker will be left with tcp sessions that can’t actually be used to transmit payload.

From a mitigation performance point of view, Yuri and Alex state that DoS cannot be fixed on the internet today, as it’s likely some device along the path may not be able to sustain the load. Furthermore, most of the current x86 hardware deals poorly with bigger workloads. Maintaining connection states for the good guys is a must while blocking the bad guys which is even more performance intensive. It’s probably fair to state that resilient mitigation of high-rate attacks is currently only possible with ASIC-based architectures.
That means that you have to use dedicated devices that are optimized for the job / amount of traffic / amount of sessions / ….

The presenters then introduce Roboo, which is a DoS mitigation tool they’ve developed. Roboo is an opensource HTTP Robot mitigator, which uses an advanced non-interactive HTTP challenge/response mechanism which will detect & mitigate HTTP robots. It should provide similar benefits of what captcha does against spam bots, but without any end-user burden.

In essence, Roboo weeds out a larger percentage of HTTP robots which do not use real browsers :

HTTP Denial of service tools (LOIC)
Vulnerability scanners (Acunetix, Metasploit Pro, Nessus)
Web exploits
Spam bots
Spiders / crawlers

Roboo will respond to each GET / POST request from an unverified source with a challenge. This challenge can be javascript or flash based, and optionally gzip compressed. A real browser with full http/html/javascript and flash player will re-issue the original request after setting a special http cookie that marks the host as “verified”. (verification cookie)

Of course, as various people in the audience point out, the system has some drawbacks too. Browsers without flash, javascript or cookie support won’t be able to get verified. In a future version of the tool, this issue will be solved by introducing a threshold. As long as the thresholds is not met, no verification takes place. But as soon as traffic rises, valid sessions may get dropped as well.

Roboo uses a positive security model. If you want to allow a robot, you have to whitelist it.

Yuri and Alex explain that the verification cookie is calculated as follows :

SHA1(client_ip, timebased_rand, secret) – 160 bits

The secret is created every time you start roboo – 512 bit. The timebased_rand value changes every X seconds (cookie validity window)

Roboo currently integrates with nginx web server and reverse proxy as an embedded perl module. You can download the module from http://www.ecl-labs.org

The presenters finish the talk by showing a demo of how Roboo would be effective against a LOIC based attack, and providing an overall summary :

DoS business is booming
could subscribers become targets
anti-DoS techniques have greatly evolved : goodbye rate limits, hello adaptive/behavioral/… system.

You are doing it wrong – Failures in Virtualization systems / Claudio Criscione

Claudio starts his presentation by explaining that we all want a virtualization infrastructure that is secure, reliable, manageable and scalable.

Most (if not all) commercial virtualization platforms broke down their solution into smaller parts, allowing us to build manageable, reliable and scalable solutions. And yes, it will be secure right ?

In fact, when designing a virtualization infrastructure, we have to realize it will end up being the most important infrastructure in the environment.

What we don’t want is

large attack surface (but we have tons of different web services)
huge OS footprint (what about windows ?)
legacy stuff (such as embedded web servers)
managed by the same IT guys

So it looks like we are doing it wrong.

About a year ago, Claudio downloaded all available commercial virtualization platforms and performed some basic security audits to their components. In 5 man/days, he found no less than 18 0day vulnerabilities (some of them just being “lame” XSS vulnerabilities, he says). As part of that audit project, VASTO (Virtualization Assessment Toolkit) was born, (a set of metasploit modules).

So it looks like security is not as good as we think it is. In fact, the driving forces behind virtualization are

it has to be as easy as possible
it needs to use as little additional infrastructure as possible
it needs to be as powerful as possible
it needs to be as flexible as possible

In short : money saving is the key driver.

Of course, management will ask “hey, it’s going to be secure, right” and the IT guys will say “yeah, the hypervisor is secure, it does CPU isolation, vm segregation etc.

So people are worried, but it’s not a driving factor.

The realitiy is that any virtualization solution is going to be complex. Because it’s a fundamental layer in the infrastructure, it needs to be able to tie into a lot of stuff. AD intregration, kerberos, ldap, soap, jboss, flex, ssl, stunnel, mysql, ajax, tomcat, etc etc etc.

Security is not keeping up with these requirements. Functionality often still does not take security into account. You can keep the hypervisor small and secure, but you can still get owned via vCenter for example, Claudio explains. Everyone is securing their VM’s… but that’s not enough.

If you ask me, Claudio states, even a “lame XSS” bug is dangerous in your core enterprise infrastructure layer, and moves on by demonstrating the impact of 3 different vulnerabilities.

1. XSS

Claudio wrote a metasploit module, part of vasto, that will exploit the “lame” XSS bug. The module will inject administrative commands to vCenter. All you need to do is trick an admin to visit your fake webserver. As soon as the admin enters his vCenter credentials, you take control and insert administrative commands.

2. Log file

During an audit, he gained regular user access to a vCenter server. Not being able to attack the underlying vmware infrastructure directly, he started browsing various folders on the file system, and discovered that he could read the vmware vCenter log/debug files. On of those files (vpxd-profile) exposes SoapSessionIDs. On top of that, every 5 minutes, a hidden user “virtualcenter” logs in to vCenter and writes an entry to the log (so you don’t have to wait for an admin to log on). Using that SoapSessionID, he managed to log on to vCenter with administrative privileges, without using a valid username and password. He wrote a metasploit module to demonstrate it. Basically, the module will accept a vcenter client connection on a given port, and use the SoapSessionID to authenticate to VMWare. You can use any username/password, the sessionID is all you need.

This bug has been fixed in the last version of VMWare, he mentions.

3. Shell escape (BONSAI-2010-0109)

This particular vulnerability will allow you to escape from a command and execute another command. This vulnerability targets Oracle VM 2.1.5 – “Unbreakable”.

You need to have VM administrator rights… and it will turn that into remote root access. So if you have delegated VM admin access to “the VM guys”, then they might own the underlying infrastructure as well. Bye bye accountability.

The vasto module demonstrates that’s it’s trivial to exploit this, and run any command you want, as root.

In the second part of his talk, he explains how to fix it. Obviously patching bugs is not enough. On top of that, admins/users don’t even feel like it’s broken or perceive their VM infrastructure as broken. In fact, they might even be hesitant to patch because it is often the most critical component in their infrastructure, and if the patch breaks something, it breaks everything.

So, in order to fix things, Claudio feels we need to harden the virtualization infrastructure, by inserting a new layer. He introduces the concept of vCells, atomic management units, which define a service or group of services, technologically uniform and logically self contained.

Next, a second component is added to the hardening concept : the vGatekeeper, and that component should

make sure that if something gets compromised, then you don’t loose every single component in the infrastructure
be agnostic
be central to the infrastructure, using an enforced path. you need to go through the vGatekeeper to access the back-end VM management interfaces
be able to enforce rules.

Part 3 of his talk explains how to implement the fix. If we look at the current implementations, most of the VM management interfaces run on top of web services, soap calls, API’s, etc. So if you secure those, you rule out most of the issues (except for the proprietary stuff).

Claudio wrote a Proof of Concept application, called VASTOKEEPER, to accomplish this.

It is based on mod_security/apache and will act as enforced proxy between the vSphere clients and the back-end Management interfaces. (You, as admin, will need to make sure the access path is enforced, of course).

In his demo, he built the following topology l:

user – vSphere client – vCenter server – VASTOKEEPER – ESXi

After launching the Vastokeeper machine (ubuntu), he uses a basic rules generator website which will produce up the necessary iptables forwarding rules, and the mod_security rules as well. Copy/paste those into their respective config files and you should be all set.

He then demonstrates that, even as an admin, you are no longer to – for example – shut down a VM. (Of course, rules are customizable, it all comes down to catching the Soap calls and blocking the ones you want to block.

In essence, ESXi is now safely behind a firewall and you need to go through the vasto vgatekeeper to access it.
Attack surface got smaller, rules can be enforced, even if the infrastructure has been compromised.

This presentation had a guid build up, it was practical, realistic and fun.

I hope the vGatekeeper will evolve and hopefully turn into an opensource community project we can all take advantage of.

That’s all folks

Well, to be honest, I also attended the Monoculture talk, by Gaus (Cisco PSIRT), but I disagreed with most of his reasoning. He basically wanted to prove that buying equipment from multiple vendors to make your environment more secure doesn’t make any sense… I expected more from this talk.

Maybe because it was the end of the day, and I don’t want to disrespect the work he has put into his research… so I’ll finish here.

BlackHat Europe 2011 was fun, I met a lot of nice people. Some people told me they liked the concept of the workshops. Some workshop presenters told me they would have liked more cooperation… so maybe that model still needs a tweak or two.

Quick message to the BlackHat folks : please consider providing some power outlets for the front few rows. I’m sure a lot of people will be very thankful for that !

You can find a nice wrap-up on the second day of BlackHat written by Xavier Mertens here : http://blog.rootshell.be/2011/03/19/blackhateu-day-2-wrap-up/

Waiting in the airport, starting to write this writeup, together with @xme

Posted in 001_Security, Cons and Seminars | Tagged 2011, alfredo ortega, barcelona, blackhat, blackhat europe, cisco ios, denial of service mitigation, DoS, floodgates, fuzzing, graph, loic, malware, pie chart, roboo, sebastian muniz, stuxnet, tom parker, visualize, wim remes

Corelan Cybersecurity Research

:: Knowledge is not an object, it's a flow ::