Towards efficient security code audits

Conducting a code review is often a daunting task, especially when the goal is to find security flaws. They can, and usually are, hidden in all parts and levels of the application – from the lowest level coding errors, through unsafe coding constructs, misuse of APIs, to the overall architecture of the application. Size and quality of the codebase, quality of (hopefully) existing documentation and time restrictions are the main complications of the review. It is therefore useful to have a plan beforehand: know what to look for, how to find the flaws efficiently and how to prioritize.

Code review should start by collecting and reviewing existing documentation about the application. The goal is to get a decent overall picture about the application – what is the expected functionality, what requirements can be possibly expected from the security standpoint, where are the trust boundaries. Not all flaws with security implications are relevant in all contexts, e.g. effective denial of service against server certainly has security implications, whereas coding error in command line application which causes excessive CPU load will probably have low impact. At the end of this phase it should be clear what are the security requirements and which flaws could have the highest impact.

Armed with this knowledge the next step is to define the scope for audit. It is generally always the case that conducting a thorough review would require much more resources than are available, so defining what parts will be audited and which vulnerabilities will be searched for increases efficiency of the audit. It is however necessary to state all the assumptions made explicitly in the report – this makes it possible for others to review them or revisit them in the future in next audits.

In general there are two approaches to conducting a code review – for the lack of better terminology we shall call them bottom up and top down. Of course, real audits always combine techniques from both, so this classification is merely useful when we want to put them in a context.

The top down approach starts with the overall picture of the application and security requirements and drills down towards lower levels of abstraction. We often start by identifying components of application, their relationships and mapping the flow of data. Drilling further down, we can choose to inspect potentially sensitive interfaces which components provide, how data is handled at rest and in motion, how access to sensitive parts of application are restricted etc. From this point audit is quickly becoming very targeted – since we have a good picture of which components, interfaces and channels might be vulnerable to which classes of attacks, we can focus our search and ignore the other parts. Sometimes this will bring us down to the level of line-by-line code inspection, but this is fine – it usually means that architecturally some part of security of application depends on correctness of the code in question.

Top down approach is invaluable, as it is possible to find flaws in overall architecture that would otherwise go unnoticed. However, it is also very demanding – it requires a broad knowledge of all classes of weaknesses, threat models and ability to switch between abstraction levels quickly. Cost of such audit can be reduced by reviewing the application very early in the design phase – unfortunately most of the times this is not possible due to development model chosen or phase in which audit was requested. Another way how to reduce the effort is to invest effort into documentation and reusing it in the future audits.

In the bottom up approach we usually look for indications of vulnerabilities in the code itself and investigate whether they can possibly lead to exploitation. These indications may include outright dangerous code, misuse of APIs, dangerous coding constructs and bad practices to poor code quality – all of these may indicate presence of weakness in the code. Search is usually automated, as there is abundance of tools to simplify this task including static analyzers, code quality metric tools and the most versatile one: grep. All of these reduce the cost of finding a potentially weak spots and so the cost lies in separating wheat from chaff. Bane of this appoach is receiver operating characteristic curve – it is difficult to substantially improve it, so we are usually left with the tradeoffs between false positives and false negatives.

Advantages of bottom up approach are relatively low requirements on resources and reusability. This means it is often easy and desirable to run such analyses as early and as often as possible. It is also much less depends on the skill of the reviewer, since the patterns can be collected to create a knowledgebase, aided with freely available resources on internet. It is a good idea to create checklists to make sure all common types of weaknesses are audited for and make this kind of review more scalable. On the other hand, biggest disadvantage is that certain classes of weaknesses can never be found with this approach – these usually include architectural flaws which lead to vulnerabilities with biggest impact.

The last step in any audit is writing a report. Even though this is usually perceived as the least productive time spent, it is an important one. A good report can enable other interested parties to further scrutinize weak points, provides necessary information to make a potentially hard decisions and is a good way to share and reuse knowledge that might otherwise stay private.

It’s all a question of time – AES timing attacks on OpenSSL

This blog post is co-authored with Andy Polyakov from the OpenSSL core team.

Advanced Encryption Standard (AES) is the mostly widely used symmetric block cipher today. Its use is mandatory in several US government and industry applications. Among the commercial standards AES is a part of SSL/TLS, IPSec, 802.11i, SSH and numerous other security products used throughout the world.

Ever since the inclusion of AES as a federal standard via FIPS PUB 197 and even before that when it was known as Rijndael, there has been several attempts to cryptanalyze it. However most of these attacks have not gone beyond the academic papers they were written in. One of them worth mentioning at this point is the key recovery attacks in AES-192/AES-256. A second angle to this is attacks on the AES implementations via side-channels. A side-channel attack exploits information which is leaked through physical channels such power-consumption, noise or timing behaviour. In order to observe such a behaviour the attacker usually needs to have some kind of direct or semi-direct control over the implementation.

There has been some interest about side-channel attacks in the way OpenSSL implements AES. I suppose OpenSSL is chosen mainly because its the most popular cross-platform cryptographic library used on the internet. Most Linux/Unix web servers use it, along with tons of closed source products on all platforms. The earliest one dates back to 2005, and the recent ones being about cross-VM cache-timing attacks on OpenSSL AES implementation described here and here. These ones are more alarming, mainly because with applications/data moving into the cloud, recovering AES keys from a cloud-based virtual machine via a side-channel attack could mean complete failure for the code.

After doing some research on how AES is implemented in OpenSSL there are several interesting facts which have emerged, so stay tuned.

What are cache-timing attacks?

Cache memory is random access memory (RAM) that microprocessor can access more quickly than it can access regular RAM. As the microprocessor processes data, it looks first in the cache memory and if it finds the data there (from a previous reading of data), it does not have to do the more time-consuming reading of data from larger memory. Just like all other resources, cache is shared among running processes for the efficiency and economy. This may be dangerous from a cryptographic point of view, as it opens up a covert channel, which allows malicious process to monitor the use of these caches and possibly indirectly recover information about the input data, by carefully noting some timing information about own cache access.

A particular kind of attack called the flush+reload attack works by  forcing data in the victim process out of the cache, waiting a bit, then measuring the time it takes to access the data. If the victim process accesses the data while the spy process is waiting, it will get put back into the cache, and the spy process’s access to the data will be fast. If the victim process doesn’t access the data, it will stay out of the cache, and the spy process’s access will be slow. So, by measuring the access time, the spy can tell whether or not the victim accessed the data during the wait interval. All this under premise that data is shared between victim and adversary.

Note that we are not talking about secret key being shared, but effectively public data, specifically lookup tables discussed in next paragraph.

Is AES implementation in OpenSSL vulnerable to cache-timing attacks?

Any cipher relying heavily on S-boxes may be vulnerable to cache-timing attacks. The processor optimizes execution by loading these S-boxes into the cache so that concurrent accesses/lookups, will not need loading them from the main memory. Textbook implementations of these ciphers do not use constant-time lookups when accessing the data from the S-boxes and worse each lookup depends on portion of the secret encryption key. AES-128, as per the standard, requires 10 rounds, each round involves 16 S-box lookups.

The Rijndael designers proposed a method which results in fast software implementations. The core idea is to merge S-box lookup with another AES operation by switching to larger pre-computed tables. There still are 16 table lookups per round. This 16 are customarily segmented to 4 split tables, so that there are 4 lookups per table and round. Each table consists of 256 32-bit entries. These are referred to as T-tables, and in the case of the current research, the way these are loaded into the cache leads to timing-leakages. The leakage as described in the paper  is quantified by probability of a cache line not being accessed as result of block operation. As each lookup table, be it S-box or pre-computed T-table, consists of 256 entries, probability is (1-n/256)^m, where n is number of table elements accommodated in single cache line, and m is number of references to given table per block operation. Smaller probability is, harder to mount the attack.

Aren’t cache-timing attacks local, how is virtualized environment affected?

Enter KSM (Kernel SamePage Merging). KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page. This page is then marked copy on write. If the contents of the page is modified by a guest virtual machine virtual machine, a new page is created for that guest virtual machine. This means that cross-VM cache-timing attacks would now be possible. You can stop KSM or modifiy its behaviour. Some details are available here.

You did not answer my original question, is AES in OpenSSL affected?

In short, no. But not to settle for easy answers, let’s have a close look at how AES in OpenSSL operates. In fact there are several implementations of AES in OpenSSL codebase and each one of them may or may not be chosen based on specific run-time conditions. Note: All of the above discussions are in about OpenSSL version 1.0.1.

  • Intel Advanced Encryption Standard New Instructions or AES-NI, is an extension to the x86 instruction set for intel and AMD machines used since 2008. Intel processors from Westmere onwards and AMD processors from Bulldozer onwards have support for this. The purpose of AES-NI is to allow AES to be performed by dedicated circuitry, no cache is involved here, and hence it’s immune to cache-timing attacks. OpenSSL uses AES-NI by default, unless it’s disabled on purpose. Some hypervisors mask the AES-NI capability bit, which is customary done to make sure that the guests can be freely migrated within heterogeneous cluster/farm. In those cases OpenSSL will resort to other implementations in its codebase.
  • If AES-NI is not available, OpenSSL will either use Vector Permutation AES (VPAES) or  Bit-sliced AES (BSAES), provided the SSSE3 instruction set extension is available. SSSE3 was first introduced in 2006, so there is a fair chance that this will be available in most computers used. Both of these techniques avoid data- and key-dependent branches and memory references, and therefore are immune to known timing attacks. VPAES is used for CBC encrypt, ECB and “obscure” modes like OFB, CFB, while BSAES is used for CBC decrypt, CTR and XTS.
  • In the end, if your processor does not support AES-NI or SSSE3, OpenSSL falls back to integer-only assembly code. Unlike widely used T-table implementations, this code path uses a single 256-bytes S-box. This means that probability of a cache line not being accessed as result of block operation would be (1-64/256)^160=1e-20. “Would be” means that actual probability is even less, in fact zero, because S-box is fully prefetched, and even in every round.

For completeness sake it should be noted that OpenSSL does include reference C implementation which has no mitigations to cache-timing attacks. This is a platform-independent fall-back code that is used on platforms with no assembly modules, as well as in cases when assembler fails for some reason. On side note, OpenSSL maintains really minimal assembler requirement for AES-NI and SSSE3, in fact the code can be assembled on Fedora 1, even though support for these instructions was added later.

Bottom line is that if you are using a Linux distribution which comes with OpenSSL binaries, there is a very good chance that the packagers have taken pain to ensure that the reference C implementation is not compiled in. (Same thing would happen if you download OpenSSL source code and compile it)

It’s not clear from the research paper how the researchers were able to conduct the side channel attack. All evidence suggests that they ended up using the standard reference C implementation of AES instead of assembly modules which have mitigations in place.  The researchers were contacted but did not respond to this point.  Anyone using an OpenSSL binary they built themselves using the defaults, or precompiled as part of an Linux distribution should not be vulnerable to these attacks.

OpenSSL Privilege Separation Analysis

As part of the security response process, Red Hat Product Security looks at the information that we obtain in order to align future endeavors, such as source code auditing, to where problems occur in order to attempt to prevent repeats of previous issues.

Private key isolation

When Heartbleed was first announced, a patch was proposed to store private keys in isolated memory, surrounded by an unreadable page. The idea was that the process would crash due to a segmentation violation before the private key memory was read.

However, it was quickly pointed out that the proposed patch was flawed. It did not store the private keys in the isolated memory space, and the contents of memory accessible by Heartbleed could still contain information that can be used to quickly reconstruct the private key.

The lesson learned here was that an audit of how and where private keys can be accessed, and where useful information is stored, should be undertaken to identify any potential weaknesses in the approach. Additionally, testing and verifying results would have identified that the private keys were not located in memory surrounded by unreadable memory pages.

Private key privilege separation

The idea behind private key privilege separation is to reduce the risk of an equivalent Heartbleed-style memory leak vulnerability. This can be implemented by using an application in front of the end service being protected or be implemented in the target application itself.

One example of using an application in front of the service being protected is Titus.  This application runs a separate process per TLS connection and stores the private key in another process. This helps prevent Heartbleed-style bugs from leaking private keys and other information about application state. The per-connection process model also protects against information from other connections being leaked or affected.

One drawback of the current implementation in Titus is that it fork()s and doesn’t execve() itself.  If there are any memory corruption vulnerabilities present in Titus, or OpenSSL, writing an exploit against the target is far easier than it could have been and potentially leaves useful information in memory that can be obtained later on.

Additionally, depending on the how chroot directories are set up, there may not be devices such as /dev/urandom available, which reduces the possible entropy sources available to OpenSSL.

Another approach is to implement the private key privilege separation in the process itself which is what some of the OpenBSD software has started to do. The aim being that while it won’t protect against OpenSSL vulnerabilities in and of itself, it will help restrict private keys from being leaked.

Privilege-separated OpenSSL

Sebastian Krahmer wrote a OpenSSL Privilege Separation (sslps) proof of concept which uses the Linux Kernel Secure Computing (seccomp) interface to isolate OpenSSL from the lophttpd process. This effectively reduces the available system calls that OpenSSL itself makes.

This has the advantage that if there is a memory corruption or arbitrary code execution vulnerability present in OpenSSL an attacker requires a further kernel vulnerability present either in the allowed system calls or in the lophttpd IPC mechanism to gain access.

Another possibility is that the attacker is happy to sit in the restricted OpenSSL process and monitor the SSL_read and SSL_write traffic, potentially gaining access to the private keys in memory.

While the current version of sslps doesn’t mitigate against Heartbleed-style memory leaking the private key, it helps make an attacker’s job harder in a memory corruption or arbitrary code execution vulnerability situation in OpenSSL.

It will be interesting to see if the OpenSSL or LibreSSL developers investigate using privilege separation or sandboxing in the future and what approaches are taken to implement them.

Hardware

One approach to help restrict compromises from software is to store the private keys elsewhere to prevent key compromise. One such approach is using a Hardware Security Module (HSM) to handle key generation, encryption, and signing. We may discuss using HSMs in the future.

It is also possible to use a Trusted Platform Module (TPM) to provide key generation, storage, encryption, and signing with OpenSSL, but this approach may be too slow for non-client side consideration.

Designing a new approach

Having laid out what’s available, a rough draft of an idealized approach for hardening SSL processing can now be made.

First, the various private keys should be isolated from the main processing of SSL traffic. This will help reduce the impact of Heartbleed-style memory leaks which makes the attackers job of getting the private keys harder.

Second, the SSL traffic processing should be isolated from the application itself. This helps restrict the impact of bugs in OpenSSL from affecting the rest of the application and system to the maximum possible extent.

Lastly, use existing kernel features, such as executing a new process to have address space randomization and stack cookie values reapplied, as this helps reduce the amount of information available to attack other processes. Additionally, features such as seccomp could be used to restrict what the private key process and the SSL traffic process can do, which in turn helps restrict the attack surface available to a process. Furthermore, it may be possible to utilize mandatory access control (MAC) systems, such as SELinux, to further contain and restrict the processes involved.

Potential Pitfalls

Implementing all of the above may introduce some backwards compatibility issues. An example to consider is when applications which utilize chroot() and can no longer access the required executables to implement an idealized approach. Perhaps it might be feasible to implement a fallback to a fork() based mechanism.

There are other functionality that may be adversely affected by such restrictions, and would require proper indepth analysis, such as looking up server and client certificate validity. Some API compatibilities could also get in the way.

It’s possible that the IPC mechanisms would introduce some performance impact, but overhead would be dwarfed by the cryptographic processing side, and actually may not be measurable. It may be possible to reduce the amount of overhead with some compromise of security by using shared pages, or page migration between processes to reduce the data copying aspect of IPC, and just have the IPC mechanism used for message passing.

Conclusion

We’ve covered currently existing approaches and drawn up a rough list of idealized features that would be required to help reduce the current attack surface of OpenSSL. These features would make an attackers job harder in compromising private keys and compromising applications that use OpenSSL. A follow-up post may look at using an OpenSSL engine to move the private key from the application itself, into another process to prevent Heartbleed-style memory leaks from disclosing the private keys.

OpenSSL MITM CCS injection attack (CVE-2014-0224)

In the last few years, several serious security issues have been discovered in various cryptographic libraries. Though very few of them were actually exploited in the wild before details were made public and patches were shipped, important issues like Heartbleed have led developers, researchers, and users to take code sanity of these products seriously.

Among the recent issues fixed by the OpenSSL project in version 1.0.1h, the main one that will have everyone talking is the “Man-in-the-middle” (MITM) attack, documented by CVE-2014-0224, affecting the Secure Socket Layer (SSL) and Transport Layer Security (TLS) protocols.

What is CVE-2014-0224 and should I really be worried about it?

The short answer is: it depends. But like any security flaw, its always safer to patch rather than defer and worry.

In order for an attacker to exploit this flaw, the following conditions need to be present.

  • Both the client and the server must be vulnerable. All versions of OpenSSL are vulnerable on the client side. Only 1.0.1 and above are currently known to be vulnerable on the server side. If either the client or the server is fixed, it is not feasible to perform this attack.
  • A Man-In-The-Middle (MITM) attacker: An attacker capable of intercepting and modifying packets off the wire. A decade back, this attack vector seemed almost impossible for anyone but Internet Service Providers as they had access to all the network devices through which most of the traffic on the internet passed.

However with the prevalence of various public wireless access points, easily available at cafes, restaurants, and even free internet access provided by some cities, MITM is now possible. Additionally, there is a variety of software available that provides the capability of faking Access Points. Once clients connect to the fake AP, an attacker could then act as a MITM for the client’s traffic. A successful MITM attack may disclose authentication credentials, sensitive information, or give the attacker the ability to impersonate the victim.

How does this attack work?

SSL/TLS sessions are initiated with the ClientHello and ServerHello handshake messages sent from the respective side. This part of the protocol is used to negotiate the attributes of the session, such as protocol version used, encryption protocol, encryption keys, Message Authentication Code (MAC) secrets and Initializaton Vectors (IV), as well as the extensions supported.

For various reasons, the client or the server may decide to modify the ciphering strategies of the connection during the handshake stage (don’t confuse this with the handshake protocol). This can be achieved by using the ChangeCipherSpec (CCS) request. The CCS consists of a single packet which is sent by both the client and the server to notify that the subsequent records will be protected under the newly negotiated CipherSpec and keys.

As per the standards (RFC 2246, RFC 5246) “The ChangeCipherSpec message is sent during the handshake after the security parameters have been agreed upon, but before the verifying Finished message is sent.”. This however did not happen with OpenSSL, and it accepted a CCS even before the security parameters were agreed upon. It is expected that accepting CCS out of order results in the state between both sides being desynchronized. Usually this should result in both sides effectively terminating the connection, unless you have another flaw present.

In order to exploit this issue, a MITM attacker would effectively do the following:

  • Wait for a new TLS connection, followed by the ClientHello / ServerHello handshake messages.
  • Issue a CCS packet in both the directions, which causes the OpenSSL code to use a zero length pre master secret key. The packet is sent to both ends of the connection. Session Keys are derived using a zero length pre master secret key, and future session keys also share this weakness.
  • Renegotiate the handshake parameters.
  • The attacker is now able to decrypt or even modify the packets in transit.

OpenSSL patched this vulnerability by changing how it handles when CCS packets are received, and how it handles zero length pre master secret values. The OpenSSL patch ensures that is is no longer possible to use master keys with zero length. It also ensures that CCS packets cannot be received before the master key has been set.

What is the remedy?

The easiest solution is to ensure you are using the latest version of OpenSSL your distribution provides. Red Hat has issued security advisories for all of its affected products, and Fedora users should also be able to update their openssl packages to a patched version.

You will need to restart any services using OpenSSL that are not restarted automatically.

If you are a Red Hat customer, there is a tool available located at https://access.redhat.com/labs/ccsinjectiontest/ which you can use to remotely verify the latest patches have been applied and your TLS server is responding correctly.

We have additional information regarding specific Red Hat products affected by this issue that can be found at https://access.redhat.com/site/articles/904433