Fedora Security Team

Vulnerabilities in software happen.  When they get fixed it’s up to the packager to make those fixes available to the systems using the software.  Duplicating much of the response efforts that Red Hat Product Security performs for Red Hat products, the Fedora Security Team (FST) has recently been created to assist packagers get vulnerability fixes downstream in a timely manner.

At the beginning of July, there were over 500 vulnerability tickets open* against Fedora and EPEL.  Many of these vulnerabilities already had patches or releases available to remedy the problems but not all.  The Team has already found several examples of upstream not knowing that the vulnerability exists and was able to fix the issue quickly.  This is one of the reasons having a dedicated team to work these issues is so important.

In the few short weeks since the Team was created, we’ve already closed 14 vulnerability tickets and are working another 150.  We hope to be able to work in a more real-time environment once the backlog decreases.  Staying in front of the vulnerabilities will not be easy, however.  During the week of August 3rd, 27 new tickets were opened for packages in Fedora and EPEL.  While we haven’t figured out a way to get ahead of the problem, we are trying to deal with the aftermath and get fixes pushed to the users as quickly as possible.

Additional information on the mission and the Team can be found on our wiki page.  If you’d like to get involved please join us for one of our meetings and subscribe to our listserv.

 

* A separate vulnerability ticket is sometimes opened for different versions of Fedora and EPEL resulting in multiple tickets for a single vulnerability.  This makes informing the packager easier but also inflates the numbers significantly.

Controlling access to smart cards

Smart cards are increasingly used in workstations as an authentication method. They are mainly used to provide public key operations (e.g., digital signatures) using keys that cannot be exported from the card. They also serve as a data storage, e.g., for the corresponding certificate to the key. In RHEL and Fedora systems low-level access to smart cards is provided using the pcsc-lite daemon, an implementation of the PC/SC protocol, defined by the PC/SC industry consortium. In brief the PC/SC protocol allows the system to execute certain pre-defined commands on the card and obtain the result. The implementation on the pcsc-lite daemon uses a privileged process that handles direct communication with the card (e.g., using the CCID USB protocol), while applications can communicate with the daemon using the SCard API. That API hides, the underneath communication between the application and the pcsc-lite daemon which is based on unix domain sockets.

However, there is a catch. As you may have noticed there is no mention of access control in the communication between applications and the pcsc-lite daemon. That is because it is assumed that the access control included in smart cards, such as PINs, pinpads, and biometrics, would be sufficient to counter most threats. That isn’t always the case. As smart cards typically contain embedded software in the form of firmware there will be bugs that can be exploited by a malicious application, and these bugs even if known they are not easy nor practical to fix. Furthermore, there are often public files (e.g., without the protection of a PIN) present on a smart card that while they were intended to be used by the smart card user, it is not always desirable to be accessible by all system users. Even worse, there are certain smart cards that would allow any user of a system to erase all smart card data by re-initializing it. All of these led us to introduce additional access control to smart cards, in par with the access control used for external hard disks. The main idea is to be able to provide fine-grained access control on the system, and specify policies such as “the user on the console should be able to fully access the smart card, but not any other user”. For that we used polkit, a framework used by applications to grant access to privileged operations. The reason of this decision is mainly because polkit has already been successfully used to grant access to external hard disks, and unsurprisingly the access control requirements for smart cards share many similarities with removable devices such as hard disks.

The pcsc-lite access control framework is now part of pcsc-lite 1.8.11 and will be enabled by default in Fedora 21. The advantages that it offers is that it can prevent unauthorized users from issuing commands to smart cards, and prevent unauthorized users from reading, writing or (in some cases) erasing any public data from a smart card. The access control is imposed during the session initialization, thus reducing to minimal any potential overhead. The default policy in Fedora 21 will treat any user on the console as authorized, as physical access to the console implies physical access to the card, but remote users, e.g., via ssh, or system daemons will be treated as unauthorized unless they have administrative rights.

Let’s now see how the smart card access control can be administered. The system-wide policy for pcsc-lite daemon is available at /usr/share/polkit-1/actions/org.debian.pcsc-lite.policy. That file is a polkit XML file that contains the default rules needed to access the daemon. The default policy that will be shipped in Fedora 21 consists of the following.

  <action id="org.debian.pcsc-lite.access_pcsc">
    <description>Access to the PC/SC daemon</description>
    <message>Authentication is required to access the PC/SC daemon</message>
    <defaults>
      <allow_any>auth_admin</allow_any>
      <allow_inactive>auth_admin</allow_inactive>
      <allow_active>yes</allow_active>
    </defaults>
  </action>

  <action id="org.debian.pcsc-lite.access_card">
    <description>Access to the smart card</description>
    <message>Authentication is required to access the smart card</message>
    <defaults>
      <allow_any>auth_admin</allow_any>
      <allow_inactive>auth_admin</allow_inactive>
      <allow_active>yes</allow_active>
    </defaults>
  </action>

The syntax format is explained in more details in the polkit manual page. The pcsc-lite relevant parts are the action IDs. The action with ID “org.debian.pcsc-lite.access_pcsc” contains the policy in order to access the pcsc-lite daemon and issue commands to it, i.e., access the unix domain socket. The latter action with ID “org.debian.pcsc-lite.access_card” contains the policy to issue commands to smart cards available to the pcsc-lite daemon. That distinction allows for example programs to query the number of readers and cards present, but not issue any commands to them. Under both policies only active (console) processes are allowed to access the pcsc-lite daemon and smart cards, unless they are privileged processes.

Polkit, is quite more flexible though. With it we can provide even more fine-grained access control, e.g., to specific card readers. For example, if we have a web server that utilizes a smart card we can restrict it to use only the smart cards under a given reader. These rules are expressed in Javascript and can be added in a separate file in /usr/share/polkit-1/rules.d/. Let’s now see how the rules for our example would look like.

polkit.addRule(function(action, subject) {
    if (action.id == "org.debian.pcsc-lite.access_pcsc" &&
        subject.user == "apache") {
            return polkit.Result.YES;
    }
});

polkit.addRule(function(action, subject) {
    if (action.id == "org.debian.pcsc-lite.access_card" &&
        action.lookup("reader") == 'name_of_reader' &&
        subject.user == "apache") {
            return polkit.Result.YES;    }
});

Here we add two rules. The first one allows the user “apache”, which is the user the web-server runs under, to access the pcsc-lite daemon. That rule explicitly allows access to the daemon because in our default policy only administrator and console user can access it. The latter rule, it allows the same user to access the smart card reader identified by “name_of_reader”. The name of the reader can be obtained using the commands pcsc_scan or opensc-tool -l.

With these changes to pcsc-lite we manage to provide reasonable default settings for the users of smart cards that apply to most, if not all, typical uses. These default settings increase the overall security of the system, by denying access to the smart card firmware, as well as to data and operations for non-authorized users.

Towards efficient security code audits

Conducting a code review is often a daunting task, especially when the goal is to find security flaws. They can, and usually are, hidden in all parts and levels of the application – from the lowest level coding errors, through unsafe coding constructs, misuse of APIs, to the overall architecture of the application. Size and quality of the codebase, quality of (hopefully) existing documentation and time restrictions are the main complications of the review. It is therefore useful to have a plan beforehand: know what to look for, how to find the flaws efficiently and how to prioritize.

Code review should start by collecting and reviewing existing documentation about the application. The goal is to get a decent overall picture about the application – what is the expected functionality, what requirements can be possibly expected from the security standpoint, where are the trust boundaries. Not all flaws with security implications are relevant in all contexts, e.g. effective denial of service against server certainly has security implications, whereas coding error in command line application which causes excessive CPU load will probably have low impact. At the end of this phase it should be clear what are the security requirements and which flaws could have the highest impact.

Armed with this knowledge the next step is to define the scope for audit. It is generally always the case that conducting a thorough review would require much more resources than are available, so defining what parts will be audited and which vulnerabilities will be searched for increases efficiency of the audit. It is however necessary to state all the assumptions made explicitly in the report – this makes it possible for others to review them or revisit them in the future in next audits.

In general there are two approaches to conducting a code review – for the lack of better terminology we shall call them bottom up and top down. Of course, real audits always combine techniques from both, so this classification is merely useful when we want to put them in a context.

The top down approach starts with the overall picture of the application and security requirements and drills down towards lower levels of abstraction. We often start by identifying components of application, their relationships and mapping the flow of data. Drilling further down, we can choose to inspect potentially sensitive interfaces which components provide, how data is handled at rest and in motion, how access to sensitive parts of application are restricted etc. From this point audit is quickly becoming very targeted – since we have a good picture of which components, interfaces and channels might be vulnerable to which classes of attacks, we can focus our search and ignore the other parts. Sometimes this will bring us down to the level of line-by-line code inspection, but this is fine – it usually means that architecturally some part of security of application depends on correctness of the code in question.

Top down approach is invaluable, as it is possible to find flaws in overall architecture that would otherwise go unnoticed. However, it is also very demanding – it requires a broad knowledge of all classes of weaknesses, threat models and ability to switch between abstraction levels quickly. Cost of such audit can be reduced by reviewing the application very early in the design phase – unfortunately most of the times this is not possible due to development model chosen or phase in which audit was requested. Another way how to reduce the effort is to invest effort into documentation and reusing it in the future audits.

In the bottom up approach we usually look for indications of vulnerabilities in the code itself and investigate whether they can possibly lead to exploitation. These indications may include outright dangerous code, misuse of APIs, dangerous coding constructs and bad practices to poor code quality – all of these may indicate presence of weakness in the code. Search is usually automated, as there is abundance of tools to simplify this task including static analyzers, code quality metric tools and the most versatile one: grep. All of these reduce the cost of finding a potentially weak spots and so the cost lies in separating wheat from chaff. Bane of this appoach is receiver operating characteristic curve – it is difficult to substantially improve it, so we are usually left with the tradeoffs between false positives and false negatives.

Advantages of bottom up approach are relatively low requirements on resources and reusability. This means it is often easy and desirable to run such analyses as early and as often as possible. It is also much less depends on the skill of the reviewer, since the patterns can be collected to create a knowledgebase, aided with freely available resources on internet. It is a good idea to create checklists to make sure all common types of weaknesses are audited for and make this kind of review more scalable. On the other hand, biggest disadvantage is that certain classes of weaknesses can never be found with this approach – these usually include architectural flaws which lead to vulnerabilities with biggest impact.

The last step in any audit is writing a report. Even though this is usually perceived as the least productive time spent, it is an important one. A good report can enable other interested parties to further scrutinize weak points, provides necessary information to make a potentially hard decisions and is a good way to share and reuse knowledge that might otherwise stay private.

It’s all a question of time – AES timing attacks on OpenSSL

This blog post is co-authored with Andy Polyakov from the OpenSSL core team.

Advanced Encryption Standard (AES) is the mostly widely used symmetric block cipher today. Its use is mandatory in several US government and industry applications. Among the commercial standards AES is a part of SSL/TLS, IPSec, 802.11i, SSH and numerous other security products used throughout the world.

Ever since the inclusion of AES as a federal standard via FIPS PUB 197 and even before that when it was known as Rijndael, there has been several attempts to cryptanalyze it. However most of these attacks have not gone beyond the academic papers they were written in. One of them worth mentioning at this point is the key recovery attacks in AES-192/AES-256. A second angle to this is attacks on the AES implementations via side-channels. A side-channel attack exploits information which is leaked through physical channels such power-consumption, noise or timing behaviour. In order to observe such a behaviour the attacker usually needs to have some kind of direct or semi-direct control over the implementation.

There has been some interest about side-channel attacks in the way OpenSSL implements AES. I suppose OpenSSL is chosen mainly because its the most popular cross-platform cryptographic library used on the internet. Most Linux/Unix web servers use it, along with tons of closed source products on all platforms. The earliest one dates back to 2005, and the recent ones being about cross-VM cache-timing attacks on OpenSSL AES implementation described here and here. These ones are more alarming, mainly because with applications/data moving into the cloud, recovering AES keys from a cloud-based virtual machine via a side-channel attack could mean complete failure for the code.

After doing some research on how AES is implemented in OpenSSL there are several interesting facts which have emerged, so stay tuned.

What are cache-timing attacks?

Cache memory is random access memory (RAM) that microprocessor can access more quickly than it can access regular RAM. As the microprocessor processes data, it looks first in the cache memory and if it finds the data there (from a previous reading of data), it does not have to do the more time-consuming reading of data from larger memory. Just like all other resources, cache is shared among running processes for the efficiency and economy. This may be dangerous from a cryptographic point of view, as it opens up a covert channel, which allows malicious process to monitor the use of these caches and possibly indirectly recover information about the input data, by carefully noting some timing information about own cache access.

A particular kind of attack called the flush+reload attack works by  forcing data in the victim process out of the cache, waiting a bit, then measuring the time it takes to access the data. If the victim process accesses the data while the spy process is waiting, it will get put back into the cache, and the spy process’s access to the data will be fast. If the victim process doesn’t access the data, it will stay out of the cache, and the spy process’s access will be slow. So, by measuring the access time, the spy can tell whether or not the victim accessed the data during the wait interval. All this under premise that data is shared between victim and adversary.

Note that we are not talking about secret key being shared, but effectively public data, specifically lookup tables discussed in next paragraph.

Is AES implementation in OpenSSL vulnerable to cache-timing attacks?

Any cipher relying heavily on S-boxes may be vulnerable to cache-timing attacks. The processor optimizes execution by loading these S-boxes into the cache so that concurrent accesses/lookups, will not need loading them from the main memory. Textbook implementations of these ciphers do not use constant-time lookups when accessing the data from the S-boxes and worse each lookup depends on portion of the secret encryption key. AES-128, as per the standard, requires 10 rounds, each round involves 16 S-box lookups.

The Rijndael designers proposed a method which results in fast software implementations. The core idea is to merge S-box lookup with another AES operation by switching to larger pre-computed tables. There still are 16 table lookups per round. This 16 are customarily segmented to 4 split tables, so that there are 4 lookups per table and round. Each table consists of 256 32-bit entries. These are referred to as T-tables, and in the case of the current research, the way these are loaded into the cache leads to timing-leakages. The leakage as described in the paper  is quantified by probability of a cache line not being accessed as result of block operation. As each lookup table, be it S-box or pre-computed T-table, consists of 256 entries, probability is (1-n/256)^m, where n is number of table elements accommodated in single cache line, and m is number of references to given table per block operation. Smaller probability is, harder to mount the attack.

Aren’t cache-timing attacks local, how is virtualized environment affected?

Enter KSM (Kernel SamePage Merging). KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page. This page is then marked copy on write. If the contents of the page is modified by a guest virtual machine virtual machine, a new page is created for that guest virtual machine. This means that cross-VM cache-timing attacks would now be possible. You can stop KSM or modifiy its behaviour. Some details are available here.

You did not answer my original question, is AES in OpenSSL affected?

In short, no. But not to settle for easy answers, let’s have a close look at how AES in OpenSSL operates. In fact there are several implementations of AES in OpenSSL codebase and each one of them may or may not be chosen based on specific run-time conditions. Note: All of the above discussions are in about OpenSSL version 1.0.1.

  • Intel Advanced Encryption Standard New Instructions or AES-NI, is an extension to the x86 instruction set for intel and AMD machines used since 2008. Intel processors from Westmere onwards and AMD processors from Bulldozer onwards have support for this. The purpose of AES-NI is to allow AES to be performed by dedicated circuitry, no cache is involved here, and hence it’s immune to cache-timing attacks. OpenSSL uses AES-NI by default, unless it’s disabled on purpose. Some hypervisors mask the AES-NI capability bit, which is customary done to make sure that the guests can be freely migrated within heterogeneous cluster/farm. In those cases OpenSSL will resort to other implementations in its codebase.
  • If AES-NI is not available, OpenSSL will either use Vector Permutation AES (VPAES) or  Bit-sliced AES (BSAES), provided the SSSE3 instruction set extension is available. SSSE3 was first introduced in 2006, so there is a fair chance that this will be available in most computers used. Both of these techniques avoid data- and key-dependent branches and memory references, and therefore are immune to known timing attacks. VPAES is used for CBC encrypt, ECB and “obscure” modes like OFB, CFB, while BSAES is used for CBC decrypt, CTR and XTS.
  • In the end, if your processor does not support AES-NI or SSSE3, OpenSSL falls back to integer-only assembly code. Unlike widely used T-table implementations, this code path uses a single 256-bytes S-box. This means that probability of a cache line not being accessed as result of block operation would be (1-64/256)^160=1e-20. “Would be” means that actual probability is even less, in fact zero, because S-box is fully prefetched, and even in every round.

For completeness sake it should be noted that OpenSSL does include reference C implementation which has no mitigations to cache-timing attacks. This is a platform-independent fall-back code that is used on platforms with no assembly modules, as well as in cases when assembler fails for some reason. On side note, OpenSSL maintains really minimal assembler requirement for AES-NI and SSSE3, in fact the code can be assembled on Fedora 1, even though support for these instructions was added later.

Bottom line is that if you are using a Linux distribution which comes with OpenSSL binaries, there is a very good chance that the packagers have taken pain to ensure that the reference C implementation is not compiled in. (Same thing would happen if you download OpenSSL source code and compile it)

It’s not clear from the research paper how the researchers were able to conduct the side channel attack. All evidence suggests that they ended up using the standard reference C implementation of AES instead of assembly modules which have mitigations in place.  The researchers were contacted but did not respond to this point.  Anyone using an OpenSSL binary they built themselves using the defaults, or precompiled as part of an Linux distribution should not be vulnerable to these attacks.