Regular expressions and recommended practices

Red Hat published on 2015-04-22T13:30:52+00:00, last updated 2015-04-22T13:30:52+00:00

Whenever a security person crosses a vulnerability report, one of the the first steps is to ensure that the reported problem is actually a vulnerability. Usually, the issue falls into well known and studied categories and this step is done rather quickly. Occasionally, however, one can come across bugs where this initial triage is a bit more problematic. This blog post is about such an issue, which will ultimately lead us to the concept of “recommended practice”.

What happened?

On July 31st 2014, Maksymilian Arciemowicz of cxsecurity reported that “C++11 [is] insecure by default.”, with upstream GCC bugs 61601 and 61582. LLVM/Clang's libc++ didn't dodge the bullet either, more details are available in LLVM bug 20291.

Not everybody can be bothered to go through so many links, so here is a quick summary: C++11, a new C++ standard approved in 2011, introduced support for regular expressions. Regular expressions (regexes from here on) are an amazingly powerful processing tool – but one that can become extremely complex to handle correctly. Not only can the regex itself become hideous and hard to understand, but also the way how the regex engine deals with it can lead to all sorts of problems. If certain complex regexes are passed to a regex engine, the engine can quickly out-grow the available CPU and memory constraints while trying to process the expression, possibly leading to a catastrophic event, which some call ReDoS, a “regular expression denial of service”.

This is exactly what Maksymilian Arciemowicz exploits: he passes specially crafted regexes to the regex engines provided by the C++11 implementations of GCC and Clang, causing them to use a huge amount of CPU resources or even crash (e.g. due to extreme recursion, which will exhaust all the available stack space, leading to a stack-overflow).

Is it a vulnerability?

CPU exhaustion and crashes are often good indicators for a vulnerability. Additionally, the C++11 standard even suggests error return codes for the exact problems triggered, but the implementations at hand fail to catch these situations. So, this must be a vulnerability, right? Well, this is the point where opinions differ. In order to understand why, it's necessary to introduce a new concept:

The “recommended practice” concept

“Recommended practice” is essentially a mix of common sense and dos and don'ts. A huge problem is that they are informal, so there's no ultimate guide on the subject, which leaves best practices open to personal experiences and opinion. Nevertheless, the vast majority of the programming community should know about the dangers of regular expressions; dangers just like the issues Maksymilian Arciemowicz reported in GCC/Clang. That said, passing arbitrary, unfiltered regexes from an untrusted source to the regex engine should be considered as a recommended practice case of “don't do this; it'll blow up in your face big time”.

To further clear this up: if an application uses a perfectly reasonable, well defined regex and the application crashes because the regex engine chocked when processing certain specially crafted input, it's (most likely) a vulnerability in the regex engine. However, if the application uses a regex thought to be well defined, efficient and trusted, but turns out to e.g. take overly long to process certain specially crafted input, while other, more efficient regexes will do the job just fine, it's (probably) a vulnerability in the application. But if untrusted regexes are passed to the regex engine without somehow filtering them for sanity first (which is incredibly hard to do for anything but the simplest of regexes, so better to avoid it), it is violating what a lot of people believe to be recommended practice, and thus it is often not considered to be a strict vulnerability in the regex engine.

So, next time you feel inclined to pass regexes verbatim to the engine, you'll hopefully remember that it's not a good idea and refrain from doing so. If you have done so in the past, you should probably go ahead and fix it.

Language English

About The Author

RH Red Hat

Red Hat

This user is used for automation in Pantheon as part of the Docs publishing toolchain.

Select Your Language

Warning message

Regular expressions and recommended practices

About The Author

Red Hat

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Warning message

About The Author

Red Hat

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links