SSL/TLS Everywhere – visions of a secure OpenStack

As most people familiar with OpenStack are already aware, it is made up of many software components that are typically deployed in a distributed manner.  The more scalable an OpenStack deployment is, the more distributed the underlying components are as the infrastructure is usually scaled out horizontally on commodity hardware.  As a consequence of this distributed architecture, there are many communication channels used between all of the software components.  We have users communicating with the services via REST APIs and Dashboard, services communicating with each other via REST APIs and the message queue, services accessing databases, and so on.  One only needs to look at the following simplified diagram to get an idea of the number of communication channels that there are.

openstack-arch-havana-logical-v1Knowing about all of this communication taking place in an OpenStack deployment should raise a few questions.  What communication channels need to be secured, and how can it be done?  The OpenStack Security Guide attempts to address these questions at a high-level.  The guidance can be summarized as “use SSL/TLS on both public facing and internal networks”.  If you talk to those deploying OpenStack though, you will find that there are many different opinions on where and how SSL/TLS should be used.  For example, some deployments will use SSL/TLS on public facing proxies only, leaving traffic on their internal networks in the clear.  I don’t think that anyone really thinks that having unencrypted traffic on internal networks is more secure than encrypting it, but there are some with the opinion that it is unnecessary due to network security being “good enough”.  I also think that technical difficulties in setting up SSL/TLS to protect all of these communication channels is a factor, especially when you start adding in complexities with load balancing and highly-available deployments.  If actually deploying with SSL/TLS everywhere is too difficult, it makes it easier to accept the compromise of relying on network security alone internally.  This is far from ideal.

The first thing one should do when evaluating their OpenStack SSL/TLS needs is to identify the threats.  You can divide these threats into external and internal attacker categories, but the lines tend to get blurred since certain components of OpenStack operate on both the public and management networks.

For publicly facing services, the threats are pretty straight-forward.  Users will be authenticating against Horizon and Keystone with their username and password.  Users will also be accessing the API endpoints for other services using their Keystone tokens.  If this network traffic is unencrypted, password and tokens can be intercepted by an attacker using a man-in-the-middle attack.  The attacker can then use these valid credentials to perform malicious operations.  All real deployments should be using SSL/TLS to protect publicly facing services.

For services that are deployed on internal networks, the threats aren’t so clear due to the bridging of security domains previously mentioned.  There is always the chance that an administrator with access to the management network decides to do something malicious.  SSL/TLS isn’t going to help in this situation if the attacker is allowed to access the private key.  Not everyone on the management network would be allowed to access the private key of course, so there is still a lot of value in using SSL/TLS to protect yourself from internal attackers.  Even if everyone that is allowed to access your management network is 100% trusted, there is still a threat that an unauthorized user gains access to your internal network by exploiting a misconfiguration or software vulnerability.  One must keep in mind that you have users running their own code on instances in the OpenStack Compute nodes, which are deployed on the management network.  If a vulnerability allows them to break out of the hypervisor, they will have access to your management network.  Using SSL/TLS on the management network can minimize the damage that an attacker can cause.

It is generally accepted that it is best to encrypt sensitive data as early as possible and decrypt it as late as possible.  Despite this best practice, it seems that it’s common to use a SSL/TLS proxy in front of the OpenStack services and use clear communication afterwards:

ssl-everywhere-proxyLet’s look at some of the reasons for the use of SSL/TLS proxies as pictured above:

  • Native SSL/TLS in OpenStack services does not perform/scale as well as SSL proxies (particularly for Python implementations like Eventlet).
  • Native SSL/TLS in OpenStack services not as well scrutinized/audited as more proven solutions.
  • Native SSL/TLS configuration is difficult (not well documented, tested, or consistent across services).
  • Privilege separation (OpenStack service processes should not have direct access to private keys used for SSL/TLS).
  • Traffic inspection needs for load balancing.

All of the above are valid concerns, but none of the prevent SSL/TLS from being used on the management network.  Let’s consider the following deployment model:

ssl-everywhere-proxy-coloThis is very similar to the previous diagram, but the SSL/TLS proxy is on the same physical system as the API endpoint.  The API endpoint would be configured to only listen on the local network interface.  All remote communication with the API endpoint would go through the SSL/TLS proxy.  With this deployment model, we address a number of the bullet points above.  A proven SSL implementation that performs well would be used.  The same SSL proxy software would be used for all services, so SSL configuration for the API endpoints would be consistent.  The OpenStack service processes would not have direct access to the private keys used for SSL/TLS, as you would run the SSL proxies as a different user and restrict access using permissions (and additionally mandatory access controls using something like SELinux).  We would ideally have the API endpoints listen on a Unix socket such that we could restrict access to it using permissions and mandatory access controls as well.  Unfortunately, this doesn’t seem to work currently in Eventlet from my testing.  It is a good future development goal.

What about high availability or load balanced deployments that need to inspect traffic?  The previous deployment model wouldn’t allow for deep packet inspection since the traffic is encrypted.  If the traffic only needs to be inspected for basic routing purposes, it might not be necessary for the load balancer to have access to the unencrypted traffic.  HAProxy has the ability to extract the SSL/TLS session ID during the handshake, which can then be used to achieve session affinity.  HAProxy can also use the TLS Server Name Indication (SNI) extension to determine where traffic should be routed to.  These features likely cover some of the most common load balancer needs.  HAProxy would be able to just pass the HTTPS traffic straight through to the API endpoint systems in this case:

ssl-everywhere-ha-passthruWhat if you want cryptographic separation of your external and internal environments?  A public cloud provider would likely want their public facing services (or proxies) to use certificates that are issued by a CA that chains up to a trusted Root CA that is distributed in popular web browser software for SSL/TLS.   For the internal services,  one might want to instead use their own PKI to issue certificates for SSL/TLS.  This cryptographic separation can be accomplished by terminating SSL at the network boundary, then re-encrypting using the internally issued certificates.  The traffic will be unencrypted for a brief period on the public facing SSL/TLS proxy, but it will never be transmitted over the network in the clear.  The same re-encryption approach that is used to achieve cryptographic separation can also be used if deep packet inspection is really needed on a load balancer.  Here is what this deployment model would look like:

ssl-everywhere-ha-reencryptAs with most things, there are trade-offs.  The main trade-off is going to be between security and performance.  Encryption has a cost, but so does being hacked.  The security and performance requirements are going to be different for every deployment, so how SSL/TLS is used will ultimately be an individual decision.

What can be done in the OpenStack community to ensure that a secure deployment is as friendly as possible?  After all, many of the deployment models described above don’t even use components of OpenStack to implement SSL/TLS.

On the documentation side of things, we can improve the OpenStack Security Guide to go into more detail about secure reference architectures.  There’s no coverage on load balancers and highly available deployments with SSL/TLS, which would be a nice topic to cover.  Nearly everything in the deployment models described above should work today.

On the development side of things, there are a number of areas where improvements can be made.  I’ve focused on the server side SSL/TLS implementation of the API endpoints, but the OpenStack services all have client-side SSL/TLS implementations that are used when communicating with each other.  Many of the improvements we can make are on the SSL/TLS client side of things:

  • SSL/TLS client support in the OpenStack services isn’t well tested currently, as Devstack doesn’t have the ability to automatically configure the services for SSL/TLS.
  • Tempest should perform SSL/TLS testing to ensure that everything remains working for secure deployments.
  • The HTTP client implementations and configuration steps for SSL/TLS varies between OpenStack services.  We should standardize in these areas for feature parity and ease of configuration.
  • OpenStack services should support listening on Unix sockets instead of network interfaces.  This would allow them to be locked down more securely when co-located with a SSL/TLS proxy.

It would be great if we can get some cross-project coordination on working towards these development goals in the Juno cycle, as I really think that we would have a more polished security story around the API endpoints.  I’m hoping to get a chance to discuss this with other interested Stackers at the Summit in Atlanta.

New SELinux Feature: File Name Transitions

In Red Hat Enterprise Linux 7, we have fixed one of the biggest issues with SELinux where initial creation of content by users and administrators can sometimes get the wrong label.

The new feature makes labeling files easier for users and administrators. The goal is to prevent the accidental mislabeling of file objects.

Accidental Mislabeling

Users and administrators often create files or directories that do not have the same label as the parent directory, and then they forget to fix the label. One example of this would be an administrator going into the /root directory and creating the .ssh directory. In previous versions of Red Hat Enterprise Linux, the directory would get created with a label of admin_home_t, even though the policy requires it to be labeled ssh_home_t. Later when the admin tries to use the content of the .ssh directory to log in without a password, sshd (sshd_t) fails to read the directory’s contents because sshd is not allowed to read files labeled admin_home_t. The administrator would need to run restorecon -R -v /home/.ssh to fix the labels, and often they forget to do so.

Another example would be a user creating the public_html directory in his home directory. The default label for content in the home directory is user_home_t, but SELinux requires the public_html directory to be labeled http_user_content_t, which allows the Apache process (httpd_t) to read the content. We block the Apache process from reading user_home_t as valuable information like user secrets and credit-card data could be in the user’s home directory.

File Transitions Policy

Policy writers have always be able to write a file transition rule that includes the type of the processes creating the file object (NetworkManger_t), the type of the directory that will contain the file object (etc_t), and the class of the file object (file). They can also specify the type of the created object (net_conf_t):

filetrans_pattern(NetworkManager_t, etc_t, file, net_conf_t)

This policy line says that a process running as NetworkManager_t creating any file in a directory labeled etc_t will create it with the label net_conf_t.

Named File Transitions Policy

Eric Paris added a cool feature to the kernel that allows the kernel to label a file based on four characteristics instead of just three. He added the base file name (not the path).

Now policy writers can write policy rules that state:

  • If the unconfined_t user process creates the .ssh directory in a directory labeled admin_home_t, then it will get created with the label ssh_home_t: `filetrans_pattern(unconfined_t, admin_home_t, dir, ssh_home_t, “.ssh”)
  • If the staff_t user process creates a directory named public_html in a directory labeled user_home_dir_t, it will get labeled http_user_content_t: `filetrans_pattern(staff_t, user_home_dir_t, dir, http_user_content_t, “public_html”)

Additionally, we have added rules to make sure that if the kernel creates content in /dev, it will label it correctly rather than waiting for udev to fix the label.

filetrans_pattern(kernel_t, device_t, chr_file, wireless_device_t, "rfkill")

Better Security

This can also be considered a security enhancement, since in Red Hat Enterprise Linux 6, policy writers could only write rules based on the the destination directory label. Consider the example above using NetworkManager_t. In Red Hat Enterprise Linux 6, a policy writer would write filetrans_pattern(NetworkManager_t, etc_t, file, net_conf_t), which means the networkmanager process could create any file in an etc_t directory (/etc) that did not exist. If for some reason the /etc/passwd file did not exist, SELinux policy would not block NetworkManager_t from creating /etc/passwd. In Red Hat Enterprise Linux 7, we can write a tighter policy like this:

filetrans_pattern(NetworkManager_t, etc_t, file, net_conf_t, "resolv.conf")

This states that NetworkManger can only create files named resolv.conf in directories labeled etc_t. If it tries to create the passwd file in an etc_t directory, the policy would check if NetworkManager_t is allowed to create an etc_t file, which is not allowed.

Bottom Line

This feature should result in less occurrences of accidental mislabels by users and hopefully a more secure and better-running SELinux system.

New Red Hat Enterprise Linux 7 Security Feature: systemd-journald

A lot has already been written about systemd-journald. For example, this article describes the security benefits of the journal.

I would argue that systemd-journal is not a full replacement for syslog. The syslog format is ubiquitous, and I don’t see it going away. On all Red Hat Enterprise Linux 7 machines, syslog will still be on by default. This is because it’s still the defacto mechanism for centralizing your logging data, and most tools that analyze log data read syslog data. The journald actually makes syslog better, as syslog gathers its data from the journal, and because the journal runs from bootup to shutdown, it can feed more data to syslog, saving it until the syslog process starts.

When journald was first being created, many people who were working on Structured Logging got all up in arms over it because Lennart Poettering and Kay Sievers did not work with them. Despite that problem, I still like it.

When it comes to launching system apps, systemd has become the central point. It can be thought of as the systems process manager. It knows more about what is going on in the system then any other process, save for the kernel.

Years ago when the audit system was being built, Karl MacMillan of Tresys believed that some of the problems that the audit system was trying to fix could be handled by extending syslog to record all information about the sending process. You see syslog records very little metadata about who sent the syslog message. The audit subsystem was created to record all of the critical identity data, such as all of the UIDs associated with a process as well as the SELinux context; journald now collects all of data.

Let me give an example of where systemd-journal could be used to increase security.

SELinux controls what a process is allowed to do based on what it was designed to do. Sometimes even less, depending on the security goals of the policy writer. This means SELinux would prevent a hacked ntpd process from doing anything other then handling Network Time. SELinux would prevent the hacked ntpd from reading MySQL databases or credit-card data from a user’s home directory, even if the ntpd process was running as root. However, as the ntpd process sends syslog messages, SELinux would allow the hacked process to continue to send syslog messages.

The hacked ntpd could format syslog messages to match other daemons and potentially trick an administrator or (even better) a tool that reads the syslog file (like intrusion detection tools) into doing something bad. If all messages were verified with the systemd-journal, then the administrator or syslog analysis tool could see that ntpd_t was sending messages forged as if they were coming from the sshd daemon. The intrusion detection tools, realizing the ntpd daemon had been hacked, could then be coded to recognize those bad messages.

.cursor=s=f328cc4b2615417189ab76b00c7ae041;i=2;b=4c3d0faf6b774fb7930972c1a4a5f87
.realtime=1329940273078467
...skipping...
SYSLOG_IDENTIFIER=sshd
SYSLOG_PID=2302
MESSAGE=sshd Fake message from sshd.
_PID=2302
_UID=0
_GID=0
_COMM=ntpd
_EXE=/usr/sbin/ntpd
_CMDLINE=/usr/sbin/ntpd -n -u ntp:ntp -g
_SYSTEMD_CGROUP=/system/ntpd.service
_SYSTEMD_UNIT=ntpd.service
_SELINUX_CONTEXT=system_u:system_r:ntpd_t:s0
_SOURCE_REALTIME_TIMESTAMP=1330527027590337
_BOOT_ID=4c3d0faf6b774fb7930972c1a4a5f870
_MACHINE_ID=432d8198a8fc421caf2dca48ccde1cf2\
_HOSTNAME=x.example.com

New SELinux Feature: File Name Transitions

In Red Hat Enterprise Linux 7, we have fixed one of the biggest issues with SELinux where initial creation of content by users and administrators can sometimes get the wrong label.

The new feature makes labeling files easier for users and administrators. The goal is to prevent the accidental mislabeling of file objects.

Accidental Mislabeling

Users and administrators often create files or directories that do not have the same label as the parent directory, and then they forget to fix the label. One example of this would be an administrator going into the /root directory and creating the .ssh directory. In previous versions of Red Hat Enterprise Linux, the directory would get created with a label of admin_home_t, even though the policy requires it to be labeled ssh_home_t. Later when the admin tries to use the content of the .ssh directory to log in without a password, sshd (sshd_t) fails to read the directory’s contents because sshd is not allowed to read files labeled admin_home_t. The administrator would need to run restorecon -R -v /home/.ssh to fix the labels, and often they forget to do so.

Another example would be a user creating the public_html directory in his home directory. The default label for content in the home directory is user_home_t, but SELinux requires the public_html directory to be labeled http_user_content_t, which allows the Apache process (httpd_t) to read the content. We block the Apache process from reading user_home_t as valuable information like user secrets and credit-card data could be in the user’s home directory.

File Transitions Policy

Policy writers have always be able to write a file transition rule that includes the type of the processes creating the file object (NetworkManger_t), the type of the directory that will contain the file object (etc_t), and the class of the file object (file). They can also specify the type of the created object (net_conf_t):

filetrans_pattern(NetworkManager_t, etc_t, file, net_conf_t)

This policy line says that a process running as NetworkManager_t creating any file in a directory labeled etc_t will create it with the label net_conf_t.

Named File Transitions Policy

Eric Paris added a cool feature to the kernel that allows the kernel to label a file based on four characteristics instead of just three. He added the base file name (not the path).

Now policy writers can write policy rules that state:

  • If the unconfined_t user process creates the .ssh directory in a directory labeled admin_home_t, then it will get created with the label ssh_home_t: `filetrans_pattern(unconfined_t, admin_home_t, dir, ssh_home_t, “.ssh”)
  • If the staff_t user process creates a directory named public_html in a directory labeled user_home_dir_t, it will get labeled http_user_content_t: `filetrans_pattern(staff_t, user_home_dir_t, dir, http_user_content_t, “public_html”)

Additionally, we have added rules to make sure that if the kernel creates content in /dev, it will label it correctly rather than waiting for udev to fix the label.

filetrans_pattern(kernel_t, device_t, chr_file, wireless_device_t, "rfkill")

Better Security

This can also be considered a security enhancement, since in Red Hat Enterprise Linux 6, policy writers could only write rules based on the the destination directory label. Consider the example above using NetworkManager_t. In Red Hat Enterprise Linux 6, a policy writer would write filetrans_pattern(NetworkManager_t, etc_t, file, net_conf_t), which means the networkmanager process could create any file in an etc_t directory (/etc) that did not exist. If for some reason the /etc/passwd file did not exist, SELinux policy would not block NetworkManager_t from creating /etc/passwd. In Red Hat Enterprise Linux 7, we can write a tighter policy like this:

filetrans_pattern(NetworkManager_t, etc_t, file, net_conf_t, "resolv.conf")

This states that NetworkManger can only create files named resolv.conf in directories labeled etc_t. If it tries to create the passwd file in an etc_t directory, the policy would check if NetworkManager_t is allowed to create an etc_t file, which is not allowed.

Bottom Line

This feature should result in less occurrences of accidental mislabels by users and hopefully a more secure and better-running SELinux system.