Position Independent Executables (PIE)

The Fedora Engineering Steering Committee maintains a conservative list of packages that must be built using security features of GCC. Packages not on this list have these security features enabled at the packagers’ descretion. There is not currently a consensus in the community as to when security hardened binaries are necessary. As a result the use of security hardened binaries can be a controversial topic. Most arguments can be reduced to whether the security benefit outweighs the performance overhead involved in using the feature.

Position Independent Executables (PIE) are an output of the hardened package build process. A PIE binary and all of its dependencies are loaded into random locations within virtual memory each time the application is executed. This makes Return Oriented Programming (ROP) attacks much more difficult to execute reliably. These blog posts are designed to showcase the results of a study I did recently which looked at the effect of building applications using PIE. In the study I investigated the overhead incurred in the loader during program startup with the aim to help distributions make better security decisions based on a technical analysis. The focus on program startup was chiefly to examine the place where PIE has the largest performance impact. The performance post process execution is largely comparable to standard Dynamic Shared Objects (DSOs) on x86_64 machines depending on how well the program and shared libraries have been designed. As this is a security blog I am biased towards functionality that increases security. However, in the tests that I performed, the start time of a PIE application and a regular application were comparable.

One of the more interesting things for me personally whilst doing this work was looking at how compiling with PIE enabled affects the resultant binary. Consider the following “Hello World” program:

#include "not/stdio.h"
char message[] = "Hello World";
int main(int argc, char *argv[], char *envp[])
    return 0;

To reduce other influences, I used my own implementation of the standard library functions during compilation:

$ cc -nostdlib -nodefaultlibs -I.   
                -o static-example   

$ size --format=sysv static-example
static-example  :
section     size      addr
.text        420   4194536 
.rodata        2   4194956
.eh_frame    280   4194960
.data         12   6292392
.comment      44         0
Total        758

The ELF binary that is produced by this build has no dependencies on libc or the loader in order to run. This means that it can be loaded into memory and run without depending on the linker to find and bind dynamically with dependencies. This makes sharing and reusing routines difficult, however. The common solution to this problem is to create a shared library:

$ cc -fpic -shared -I. -nostdlib  
              -o libnotc.so       

The next step is to recompile the main binary indicating that some symbol definitions exist within an external shared library:

$ cc -nostdlib -nodefaultlibs -I. 
              -o dynamic-example  
              main.c -L. -lnotc

The size of the resultant binary has a smaller .text section as that code is contained within the shared library libnotc.so. There are some other significant differences:

$ size --format=sysv dynamic-example
dynamic-example  :
section              size      addr
.interp                28   4194816
.note.gnu.build-id     36   4194844
.gnu.hash              48   4194880
.dynsym               144   4194928
.dynstr                46   4195072
.rela.plt              48   4195120
.plt                   48   4195168
.text                  56   4195216
.eh_frame_hdr          28   4195272
.eh_frame              96   4195304
.dynamic              272   6292552
.got.plt               40   6292824
.data                  12   6292864
.comment               44         0
Total                 946

In order for the program to execute correctly the ELF binary needs to be constructed in such a way that it allows the loader to resolve symbols at runtime. As the address of the symbol in memory is not a part of the main binary the loader adds a level of indirection in the procedure linkage table (the .plt section). Instead of calling puts() directly, the .plt section contains a special entry that points to the loader. The loader then has to resolve the actual address of the function. Once it has done that it updates an entry in the Global Offset Table (GOT). Subsequent calls to the same routine are made by jumps from the GOT entry.

A standard ELF binary is typically loaded into the the same base address in virtual memory each time it is executed. The linker takes advantage of this in non-relocatable code by jumping to absolute addresses of symbols. This turns out to have a slight performance benefit as it is quicker to jump to an absolute address than using relative addressing. This is especially true for i386 applications as another register is required for this process.

To see the difference between the dynamic and PIE applications we need to recompile the example program as a PIE. This simply requires the addition of the -fpic -pie flags to what we had previously:

$ cc -fpic -pie -nostdlib -nodefaultlibs -I. 
              -o pie-example  
              main.c -L. -lnotc
$ size --format=sysv pie-example 
pie-example  :
section              size      addr
.interp                28       512
.note.gnu.build-id     36       540
.gnu.hash              52       576
.dynsym               192       632
.dynstr                54       824
.rela.dyn              24       880
.rela.plt              48       904
.plt                   48       960
.text                  61      1008
.eh_frame_hdr          28      1072
.eh_frame              96      1104
.dynamic              320   2098352
.got                    8   2098672
.got.plt               40   2098680
.data                  12   2098720
.comment               44         0
Total                1091

Note that the address listed by the size command for each of the ELF sections is a relative address, whilst the address listed for the dynamic-example uses an absolute location. This is necessary because the program and all of its dependencies will be loaded into random locations in virtual memory upon execution. This is inclusive of prelinked libraries, and as such serves as an effective exploit mitigation technology for attacks that rely on returning to known addresses of standard system libraries. The overhead that is incurred by this defense mechanism and ways in which the number of relative relocations can be reduced will be covered in the next post of this series.

3 thoughts on “Position Independent Executables (PIE)

  1. Pingback: Position Independent Executable (PIE) Performance | Red Hat Security

  2. Pingback: OpenBSD 5.3 z demonem SMTPD 5.3 | OSWorld.pl

Comments are closed.