Debugging stack protector failures

Co-contributors: Dhiru Kholia and Florian Weimer

GCC upstream and Fedora 19 recently improved the stack smashing protector. Each time we add more security instrumentation, we also uncover some previously hidden bugs. This post shows how to debug stack protector failures.

Our example debugging session is based on a GNOME bug report for Evolution. Vadim Rutkovsky reported that Evolution 3.9.4 in Fedora rawhide crashed during the initial setup when built with -fstack-protector-strong flag.

The crash in question looked like this:

$ evolution

*** stack smashing detected ***: evolution terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x3cb230c9d7]
/lib64/libc.so.6(__fortify_fail+0x0)[0x3cb230c9a0]
/usr/lib64/evolution/3.10/modules/module-mail-config.so(+0x6bd5)⤵
⤷[0x7f1cb9223bd5]
/usr/lib64/evolution/3.10/libevolution-mail.so⤵
⤷(e_mail_config_service_page_add_scratch_source+0x344)[0x7f1cc2d4cf64]
/usr/lib64/evolution/3.10/libevolution-mail.so(+0x46394)⤵
⤷[0x7f1cc2d3f394]
/usr/lib64/evolution/3.10/modules/module-startup-wizard.so(+0x486b)⤵
⤷[0x7f1cb8c0686b]
/lib64/libgobject-2.0.so.0(g_object_newv+0x6d5)[0x3cb5215f95]
/lib64/libgobject-2.0.so.0(g_object_new_valist+0x1b6)[0x3cb52162e6]
/lib64/libgobject-2.0.so.0(g_object_new+0xd4)[0x3cb5216654]
/usr/lib64/evolution/3.10/modules/module-startup-wizard.so(+0x43ca)⤵
⤷[0x7f1cb8c063ca]
/lib64/libgobject-2.0.so.0(g_closure_invoke+0x138)[0x3cb520fa28]
/lib64/libgobject-2.0.so.0[0x3cb5220a3d]
/lib64/libgobject-2.0.so.0(g_signal_emit_valist+0xef9)[0x3cb5228829]
/lib64/libgobject-2.0.so.0(g_signal_emit+0x82)[0x3cb5228a72]
evolution(main+0x527)[0x404197]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3cb2221b75]
evolution[0x4046a9]

Notice the message “stack smashing detected”, which implies that GCC’s stack protector feature is being used.

The stack smashing protector in GCC

The GCC flags -fstack-protector and -fstack-protector-all activate the Stack Smashing Protector (SSP). When any of these flags are used, GCC instruments the function return instruction with a probabilistic check that the stack frame is not corrupted. This happens before the jump to the return address popped of the stack, and is intended to make exploitation of stack-based buffer overflows for arbitrary code execution more difficult.

The stack protector uses a canary value in the stack frame. The inserted check fails if the canary differs from the expected value loaded from a global variable.

Previously, GCC offered only two stack protector modes, -fstack-protector and -fstack-protector-all. GCC 4.8.1 and the GCC version in Fedora 19 added another mode, -fstack-protector-strong, bringing the number of stack protector modes up to three. These modes differ in the set of functions they consider eligible for instrumentation. -fstack-protector-all instruments all functions (including leaf functions that do not use pointers). For performance reasons, -fstack-protector only instruments a small subset of all functions, including functions which should have instrumentation.

The new -fstack-protector-strong mode adds instrumentation to almost all functions which can theoretically corrupt the stack frame (exceptions are large allocas and variable-length arrays and the named return value optimization in C++). The design is described in New stack protector option for gcc.

Currently, all Fedora 19 packages are built with -fstack-protector and all Fedora 20 (and above) are built with this new -fstack-protector-strong flag turned on.

In some cases, this exposes latent bugs, and we return to our debugging session in the next section.

A sample debugging session

After installing the corresponding -debuginfo package (evolution-debuginfo-3.9.4-1 in this case) and running evolution under GDB, we got the following resolved backtrace.

(gdb) bt
#0  0x0000003cb2235a19 in __GI_raise (sig=sig@entry=6)⤵
⤷ at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x0000003cb2237128 in __GI_abort () at abort.c:90
#2  0x0000003cb2275d47 in __libc_message (do_abort=do_abort@entry=2,⤵
⤷ fmt=fmt@entry=0x3cb237c85a "*** %s ***: %s terminatedn")⤵
⤷ at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x0000003cb230c9d7 in __GI___fortify_fail⤵
⤷ (msg=msg@entry=0x3cb237c842 "stack smashing detected")⤵
⤷ at fortify_fail.c:31
#4  0x0000003cb230c9a0 in __stack_chk_fail () at stack_chk_fail.c:28
#5  0x00007fffd764abd5 in mail_config_smtp_backend_insert_widgets⤵
⤷ (backend=0x126b2c0, parent=0x12711c0)⤵
⤷ at e-mail-config-smtp-backend.c:286
#6  0x00007fffe92d9f64 in⤵
⤷ e_mail_config_service_page_add_scratch_source⤵
⤷ (page=0x118f1f0, scratch_source=scratch_source@entry=0x126f0f0,⤵
⤷    opt_collection=opt_collection@entry=0x0)⤵
⤷ at e-mail-config-service-page.c:831
#7  0x00007fffe92cc394 in mail_config_assistant_constructed⤵
⤷ (object=<optimized out>) at e-mail-config-assistant.c:752
#8  0x00007fffd702d86b in startup_assistant_constructed⤵
⤷ (object=0x9d80e0) at e-startup-assistant.c:118
#9  0x0000003cb5215f95 in g_object_newv ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#10 0x0000003cb52162e6 in g_object_new_valist ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#11 0x0000003cb5216654 in g_object_new ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#12 0x00007fffd702dbac in e_startup_assistant_new⤵
⤷ (session=<optimized out>) at e-startup-assistant.c:240
#13 0x00007fffd702d3ca in startup_wizard_new_assistant⤵
⤷ (extension=0xd93090) at evolution-startup-wizard.c:98
#14 startup_wizard_run (extension=0xd93090)⤵
⤷ at evolution-startup-wizard.c:173
#15 startup_wizard_load_accounts (extension=0xd93090)⤵
⤷ at evolution-startup-wizard.c:274
#16 0x0000003cb520fa28 in g_closure_invoke ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#17 0x0000003cb5220a3d in signal_emit_unlocked_R ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#18 0x0000003cb5228829 in g_signal_emit_valist ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#19 0x0000003cb5228a72 in g_signal_emit ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#20 0x0000000000404197 in main (argc=1, argv=0x7fffffffd758)⤵
⤷ at main.c:682

The offending code seems to be in mail_config_smtp_backend_insert_widgets function (see frame 5). We were not familiar with the code base and needed to track down the code responsible for overwriting the stack canary. The location of the stack protector failure is only a very rough indicator because the actual write to the canary has happened earlier, perhaps in a nested function call. Ideally, we want to halt execution as soon as the write happens.

This is possible with a GDB feature called watchpoints. In this case, it is a bit tricky to set the watchpoint because we must make sure that the correct canary has been written to the stack, in the mail_config_smtp_backend_insert_widgets function because that is where the crash happens.

Lets use GDB to take a look at the function prologue.

$ gdb evolution
(gdb) break mail_config_smtp_backend_insert_widgets
(gdb) run
…
(gdb) disassemble

Dump of assembler code for function⤵
⤷ mail_config_smtp_backend_insert_widgets:
=> 0x00007fffd764a120 <+0>:     push   %r15
   0x00007fffd764a122 <+2>:     mov    %rdi,%r15
   0x00007fffd764a125 <+5>:     push   %r14
   0x00007fffd764a127 <+7>:     mov    %rsi,%r14
   0x00007fffd764a12a <+10>:    push   %r13
   0x00007fffd764a12c <+12>:    push   %r12
   0x00007fffd764a12e <+14>:    push   %rbp
   0x00007fffd764a12f <+15>:    push   %rbx
   0x00007fffd764a130 <+16>:    sub    $0x78,%rsp
   0x00007fffd764a134 <+20>:    mov    %rdi,0x50(%rsp)
   0x00007fffd764a139 <+25>:    mov    %rsi,0x30(%rsp)
   0x00007fffd764a13e <+30>:    mov    %fs:0x28,%rax
   0x00007fffd764a147 <+39>:    mov    %rax,0x68(%rsp)
   0x00007fffd764a14c <+44>:    xor    %eax,%eax
   0x00007fffd764a14e <+46>:    callq  0x7fffd76476a0⤵
⤷ <e_mail_config_smtp_backend_get_type@plt>

The stack canary value is copied to the stack by the following lines:

0x00007fffd764a13e <+30>:    mov    %fs:0x28,%rax
0x00007fffd764a147 <+39>:    mov    %rax,0x68(%rsp)

On 32-bit platforms, you will see mov %gs:0x14,%eax
instead of mov %fs:0x28,%rax.

So we need to activate our stack canary watch after hitting the instruction at address 0x00007fffd764a147. We also print the address of that canary.

(gdb) run
…
(gdb) break *0x00007fffd764a147
(gdb) continue
Breakpoint 2, 0x00007fffd764a147⤵
⤷ in mail_config_smtp_backend_insert_widgets⤵
⤷ (backend=0x126a280, parent=0x12701a0)⤵
⤷ at e-mail-config-smtp-backend.c:5
(gdb) print $rsp + 0x68
$1 = (void *) 0x7fffffffcd78

Thus the canary value we are interested in is stored at address 0x7fffffffcd78. We can set the required watchpoint and continue running the program.

(gdb) watch *$1
Hardware watchpoint 3: *0x7fffffffcd58
(gdb) continue
Continuing.
…
Hardware watchpoint 3: *0x7fffffffcd58

Old value = 1851354880
New value = 1851326464
0x0000003cb523616d in value_lcopy_int ()⤵
⤷ from /lib64/libgobject-2.0.so.0

(gdb) bt
#0  0x0000003cb523616d in value_lcopy_int ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#1  0x0000003cb5216e58 in g_object_get_valist ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#2  0x0000003cb5217267 in g_object_get ()⤵
⤷ from /lib64/libgobject-2.0.so.0
#3  0x00007fffd764aa6b in mail_config_smtp_backend_insert_widgets⤵
⤷ (backend=0x126a280, parent=0x12701a0)⤵
⤷ at e-mail-config-smtp-backend.c:242
#4  0x00007fffe92d9f64⤵
⤷ in e_mail_config_service_page_add_scratch_source⤵
⤷ (page=0x118e230, scratch_source=scratch_source@entry=0x126e0a0,⤵
⤷ opt_collection=opt_collection@entry=0x0)
 at e-mail-config-service-page.c:831
…

Aha! The offending code line seems to be line e-mail-config-smtp-backend.c:242, which is,

g_object_get (G_OBJECT (settings), "port", &port, NULL);

Note that port is defined as guint16 earlier. The GObject attribute, however, is a 32-bit unsigned integer, so g_object_get writes four bytes instead of the expected two.

Little endian to the rescue

Why did this work before? On a little-endian architecture such as x86_64, the port variable looks like this:

+-------------------------------+
| LSB (bit 0)  ... (bit 15) MSB |
+-------------------------------+

When a 32-bit value is written at this location, the effect is like this:

+-------------------------------+·······························.
| LSB (bit 0)    ...    (bit 15) (bit 16)          (bit 31) MSB :
+-------------------------------+·······························'

Crucially, the first two bytes containing the lower 16 bits receive the expected value. Since port numbers are less than 65,536, the immediately following two bytes are overwritten with zeros. (We believe that this is just an ordinary bug without security impact.) Depending on the stack layout, this can be without consequence, but -fstack-protector-strong puts the canary right next to the port variable, so the out-of-bounds write is suddenly detected.

This has always been a real bug on big-endian architectures such as ppc64. In that case, the memory layout looks like this:

+-------------------------------+
| MSB (bit 15)  ... (bit 0) LSB |
+-------------------------------+

+-------------------------------+·······························.
| MSB (bit 31)    ...   (bit 16) (bit 15)           (bit 0) MSB :
+-------------------------------+·······························'

Therefore, the 16-bit port variable was always zero because it received the upper half of the written value, not the lower half, independently of the actual setting.

Another failure mode

The stack protector instrumentation changes the register contents when the function is left, compared to the uninstrumented version of the function. With correct code, this is not visible. But if you call function returning void using a function pointer which has a non-void result type and examine the result, you can see the difference. Similarly, a non-void function which does not explicitly return a value can expose register contents, but GCC will warn about a missing return statement in this case.

The former issue, mismatched function pointers, turned out to be the cause of this GNOME bug in Rhythmbox. The fix for this type of bug is to adjust the function to return the correct type, and add a return state with an appropriate value. Again, this bug had no known security implications.

Conclusion

As we have seen, the increased stack protector coverage in recent GCC and Fedora versions not only increases security, but it also uncovers real bugs which previously only affected a subset of the architectures on which Fedora runs. Fortunately, these bugs are not too difficult to isolate and address. In this way, the stack protector instrumentation indirectly improves the overall quality of the software shipped in Fedora.