Given an ELF binary, you can use tools like Radare2 (open-source equivalent of IDA Pro) to try and understand what the binary consists of and what could exactly happen when you execute it. This is commonly known as Reversing.
Understanding what these tools do for you under the hood is a complex task, and writing a tool equivalent to radare2 would take a long time. But thanks to amazing python libraries, we could write a simple script to parse out the relevant sections and get enough information to solve a very simple crackme I created.
For those who don’t know crackmes are binaries (executables) that are written for hackers, who try and understand them, without having access to the source code. For a more information visit crackmes.de.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *buffer = malloc(9);
buffer[0] = 'p';
buffer[1] = 'a';
buffer[2] = 's';
buffer[3] = 's';
buffer[4] = 'w';
buffer[5] = 'o';
buffer[6] = 'r';
buffer[7] = 'd';
buffer[8] = '\0';
char *input = malloc(9);
printf("Enter the 8 letter password: ");
fgets(input, 9, stdin);
if (strncmp(input, buffer, 8) == 0) {
printf("You read my memory\n");
} else {
printf("You can't read my memory!\n");
}
return 0;
}
This one is probably a “Hello, World” equivalent of a crackme. But it’ll suffice for now.
Reversing
Let’s start by passing our binary to the file command.
aneesh@aneesh-ubuntu:~/pycon_talk$ file crackme_1
crackme_1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=8d5e6525579c6c97ae69b5e62dabfc2ec13cc0fd, not stripped
Seems standard. Now, we’re gonna start digging into the binary trying to look for information. Let’s start by getting a list of sections the binary has. To do this, I have written a python script using pyelftools.
import sys
from elftools.elf.elffile import ELFFile
def process_file(filename):
print('Processing file:', filename)
with open(filename, 'rb') as f:
elffile = ELFFile(f)
code_section = '.text'
for section in elffile.iter_sections():
print (section.name)
if __name__ == '__main__':
if len(sys.argv) == 2:
process_file(sys.argv[1])
aneesh@aneesh-ubuntu:~/pycon_talk$ python solve.py crackme_1
Processing file: crackme_1
.interp
.note.ABI-tag
.note.gnu.build-id
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt
.plt.got
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.init_array
.fini_array
.jcr
.dynamic
.got
.got.plt
.data
.bss
.comment
.shstrtab
.symtab
.strtab
There are a few important sections to note here. The text section mostly contains the opcodes that are executed by the processor. The bss section contain the uninitialized variables. The data section is going to have all the strings and other constants used by the program. Let’s now check the .text segment
A couple of minutes of work and we have a script to do that:
import sys
from elftools.elf.elffile import ELFFile
from capstone import *
def process_file(filename):
with open(filename, 'rb') as f:
elffile = ELFFile(f)
code = elffile.get_section_by_name('.text')
opcodes = code.data()
addr = code['sh_addr']
print 'Entry Point:', hex(elffile.header['e_entry'])
md = Cs(CS_ARCH_X86, CS_MODE_64)
for i in md.disasm(opcodes, addr):
print("0x%x:\t%s\t%s" %(i.address, i.mnemonic, i.op_str))
if __name__ == '__main__':
if len(sys.argv) == 2:
process_file(sys.argv[1])
Credits: I am using the capstone engine to find the mapping of opcodes to readable assembly instructions. Its a great project, read more about it here.
aneesh@aneesh-ubuntu:~/pycon_talk$ python solve.py crackme_1 | head -12
Entry Point: 0x400590
0x400590: xor ebp, ebp
0x400592: mov r9, rdx
0x400595: pop rsi
0x400596: mov rdx, rsp
0x400599: and rsp, 0xfffffffffffffff0
0x40059d: push rax
0x40059e: push rsp
0x40059f: mov r8, 0x4007e0
0x4005a6: mov rcx, 0x400770
0x4005ad: mov rdi, 0x400686
0x4005b4: call 0x400550
We now have the disassembly of the program. To decode the addresses before and after the .text segment, we’d need to find where all the sections are loaded. Here’s a script to do that:
import sys
from elftools.elf.elffile import ELFFile
from capstone import *
def process_file(filename):
print('Processing file:', filename)
with open(filename, 'rb') as f:
elffile = ELFFile(f)
for section in elffile.iter_sections():
print hex(section['sh_addr']), section.name, section['sh_size']
if __name__ == '__main__':
if len(sys.argv) == 2:
process_file(sys.argv[1])
aneesh@aneesh-ubuntu:~/pycon_talk$ python list_section_addr_size.py crackme_1
0x400238 .interp 28
0x400254 .note.ABI-tag 32
0x400274 .note.gnu.build-id 36
0x400298 .gnu.hash 36
0x4002c0 .dynsym 216
0x400398 .dynstr 95
0x4003f8 .gnu.version 18
0x400410 .gnu.version_r 32
0x400430 .rela.dyn 48
0x400460 .rela.plt 144
0x4004f0 .init 26
0x400510 .plt 112
0x400580 .plt.got 8
0x400590 .text 594
0x4007e4 .fini 9
0x4007f0 .rodata 79
0x400840 .eh_frame_hdr 52
0x400878 .eh_frame 244
0x600e10 .init_array 8
0x600e18 .fini_array 8
0x600e20 .jcr 8
0x600e28 .dynamic 464
0x600ff8 .got 8
0x601000 .got.plt 72
0x601048 .data 16
0x601060 .bss 16
0x0 .comment 47
0x0 .shstrtab 268
0x0 .symtab 1728
0x0 .strtab 635
Let’s now try to decode our first call instruction
0x4005b4: call 0x400550
The instruction refers the address 0x400550, which lies in the .plt section. Let’s try to disassembly the .plt section and see what we get. I can use one of the previous scripts to do the same.
0x400510: push qword ptr [rip + 0x200af2]
0x400516: jmp qword ptr [rip + 0x200af4]
0x40051c: nop dword ptr [rax]
0x400520: jmp qword ptr [rip + 0x200af2]
0x400526: push 0
0x40052b: jmp 0x400510
0x400530: jmp qword ptr [rip + 0x200aea]
0x400536: push 1
0x40053b: jmp 0x400510
0x400540: jmp qword ptr [rip + 0x200ae2]
0x400546: push 2
0x40054b: jmp 0x400510
0x400550: jmp qword ptr [rip + 0x200ada]
0x400556: push 3
0x40055b: jmp 0x400510
0x400560: jmp qword ptr [rip + 0x200ad2]
0x400566: push 4
0x40056b: jmp 0x400510
0x400570: jmp qword ptr [rip + 0x200aca]
0x400576: push 5
0x40057b: jmp 0x400510
It jumps to 0x400556 + 0x200ada = 0x601030, which lies in the .got.plt section. Note that rip always holds the value of the next instruction to execute, hence rip = 0x400556 when running the jump instruction. The .got.plt is filled up by the linker when running running the program, by using the information in the relocations section. Let’s dig into relocations next.
import sys
from elftools.elf.elffile import ELFFile
from elftools.elf.relocation import RelocationSection
from elftools.elf.descriptions import describe_reloc_type
def process_file(filename):
print('Processing file:', filename)
with open(filename, 'rb') as f:
elffile = ELFFile(f)
for section in elffile.iter_sections():
if not isinstance(section, RelocationSection):
continue
symtable = elffile.get_section(section['sh_link'])
print(' %s section with %s relocations' % (
section.name, section.num_relocations()))
for reloc in section.iter_relocations():
symbol = symtable.get_symbol(reloc['r_info_sym'])
print ' Relocation (%s)' % 'RELA' if reloc.is_RELA() else 'REL'
# Relocation entry attributes are available through item lookup
print ' offset = %s' % hex(reloc['r_offset'])
print symbol.name, 'type:', describe_reloc_type(reloc['r_info_type'], elffile), 'load at: ', hex(reloc['r_offset'])
if __name__ == '__main__':
process_file(sys.argv[1])
That gives us:
.rela.dyn section with 2 relocations
__gmon_start__ type: R_X86_64_GLOB_DAT load at: 0x600ff8
stdin type: R_X86_64_COPY load at: 0x601060
.rela.plt section with 6 relocations
strncmp type: R_X86_64_JUMP_SLOT load at: 0x601018
puts type: R_X86_64_JUMP_SLOT load at: 0x601020
printf type: R_X86_64_JUMP_SLOT load at: 0x601028
__libc_start_main type: R_X86_64_JUMP_SLOT load at: 0x601030
fgets type: R_X86_64_JUMP_SLOT load at: 0x601038
malloc type: R_X86_64_JUMP_SLOT load at: 0x601040
Phew! Finally, we have arrived at the function name the call instruction executes. It calls __libc_start_main. __libc_start_main initializes the process and calls main with appropriate attributes.
int __libc_start_main(int *(main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (* stack_end));
Let’s look at the disassembly again:
0x400590: xor ebp, ebp
0x400592: mov r9, rdx
0x400595: pop rsi
0x400596: mov rdx, rsp
0x400599: and rsp, 0xfffffffffffffff0
0x40059d: push rax
0x40059e: push rsp
0x40059f: mov r8, 0x4007e0
0x4005a6: mov rcx, 0x400770
0x4005ad: mov rdi, 0x400686
0x4005b4: call 0x400550
As the first argument will be pushed last on the stack, 0x400686 is the address of the main function. Let’s see the disassembly at 0x400686.
0x400686: push rbp
0x400687: mov rbp, rsp
0x40068a: sub rsp, 0x10
0x40068e: mov edi, 9
0x400693: call 0x400570
0x400698: mov qword ptr [rbp - 0x10], rax
0x40069c: mov rax, qword ptr [rbp - 0x10]
0x4006a0: mov byte ptr [rax], 0x70
0x4006a3: mov rax, qword ptr [rbp - 0x10]
0x4006a7: add rax, 1
0x4006ab: mov byte ptr [rax], 0x61
0x4006ae: mov rax, qword ptr [rbp - 0x10]
0x4006b2: add rax, 2
0x4006b6: mov byte ptr [rax], 0x73
0x4006b9: mov rax, qword ptr [rbp - 0x10]
0x4006bd: add rax, 3
0x4006c1: mov byte ptr [rax], 0x73
0x4006c4: mov rax, qword ptr [rbp - 0x10]
0x4006c8: add rax, 4
0x4006cc: mov byte ptr [rax], 0x77
0x4006cf: mov rax, qword ptr [rbp - 0x10]
0x4006d3: add rax, 5
0x4006d7: mov byte ptr [rax], 0x6f
0x4006da: mov rax, qword ptr [rbp - 0x10]
0x4006de: add rax, 6
0x4006e2: mov byte ptr [rax], 0x72
0x4006e5: mov rax, qword ptr [rbp - 0x10]
0x4006e9: add rax, 7
0x4006ed: mov byte ptr [rax], 0x64
0x4006f0: mov rax, qword ptr [rbp - 0x10]
0x4006f4: add rax, 8
0x4006f8: mov byte ptr [rax], 0
0x4006fb: mov edi, 9
0x400700: call 0x400570
0x400705: mov qword ptr [rbp - 8], rax
0x400709: mov edi, 0x4007f4
0x40070e: mov eax, 0
0x400713: call 0x400540
0x400718: mov rdx, qword ptr [rip + 0x200941]
0x40071f: mov rax, qword ptr [rbp - 8]
0x400723: mov esi, 9
0x400728: mov rdi, rax
0x40072b: call 0x400560
0x400730: mov rcx, qword ptr [rbp - 0x10]
0x400734: mov rax, qword ptr [rbp - 8]
0x400738: mov edx, 8
0x40073d: mov rsi, rcx
0x400740: mov rdi, rax
0x400743: call 0x400520
0x400748: test eax, eax
0x40074a: jne 0x400758
0x40074c: mov edi, 0x400812
0x400751: call 0x400530
0x400756: jmp 0x400762
0x400758: mov edi, 0x400825
0x40075d: call 0x400530
0x400762: mov eax, 0
0x400767: leave
0x400768: ret
We can go about decoding the call instructions in the same way we did earlier.
0x400570: malloc
0x400540: printf
0x400560: fgets
0x400520: strncmp
0x400530: puts
Nice. We finally have enough information to solve this crackme. The disassembly looks pretty self-explanatory now.
0x400698 – 0x400700: Set bytes into the memory returned by malloc. array = [0x70, 0x61, 0x73 … ] = [‘p’, ‘a’, ‘s’ …].
0x400718 – 0x40072b: fgets into the buffer returned by malloc from stdin.
0x400730 – 0x400743: strncmp.
That’s all for now. I am applying to give a pycon talk on Binary Analysis using Python. Please upvote my proposal here, if you’d like to see me there. Thanks!
The post Reversing an ELF from the ground up appeared first on Aneesh Dogra's Blog.