Assembly Reading

Some notes on reading x86 assembly

Introduction

These pages aim to show the basics of reading and interpreting x86 assembly. It will contain several code samples that start from a simple “Hello world” binary to more complex binaries that have flow control, arithmetic operations, etc. The goal is to introduce x86 assembly and slowly build up the understanding to be able to read and interpret more complex assembly instructions.

Background Information

Terms & Definitions

  1. Stack - A region of memory where data is stored and retrieved in a LIFO (last in first out) manner. It’s commonly used to store return addresses, variables, flags, etc.

  2. Stack pointer - CPU register that points to the top of the stack.

  3. Base pointer - CPU register that points to a fixed location within the stack and is used to access local variables and function parameters.

  4. Stack frame - A memory location range.

  5. Instruction - An action that the CPU could interpret and do, such as MOV.

  6. Registers - Typically dedicated to store specific data in memory

    1. RAX (accumulator register): Historically used for I/O operations. In modern usage, it is often used to store the return value of a function.

    2. RBX (base register): Often used as a base pointer to an array in memory, hence the name "base" register.

    3. RCX (count register): Used in loop and shift instructions as a counter, hence the name "count" register.

    4. RDX (data register): Historically used in I/O operations. In modern usage, it often holds extra data that can't fit in RAX, such as the high-order half of a 128-bit integer.

    5. RSI (source index): Often used as a pointer to the source in string and memory operations.

    6. RDI (destination index): Often used as a pointer to the destination in string and memory operations.

    7. RBP (base pointer): Typically used as a base pointer for the stack frame.

    8. RSP (stack pointer): Always points to the top of the stack.

    9. R8 - R15: Additional general-purpose registers that were added in the x86-64 architecture. They can be used for a variety of purposes.

  7. Stack: A region of computer memory where data is temporarily stored and retrieved in a last-in-first-out (LIFO) manner. It's commonly used to store return addresses, local variables, and other state information.

  8. Stack Pointer: A CPU register that points to the top of the stack.

  9. Base Pointer: A CPU register that points to a fixed location within a stack frame, and is used to access local variables and function parameters.

  10. Stack Frame: The range of memory locations on the stack that is used by a single function invocation to store local variables and other information.

Function Prologue

Within the context of assembly, the function prologue refers to a set of assembly instructions that happens at the beginning of a function that prepares it to execute properly. This usually involves the following steps

  1. Push base pointer onto the stack (push %rbp): This is typically the first thing that happens when a function is called. Since a function usually has its own local variables, it requires the creation of a new stack frame though the calling being called may happen during the middle of the program’s execution. For example, main gets called, then a helper function is called, then after the helper function is done, program execution returns back to main until the program is finished executing. In this case, we don’t want the called function (in the previous example the helper function) to mess up the caller’s (main) stack frame because when the function execution flow goes back to main, the program will need to continue using the local variables in the main function’s stack frame. To handle this, when a new function is called, it first saves the old base pointer (bp) by pushing it on the stack.

  2. Move the stack pointer into base pointer register (mov %rsp, %rbp): The prologue will then usually move the current stack pointer (%rsp in x86-64) into the base pointer register, effectively creating a new base pointer for the called function. The stack pointer points to the top of the stack and changes as data is pushed or popped from the stack. By copying the stack pointer to the base pointer, we have a fixed reference to the top of the stack at the point the function was called.

  3. Decrement the stack pointer ($0x10, %rsp): This will effectively make room for any variables that will be used in the function. In the example $0x10, %rsp it is making 16 bytes (10 hex = 16 decimal) of room in memory. In other programs this will usually be larger but this is just for example purposes.

Last updated