ASM 1 Tutorial: Absolute Basics

July 14, 2020

In this series we’ll be working through these tutorials on the basics of assembly. We’ll be using the Netwide Assembler (NASM) (https://www.nasm.us/) to write assembly programs on a 32-bit i686 Ubuntu Linux virtual machine.

The goal of this series is to document my journey as I learn how to program in assembly. I’m eager to improve my command over assembly language because I want to learn more about exploit development and low-level programming. Therefore, there may be asides or references to matters related to cyber-security / exploitation throughout the series.

Finally, this series is not completely original and is based directly off the tutorial created by https://asmtutor.com/. Most of the assembly code examples are pulled from that tutorial series.

Now, let’s jump in.

Every assembly program gets broken into three main sections:

  1. .data section
  2. .bss section
  3. .text section

The data section is used for “declaring initialized data and constants. The data does not change at runtime,” (tutorialspoint.com). This is significant because the amount of space allocated in the data section cannot be increased at runtime.

The bss section is used for declaring variables.

The text section is where the actual instructions for the program go. This section must begin with the instruction “global _start which tells the kernel where program execution begins,” (tutorialspoint.com).

The only interface the programmer has above the actual hardware is the kernel. In order to build useful programs in assembly we have to use the “Linux system calls provided by the kernel,” (asmtutor.com). System calls are implemented as assembly language instructions and are “usually made when a process in user mode requires access to a resource,” (tutorialspoint.com). When you execute a system call a few things happen:

  1. The user application sets up arguments for system call. Depending on the system call, this likely means loading up general purpose registers.
  2. Once the arguments are staged an interrupt instruction “INT” is run which indicates that the user program needs a resource from the OS.
  3. The interrupt instruction has a number which tells the kernel what the user program needs. On Linux, 0x80 is the interrupt number that says the user needs to make a system call: “INT 0x80.”
  4. The system call number has been loaded into the EAX register. This way, when the interrupt is triggered, the kernel knows what system call in the system call table the user is trying to execute. For example, “requesting an interrupt when EAX = 1 will call sys_exit and requesting and interrupt when EAX = 4 will call sys_write(),” (asmtutor.com).
  5. Control jumps to the instructions at the address specified at in the system call table. The user program state is preserved, the system call executes, and control is transferred back to the user program.

Here’s a visual explanation of the system call process:

“””

There are four general classes of system calls (tutorialspoint.com):

  1. Process control: deal with process creation, termination, etc
  2. File management: creating, deleting, reading, writing, etc
  3. Device management: reading from device buffers into other device buffers
  4. Information maintenance: information transfer between user program and operating system

“””

Building on this, there are a few common scenarios that require system calls (asmtutor.com):

“””

  1. Files system requires the creation or deletion of files. Reading or writing from files also requires a system call.
  2. Creation and management of new processes.
  3. Network connections also require system calls. This includes sending and receiving of packets.
  4. Access to hardware devices like printers and scanners.

“””

Now that we have a better understanding about what system calls are and how they’re implemented, let’s get to actually writing our assembly program:

Here is a visual layout of an assembly program:

First we need to create a variable ‘msg’ in the .data section of our program and “assign to it the string we need to output,” (asmtutor.com). In our .text section we tell the kernel where to begin execution by “providing it with a global label _start: to denote the programs entry point,” (asmtutor.com).

We’re using sys_write to write our output to the console window. This function takes three arguments which are “sequentially loaded into EDX, ECX, and EBX before requesting a software interrupt which will perform the task,” (asmtutor.com):

  1. EDX will be loaded with the length of the string
  2. ECX will be loaded with the address of our variable created in the .data section
  3. EBX will be loaded with the file we want to write to

In the end our final assembly program will look like:

And when we compile and link it:

There are TONS of assembly instructions so asmtutor.com recommends that you at least familiarize yourself with the following common routines:

The “db” instruction initializes memory with the “Hello World!” string (keil.com). “0Ah” specifies how many bytes in memory that string will need.

Hopefully the reader has a better sense of the general structure of an assembly program and of the significance and process by which system calls are invoked.

Note: After writing one of my first assembly programs and running it with NASM, I was confused as to why we had to compile our assembly code. After all, in C programming, the compilation stage of the compilation process, translates source code into assembly. However, since we’re already in assembly, I wasn’t quite sure what compilation was actually doing. It turns out that “compilation” in the context of NASM means going from assembly instructions to binary (en.wikipedia.org). Once we’re in binary, the program can be linked and run.

The reader is probably also wondering why we received a segmentation fault when we ran our program. We will address that very question in our next tutorial.

Explore more insights