In the previous article we learned about Control Transfer Instructions, and we have seen how they are the cornerstone of control flow in assembly. We looked at their simplest, jump, and learned how to implement conditionals.

Today we will look at reproducing functions. Starting with no arguments, void functions, all the way up to functions returning multiple values.

A friendly introduction to assembly for high-level programmers

  1. Hello
  2. Conditionals
  3. Functions & Loops

Nullary Void Functions

The definition of function1 we will be using in this article is:

A named group of instructions that is reusable, can take inputs and can return outputs

From what we have seen in the previous article, jumps seem like a good approximation of this concept: they provide a mechanism to execute code from elsewhere, and they have a human-readable name. What could one desire more?

In high-level languages, functions do their job and hand back control to the caller once done, typically with the return keyword. Jumps, conversely, are one-way control transfer: once you jump to a location there is no way back unless the callee knows where to jump to.

The premise that we need to control callee and caller hinders reusability, one property we expect from functions. Imagine authoring a library; functions would have to know details about calling code’s structure to jump back, making it impossible to write modular or reusable code.

How do we make this execution model caller-independent?

Call and return

At any moment, we know the address of the current instruction because it’s stored in rip, the instruction pointer. The caller can save this address before jumping to another location, and the callee can later jump back to the next instruction using that saved address. Problem solved, right?

Not quite. Doing all this manually would be tedious and prone to mistakes. Luckily, assembly provides two instructions, call (call) and ret (return), to do the math for us. Let’s look at them.

The first, call looks like a jump:

call label

It jumps to the specified label, but before doing so, it saves the current address (the return location) to a special memory region called the stack.

Note

We will look into the details of the stack in the next article. For now, all you need to know is that it’s a memory location where you can store values and retrieve them later.

Much like in high-level languages, at the end of a function we return to hand the control back to the caller. In assembly, we do that using the ret (return) instruction

label:
 ; code of the function
 ret

Just like a Scooby Doo mask off moment, what we see is once again a jump: ret fetches the value stored on the stack by call and jumps back there, giving back control without having to know anything about the caller. Victory!

Let’s look at an example.

We’ll create a little dungeon crawler game where our explorer navigates a maze to find a treasure. We will learn a few new instructions and extensively use functions to reduce boilerplate. Incidentally, we will realize we already know how to implement loops in assembly, our second high-level programming concept for this article.

Arity and Return Values

Alright, that was a lot to take in! But the good news is we’ve already covered most of the heavy lifting for today. Great job!

As you know, there is very little one can do with functions without passing parameters or returning values. Let’s go ahead and fix that.

In the examples above, we got comfortable with passing data between functions using registers.

For example, before calling print, we set the values of rsi and rdx so that the function had the right data in the correct registers. That’s how parameters are typically2 passed to functions in assembly.

Returning values works the same, but in reverse: when a function needs to return a result, it places the value in a designated register, allowing the caller to retrieve it from there.

This system feels a bit brittle, doesn’t it? How can we make sure that functions don’t overwrite registers we’re relying on? How do we decide which registers to use for what? And how do we preserve values across function calls?

Calling Conventions

Since the language can’t inherently enforce rules, assembly relies on calling conventions: guidelines for passing parameters, returning results, and safely using registers to ensure data integrity.

These conventions are part of the Application Binary Interface (ABI), specific to each operating system3. Our ABI of reference will be x86-64 System V ABI, commonly used in most Unix-like systems.

To prevent unintended data overwriting, the ABI introduces two register categories: callee-saved and caller-saved.

  • callee-saved means the callee is responsible for preserving the value in the register. Practically, it means the callee must either not use the register, or save its original value before use and restore it before returning control to the caller.

  • caller-saved means the caller is responsible for preserving the value in the register. In other words, the caller needs to store the register in the stack before invoking a function, if it cares about it.

Here’s the condensed version of the calling convention we will use in this course.

Register Usage Saved by?
rax 1st return register Caller
rdi 1st function argument Caller
rsi 2nd function argument Caller
rdx 3rd function argument, 2nd return register Caller
rcx 4th function argument Caller
r10-r11 for temporary data Caller
r12-r15 for temporary data Callee

Sticking with the game theme, we will code a die roller. We won’t introduce new instructions, but we will see calling conventions and we will quickly touch on addressing: a new way of referencing memory that we will explore more in depth later on.

Conclusion

We explored how functions work using call and ret, and learned about calling conventions to safely pass parameters, return values, and manage registers without breaking things.

Next up, we’ll dive into the stack; a new tool for handling data beyond registers and fundamental to implement our next high-level construct: scope.


  1. Finding an agreed-upon definition of function is not that easy, hence why I came up with a new one. Imagine how different functions are between functional programming, lambda calculus, and mathematics, for example. Speaking of wide definitions, here’s a fun personal anecdote: during my studies, I had two harmonic functions classes in the same semester, one in math school and the other in jazz school. 

  2. You might wonder what happens when you have more arguments than registers or when data is larger than a register’s size. In the next lesson, we will learn more about the stack and answer all these questions. 

  3. More precisely, the ABI specifies how different binaries (such as a program and the operative system) interact at the binary level. Besides calling conventions, it specifies how binaries are formatted, how data is laid out in memory, and the system call interface. You can imagine it as a low-level equivalent of an API (Application Programming Interface) for more programs.