In the last post of the assembly x86-64 series, we have seen how to manipulate the stack in order to store local variables. The stack is also essential for calling functions : it is used to store information such as the return instruction address. In this post, we will see how to write and call functions, and also how to interact with C functions in our program! We will apply these new notions to write a recursive function that computes the power of a number.
More about the stack : pushing and popping values
Before diving into the mechanism of function calls, we should see two important instructions related to the stack that I did not mention in the previous post : push
and pop
.
The push
instruction allows to move a value to the stack in an automated way.
This instruction will concretely perform two operations :
- subtracting from the stack address to allocate some bytes in the stack
- copying the value at the resulting address in the stack (
mov
instruction)
pop
is the reversed instruction : it copies the value from the stack to a register and increases the stack pointer accordingly.
Letās create a simple example :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
.global _start, debug, debug2
.intel_syntax noprefix
_start:
mov rbp, rsp
; moving the value 42 into 8 bytes in the stack
sub rsp, 8
mov [rsp], qword ptr 42
mov rsp, rbp
; exit
mov rax, 60
mov rdi, 0
syscall
Nothing new here, we allocate 8 bytes on the stack and we store the value 42.
We will check that the value is properly stored : run GDB on the program after compiling in debug mode (as we did in a previous post), type b my_prog.s:10
and run
to stop just after the mov
instruction (replace āmy_progā with your programās name).
After that, execute the commands print $rbp
and print $rsp
.
This gives me 0x7fffffffdd90 and 0x7fffffffdd88.
0x90-0x88=0x8 in hexadecimal, the 8 bits have been allocated (0x here simply means hexadecimal).
Now we would check the value that has been moved to the stack, directly from its memory address this time : print *0x7fffffffdd88
(in this step, you should replace the specified address by the output you obtained with the print $rsp
command).
The star character (*) here is used to specify an address.
We end up with the value 42, perfect!
Note that the alternative command x/dg 0x7fffffffdd88
could also be used to print the value, where x stands for āexaminingā memory.
This page shows the different options such as the output format and the number of bytes to read.
Now letās modify this code with the push instruction and letās add a pop instruction afterward :
1
2
3
4
5
6
7
8
9
10
11
_start:
mov rbp, rsp
; moving the value 42 into 8 bytes in the stack
push qword ptr 42
pop rax
mov rsp, rbp
; exit ...
Letās verify that the push
instruction did exactly the same as a sub
followed by a mov
.
We run gdb in debug mode as previously and verify the rbp and rsp addresses after the push
instruction at line 5 (to be replaced with your correct line number if different).
We end up with the same values as previously.
Great the push
instruction did what we expected!
Now letās inspect the action of pop
by adding a breakpoint at line 8.
gdb indicates that rbp = 0x7fffffffdd90
and rsp = 0x7fffffffdd90
.
This is it.
After the pop
instruction, the rsp
register is now equal to the rbp
register, which means the stack pointer took back its value before our allocation.
We can see the second effect of the pop
instruction : the popped value is now stored in the rax
register.
In this specific case, our last line mov rsp, rbp
is not even necessary anymore since the de-allocation is now handled by pop
.
Writing basic functions
We have discovered in the previous posts of the series the jmp
instruction which allows to go to a certain location in the programās code.
You may guess that this is the same mechanism at the base of function call : a jump is performed to the first instruction of the called function.
However, when the execution of the function is terminated, the program must go back to the instruction that follows this jmp
.
To do so, we will see the call
and ret
instructions that take care of this.
When defining new functions, our labels will not have the ā_L.
ā prefix anymore as they will not be local symbols.
Indeed, the functions can be called in other portions of the code or even from external programs (in libraries for instance).
Writing and calling our first function
Letās start with a simple function that prints āhello worldā :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
.global _start
.intel_syntax noprefix
_print_hello_world:
mov rax, 1
mov rdi, 1
lea rsi, [hello_world]
mov rdx, 14
syscall
ret
_start:
call _print_hello_world
; exit
mov rax, 60
mov rdi, 69
syscall
hello_world:
.asciz "Hello, World!\n"
Compared to our hello_world example, this code simply encapsulates the printing instructions within a function with the use of a new symbol : ā_print_hello_world
ā.
The new function is executed with the call
instruction and, at its end, the program returns to the execution of the _start
function thanks to the ret
instruction.
The ability to perform function calls brings a new dimension in our code : we are now working with the programās instructions addresses at runtime.
Indeed, when a function is being executed, the program must remember the address of the instruction that called the function (in the programās memory, where all instructions are loaded).
By doing so, the programs knows where to āreturnā in the programās memory at the ret
instruction.
There exists a register dedicated to store the address of the next instruction to be executed : the rip
register (Relative Instruction Pointer).
We will look at this register to understand our program.
How function calls work āļø
Letās use gdb on the previous code to see what is happening.
We add breakpoints at lines 13 and 15 and another one at the _print_hello_world
symbol (with b _print_hello_world
) to see how the stack is manipulated.
After executing the run
command in gdb, the program will start and pause just before the call
instruction.
To illustrate the rip
register mentioned before, we can already print the pointed instruction as follows :
1
2
(gdb) x/i $rip
=> 0x401020 <_start>: call 0x401000 <_print_hello_world>
We can see the instructionās address in memory, 0x401020 and the decoded instruction which is our call
.
Letās also print our stack pointer address : print $rsp
, which gives 0x7fffffffdd70 in my case.
Then, the continue
command in gdb makes the program stop at the beginning of the _print_hello_world
function, just after the execution of the call
instruction.
We will see what changed on the stack :
1
2
3
4
5
6
7
8
Breakpoint 2, _print_hello_world () at hello_world_func.s:9
9 mov rax, 1
(gdb) print $rsp
$2 = (void *) 0x7fffffffdd68
(gdb) x *0x7fffffffdd68
0x401025 <_start+5>: mov $0x3c,%rax
(gdb) print 0x7fffffffdd68 - 0x7fffffffdd70
$3 = -8
We can see that the rsp
register has changed from 0x7fffffffdd70 to 0x7fffffffdd68.
8 bytes have been allocated : the program has stored the return address of the function call.
Indeed, by printing the instruction at the corresponding location, we see that it corresponds to the instruction that follows call
in the _start
function : mov $0x3c,%rax
(0x3c corresponds to the value 60 in decimal).
We continue the execution to see what happens when returning from the function :
1
2
3
4
5
6
7
8
9
Continuing.
Hello, World!
Breakpoint 3, _start () at hello_world_func.s:22
22 mov rax, 60
(gdb) print $rsp
$4 = (void *) 0x7fffffffdd70
(gdb) x/i $rip
=> 0x401025 <_start+5>: mov $0x3c,%rax
The continue
command in gdb now brings us right after the call
instruction.
We can see that the rsp
register took back its value from before the call, the return address is no longer necessary and has been popped out from the stack by the ret
instruction.
We additionally print the value of the rip
register : the value is the same as the one previously stored on the stack, that is the return address of our function call.
Writing a recursive pow function
We will now study more complex functions to gain a deeper understanding of how the stack is manipulated when functions are called. A good exercise for this is recursion : a recursive function (a function that calls itself) must pay attention to preserve the stack in a coherent state, in order to avoid unpredicted behaviors. This is in fact an obligation for every function call but errors become easily fatal for our programs in the recursive case š.
In this chapter, we will write a function that computes the exponentiation of a number recursively, by performing successive multiplications. Such an operation would easily be done iteratively with a simple loop but the recursive way is going to be more instructive.
Passing parameters by registers
We have previously seen how to write a simple function that takes no parameters and that returns nothing. In practice, parameters and return values are essential to our programs. There are actually different ways to pass information between the functions : you may pass parameters into registers for instance or store them into the stack, it depends on the convention that is adopted.
We will start writing our pow function by passing the two parameters (the base and the exponent) to our function.
In this example, we pass the two values (the base and the exponent) by using the two registers rdi
and rsi
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
_pow_rec:
mov rbp, rsp
; ...
mov rsp, rbp
ret
_start:
; pushing the two parameters to the stack
mov rdi, 5 ; base
mov rsi, 4 ; exponent
call _pow_rec
; exit
mov rax, 60
mov rdi, 0
syscall
In this code, the _start
function starts by placing the two parameters in their (temporarily) respective registers.
The first parameter to be pushed is the base and the second one is the exponent ; hence we will expect our program to compute 5^4 here.
The _start
function then calls the _pow_rec
function and eventually exits normally.
For now, our _pow_rec
function simply takes care of saving and restoring the stack pointer, as we saw in the previous post of the series on the local variables allocation.
These precautions will be useful as we will use the stack in our function.
You can run the program in gdb and verify that the two registers rdi
and rsi
received the correct values when the function _pow_rec
is being called.
Letās now store the two parameters on the stack by allocating two 8-bytes local variables :
1
2
3
4
5
6
7
8
9
10
11
12
13
_pow_rec:
mov rbp, rsp
sub rsp, 16
; base : offset=rbp-8, size=8
; exponent : offset=rbp-16, size=8
; storing the base and exponent parameters
mov [rbp-8], rdi
mov [rbp-16], rsi
mov rsp, rbp
ret
The recursive pow algorithm
We can now write the recursion, namely the central part of our algorithm. Our program will first accumulate the function calls on the stack until the value 0 is reached for the exponent parameter. Once it is done, the function can return 1 (x^0 = 1 for all x) without any additional recursive call. After that, the function calls would be unstacked and, at each time, the results of the previous call would be multiplied by the base.
To implement this, we separate the base case, when the exponent is 0, from the general case :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
_pow_rec:
; saving the stack register and allocating memory
; ...
; storing the parameters on the stack
; ...
; compare the exponent to 0
mov rbx, [rbp-16]
test rbx, rbx
jnz else_exponent_not_0
.L_if_exponent_0:
mov rax, 1
jmp .L_endif
.L_else_exponent_not_0:
dec qword ptr [rbp-16]
; store the base and the exponent parameters in their associated registers
mov rdi, [rbp-8]
mov rsi, [rbp-16]
call _pow_rec ; the recursive call
imul rax, [rbp-8] ; compute the result
.L_endif:
; restoring the stack pointer and returning
; ...
The first part of the algorithm is the comparison of the exponent value with 0, indirectly done through the rbx
register (an instruction cannot be given two memory addresses).
Here this comparison is performed with the test
instruction that performs a logical āandā operation and sets an internal flag depending on the result.
The jnz
instruction (jump if not zero) will be triggered if the tested value (rbx
here) is not equal to zero, as the instruction says.
The following instructions emulate an āif..else..ā statement, where the first condition corresponds to the base being equal to 0.
In this example, we choose to put the return value of the function in the rax
register, which is handled in these two different conditions without using additional memory.
In the first case, the value 1 is simply moved to the register and the program jumps š¦ to the end of the function, hence skipping the second caseās instructions.
In the second case, the base variable is first decremented and the recursive call is then performed after moving the parameters to their registers.
Preserving the stack pointers across function calls
If you test the complete function by adding the remaining instructions from the previous code snippet, you would observe an infinite loop! ā¾ļø There is one crucial missing step in our code that handles the stack pointers. For now we had only performed function calls from the main function.
We have seen how to save the stack pointer and restore it at the end of the function but we did not take care of the base pointer.
That is the issue here : the base pointer is modified in each sub call.
Hence, when returning from the function, the base pointer then written in rsp
is no longer valid, causing the function to return in the wrong place.
In this situation, since the sub-calls are recursive, the function actually returns to itself, causing this infinite loop.
We will now change the functionās base code to :
1
2
3
4
5
6
7
8
9
10
11
12
13
_my_function_base:
push rbp
mov rbp, rsp
; allocate memory
; ...
; compute stuff
; ...
mov rsp, rbp
pop rbp
ret
The two additional instructions push
and pop
here make sure that the rbp
register is saved and restored at each function call.
This is handled directly by the called function, which is convenient for readability as these instructions would always be placed at the beginning and at the end of our functions.
We can now write the complete code of our recursive pow function : ā
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
_pow_rec:
; saving the stack and the base registers
push rbp
mov rbp, rsp
sub rsp, 16
; base : offset=rbp-8, size=8
; exponent : offset=rbp-16, size=8
; storing the base and exponent parameters
mov [rbp-8], rdi
mov [rbp-16], rsi
; compare the exponent to 0
mov rbx, [rbp-16]
test rbx, rbx
jnz else_exponent_not_0
.L_if_exponent_0:
mov rax, 1
jmp .L_endif
.L_else_exponent_not_0:
dec qword ptr [rbp-16]
; store the base and the exponent parameters in their associated registers
mov rdi, [rbp-8]
mov rsi, [rbp-16]
call _pow_rec ; the recursive call
imul rax, [rbp-8] ; compute the result
.L_endif:
; restoring the stack and the base registers and returning
mov rsp, rbp
pop rbp
ret
Testing and memory usage
We can now test our complete program with the help of gdb.
As we did previously, the program can be compiled in debug mode in order to add a breakpoint in gdb just after the call to _pow_rec
in the _start
function.
You should be able to see that the value returned by the function in the rax
register is indeed the value 5^5 = 125!
Now it is also interesting to take a look at the memory usage of this function. As explained previously, the recursive calls in our program are accumulated on the stack until the base case is reached, namely when the base parameter equals 0. This means that all information required for the function calls are multiplied on the stack, leading to a consequent use of memory.
To verify this, we can set a breakpoint at the base case of the recursion, just after the .L_if_exponent_0
symbol.
If we now run the program in gdb, the calls will be accumulated until this base case is reached.
We can then use the backtrace
(or bt
) command in gdb to see the different calls present on the stack, namely the stack frames :
1
2
3
4
5
6
7
8
Breakpoint 1, _pow_rec () at pow_rec.s:31
31 mov rax, 1
(gdb) backtrace
#0 _pow_rec () at pow_rec.s:31
#1 0x0000000000401033 in _pow_rec () at pow_rec.s:41
#2 0x0000000000401033 in _pow_rec () at pow_rec.s:41
#3 0x0000000000401033 in _pow_rec () at pow_rec.s:41
#4 0x0000000000401054 in _start () at pow_rec.s:61
This list shows all the functions that are being called by our program and that are not terminated.
We can see that there are 4 different calls to the _pow_rec
function that are pending in the stack.
Indeed, at each call the base parameter is decreased.
Since the execution starts with base=3, four calls are required to reach the value 0.
gdb allows us to navigate to each of these stack frames and explore the memory and registers there.
The command frame 0
for instance will load the first frame.
This helps us to analyze the evolution the the stack pointer register rsp
:
1
2
3
4
5
6
7
8
9
10
11
12
(gdb) frame 4
#4 0x0000000000401054 in _start () at pow_rec.s:61
61 call _pow_rec
(gdb) print $rsp
$1 = (void *) 0x7fffffffdd88
(gdb) frame 0
#0 _pow_rec () at pow_rec.s:31
31 mov rax, 1
(gdb) print $rsp
$2 = (void *) 0x7fffffffdd08
(gdb) print 0x7fffffffdd88-0x7fffffffdd08
$3 = 128
We can see the 128 bytes are used on the stack between the oldest function call (_start
) and the most recent one.
Letās see if we can understand each of this use :
Our _pow_rec
function stores 2 8-bytes registers on the stack for each call.
Additionally, the rbp
register is saved.
Moreover, for each function call, the program must store its return address as an additional 8-bytes value.
This means that we have 4 8-bytes values stored for each function call.
Since our program is currently executing 4 times the _pow_rec
function, this results in 4 * 4 * 8 = 128 bytes of memory being used !
Our analysis helps to realize how much memory is used for our small _pow_rec
function.
We would directly see that the value stored by the intermediate calls are actually useless since an iterative version of our program can be made : each intermediate computation step in our function only depends on the previous result.
Recursion can be very handy to use but sometimes its memory usage is not beneficial.
An alternative version : passing parameters through the stack
In our last example, we have passed parameters to our function by the use of two registers. It is however possible to do it differently, for instance by using the stack.
We will now write a second version of our algorithm following this new convention.
We will start from the initial code backbone and modify the _start
function to pass the parameters on the stack :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
_pow_rec:
push rbp
mov rbp, rsp
; recursive pow computation
mov rsp, rbp
pop rbp
ret
_start:
; pushing the two parameters to the stack
push qword ptr 5 ; base
push qword ptr 4 ; exponent
call _pow_rec
; popping the parameters stored on the stack
add rsp, 16
; exit
mov rax, 60
mov rdi, 0
syscall
Compared to our previous version, the _start
function now contains two push
instructions, one for each parameter.
Since the rsp
pointer is modified at this point, it is necessary to restore it after the call, hence the add
instruction.
You may realize that in the configuration, we did not bother to save the stack and the frame pointers contrarily to what we saw previously.
Indeed, the _start
function remains simple and our program exits right after the power computation.
It is now necessary to modify the code in charge of storing the parameters in the stack in our _pow_rec
function.
I agree that this step seems counterintuitive as the parameters are now already present in the stack when the function is called.
However, it is important to place the local variables of our function in their own stack frame and not interfere with the stack frame of the calling function.
The beginning of _pow_rec
becomes :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
_pow_rec:
push rbp
mov rbp, rsp
sub rsp, 16
; base : offset=rbp-8, size=8
; exponent : offset=rbp-16, size=8
; storing the base parameter
mov rax, [rbp+24]
mov [rbp-8], rax
; storing the exponent parameter
mov rax, [rbp+16]
mov [rbp-16], rax
; instructions that follow do not change
; ...
The only difference with the previous version is where the parametersā values are taken from.
Counterintuitively, we get access to the values by navigating backward in the stack, adding a positive offset to the rbp
register instead of a negative one.
This is because our parameters were stored in the previous stack frame (the stack frame of the calling function).
When the function starts, the rsp
register points toward the return address as we saw previously.
Since the rbp
register is also pushed to the stack, this means that two registers are already present between the start of our stack frame and our function parameters.
For this reason, the parameters are being picked from above in the stack addresses : the exponent parameter being that last to be pushed, it is accessed at offset 16 from the rbp
register.
On the other hand, the base parameter is accessed from above, at offset 24.
I find it useful to take some time to apprehend how the stack is being used here. The best way to organize our code for the function calls is to carefully respect the stack frame from each function call and refrain from interfering with the frame of another function.
Whatās next ?
This one was pretty dense! Correctly manipulating the stack with function calls is certainly an important step toward the ability to program in assembly. I hope this post was not too technical. The good news is : this is I think the most technical content to digest before advancing to more practical exercises š„³. The next chapter will be about interfacing our code with C programs : calling functions from the C standard library for instance. This will allow us to perform higher level operations without requiring any single line of C.
Before that, I invite you to continue practicing š§āš» from the different codes realized here. For instance, you may create a program that computes the multiplication of two numbers recursively, similarly to what we saw with the pow function. You may also try more complex recursions such as writing a function that computes the Fibonacci series. As previously, I put the codes from this post as well as bonus ones (the Fibonacci function for instance) at this following link.