System calls, FFI, and cross-platform abstractions
We’ll implement a very basic syscall for the three architectures: BSD/macOS, Linux, and Windows. We’ll also see how this is implemented in three levels of abstraction.
The syscall we’ll implement is the one used when we write something to the standard output (stdout) since that is such a common operation and it’s interesting to see how it really works.
We’ll start off by looking at the lowest level of abstraction we can use to make system calls and build our understanding of them from the ground up.
The lowest level of abstraction
The lowest level of abstraction is to write what is often referred to as a “raw” syscall. A raw syscall is one that bypasses the OS-provided library for making syscalls and instead relies on the OS having a stable syscall ABI. A stable syscall ABI means it guarantees that if you put the right data in certain registers and call a specific CPU instruction that passes control to the OS, it will always do the same thing.
To make a raw syscall, we need to write a little inline assembly, but don’t worry. Even though we introduce it abruptly here, we’ll go through it line by line, and in Chapter 5, we’ll introduce inline assembly in more detail so you become familiar with it.
At this level of abstraction, we need to write different code for BSD/macOS, Linux, and Windows. We also need to write different code if the OS is running on different CPU architectures.
Raw syscall on Linux
On Linux and macOS, the syscall we want to invoke is called write. Both systems operate based on the concept of file descriptors, and stdout is already present when you start a process.
If you don’t run Linux on your machine, you have some options to run this example. You can copy and paste the code into the Rust Playground or you can run it using WSL in Windows.
As mentioned in the introduction, I’ll list what example you need to go to at the start of each example and you can run the example there by writing cargo run. The source code itself is always located in the example folder at src/main.rs.
The first thing we do is to pull in the standard library module that gives us access to the asm! macro.
Repository reference: ch03/a-raw-syscall
use std::arch::asm;
The next step is to write our syscall function:
#[inline(never)]
fn syscall(message: String) {
let msg_ptr = message.as_ptr();
let len = message.len();
unsafe {
asm!(
“mov rax, 1”,
“mov rdi, 1”,
“syscall”,
in(“rsi”) msg_ptr,
in(“rdx”) len,
out(“rax”) _,
out(“rdi”) _,
lateout(“rsi”) _,
lateout(“rdx”) _
);
}
}
We’ll go through this first one line by line. The next ones will be pretty similar, so we only need to cover this in great detail once.
First, we have an attribute named #[inline(never)] that tells the compiler that we never want this function to be inlined during optimization. Inlining is when the compiler omits the function call and simply copies the body of the function instead of calling it. In this case, we don’t want that to ever happen.
Next, we have our function call. The first two lines in the function simply get the raw pointer to the memory location where our text is stored and the length of the text buffer.
The next line is an unsafe block since there is no way to call assembly such as this safely in Rust.
The first line of assembly puts the value 1 in the rax register. When the CPU traps our call later on and passes control to the OS, the kernel knows that a value of one in rax means that we want to make a write.
The second line puts the value 1 in the rdi register. This tells the kernel where we want to write to, and a value of one means that we want to write to stdout.
The third line calls the syscall instruction. This instruction issues a software interrupt, and the CPU passes on control to the OS.
Rust’s inline assembly syntax will look a little intimidating at first, but bear with me. We’ll cover this in detail a little later in this book so that you get comfortable with it. For now, I’ll just briefly explain what it does.
The fourth line writes the address to the buffer where our text is stored in the rsi register.
The fifth line writes the length (in bytes) of our text buffer to the rdx register.
The next four lines are not instructions to the CPU; they’re meant to tell the compiler that it can’t store anything in these registers and assume the data is untouched when we exit the inline assembly block. We do that by telling the compiler that there will be some unspecified data (indicated by the underscore) written to these registers.
Finally, it’s time to call our raw syscall:
fn main() {
let message = “Hello world from raw syscall!\n”;
let message = String::from(message);
syscall(message);
}
This function simply creates a String and calls our syscall function, passing it in as an argument.
If you run this on Linux, you should now see the following message in your console:
Hello world from raw syscall!
Leave a Reply