Post #2: In Which I Actually Write the Terrible Compiler for the Useless Language

This is the third and penultimate post in the Terrible Compiler for a Useless Language series, in which I walk you through implementing an x86-64 Linux compiler for Brainfuck. So far, we have already learned the convention for making Linux system calls in x86 assembly (necessary for any kind of I/O) and written a parser that detects syntax errors in our source code. Previous posts in the series:

Picking up where we left off

Now that we have a parser, we can write the compiler proper. Because our lib.rs file would grow to ludicrous sizes if we kept adding more to it, let's move the Parser code out to a sub-module, parser.rs. We'll create a new one for the compiler module, too. Rust's module system is strictly hierarchical; modules can freely use items used or declared by their parents, but their children must make their own items explicitly visible from "upstairs" with pub. The compiler will need to be able to see the BrainfuckInstr type, and our compiler executable, main.rs, should have some idea what a SyntaxError is, so let's leave those where they are for now and just add a pub to SyntaxError. As for tests.rs, our tests are all parser-related, so we can make tests.rs a submodule of parser.rs by moving it into a new directory named parser/ and moving the #[cfg(test)] mod tests; module declaration into parser.rs. We'll do the same for compiler.rs and the compiler's tests. Our lib.rs now looks like this:

/// Module containing the parser code.
mod parser;
/// Module containing the compiler code.
mod compiler;

// trait derivations...
enum BrainfuckInstr {/* ... */}
// trait derivations...
pub enum SyntaxError {/* ... */}

If we wanted a really clean lib.rs, we could move BrainfuckInstr and SyntaxError into some sort of common module for code we expect multiple other modules to need, and put a use common::* where they used to be, but I think that would be art for art's sake when we're dealing with just two blocks of code.

We also make these additions to parser.rs:

#[cfg(test)]
mod tests;

use super::{BrainfuckInstr, SyntaxError};

The output of cargo test is now:

test parser::tests::test_newline_program ... ok
test parser::tests::test_unmatched_program ... ok
test parser::tests::test_premature_program ... ok

Nothing seems to have been broken by our reorganization, so we can move on to implementing the compiler. The compiler's job will be to take a pre-validated list of Brainfuck instructions and produce a string of NASM code.

Compiler functions

Obviously, we'll need some sort of function that goes over a collection of BrainfuckInstrs and returns a String we can write to an .asm file. We'll make it public and thus accessible from upstairs so we can eventually pull it into main.rs.

#[cfg(test)]
mod tests;
use super::BrainfuckInstr;
/// Transforms abstract Brainfuck instructions into assembly.
pub fn compile(code: &[BrainfuckInstr]) -> String {
    unimplemented!()
}

This will give us a compiler error, because a public function can only use types that are themselves public - otherwise we'd be violating the "parents can't grab children's private parts" rule.

That was a new low, even for you.

Ouch.

In any case, having made BrainfuckInstr public, we can move on to writing tests for our compiler function. What's something we expect all Brainfuck programs to do? We certainly expect that they should eventually exit (though they might loop forever, but we can't reliably predict that, smarter people have tried and called it the halting problem), so our expected output for our first test will definitely contain a sys_exit syscall. The other thing we expect Brainfuck programs to do is allocate 30000 bytes of memory at the start.

How did syscalls work, again?

We've already gone over the obsolete, 32-bit, plain old x86 way of making syscalls with a software interrupt. I've also told you that this is not how it's done anymore. What we are supposed to do on x86-64 Linux is to use the syscall instruction, which is faster for reasons beyond my ability to explain and interest in learning. However, the conversion from the old format to the new isn't as straightforward as we'd think - syscalls have different numbers in the new and old format. Luckily, there's a searchable table out there, thanks to a Mr. Filippo Valsorda. We can learn from it that to specify the number of the syscall you want, you put it in the 64-bit rax register. exit's is 60; its one argument, our status code, must be in rdi, also 64-bit, as the "R" indicates, and we'll always be putting a 0 there, since Brainfuck doesn't let the user exit early.

Below is a minimal program that statically allocates 30 kilobytes of memory (ie. they're baked into the executable) and exits:

section .data
memory times 30000 db 0
section .text
global _start
_start:
mov rdi,0
mov rax,60
syscall

I think the only really unfamiliar syntax here is times, which tells NASM that "declare byte with value 0" happens, well, 30000 times. If we play around with the contents of rdi and execute the code - you know the drill - we can check our return value with bash's special $? variable to see if it's what we wrote.

Wait, those 30 kilobytes of memory are just baked into the .o file and the executable? A simple exit call weighs how many kilobytes after linking? 38 kB? Preposterous!

The alternative would be asking the OS for memory from the heap through some mechanism like C's malloc. Afterwards, we'd have to hand it back with something equivalent to free, lest we leak memory. This would require syscalls, and even proper, x64 syscall syscalls are slower than ordinary code, since you have to wait for the OS to handle them, and the processor switches into kernel mode, and a bunch of other details happen.

Besides, it's 2021. If, at the time you're reading this, 30 kilobytes is considered a large amount of memory for a desktop application, Collapse OS might be more interesting and useful to you than this stuff.

Those aren't real kilobytes! A kilobyte has 1024 bytes! Let me yell and sound unreasonable while you calmly explain how I'm wrong!

Blame Microsoft for choosing a confusing standard for their units of size. Indeed, Windows UIs claim it's that way when they show you file sizes, but a kiloanything is still exactly a thousand anythings. A thousand and twenty-four anythings is a kibianything. Wikipedia can tell you more.

Anyway, let's save our assembly program, perhaps as compiler/alloc_exit.asm, and write our first compiler test:

/// What an empty program should look like.
const ALLOCATE_AND_EXIT: &str = include_str!("alloc_exit.asm");

use super::compile;
#[test]
fn test_empty_program() {
    assert_eq!(compile(&[]), ALLOCATE_AND_EXIT)
}
running 4 tests
test compiler::tests::test_empty_program ... FAILED
test parser::tests::test_unmatched_program ... ok
test parser::tests::test_premature_program ... ok
test parser::tests::test_newline_program ... ok

Unsurprising. Let's implement our compile() function.

Implementing compile()

We can divide any program we'll be producing into three portions:

  • memory allocation, global _start declaration, _start label
  • the program proper
  • the exit syscall

Only one of those is subject to change. We can simply create a file named alloc_start.asm for the start...

section .data
memory times 30000 db 0
section .text
global _start
_start:

Notice the trailing newline at the end - we need it so we can just add the contents of exit.asm and any code in between:

mov rdi,0
mov rax,60
syscall

Our logic from now on should be straightforward:

/// Instructions and directives found at the beginning of every program.
const ALLOCATE_AND_START: &str = include_str!("compiler/alloc_start.asm");
/// Ditto for the end.
const EXIT_SYSCALL: &str = include_str!("compiler/exit.asm");

/// Transforms abstract Brainfuck code into assembly.
pub fn compile(code: &[BrainfuckInstr]) -> String {
    let mut output = String::new();
    output.push_str(ALLOCATE_AND_START);
    for instruction in code {
        // Generate the rest of the damn code.
    }
    output.push_str(EXIT_SYSCALL);
    output
}

Done! Well, except for the "rest of the damn code" thing. We should probably write some sort of function that translates our instructions from our own internal representation. Let's write a stub:

/// Transforms abstract Brainfuck code into assembly.
pub fn compile(code: &[BrainfuckInstr]) -> String {
    let mut output = String::new();
    output.push_str(ALLOCATE_AND_START);
    for instruction in code {
        // Generate the rest of the damn code.
        output.push_str(translate_instruction(instruction));
        output.push('\n');
    }
    output.push_str(EXIT_SYSCALL);
    output
}
/// Transforms an individual Brainfuck instruction into an x86 one.
fn translate_instruction(instruction: &BrainfuckInstr) -> &str {
    unimplemented!()
}

Of course, we're not going to be implementing all the instructions just yet. Let's manually translate and write a test for a simple program, the empty newline hello world:

++++++++++ Add 10 to the current byte to make a newline or 0x0A byte
. Print it

What's happening here? Ten times, we increment the byte at our data pointer, then, one time, we call sys_write, syscall number 1, with the following arguments, right-to-left:

  • count - how many bytes we want to write, stored in rdx; this will be a constant 1
  • buf - the pointer to where we want to write from, stored in rsi; this is incidentally going to always be the same place in memory as our Brainfuck data pointer, they're the same thing as far as we're concerned.
  • fd - the file descriptor, stored in rdi; a constant 1, since we're going to be writing to the standard output

And we are in luck, because the arguments for sys_read, which we'll need to implement the Brainfuck command for reading a byte(,) later on, are exactly the same, except the syscall number is 0 and so is fd. These are all the syscalls we need to implement.

So, our assembly should look like this:

section .data
memory times 30000 db 0
section .text
global _start
_start:
mov rsi,memory ; this is our data pointer

inc byte [rsi] ; increment the value found at the memory address pointed to by our pointer
; The "byte" is necessary, as we don't have a type system in assembly
; so the assembler doesn't know what's supposed to be at some memory address, that's our job
; This is why smart people invented higher-level programming languages.
inc byte [rsi] ; All these are our "+" commands.
inc byte [rsi]
inc byte [rsi]
inc byte [rsi]
inc byte [rsi]
inc byte [rsi]
inc byte [rsi]
inc byte [rsi]
inc byte [rsi]

mov rdx,1
mov rdi,1
mov rax,1
syscall ; that was our "."

mov rdi,0
mov rax,60
syscall ; exit

If we compile and run it, it will indeed print a newline. I've also saved a version of it without the comments and empty lines as newline.asm for testing in the following test:

const NEWLINE: &str = include_str!("newline.asm");
use super::compile;
use super::BrainfuckInstr;
use BrainfuckInstr::*;
#[test]
fn test_newline_program() {
    let code = vec![
        DataInc, DataInc, DataInc, DataInc, DataInc,
        DataInc, DataInc, DataInc, DataInc, DataInc,
        PutByte
    ];
    assert_eq!(compile(&code), NEWLINE)
}

However, we haven't actually implemented any instruction translation yet! Also, mov rsi,memory should be in our alloc_start.asm file, which it now is. Let's implement DataInc and PutByte then:

/// Transforms an individual Brainfuck instruction into an x86 one.
fn translate_instruction(instruction: &BrainfuckInstr) -> &str {
    use BrainfuckInstr::*;
    match instruction {
        DataInc => "inc byte [rsi]",
        PutByte => "mov rdx,1\nmov rdi,1\nmov rax,1\nsyscall",
        _ => unimplemented!()
    }
}

Now our tests will pass again.

Wait, you're using rsi as your data pointer? Shouldn't we be using a 16-bit register, since we've only got 30k memory cells to keep track of?

One of my lecturers would have loved you. In between his rants on how military conscription should be brought back and his totally believable "anecdotes" about a friend of his who had robbed the bank he worked at by explointing rounding errors (as, by an amazing coincidence, did a friend of every other CS professor in the western world), he'd be demanding you learn the capacities of Java's primitive types by heart, because every Real Programmerâ„¢ can recite them in order if woken up at 3am.

The idea has merit, but it would be more trouble than it's worth - our syscalls for reading and writing expect 64-bit integers as their arguments anyway, so keeping another register around would just be pointless data duplication.

Translating all the Brainfuck instructions

Let's go over all the Brainfuck instructions and see which ones we have implemented:

  • [ ] move program pointer backward
  • [ ] move program pointer forward
  • [ ] decrement byte at pointer
  • [X] increment byte at pointer
  • [ ] get byte
  • [X] put byte
  • [ ] begin while loop
  • [ ] end while loop

The first six, of which two we already have, are simple one-liners (or rather canned four-liners in the case of the I/O instructions) which require no context to translate and should pose no challenge to us:

/// Transforms an individual Brainfuck instruction into an x86 one.
fn translate_instruction(instruction: &BrainfuckInstr) -> &str {
    use BrainfuckInstr::*;
    match instruction {
        PointerDec => "dec rsi",
        PointerInc => "inc rsi",
        DataDec => "dec byte [rsi]",
        DataInc => "inc byte [rsi]",
        GetByte => "mov rdx,1\nmov rdi,0\nmov rax,0\nsyscall",
        PutByte => "mov rdx,1\nmov rdi,1\nmov rax,1\nsyscall",
        _ => unimplemented!()
    }
}

You said something about context?

Oh, yes. Remember when we were writing the parser and we were counting how many levels of while loops we were on? The current function signature for translate_instruction doesn't really let us do that. First of all, it has no access to any mutable state, like the bracket counter we need to generate labels for our while loops to jump to. Second, it returns a &str, a reference to a slice of bytes guaranteed to be UTF-8 text, not an owned and runtime-heap-allocated String, as all we've been returning so far are totally static string constants that have no reason to change. If we created new Strings every time we ran into these very common commands, we'd be probably be slower than we could be (not that we'd notice with our short programs). The ugly, quick, hacky solution we'll use will be to embrace the mutable pointer parameters of C:

/// Transforms abstract Brainfuck code into assembly.
pub fn compile(code: &[BrainfuckInstr]) -> String {
    let mut output = String::new();
    output.push_str(ALLOCATE_AND_START);
    let mut bracket_counter: u64 = 0; // we don't need to worry about negative values since the parser has already errored out on them
    for instruction in code {
        // Generate the rest of the damn code.
        translate_instruction(instruction, &mut bracket_counter, &mut output);
    }
    output.push_str(EXIT_SYSCALL);
    output
}
/// Transforms an individual Brainfuck instruction into an x86 one and stores it in the output buffer.
fn translate_instruction(instruction: &BrainfuckInstr, bracket_counter: &mut u32, output: &mut String) {
    use BrainfuckInstr::*;
    output.push_str(match instruction {
        PointerDec => "dec rsi",
        PointerInc => "inc rsi",
        DataDec => "dec byte [rsi]",
        DataInc => "inc byte [rsi]",
        GetByte => "mov rdx,1\nmov rdi,0\nmov rax,0\nsyscall",
        PutByte => "mov rdx,1\nmov rdi,1\nmov rax,1\nsyscall",
        _ => unimplemented!()
    });
    output.push('\n');
}

Hey, we are writing systems software. It's supposed to be ugly so the user-facing code isn't.

But Brainfuck is ugly.

You have no taste. Moving on, here's what two nested Brainfuck while loops should look like in assembly:

cmp [rsi],0 ; is the current number 0?
je .end_0 ; Jump if Equal - jump to the end of while loop number 0 if it is
.while_0: ; define the (local) label for the beginning of the while loop

cmp [rsi],0
je .end_1
.while_1:
cmp [rsi],0
jne .while_1
.end_1:

cmp [rsi],0
jne .while_0 ; Jump if Not Equal - if our number still isn't 0, jump back to the beginning of the loop
.end_0: ; define label for end of loop

Anyway, here's how I ended up implementing the remaining two cases of translate_instruction:

WhileNonzero => {
    let asm = format!(
        "cmp [rsi],0\nje end_{x}\nwhile_{x}:",
        x = bracket_counter
    );
    // increment bracket counter
    *bracket_counter += 1;
    output.push_str(&asm);
    return // early return so this block doesn't have to evaluate to an expression
},
EndWhile => {
    let x = *bracket_counter - 1;
    let asm = format!(
        "cmp [rsi],0\njne while_{x}\nend_{x}:",
        x = x
    );
    // decrement bracket counter
    *bracket_counter -= 1;
    output.push_str(&asm);
    return
}

Groovy. How do we test this? Please tell me we're not going to write another program or test manually. Anything but that!

Hell no, that would just be cruel! We're going to actually export the Parser and compile and use them in main.rs to show us the code for a program I "borrowed" from Wikipedia, a Hello World.

To make Parser and compile available to main.rs, all we have to do is add these lines to lib.rs:

pub use parser::Parser;
pub use compiler::compile;

Except we'll get compiler errors, because Parser isn't actually public. Let's make it public and do the same for its methods, new and parse. Finally, let's write our main.rs:

use bfc_rs::{compile, Parser}; // if you have both lib.rs and main.rs in the same folder, Rust treats main.rs as a standalone program that needs to use lib.rs like a foreign "crate" (library)
const HELLO_WORLD: &str = include_str!("../hello_world.bf");
fn main() {
    let mut parser = Parser::new();
    let code = match parser.parse(HELLO_WORLD) {
        Ok(code) => code,
        Err(e) => {
            eprintln!("Syntax error: {:?}", e);
            std::process::exit(1);
        }
    };
    let asm = compile(&code);
    println!("{}", asm);
}

And pipe its output into a file:

$ cargo run > hello.asm
   Compiling bfc-rs v0.1.0 (/home/jan/Kod/bfc-rs)
    Finished dev [unoptimized + debuginfo] target(s) in 0.40s
     Running `target/debug/bfc-rs`

See? No warnings! I knew better than the compiler all along! Let's now invoke nasm on the file, invoke ld on the result and then launch the result of that:

$ nasm -f elf64 hello.asm -o hello.o && ld -o hello.out hello.o && ./hello.out
hello.asm:58: error: label `_start.while_1' inconsistently redefined
hello.asm:25: info: label `_start.while_1' originally defined here
hello.asm:62: error: label `_start.end_1' inconsistently redefined
hello.asm:46: info: label `_start.end_1' originally defined here

Wait, what?

Ha-ha! You suck! Not so smug and condescending now, are you?

Just wait five minutes.

What happened here is that the bracket counter doesn't remember all the loops in the program, only how many levels we're currently on, resulting in label names being inappropriately reused. This can easily be fixed with a stack data structure (aka. the push and pop methods of a bog-standard Rust Vec), making translate_instruction even messier:

pub fn compile(code: &[BrainfuckInstr]) -> String {
    // ...
    let mut total_loops = 0;
    let mut loop_no_stack = Vec::new();
    for instruction in code {
        // Generate the rest of the damn code.
        translate_instruction(
            instruction,
            &mut total_loops,
            &mut loop_no_stack,
            &mut output
        );
    }
    // ...
}
fn translate_instruction(instruction: &BrainfuckInstr, total_loops: &mut u32, loop_no_stack: &mut Vec<u32>, output: &mut String) {
    // ...
    WhileNonzero => {
        *total_loops += 1;
        loop_no_stack.push(*total_loops);
        let asm = format!(
            "cmp [rsi],0\nje .end_{x}\n.while_{x}:\n",
            x = total_loops
        );
        output.push_str(&asm);
        return // early return so this block doesn't have to evaluate to an expression
    },
    EndWhile => {
        let asm = format!(
            "cmp [rsi],0\njne .while_{x}\n.end_{x}:\n",
            x = loop_no_stack.pop().expect("There should be a loop number on the stack at this point.")
        );
        output.push_str(&asm);
        return // ditto
    }
    // ...
}

I'm not even done with my bachelor's degree yet and my classical CS education is already paying off. Ha! Let's try again:

hello.asm:15: error: operation size not specified
hello.asm:23: error: operation size not specified
hello.asm:44: error: operation size not specified
hello.asm:56: error: operation size not specified
hello.asm:60: error: operation size not specified
hello.asm:65: error: operation size not specified

What the...?

*amused snickering*

Alright, alright, I admit it, I'm a moron, but I compensate with knowledge and vocabulary so I seem smarter than I really am. Satisfied? When we look at the line numbers listed, we see that I forgot to specify the size of the number the cmp instruction operates on. You remember I've said I don't actually know assembly, right? Anyway, let's correct that mistake in translate_instruction by Ctrl+H-ing all two instances of cmp [rsi] to cmp byte [rsi]...

$ nasm -f elf64 hello.asm -o hello.o && ld -o hello.out hello.o && ./hello.out
Hello World!

Well, well. Time to make this thing accept any file, not just one hardcoded one. We'll use the argh argument parser like we did in hwdpc.

use bfc_rs::{compile, Parser, SyntaxError};
use argh::FromArgs;
#[derive(FromArgs)]
/// BFC-RS v0.1
struct App {
    #[argh(positional, description = "source filename")]
    source_filename: String,
    #[argh(option, short = 'o', description = "output filename (must be provided unless --dump-nasm is explicitly passed)")]
    output_filename: Option<String>,
    #[argh(switch, description = "instead of compiling, print raw NASM output to stdout for debugging")]
    dump_nasm: bool,
    #[argh(switch, description = "do not clean up build directory after successful build")]
    no_cleanup: bool
}

fn main() {
    use std::process::{exit, Command};
    let opts: App = argh::from_env();
    let source_code: String = match std::fs::read_to_string(&opts.source_filename) {
        Ok(code) => code,
        Err(_) => {
            eprintln!("Error: Could not read the file {}.", &opts.source_filename);
            exit(1);
        }
    };
    let mut parser = Parser::new();
    let code = match parser.parse(&source_code) {
        Ok(code) => code,
        Err(e) => {
            match e {
                SyntaxError::UnclosedWhile(l, i) => {
                    eprintln!("Syntax error: unclosed while loop; [ without corresponding ] at {}:{}:{}.", &opts.source_filename, l, i);
                },
                SyntaxError::PrematureEndWhile(l, i) => {
                    eprintln!("Syntax error: closure of nonexistent while loop; ] without corresponding [ at {}:{}:{}.", &opts.source_filename, l, i);
                }
            }
            exit(1);
        }
    };
    let asm = compile(&code);
    if opts.dump_nasm {
        println!("{}", asm);
        return
    } else {
        if opts.output_filename.is_none() {
            eprintln!("Error: no output filename was provided.");
            exit(1);
        }
    }
    if std::fs::create_dir_all("build").is_err() {
        eprintln!("Error: Could not create build directory. Are you sure you have write permissions on this directory?");
        exit(1);
    }
    if std::fs::write("build/build.asm", &asm).is_err() {
        eprintln!("Error: Could not write NASM assembly. Are you sure you have write permissions on the build directory?");
        exit(1);
    }

    let assembler = match Command::new("nasm")
    .args(&["-f", "elf64", "build/build.asm", "-o", "build/build.o"])
    .status() {
        Ok(o) => o,
        Err(_) => {
            eprintln!("Error: Could not launch NASM assembler. Are you sure it's installed?");
            exit(1);
        }
    };
    if !assembler.success() {
        eprintln!("Error: NASM assembler exited with non-0 status code. This is likely an error in the generated code. Inspect build/build.asm to find out more.");
        exit(1);
    }
    let linker = match Command::new("ld")
    .args(&["-s", "-o", &opts.output_filename.expect("Output filename should have been checked by now"), "build/build.o"])
    .status() {
        Ok(o) => o,
        Err(_) => {
            eprintln!("Error: Could not launch `ld` linker. Are you sure it's installed?");
            exit(1);
        }
    };
    if !linker.success() {
        eprintln!("Error: `ld` linker exited with non-0 status code.");
        exit(1);
    }
    if !opts.no_cleanup {
        std::fs::remove_dir_all("build").expect("Expected to be able to remove build directory; was able to create it");
    }
}

The moment of truth

Up until now, we've run programs with every Brainfuck instruction in them but ,. Let's try to build, among others, a short echo program :,[.,]

$ target/debug/bfc-rs unmatched.bf --dump-nasm
Syntax error: unclosed while loop; [ without corresponding ] at unmatched.bf:1:8.
$ target/debug/bfc-rs premature.bf --dump-nasm
Syntax error: closure of nonexistent while loop; ] without corresponding [ at premature.bf:2:1.
$ target/debug/bfc-rs hello_world.bf -o hello_world
$ ./hello_world
Hello World!
$ target/debug/bfc-rs echo.bf -o echo
$ ./echo
adda
adda
ayy
ayy
^C

Oh, yeah! We now officially have a feature-complete Brainfuck compiler for x86 Linux. This merits a version number bump in Cargo.toml and our argh doc-comment:

[package]
name = "bfc-rs"
version = "1.0.0"
$ target/debug/bfc-rs --help
Usage: bfc-rs <source_filename> [-o <output-filename>] [--dump-nasm] [--no-cleanup]

BFC-RS v1.0.0

Options:
  -o, --output-filename
                    output filename (must be provided unless --dump-nasm is
                    explicitly passed)
  --dump-nasm       instead of compiling, print raw NASM output to stdout for
                    debugging
  --no-cleanup      do not clean up build directory after successful build
  --help            display usage information

Conclusion

We can now say that we have written a compiler for a Turing-complete language and learned the x86 assembly instructions syscall, dec, inc, cmp, je and jne.

This post was really long.

No shit. Implementing a compiler from scratch and describing every step of the way is a lot of work.

But it isn't really all that impressive. All we do in this language is increment and decrement numbers, and the generated code looks ugly and repetitive even for assembly. Can't we do anything more exciting with this?

Indeed we can, and we will. In the next post we'll extend BFC-RS to do some things that serious compilers for big boys do which make programs take up less space and run faster. But I'm tired.