December of Rust Project, Part 2: The Assembler Macro

Welcome back! This is the second post of a series which turned out to be more occasional than I thought it would be. You might remember that I originally called it December of Rust 2021. Look how that worked out! Not only is it not December 2021 anymore, but also it is not December 2022 anymore.

In the previous installment, I wrote about writing a virtual machine for the LC-3 architecture in Rust, that was inspired by Andrei Ciobanu’s blog post Writing a simple 16-bit VM in less than 125 lines of C. The result was a small Rust VM with a main loop based on the bitmatch crate.

By the end of the previous post, the VM seemed to be working well. I had tested it with two example programs from Andrei Ciobanu’s blog post, and I wanted to write more programs to see if they would work as well. Unfortunately, the two test programs were tedious to create; as you might remember, I had to write a program that listed the hex code of each instruction and wrote the machine code to a file. And that was even considering that Andrei had already done the most tedious part, of assembling the instructions by hand into hex codes! Not to mention that when I wanted to add one instruction to one of the test programs, I had to adjust the offset in a neighbouring LEA instruction, which is a nightmare for debugging. This was just not going to be good enough to write any more complicated programs, because (1) I don’t have time for that, (2) I don’t have time for that, and (3) computers are better at this sort of work anyway.

Close-up photo of a vintage-looking Commodore computer
The real question is, can the LC-3 running on modern hardware emulate this badboy in all its Eurostile Bold Extended glory?? (Image by Viktorya Sergeeva, Pexels)

In this post, I will tell the story of how I wrote an assembler for the LC-3. The twist is, instead of making a standalone program that processes a text file of LC-3 assembly language into machine code1, I wanted to write it as a Rust macro, so that I could write code like this, and end up with an array of u16 machine code instructions:

asm! {
            .ORIG x3000
            LEA R0, hello  ; load address of string
            PUTS           ; output string to console
            HALT
    hello:  .STRINGZ "Hello World!\n"
            .END
}

(This example is taken from the LC-3 specification, and I’ll be using it throughout the post as a sample program.)

The sample already illustrated some features I wanted to have in the assembly language: instructions, of course; the assembler directives like .ORIG and .STRINGZ that don’t translate one-to-one to instructions; and most importantly, labels, so that I don’t have to compute offsets by hand.

Learning about Rust macros

This could be a foolish plan. I’ve written countless programs over the years that process text files, but I have never once written a Rust macro and have no idea how they work. But I have heard they are powerful and can be used to create domain-specific languages. That sounds like it could fit this purpose.

Armed with delusional it’s-all-gonna-work-out overconfidence, I searched the web for “rust macros tutorial” and landed on The Little Book of Rust Macros originally by Daniel Keep and updated by Lukas Wirth. After browsing this I understood a few facts about Rust macros:

They consist of rules, which match many types of single or repeated Rust tokens against patterns.2 So, I should be able to define rules that match the tokens that form the LC-3 assembly language.

They can pick out their inputs from among any other tokens. You provide these other tokens in the input matching rule, so you could do for example:

macro_rules! longhand_add {
    ($a:literal plus $b:literal) => { $a + $b };
}

let two = longhand_add!{ 1 plus 1 };

This is apparently how you can create domain-specific languages with Rust macros, because the tokens you match don’t have to be legal Rust code; they just have to be legal tokens. In other words, plus is fine and doesn’t have to be the name of anything in the program; but foo[% is not.

They substitute their inputs into the Rust code that is in the body of the rule. So really, in the end, macros are a way of writing repetitive code without repeating yourself.

A tangent about C macros

This last fact actually corresponds with one of the uses of C macros. C macros are useful for several purposes, for which the C preprocessor is a drastically overpowered and unsafe tool full of evil traps. Most of these purposes have alternative, less overpowered, techniques for achieving them in languages like Rust or even C++. First, compile-time constants:

#define PI 3.1416

for which Rust has constant expressions:

const PI: f64 = 3.1416;

Second, polymorphic “functions”:

#define MIN(a, b) (a) <= (b) ? (a) : (b)

for which Rust has… well, actual polymorphic functions:

fn min<T: std::cmp::PartialOrd>(a: T, b: T) -> T {
    if a <= b { a } else { b }
}

Third, conditional compilation:

#ifdef WIN32
int read_key(void) {
    // ...
}
#endif  // WIN32

for which Rust has… also conditional compilation:

#[cfg(windows)]
fn read_key() -> i32 {
    // ...
}

Fourth, redefining syntax, as mentioned above:

#define plus +
int two = 1 plus 1;

which in C you should probably never do except as a joke. But in Rust (as in the longhand_add example from earlier) at least you get a clue about what is going on because of the the longhand_add!{...} macro name surrounding the custom syntax; and the plus identifier doesn’t leak out into the rest of your program.

Lastly, code generation, which is what we want to do here in the assembler. In C it’s often complicated and this tangent is already long enough, but if you’re curious, here and here is an example of using a C preprocessor technique called X Macros to generate code that would otherwise be repetitive to write. In C, code generation using macros is a way of trading off less readability (because macros are complicated) for better maintainability (because in repeated blocks of very similar code it’s easy to make mistakes without noticing.) I imagine in Rust the tradeoff is much the same.

Designing an LC-3 assembler macro

You may remember in the previous post, in order to run a program with the VM, I had to write a small, separate Rust program to write the hand-assembled words to a file consisting of LC-3 bytecode. I could then load and run the file with the VM’s ld_img() method.

I would like to be able to write a file with the assembler macro, but I would also like to be able to write assembly language directly inline and execute it with the VM, without having to write it to a file. Something like this:

fn run_program() -> Result<()> {
    let mut vm = VM::new();
    vm.ld_asm(&asm! {
                .ORIG x3000
                LEA R0, hello  ; load address of string
                PUTS           ; output string to console
                HALT
        hello:  .STRINGZ "Hello World!\n"
                .END
    }?);
    vm.start()
}

My first thought was that I could have the asm macro expand to an array of LC-3 bytecodes. However, writing out a possible implementation for the VM.ld_asm() method shows that the asm macro needs to give two pieces of data: the origin address as well as the bytecodes.

pub fn ld_asm(&mut self, ???) {
    let mut addr = ???origin???;
    for inst in ???bytecodes??? {
        self.mem[addr] = Wrapping(*inst);
        addr += 1;
    }
}

So, it seemed better to have the asm macro expand to an expression that creates a struct with these two pieces of data in it. I started an assembler.rs submodule and called this object assembler::Program.

#[derive(Debug)]
pub struct Program {
    origin: u16,
    bytecode: Vec<u16>,
}

impl Program {
    pub fn origin(&self) -> usize {
        self.origin as usize
    }

    pub fn bytecode(&self) -> &[u16] {
        &self.bytecode
    }
}

Next, I needed to figure out how to get from LC-3 assembly language to the data model of Program. Obviously I needed the address to load the program into (origin), which is set by the .ORIG directive. But I also needed to turn the assembly language text into bytecodes somehow. Maybe the macro could do this … but at this point, my hunch from reading about Rust macros was that the macro should focus on transforming the assembly language into valid Rust code, and not so much on processing. Processing can be done in a method of Program using regular Rust code, not macro code. So the macro should just extract the information from the assembly language: a list of instructions and their operands, and a map of labels to their addresses (“symbol table”).3

#[derive(Clone, Copy, Debug)]
pub enum Reg { R0, R1, R2, R3, R4, R5, R6, R7 }

#[derive(Debug)]
pub enum Inst {
    Add1(/* dst: */ Reg, /* src1: */ Reg, /* src2: */ Reg),
    Add2(/* dst: */ Reg, /* src: */ Reg, /* imm: */ i8),
    And1(/* dst: */ Reg, /* src1: */ Reg, /* src2: */ Reg),
    And2(/* dst: */ Reg, /* src: */ Reg, /* imm: */ i8),
    // ...etc
    Trap(u8),
}

pub type SymbolTable = MultiMap<&'static str, u16>;

With all this, here’s effectively what I want the macro to extract out of the sample program I listed near the beginning of the post:

let origin: u16 = 0x3000;
let instructions = vec![
    Inst::Lea(Reg::R0, "hello"),
    Inst::Trap(0x22),
    Inst::Trap(0x25),
    Inst::Stringz("Hello world!\n"),
];
let symtab: SymbolTable = multimap!(
    "hello" => 0x3003,
);

(Remember from Part 1 that PUTS and HALT are system subroutines, called with the TRAP instruction.)

As the last step of the macro, I’ll then pass these three pieces of data to a static method of Program which will create an instance of the Program struct with the origin and bytecode in it:

Program::assemble(origin, &instructions, &symtab)

You may be surprised that I picked a multimap for the symbol table instead of just a map. In fact I originally used a map. But it’s possible for the assembly language code to include the same label twice, which is an error. I found that handling duplicate labels inside the macro made it much more complicated, whereas it was easier to handle errors in the assemble() method. But for that, we have to store two copies of the label in the symbol table so that we can determine later on that it is a duplicate.

Demystifying the magic

At this point I still hadn’t sat down to actually write a Rust macro. Now that I knew what I want the macro to achieve, I could start.4

The easy part was that the assembly language code should start with the .ORIG directive, to set the address at which to load the assembled bytecode; and end with the .END directive. Here’s a macro rule that does that:

(
    .ORIG $orig:literal
    $(/* magic happens here: recognize at least one asm instruction */)+
    .END
) => {{
    use $crate::assembler::{Inst::*, Program, Reg::*, SymbolTable};
    let mut instructions = Vec::new();
    let mut symtab: SymbolTable = Default::default();
    let origin: u16 = $orig;
    $(
        // more magic happens here: push a value into `instructions` for
        // each instruction recognized by the macro, and add a label to
        // the symbol table if there is one
    )*
    Program::assemble(origin, &instructions, &symtab)
}};

Easy, right? The hard part is what happens in the “magic”!

You might notice that the original LC-3 assembly language’s .ORIG directive looks like .ORIG x3000, and x3000 is decidedly not a Rust numeric literal that can be assigned to a u16.

At this point I had to decide what tradeoffs I wanted to make in the macro. Did I want to support the LC-3 assembly language from the specification exactly? It looked like I might be able to do that, x3000-formatted hex literals and all, if I scrapped what I had so far and instead wrote a procedural macro5, which operates directly on a stream of tokens from the lexer. But instead, I decided that my goal would be to support a DSL that looks approximately like the LC-3 assembly language, without making the macro too complicated.

In this case, “not making the macro too complicated” means that hex literals are Rust hex literals (0x3000 instead of x3000) and decimal literals are Rust decimal literals (15 instead of #15). That was good enough for me.

Next I had to write a matcher that would match each instruction. A line of LC-3 assembly language looks like this:6

instruction := [ label : ] opcode [ operand [ , operand ]* ] [ ; comment ] \n

So I first tried a matcher like this:

$($label:ident:)? $opcode:ident $($operands:expr),* $(; $comment:tt)

There are a few problems with this. The most pressing one is that “consume tokens until newline” is just not a thing in Rust macro matchers, so it’s not possible to ignore comments like this. Newlines are just treated like any other whitespace. There’s also no fragment specifier7 for “any token”; the closest is tt but that matches a token tree, which is not actually what I want here — I think it would mean the comment has to be valid Rust code, for one thing!

Keeping my tradeoff philosophy in mind, I gave up quickly on including semicolon-delimited comments in the macro. Regular // and /* comments would work just fine without even having to match them in the macro. Instead, I decided that each instruction would end with a semicolon, and that way I’d also avoid the problem of not being able to match newlines.

$($label:ident:)? $opcode:ident $($operands:expr),*;

The next problem is that macro matchers cannot look ahead or backtrack, so $label and $opcode are ambiguous here. If we write an identifier, it could be either a label or an opcode and we won’t know until we read the next token to see if it’s a colon or not; which is not allowed. So I made another change to the DSL, to make the colon come before the label.

With this matcher expression, I could write more of the body of the macro rule:8

(
    .ORIG $orig:literal;
    $($(:$label:ident)? $opcode:ident $($operands:expr),*;)+
    .END;
) => {{
    use $crate::assembler::{Inst::*, Program, Reg::*, SymbolTable};
    let mut instructions = Vec::new();
    let mut symtab: SymbolTable = Default::default();
    let origin: u16 = $orig;
    $(
        $(symtab.insert(stringify!($label), origin + instructions.len() as u16);)*
        // still some magic happening here...
    )*
    Program::assemble(origin, &instructions, &symtab)
}};

For each instruction matched by the macro, we insert its label (if there is one) into the symbol table to record that it should point to the current instruction. Then at the remaining “magic”, we have to insert an instruction into the code vector. I originally thought that I could do something like instructions.push($opcode($($argument),*));, in other words constructing a value of Inst directly. But that turned out to be impractical because the ADD and AND opcodes actually have two forms, one to do the operation with a value from a register, and one with a literal value. This means we actually need two different arms of the Inst enum for each of these instructions, as I listed above:

Add1(Reg, Reg, Reg),
Add2(Reg, Reg, i8),

I could have changed it so that we have to write ADD1 and ADD2 inside the asm! macro, but that seemed to me too much of a tradeoff in the wrong direction; it meant that if you wanted to copy an LC-3 assembly language listing into the asm! macro, you’d need to go over every ADD instruction and rename it to either ADD1 or ADD2, and same for AND. This would be a bigger cognitive burden than just mechanically getting the numeric literals in the right format.

Not requiring a 1-to-1 correspondence between assembly language opcodes and the Inst enum also meant I could easily define aliases for the trap routines. For example, HALT could translate to Inst::Trap(0x25) without having to define a separate Inst::Halt.

But then what to put in the “magic” part of the macro body? It seemed to me that another macro expansion could transform LEA R0, hello into Inst::Lea(Reg::R0, "hello")! I read about internal rules in the Little Book, and they seemed like a good fit for this.

So, I replaced the magic with this call to an internal rule @inst:

instructions.push(asm! {@inst $opcode $($operands),*});

And I wrote wrote a series of @inst internal rules, each of which constructs an arm of the Inst enum, such as:

(@inst ADD $dst:expr, $src:expr, $imm:literal) => { Add2($dst, $src, $imm) };
(@inst ADD $dst:expr, $src1:expr, $src2:expr)  => { Add1($dst, $src1, $src2) };
// ...
(@inst HALT)                                   => { Trap(0x25) };
// ...
(@inst LEA $dst:expr, $lbl:ident)              => { Lea($dst, stringify!($lbl)) };

Macro rules have to be written from most specific to least specific, so the rules for ADD first try to match against a literal in the third operand (e.g. ADD R0, R1, -1) and construct an Inst::Add2, and otherwise fall back to an Inst::Add1.

But unfortunately I ran into another problem here. The $lbl:ident in the LEA rule is not recognized. I’m still not 100% sure why this is, but the Little Book’s section on fragment specifiers says,

Capturing with anything but the ident, lifetime and tt fragments will render the captured AST opaque, making it impossible to further match it with other fragment specifiers in future macro invocations.

So I suppose this is because we capture the operands with $($operands:expr),*. I tried capturing them as token trees (tt) but then the macro becomes ambiguous because token trees can include the commas and semicolons that I’m using for delimitation. So, I had to rewrite the rules for opcodes that take a label as an operand, like this:

(@inst LEA $dst:expr, $lbl:literal) => { Lea($dst, $lbl) };

and now we have to write them like LEA R0, "hello" (with quotes). This is the one thing I wasn’t able to puzzle out to my satisfaction, that I wish I had been.

Finally, after writing all the @inst rules I realized I had a bug. When adding the address of a label into the symbol table, I calculated the current value of the program counter (PC) with origin + code.len(). But some instructions will translate into more than one word of bytecode: BLKW and STRINGZ.9 BLKW 8, for example, reserves a block of 8 words. This would give an incorrect address for a label occurring after any BLKW or STRINGZ instruction.

To fix this, I wrote a method for Inst to calculate the instruction’s word length:

impl Inst {
    pub fn word_len(&self) -> u16 {
        match *self {
            Inst::Blkw(len) => len,
            Inst::Stringz(s) => u16::try_from(s.len()).unwrap(),
            _ => 1,
        }
    }
}

and I changed the macro to insert the label into the symbol table pointing to the correct PC:10

$(
    symtab.insert(
        stringify!($lbl),
        origin + instructions.iter().map(|i| i.word_len()).sum::<u16>(),
    );
)*

At this point I had something that looked and worked quite a lot like how I originally envisioned the inline assembler. For comparison, the original idea was to put the LC-3 assembly language directly inside the macro:

asm! {
            .ORIG x3000
            LEA R0, hello  ; load address of string
            PUTS           ; output string to console
            HALT
    hello:  .STRINGZ "Hello World!\n"
            .END
}

Along the way I needed a few tweaks to avoid making the macro too complicated, now I had this:

asm! {
            .ORIG 0x3000;
            LEA R0, "hello";  // load address of string
            PUTS;             // output string to console
            HALT;
    :hello  STRINGZ "Hello World!\n";
            .END;
}

Just out of curiosity, I used cargo-expand (as I did in Part One) to expand the above use of the asm! macro, and I found it was quite readable:

A retro-looking cassette tape labeled "COMPUTAPE"
Now we’re ready to save the bytecode to tape. (Image by Bruno /Germany from Pixabay)
{
    use crate::assembler::{Inst::*, Program, Reg::*, SymbolTable};
    let mut instructions = Vec::new();
    let mut symtab: SymbolTable = Default::default();
    let origin: u16 = 0x3000;
    instructions.push(Lea(R0, "hello"));
    instructions.push(Trap(0x22));
    instructions.push(Trap(0x25));
    symtab.insert("hello", origin + instructions.iter().map(|i| i.word_len()).sum::<u16>());
    code.push(Stringz("Hello World!\n"));
    Program::assemble(origin, &instructions, &symtab)
}

Assembling the bytecode

I felt like the hard part was over and done with! Now all I needed was to write the Program::assemble() method. I knew already that the core of it would work like the inner loop of the VM in Part One of the series, only in reverse. Instead of using bitmatch to unpack the instruction words, I matched on Inst and used bitpack to pack the data into instruction words. Most of them were straightforward:

match inst {
    Add1(dst, src1, src2) => {
        let (d, s, a) = (*dst as u16, *src1 as u16, *src2 as u16);
        words.push(bitpack!("0001_dddsss000aaa"));
    }
    // ...etc

The instructions that take a label operand needed a bit of extra work. I had to look up the label in the symbol table, compute an offset relative to the PC, and pack that into the instruction word. This process may produce an error: the label might not exist, or might have been given more than once, or the offset might be too large to pack into the available bits (in other words, the instruction is trying to reference a label that’s too far away.)

fn integer_fits(integer: i32, bits: usize) -> Result<u16, String> {
    let shift = 32 - bits;
    if integer << shift >> shift != integer {
        Err(format!(
            "Value x{:04x} is too large to fit in {} bits",
            integer, bits
        ))
    } else {
        Ok(integer as u16)
    }
}

fn calc_offset(
    origin: u16,
    symtab: &SymbolTable,
    pc: u16,
    label: &'static str,
    bits: usize,
) -> Result<u16, String> {
    if let Some(v) = symtab.get_vec(label) {
        if v.len() != 1 {
            return Err(format!("Duplicate label \"{}\"", label));
        }
        let addr = v[0];
        let offset = addr as i32 - origin as i32 - pc as i32 - 1;
        Self::integer_fits(offset, bits).map_err(|_| {
            format!(
                "Label \"{}\" is too far away from instruction ({} words)",
                label, offset
            )
        })
    } else {
        Err(format!("Undefined label \"{}\"", label))
    }
}

The arm of the core match expression for such an instruction, for example LEA, looks like this:

Lea(dst, label) => {
    let d = *dst as u16;
    let o = Self::calc_offset(origin, symtab, pc, label, 9)
        .unwrap_or_else(append_error);
    words.push(bitpack!("1110_dddooooooooo"));
}

Here, append_error is a closure that pushes the error message returned by calc_offset() into an array: |e| { errors.push((origin + pc, e)); 0 }

Lastly, a couple of arms for the oddball instructions that define data words, not code:

Blkw(len) => words.extend(vec![0; *len as usize]),
Fill(v) => words.push(*v),
Stringz(s) => {
    words.extend(s.bytes().map(|b| b as u16));
    words.push(0);
}

At the end of the method, if there weren’t any errors, then we successfully assembled the program:

if words.len() > (0xffff - origin).into() {
    errors.push((0xffff, "Program is too large to fit in memory".to_string()));
}
if errors.is_empty() {
    Ok(Self {
        origin,
        bytecode: words,
    })
} else {
    Err(AssemblerError { errors })
}

Bells and whistles

Next thing was to add a few extensions to the assembly language to make writing programs easier. (Writing programs is what I’m going to cover in Part 3 of the series.)

While researching the LC-3 for Part 1, I found a whole lot of lab manuals and other university course material. No surprise, since the LC-3 is originally from a textbook. One document I stumbled upon was “LC3 Language Extensions” from Richard Squier’s course material at Georgetown. In it are a few handy aliases for opcodes:

  • MOV R3, R5 – copy R3 into R5; can be implemented as ADD R3, R5, 0, i.e. R5 = R3 + 0
  • ZERO R2 – clear (store zero into) R2; can be implemented as AND R2, R2, 0
  • INC R4 – increment (add one to) R4; can be implemented as ADD R4, R4, 1
  • DEC R4 – decrement (subtract one from) R4; can be implemented as ADD R4, R4, -1

Besides these, the LC-3 specification itself names RET as an alias for JMP R7. Finally, an all-zero instruction word is a no-op so I defined an alias NOP for it.11

These are straightforward to define in the macro:

(@inst MOV $dst:expr, $src:expr) => { Add2($dst, $src, 0) };
(@inst ZERO $dst:expr)           => { And2($dst, $dst, 0) };
(@inst INC $dst:expr)            => { Add2($dst, $dst, 1) };
(@inst DEC $dst:expr)            => { Add2($dst, $dst, -1) };
(@inst RET)                      => { Jmp(R7) };
(@inst NOP)                      => { Fill(0) };

I wrote the last one as Fill(0) and not as Br(false, false, false, 0) which might have been more instructive, because Br takes a &' static str for its last parameter, not an address. So I would have had to make a dummy label in the symbol table pointing to address 0. Filling a zero word seemed simpler and easier.

The final improvement I wanted was to have AssemblerError print nice error messages. I kind of glossed over AssemblerError earlier, but it is an implementation of Error that contains an array of error messages with their associated PC value:

#[derive(Debug, Clone)]
pub struct AssemblerError {
    errors: Vec<(u16, String)>,
}
impl error::Error for AssemblerError {}

I implemented Display such that it would display each error message alongside a nice hex representation of the PC where the instruction failed to assemble:

impl fmt::Display for AssemblerError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        writeln!(f, "Error assembling program")?;
        for (pc, message) in &self.errors {
            writeln!(f, "  x{:04x}: {}", pc, message)?;
        }
        Ok(())
    }
}

This still left me with a somewhat unsatisfying mix of kinds of errors. Ideally, the macro would catch all possible errors at compile time!

At compile time we can catch several kinds of errors:

// Unsyntactical assembly language program
asm!{ !!! 23 skidoo !!! }
// error: no rules expected the token `!`

HCF;  // Nonexistent instruction mnemonic
// error: no rules expected the token `HCF`

ADD R3, R2;  // Wrong number of arguments
// error: unexpected end of macro invocation
// note: while trying to match `,`
//     (@inst ADD $dst:expr, $src:expr, $imm:literal)  => { Add2($dst, $sr...
//                                    ^

ADD R3, "R2", 5;  // Wrong type of argument
// error[E0308]: mismatched types
//     ADD R3, "R2", 5;
//             ^^^^ expected `Reg`, found `&str`
//     ...al)  => { Add2($dst, $src, $imm) };
//                  ---- arguments to this enum variant are incorrect
// note: tuple variant defined here
//     Add2(/* dst: */ Reg, /* src: */ Reg, /* imm: */ i8),
//     ^^^^

BR R7;  // Wrong type of argument again
// error: no rules expected the token `R7`
// note: while trying to match meta-variable `$lbl:literal`
//     (@inst BR $lbl:literal)                         => { Br(true, true,...
//               ^^^^^^^^^^^^

These error messages from the compiler are not ideal — if I had written a dedicated assembler, I’d have made it output better error messages — but they are not terrible either.

Then there are some errors that could be caught at compile time, but not with this particular design of the macro. Although note that saying an error is caught at runtime is ambiguous here. Even if the Rust compiler doesn’t flag the error while processing the macro, we can still flag it at the time of the execution of assemble() — this is at runtime for the Rust program, but at compile time for the assembly language. It’s different from a LC-3 runtime error where the VM encounters an illegal opcode such as 0xDEAD during execution.

Anyway, this sample program contains one of each such error and shows the nice output of AssemblerError:

asm! {
            .ORIG 0x3000;
            LEA R0, "hello";   // label is a duplicate
            PUTS;
            LEA R0, "greet";   // label doesn't exist
            PUTS;
            LEA R0, "salute";  // label too far away to fit in offset
            ADD R3, R2, 0x7f;  // immediate value is out of range
            HALT;
    :hello  STRINGZ "Hello World!\n";
    :hello  STRINGZ "Good morning, planet!\n";
            BLKW 1024;
    :salute STRINGZ "Regards, globe!\n";
            BLKW 0xffff;  // extra space makes the program too big
            .END;
}?
// Error: Error assembling program
//   x3000: Duplicate label "hello"
//   x3002: Undefined label "greet"
//   x3004: Label "salute" is too far away from instruction (1061 words)
//   x3005: Value x007f is too large to fit in 5 bits
//   xffff: Program is too large to fit in memory

To check that all the labels are present only once, you need to do two passes on the input. In fact, the macro effectively does do two passes: one in the macro rules where it populates the symbol table, and one in assemble() where it reads the values back out again. But I don’t believe it’d be possible to do two passes in the macro rules themselves, to get compile time checking for this.

The out-of-range value in ADD R3, R2, 0x7f is an interesting case though! This could be caught at compile time if Rust had arbitrary bit-width integers.12 After all, TRAP -1 and TRAP 0x100 are caught at compile time because the definition of Inst::Trap(u8) does not allow you to construct one with those literal values.

I tried using types from the ux crate for this, e.g. Add2(Reg, Reg, ux::i5). But there is no support for constructing custom integer types from literals, so I would have had to use try_from() in the macro — in which case I wouldn’t get compile time errors anyway, so I didn’t bother.

My colleague Andreu Botella suggested that I could make out-of-range immediate values a compile time error by using a constant expression — something I didn’t know existed in Rust.

(@int $bits:literal $num:literal) => {{
    const _: () = {
        let converted = $num as i8;
        let shift = 8 - $bits;
        if converted << shift >> shift != converted {
            panic!("Value is too large to fit in bitfield");
        }
    };
    $num as i8
}};

(@inst ADD $dst:expr, $src:expr, $imm:literal)  => { Add2($dst, $src, asm!(@int 5 $imm)) };

I found this really clever! But on the other hand, it made having a good error message quite difficult. panic! in a constant expression cannot format values, it can only print a string literal. So you get a compile time error like this:

error[E0080]: evaluation of constant value failed
    asm! { .ORIG 0x3000; ADD R0, R0, 127; .END; }.unwrap_err();
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the evaluated program panicked at 'Value is too large to fit in bitfield'

The whole assembly language program is highlighted as the source of the error, and the message doesn’t give any clue which value is problematic. This would make it almost impossible to locate the error in a large program with many immediate values. For this reason, I decided not to adopt this technique. I found the runtime error preferable because it gives you the address of the instruction, as well as the offending value. But I did learn something!

Conclusion

At this point I was quite happy with the inline assembler macro! I was able to use it to write programs for the LC-3 without having to calculate label offsets by hand, which is all I really wanted to do in the first place. Part 3 of the series will be about some programs that I wrote to test the VM (and test my understanding of it!)

I felt like I had successfully demystified Rust macros for myself now, and would be able to write another macro if I needed to. I appreciated having the chance to gain that understanding while working on a personal project that caught my interest.13 This is — for my learning style, at least — my ideal way to learn. I hope that sharing it helps you too.

Finally, you should know that this writeup presents an idealized version of the process. In 2020 I wrote a series of posts where I journalled writing short Rust programs, including mistakes and all, and those got some attention. This post is not that. I tried many different dead ends while writing this, and if I’d chronicled all of them this post would be even longer. Here, I’ve tried to convey approximately the journey of understanding I went through, while smoothing it over so that it makes — hopefully — a good read.

Many thanks to Andreu Botella, Angelos Oikonomopoulos, and Federico Mena Quintero who read a draft of this and with their comments made it a better read than it was before.


[1] As might be the smarter, and certainly the more conventional, thing to do

[2] Tokens as in what a lexer produces: identifiers, literals, …

[3] A symbol table usually means something different, a table that contains information about a program’s variables and other identifiers. We don’t have variables in this assembly language; labels are the only symbols there are

[4] What actually happened, I just sat down and started writing and deleting and rewriting macros until I got a feel for what I needed. But this blog post is supposed to be a coherent story, not a stream-of-consciousness log, so let’s pretend that I thought about it like this first

[5] Conspicuously, still under construction in the Little Book

[6] This isn’t a real grammar for the assembly language, just an approximation

[7] Fragment specifiers, according to the Little Book, are what the expr part of $e:expr is called

[8] Notice that .ORIG and .END now have semicolons too, but no labels; they aren’t instructions, so they can’t have an address

[9] Eagle-eyed readers may note that in the LC-3 manual these are called .BLKW and .STRINGZ, with leading dots. I elided the dots, again to make the macro less complicated

[10] This now seems wasteful rather than keeping the PC in a variable. On the other hand, it seems complicated to add a bunch of local variables to the repeated part of the macro

[11] In fact, any instruction word with a value between 0x0000-0x01ff is a no-op. This is a consequence of BR‘s opcode being 0b0000, which I can only assume was chosen for exactly this reason. BR, with the three status register bits also being zero, never branches and always falls through to the next instruction

[12] Or even just i5 and i6 types

[13] Even if it took 1.5 years to complete it

Advertisement

December of Rust 2021, Part 1: A Little Computer

In the beginning of December I read Andrei Ciobanu’s Writing a simple 16-bit VM in less than 125 lines of C. Now, I’ve been interested in virtual machines and emulators for a long time, and I work tangential to VMs as part of my day job as a JavaScript engine developer for Igalia. I found this post really interesting because, well, it does what it says on the tin: A simple VM in less than 125 lines of C.

Readers of this blog, if I have any left at this point, might remember that in December 2020 I did a series of posts solving that year’s Advent of Code puzzles in order to try to teach myself how to write programs in the Rust programming language. I did say last year that if I were to do these puzzles again, I would do them at a slower pace and wouldn’t blog about them. Indeed, I started again this year, but it just wasn’t as interesting, having already gone through the progression from easy to hard puzzles and now having some idea already of the kinds of twists that they like to do in between the first and second halves of each puzzle.

So, instead of puzzles, I decided to see if I could write a similarly simple VM in Rust this December, as a continuation of my Rust learning exercise last year1.

Andrei Ciobanu’s article, as well as some other articles he cites, write VMs that simulate the LC-3 architecture2. What I liked about this one is that it was really concise and no-nonsense, and did really well at illustrating how a VM works. There are already plenty of other blog posts and GitHub repositories that implement an LC-3 VM in Rust, and I can’t say I didn’t incorporate any ideas from them, but I found many of them to be a bit verbose. I wanted to see if I could create something in the same concise spirit as Andrei’s, but still idiomatic Rust.

Over the course of a couple of weeks during my end-of-year break from work, I think I succeeded somewhat at that, and so I’m going to write a few blog posts about it.

About the virtual machine

This post is not a tutorial about virtual machines. There are already plenty of those, and Andrei’s article is already a great one so it doesn’t make sense for me to duplicate it. Instead, In this section I’ll note some things about the LC-3 architecture before we start.

First of all, it has a very spartan instruction set. Each instruction is 16 bits, and there are no variable length instructions. The opcode is packed in the topmost 4 bits of each word. That means there are at most 16 instructions. And one opcode (1101) is not even used!

Only three instructions are arithmetic-type ones: addition, bitwise AND, and bitwise NOT. If you’re used to x86 assembly language you’ll notice that other operations like subtraction, multiplication, bitwise OR, are missing. We only need these three to do all the other operations in 2’s-complement arithmetic, although it is somewhat tedious! As I started writing some LC-3 assembly language to test the VM, I learned how to implement some other arithmetic operations in terms of ADD, AND, and NOT.3 I’ll go into this in a following post.

The LC-3 does not have a stack. All the operations take place in registers. If you are used to thinking in terms of a stack machine (for example, SpiderMonkey is one), this takes some getting used to.

First steps

I started out by trying to port Andrei’s C code to Rust code in the most straightforward way possible, not worrying about whether it was idiomatic or not.

The first thing I noticed is that whereas in C it’s customary to use a mutable global, such as reserving storage for the VM’s memory and registers at the global level with declarations such as uint16_t mem[UINT16_MAX] = {0};, the Rust compiler makes this very difficult. You can use a mutable static variable, but accessing it needs to be marked as unsafe. In this way, the Rust compiler nudges you to encapsulate the storage inside a class:

struct VM {
    mem: [u16; MEM_SIZE],
    reg: [u16; NREGS],
    running: bool,
}

Next we write functions to access the memory. In the C code these are:

static inline uint16_t mr(uint16_t address) { return mem[address];  }
static inline void mw(uint16_t address, uint16_t val) { mem[address] = val; }

In Rust, we have to cast the address to a usize, since usize is the type that we index arrays with:

#[inline]
fn ld(&mut self, addr: u16) -> u16 {
    self.mem[addr as usize]
}

#[inline]
fn st(&mut self, addr: u16, val: u16) {
    self.mem[addr as usize] = val;
}

(I decide to name them ld and st for “load” and “store” instead of mr and mw, because the next thing I do is write similar functions for reading and writing the VM’s registers, which I’ll call r and rw for “register” and “register write”. These names look less similar, so I find that makes the code more readable yet still really concise.)

The next thing in the C code is a bunch of macros that do bit-manipulation operations to unpack the instructions. I decide to turn these into #[inline] functions in Rust. For example,

#define OPC(i) ((i)>>12)
#define FIMM(i) ((i>>5)&1)

from the C code, become, in Rust,

#[inline] #[rustfmt::skip] fn opc(i: u16) -> u16 { i >> 12 }
#[inline] #[rustfmt::skip] fn fimm(i: u16) -> bool { (i >> 5) & 1 != 0 }

I put #[rustfmt::skip] because I think it would be nicer if the rustfmt tool would allow you to put super-trivial functions on one line, so that they don’t take up more visual space than they deserve.

You might think that the return type of opc should be an enum. I originally tried making it that way, but Rust doesn’t make it very easy to convert between enums and integers. The num_enum crate provides a way to do this, but I ended up not using it, as you will read below.

We also need a way to load and run programs in LC-3 machine code. I made two methods of VM patterned after the ld_img() and start() functions from the C code.

First I’ll talk about ld_img(). What I really wanted to do is read the bytes of the file directly into the self.mem array, without copying, as the C code does. This is not easy to do in Rust. Whereas in C all pointers are essentially pointers to byte arrays in the end, this is not the case in Rust. It’s surprisingly difficult to express that I want to read u16s into an array of u16s! I finally found a concise solution, using both the byteorder and bytemuck crates. For this to work, you have to import the byteorder::ReadBytesExt trait into scope.

pub fn ld_img(&mut self, fname: &str, offset: u16) -> io::Result<()> {
    let mut file = fs::File::open(fname)?;
    let nwords = file.metadata()?.len() as usize / 2;
    let start = (PC_START + offset) as usize;
    file.read_u16_into::<byteorder::NetworkEndian>(bytemuck::cast_slice_mut(
        &mut self.mem[start..(start + nwords)],
    ))
}

What this does is read u16s, minding the correct byte order, into an array of u8. But we have an array of u16 that we want to store it in, not u8. So bytemuck::cast_slice_mut() treats the &mut [u16] slice as a &mut [u8] slice, essentially equivalent to casting it as (uint8_t*) in C. It does seem like this ought to be part of the Rust standard library, but the only similar facility is std::mem::transmute(), which does the same thing. But it also much more powerful things as well, and is therefore marked unsafe. (I’m trying to avoid having any code that needs to be marked unsafe in this project.)

For running the loaded machine code, I wrote this method:

pub fn start(&mut self, offset: u16) {
    self.running = true;
    self.rw(RPC, PC_START + offset);
    while self.running {
        let i = self.ld(self.r(RPC));
        self.rw(RPC, self.r(RPC) + 1);
        self.exec(i);
    }
}

I’ll talk more about what happens in self.exec() in the next section.

The basic execute loop

In the C code, Andrei cleverly builds a table of function pointers and indexes it with the opcode, in order to execute each instruction:

typedef void (*op_ex_f)(uint16_t instruction);
op_ex_f op_ex[NOPS] = { 
    br, add, ld, st, jsr, and, ldr, str, rti, not, ldi, sti, jmp, res, lea, trap 
};

// ...in main loop:
op_ex[OPC(i)](i);

Each function, such as add(), takes the instruction as a parameter, decodes it, and mutates the global state of the VM. In the main loop, at the point where I have self.exec(i) in my code, we have op_ex[OPC(i)](i) which decodes the opcode out of the instruction, indexes the table, and calls the function with the instruction as a parameter. A similar technique is used to execute the trap routines.

This approach of storing function pointers in an array and indexing it by opcode or trap vector is great in C, but is slightly cumbersome in Rust. You would have to do something like this in order to be equivalent to the C code:

type OpExF = fn(&mut VM, u16) -> ();

// in VM:
const OP_EX: [OpExF; NOPS] = [
    VM::br, VM::add, ..., VM::trap,
];

// ...in main loop:
OP_EX[opc(i) as usize](self, i);

Incidentally, this is why I decided above not to use an enum for the opcodes. Not only would you have to create it from a u16 when you unpack it from the instruction, you would also have to convert it to a usize in order to index the opcode table.

In Rust, a match expression is a much more natural fit:

match opc(i) {
    BR => {
        if (self.r(RCND) & fcnd(i) != 0) {
            self.rw(RPC, self.r(RPC) + poff9(i));
        }
    }
    ADD => {
        self.rw(dr(i), self.r(sr(i)) +
            if fimm(i) {
                sextimm(i)
            } else {
                self.r(sr2(i))
            });
        self.uf(dr(i));
    }
    // ...etc.
}

However, there is an even better alternative that makes the main loop much more concise, like the one in the C code! We can use the bitmatch crate to simultaneously match against bit patterns and decode parts out of them.

#[bitmatch]
fn exec(&mut self, i: u16) {
    #[bitmatch]
    match i {
        "0000fffooooooooo" /* BR */ => {
            if (self.r(RCND) & f != 0) {
                self.rw(RPC, self.r(RPC) + sext(o, 9));
            }
        }
        "0001dddsss0??aaa" /* ADD register */ => {
            self.rw(d, self.r(s) + self.r(a));
            self.uf(d);
        }
        "0001dddsss1mmmmm" /* ADD immediate */ => {
            self.rw(d, self.r(s) + sext(m, 5));
            self.uf(d);
        }
        // ...etc.
    }
}

This actually gets rid of the need for all the bit-manipulation functions that I wrote in the beginning, based on the C macros, such as opc(), fimm(), and poff9(), because bitmatch automatically does all the unpacking. The only bit-manipulation we still need to do is sign-extension when we unpack immediate values and offset values from the instructions, as we do above with sext(o, 9) and sext(m, 5).

I was curious what kind of code the bitmatch macros generate under the hood and whether it’s as performant as writing out all the bit-manipulations by hand. For that, I wrote a test program that matches against the same bit patterns as the main VM loop, but with the statements in the match arms just replaced by constants, in order to avoid cluttering the output:

#[bitmatch]
pub fn test(i: u16) -> u16 {
    #[bitmatch]
    match i {
        "0000fffooooooooo" => 0,
        "0001dddsss0??aaa" => 1,
        "0001dddsss1mmmmm" => 2,
        // ...etc.
    }
}

There is a handy utility for viewing expanded macros that you can install with cargo install cargo-expand, and then run with cargo expand --lib test (I put the test function in a dummy lib.rs file.)

Here’s what we get!

pub fn test(i: u16) -> u16 {
    match i {
        bits if bits & 0b1111000000000000 == 0b0000000000000000 => {
            let f = (bits & 0b111000000000) >> 9usize;
            let o = (bits & 0b111111111) >> 0usize;
            0
        }
        bits if bits & 0b1111000000100000 == 0b0001000000000000 => {
            let a = (bits & 0b111) >> 0usize;
            let d = (bits & 0b111000000000) >> 9usize;
            let s = (bits & 0b111000000) >> 6usize;
            1
        }
        bits if bits & 0b1111000000100000 == 0b0001000000100000 => {
            let d = (bits & 0b111000000000) >> 9usize;
            let m = (bits & 0b11111) >> 0usize;
            let s = (bits & 0b111000000) >> 6usize;
            2
        }
        // ...etc.
        _ => // ...some panicking code
    }
}

It’s looking a lot like what I’d written anyway, but with all the bit-manipulation functions inlined. The main disadvantage is that you have to AND the value with a bitmask at each arm of the match expression. But maybe that isn’t such a problem? Let’s look at the generated assembly to see what the computer actually executes. There is another Cargo tool for this, which you can install with cargo install cargo-asm and run with cargo asm --lib lc3::test4. In the result, there are actually only three AND instructions, because there are only three unique bitmasks tested among all the arms of the match expression (0b1111000000000000, 0b1111100000000000, and 0b1111000000100000). So it seems like the compiler is quite able to optimize this into something good.

First test runs

By the time I had implemented all the instructions except for TRAP, at this point I wanted to actually run a program on the LC-3 VM! Andrei has one program directly in his blog post, and another one in his GitHub repository, so those seemed easiest to start with.

Just like in the blog post, I wrote a program (in my examples/ directory so that it could be run with cargo r --example) to output the LC-3 machine code. It looked something like this:

let program: [u16; 7] = [
    0xf026,  // TRAP 0x26
    0x1220,  // ADD R1, R0, 0
    0xf026,  // TRAP 0x26
    0x1240,  // ADD R1, R1, R0
    0x1060,  // ADD R0, R1, 0
    0xf027,  // TRAP 0x27
    0xf025,  // TRAP 0x25
];
let mut file = fs::File::create(fname)?;
for inst in program {
    file.write_u16::<byteorder::NetworkEndian>(inst)?;
}
Ok(())

In order for this to work, I still needed to implement some of the TRAP routines. I had left those for last, and at that point my match expression for TRAP instructions looked like "1111????tttttttt" => self.trap(t), and my trap() method looked like this:

fn trap(&mut self, t: u8) {
    match t {
        _ => self.crash(&format!("Invalid TRAP vector {:#02x}", t)),
    }
}

For this program, we can see that three traps need to be implemented: 0x25 (HALT), 0x26 (INU16), and 0x27 (OUTU16). So I was able to add just three arms to my match expression:

0x25 => self.running = false,
0x26 => {
    let mut input = String::new();
    io::stdin().read_line(&mut input).unwrap_or(0);
    self.rw(0, input.trim().parse().unwrap_or(0));
}
0x27 => println!("{}", self.r(0)),

With this, I could run the sample program, type in two numbers, and print out their sum.

The second program sums an array of numbers. In this program, I added a TRAP 0x27 instruction right before the HALT in order to print out the answer, otherwise I couldn’t see if it was working! This also required changing R1 to R0 so that the sum is in R0 when we call the trap routine, and adjusting the offset in the LEA instruction.5

When I tried running this program, it crashed the VM! This is due to the instruction ADD R4, R4, x-1 which decrements R4 by adding -1 to it. R4 is the counter for how many array elements we have left to process, so initially it holds 10, and when we get to that instruction for the first time, we decrement it to 9. But if you look at the implementation of the ADD instruction that I wrote above, we are actually doing an unsigned addition of the lowest 5 bits of the instruction, sign-extended to 16 bits, so we are not literally decrementing it. We are adding 0xffff to 0x000a and expecting it to wrap around to 0x0009 like it does in C. But integer arithmetic doesn’t wrap in Rust!

Unless you specifically tell it to, that is. So we could use u16::wrapping_add() instead of the + operator to do the addition. But I got what I thought was a better idea, to use std::num::Wrapping! I rewrote the definition of VM at the top of the file:

type Word = Wrapping<u16>;

struct VM {
    mem: [Word; MEM_SIZE],
    reg: [Word; NREGS],
    running: bool,
}

This did require adding Wrapping() around some integer literals and adding .0 to unwrap to the bare u16 in some places, but on the whole it made the code more concise and readable. As an added bonus, this way, we are using the type system to express that the LC-3 processor does wrapping unsigned arithmetic. (I do wish that there were a nicer way to express literals of the Wrapping type though.)

And with that, the second example program works. It outputs 16, as expected. In a following post I’ll go on to explain some of the other test programs that I wrote.

Other niceties

At this point I decided to do some refactoring to make the Rust code more readable and hopefully more idiomatic as well. Inspired by the type alias for Word, I added several more, one for addresses and one for instructions, as well as a function to convert a word to an address:

type Addr = usize;
type Inst = u16;
type Reg = u16;
type Flag = Word;
type TrapVect = u8;

#[inline] fn adr(w: Word) -> Addr { w.0.into() }

Addr is convenient to alias to usize because that’s the type that we use to index the memory array. And Inst is convenient to alias to u16 because that’s what bitmatch works with.

In fact, using types in this way actually allowed me to catch two bugs in Andrei’s original C code, where the index of the register was used instead of the value contained in the register.

I also added a few convenience methods to VM for manipulating the program counter:

#[inline] fn pc(&self) -> Word { self.r(RPC) }
#[inline] fn jmp(&mut self, pc: Word) { self.rw(RPC, pc); }
#[inline] fn jrel(&mut self, offset: Word) { self.jmp(self.pc() + offset); }
#[inline] fn inc_pc(&mut self) { self.jrel(Wrapping(1)); }

With these, I could write a bunch of things to be more expressive and concise. For example, the start() method now looked like this:

pub fn start(&mut self, offset: u16) {
    self.running = true;
    self.jmp(PC_START + Wrapping(offset));
    while self.running {
        let i = self.ld(adr(self.pc())).0;
        self.inc_pc();
        self.exec(i);
    }
    Ok(())
}

I also added an iaddr() method to load an address indirectly from another address given as an offset relative to the program counter, to simplify the implementation of the LDI and STI instructions.

While I was at it, I noticed that the uf() (update flags) method always followed a store into a destination register, and I decided to rewrite it as one method dst(), which stores a value into a destination register and updates the flags register based on that value:

#[inline]
fn dst(&mut self, r: Reg, val: Word) {
    self.rw(r, val);
    self.rw(
        RCND,
        match val.0 {
            0 => FZ,
            1..=0x7fff => FP,
            0x8000..=0xffff => FN,
        },
    );
}

At this point, the VM’s main loop looked just about as concise and simple as the original C code did! The original:

static inline void br(uint16_t i)   { if (reg[RCND] & FCND(i)) { reg[RPC] += POFF9(i); } }
static inline void add(uint16_t i)  { reg[DR(i)] = reg[SR1(i)] + (FIMM(i) ? SEXTIMM(i) : reg[SR2(i)]); uf(DR(i)); }
static inline void ld(uint16_t i)   { reg[DR(i)] = mr(reg[RPC] + POFF9(i)); uf(DR(i)); }
static inline void st(uint16_t i)   { mw(reg[RPC] + POFF9(i), reg[DR(i)]); }
static inline void jsr(uint16_t i)  { reg[R7] = reg[RPC]; reg[RPC] = (FL(i)) ? reg[RPC] + POFF11(i) : reg[BR(i)]; }
static inline void and(uint16_t i)  { reg[DR(i)] = reg[SR1(i)] & (FIMM(i) ? SEXTIMM(i) : reg[SR2(i)]); uf(DR(i)); }
static inline void ldr(uint16_t i)  { reg[DR(i)] = mr(reg[SR1(i)] + POFF(i)); uf(DR(i)); }
static inline void str(uint16_t i)  { mw(reg[SR1(i)] + POFF(i), reg[DR(i)]); }
static inline void res(uint16_t i) {} // unused
static inline void not(uint16_t i)  { reg[DR(i)]=~reg[SR1(i)]; uf(DR(i)); }
static inline void ldi(uint16_t i)  { reg[DR(i)] = mr(mr(reg[RPC]+POFF9(i))); uf(DR(i)); }
static inline void sti(uint16_t i)  { mw(mr(reg[RPC] + POFF9(i)), reg[DR(i)]); }
static inline void jmp(uint16_t i)  { reg[RPC] = reg[BR(i)]; }
static inline void rti(uint16_t i) {} // unused
static inline void lea(uint16_t i)  { reg[DR(i)] =reg[RPC] + POFF9(i); uf(DR(i)); }
static inline void trap(uint16_t i) { trp_ex[TRP(i)-trp_offset](); }

My implementation:

#[bitmatch]
match i {
    "0000fffooooooooo" /* BR   */ => if (self.r(RCND).0 & f) != 0 { self.jrel(sext(o, 9)); },
    "0001dddsss0??aaa" /* ADD1 */ => self.dst(d, self.r(s) + self.r(a)),
    "0001dddsss1mmmmm" /* ADD2 */ => self.dst(d, self.r(s) + sext(m, 5)),
    "0010dddooooooooo" /* LD   */ => self.dst(d, self.ld(adr(self.pc() + sext(o, 9)))),
    "0011sssooooooooo" /* ST   */ => self.st(adr(self.pc() + sext(o, 9)), self.r(s)),
    "01000??bbb??????" /* JSRR */ => { self.rw(R7, self.pc()); self.jmp(self.r(b)); }
    "01001ooooooooooo" /* JSR  */ => { self.rw(R7, self.pc()); self.jrel(sext(o, 11)); }
    "0101dddsss0??aaa" /* AND1 */ => self.dst(d, self.r(s) & self.r(a)),
    "0101dddsss1mmmmm" /* AND2 */ => self.dst(d, self.r(s) & sext(m, 5)),
    "0110dddbbboooooo" /* LDR  */ => self.dst(d, self.ld(adr(self.r(b) + sext(o, 6)))),
    "0111sssbbboooooo" /* STR  */ => self.st(adr(self.r(b) + sext(o, 6)), self.r(s)),
    "1000????????????" /* n/a  */ => self.crash(&format!("Illegal instruction {:#04x}", i)),
    "1001dddsss??????" /* NOT  */ => self.dst(d, !self.r(s)),
    "1010dddooooooooo" /* LDI  */ => self.dst(d, self.ld(self.iaddr(o))),
    "1011sssooooooooo" /* STI  */ => self.st(self.iaddr(o), self.r(s)),
    "1100???bbb??????" /* JMP  */ => self.jmp(self.r(b)),
    "1101????????????" /* RTI  */ => self.crash("RTI not available in user mode"),
    "1110dddooooooooo" /* LEA  */ => self.dst(d, self.pc() + sext(o, 9)),
    "1111????tttttttt" /* TRAP */ => self.trap(t as u8),
}

Of course, you could legitimately complain that both are horrible soups of one- and two-letter identifiers.6 But I think in both of them, if you have the abbreviations close to hand (and you do, since the program is so small!) it’s actually easier to follow because everything fits well within one vertical screenful of text. The Rust version has the added bonus of the bitmatch patterns being very visual, and the reader not having to think about bit shifting in their head.

Here’s the key for abbreviations:

  • iInstruction
  • rRegister
  • dDestination register (“DR” in the LC-3 specification)
  • sSource register (“SR1”)
  • aAdditional source register (“SR2”)
  • bBase register (“BASER”)
  • m — iMmediate value
  • oOffset (6, 9, or 11 bits)
  • fFlags
  • tTrap vector7
  • pcProgram Counter
  • stSTore in memory
  • ldLoaD from memory
  • rwRegister Write
  • adr — convert machine word to memory ADdRess
  • jmpJuMP
  • dstDestination register STore and update flags
  • jrelJump RELative to the PC
  • sextSign EXTend
  • iaddr — load Indirect ADDRess

At this point the only thing left to do, to get the program to the equivalent level of functionality as the one in Andrei’s blog post, was to implement the rest of the trap routines.

Two of the trap routines involve waiting for a key press. This is actually surprisingly difficult in Rust, as far as I can tell, and definitely not as straightforward as the reg[R0] = getchar(); which you can do in C. You can use the libc crate, but libc::getchar() is marked unsafe.8 Instead, I ended up pulling in another dependency, the console crate, and adding a term: console::Term member to VM. With that, I could implement a getc() method that reads a character, stores its ASCII code in the R0 register, and returns the character itself:

fn getc(&mut self) -> char {
    let ch = self.term.read_char().unwrap_or('\0');
    let res = Wrapping(if ch.is_ascii() { ch as u16 & 0xff } else { 0 });
    self.rw(R0, res);
    ch
}

This by itself was enough to implement the GETC trap routine (which waits for a key press and stores its ASCII code in the lower 8 bits of R0) and the IN trap routine (which does the same thing but first prints a prompt, and echoes the character back to stdout) was not much more complicated:

0x20 => {
    self.getc();
}
0x23 => {
    print!("> ");
    let ch = self.getc();
    print!("{}", ch);
}

Next I wrote the OUT trap routine, which prints the lower 8 bits of R0 as an ASCII character. I wrote an ascii() function that converts the lower 8 bits of a machine word into a char:

#[inline]
fn ascii(val: Word) -> char {
    char::from_u32(val.0 as u32 & 0xff).unwrap_or('?')
}

// In the TRAP match expression:
0x21 => print!("{}", ascii(self.r(R0))),

Now the two remaining traps were PUTS (print a zero-terminated string starting at the address in R0) and PUTSP (same, but the string is packed two bytes per machine word). These two routines are very similar in that they both access a variable-length area of memory, starting at the address in R0 and ending with the next memory location that contains zero. I found a nice solution that feels very Rust-like to me, a strz_words() method that returns an iterator over exactly this region of memory:

fn strz_words(&self) -> impl Iterator<Item = &Word> {
    self.mem[adr(self.r(R0))..].iter().take_while(|&v| v.0 != 0)
}

The two trap routines differ in what they do with the items coming out of this iterator. For PUTS we convert each machine word to a char with ascii():

0x22 => print!("{}", self.strz_words().map(|&v| ascii(v)).collect::<String>()),

(It’s too bad that we have a borrowed value in the closure, otherwise we could just do map(ascii). On the positive side, collect::<String>() is really nice.)

PUTSP is a bit more complicated. It’s a neat trick to use flat_map() to convert our iterator over machine words into a twice-as-long iterator over bytes. However, we still have to collect the bytes into an intermediate vector so we can check if the last byte is zero, because the string might have an odd number of bytes. In that case we’d still have a final zero byte which we have to pop off the end, because the iterator doesn’t finish until we get a whole memory location that is zero.

0x24 => {
    let mut bytes = self
        .strz_words()
        .flat_map(|&v| v.0.to_ne_bytes())
        .collect::<Vec<_>>();
    if bytes[bytes.len() - 1] == 0 {
        bytes.pop();
    };
    print!("{}", String::from_utf8_lossy(&bytes));
}

Conclusion

At this point, what I had was mostly equivalent to what you have if you follow along with Andrei’s blog post until the end, so I’ll end this post here. Unlike the C version, it is not under 125 lines long, but it does clock in at just under 200 lines.9

After I had gotten this far, I spent some time improving what I had, making the VM a bit fancier and writing some tools to use with it. I intend to make this article into a series, and I’ll cover these improvements in following posts, starting with an assembler.

You can find the code in a GitHub repo. I have kept this first version apart, in an examples/first.rs file, but you can browse the other files in that repository if you want a sneak preview of some of the other things I’ll write about.

Many thanks to Federico Mena Quintero who gave some feedback on a draft of this post.


[1] In the intervening time, I didn’t write any Rust code at all

[2] “LC” stands for “Little Computer”. It’s a computer architecture described in a popular textbook, and as such shows up in many university courses on low-level programming. I hadn’t heard of it before, but after all, I didn’t study computer science in university

[3] I won’t lie, mostly by googling terms like “lc3 division” on the lazyweb

[4] This tool is actually really bad at finding functions unless they’re in particular locations that it expects, such as lib.rs, which is the real reason why I stuck the test function in a dummy lib.rs file

[5] Note that the offset is calculated relative to the following instruction. I assume this is so that branching to an offset of 0 takes you to the next instruction instead of looping back to the current one

[6] Especially since bitmatch makes you use one-letter variable names when unpacking

[7] i.e., memory address of the trap routine. I have no idea why the LC-3 specification calls it a vector, since it is in fact a scalar

[8] I’m not sure why. Federico tells me it is possibly because it doesn’t acquire a lock on stdin.

[9] Not counting comments, and provided we cheat a bit by sticking a few trivial functions onto one line with #[rustfmt::skip]