December of Rust Project, Part 2: The Assembler Macro

Welcome back! This is the second post of a series which turned out to be more occasional than I thought it would be. You might remember that I originally called it December of Rust 2021. Look how that worked out! Not only is it not December 2021 anymore, but also it is not December 2022 anymore.

In the previous installment, I wrote about writing a virtual machine for the LC-3 architecture in Rust, that was inspired by Andrei Ciobanu’s blog post Writing a simple 16-bit VM in less than 125 lines of C. The result was a small Rust VM with a main loop based on the bitmatch crate.

By the end of the previous post, the VM seemed to be working well. I had tested it with two example programs from Andrei Ciobanu’s blog post, and I wanted to write more programs to see if they would work as well. Unfortunately, the two test programs were tedious to create; as you might remember, I had to write a program that listed the hex code of each instruction and wrote the machine code to a file. And that was even considering that Andrei had already done the most tedious part, of assembling the instructions by hand into hex codes! Not to mention that when I wanted to add one instruction to one of the test programs, I had to adjust the offset in a neighbouring LEA instruction, which is a nightmare for debugging. This was just not going to be good enough to write any more complicated programs, because (1) I don’t have time for that, (2) I don’t have time for that, and (3) computers are better at this sort of work anyway.

Close-up photo of a vintage-looking Commodore computer
The real question is, can the LC-3 running on modern hardware emulate this badboy in all its Eurostile Bold Extended glory?? (Image by Viktorya Sergeeva, Pexels)

In this post, I will tell the story of how I wrote an assembler for the LC-3. The twist is, instead of making a standalone program that processes a text file of LC-3 assembly language into machine code1, I wanted to write it as a Rust macro, so that I could write code like this, and end up with an array of u16 machine code instructions:

asm! {
            .ORIG x3000
            LEA R0, hello  ; load address of string
            PUTS           ; output string to console
            HALT
    hello:  .STRINGZ "Hello World!\n"
            .END
}

(This example is taken from the LC-3 specification, and I’ll be using it throughout the post as a sample program.)

The sample already illustrated some features I wanted to have in the assembly language: instructions, of course; the assembler directives like .ORIG and .STRINGZ that don’t translate one-to-one to instructions; and most importantly, labels, so that I don’t have to compute offsets by hand.

Learning about Rust macros

This could be a foolish plan. I’ve written countless programs over the years that process text files, but I have never once written a Rust macro and have no idea how they work. But I have heard they are powerful and can be used to create domain-specific languages. That sounds like it could fit this purpose.

Armed with delusional it’s-all-gonna-work-out overconfidence, I searched the web for “rust macros tutorial” and landed on The Little Book of Rust Macros originally by Daniel Keep and updated by Lukas Wirth. After browsing this I understood a few facts about Rust macros:

They consist of rules, which match many types of single or repeated Rust tokens against patterns.2 So, I should be able to define rules that match the tokens that form the LC-3 assembly language.

They can pick out their inputs from among any other tokens. You provide these other tokens in the input matching rule, so you could do for example:

macro_rules! longhand_add {
    ($a:literal plus $b:literal) => { $a + $b };
}

let two = longhand_add!{ 1 plus 1 };

This is apparently how you can create domain-specific languages with Rust macros, because the tokens you match don’t have to be legal Rust code; they just have to be legal tokens. In other words, plus is fine and doesn’t have to be the name of anything in the program; but foo[% is not.

They substitute their inputs into the Rust code that is in the body of the rule. So really, in the end, macros are a way of writing repetitive code without repeating yourself.

A tangent about C macros

This last fact actually corresponds with one of the uses of C macros. C macros are useful for several purposes, for which the C preprocessor is a drastically overpowered and unsafe tool full of evil traps. Most of these purposes have alternative, less overpowered, techniques for achieving them in languages like Rust or even C++. First, compile-time constants:

#define PI 3.1416

for which Rust has constant expressions:

const PI: f64 = 3.1416;

Second, polymorphic “functions”:

#define MIN(a, b) (a) <= (b) ? (a) : (b)

for which Rust has… well, actual polymorphic functions:

fn min<T: std::cmp::PartialOrd>(a: T, b: T) -> T {
    if a <= b { a } else { b }
}

Third, conditional compilation:

#ifdef WIN32
int read_key(void) {
    // ...
}
#endif  // WIN32

for which Rust has… also conditional compilation:

#[cfg(windows)]
fn read_key() -> i32 {
    // ...
}

Fourth, redefining syntax, as mentioned above:

#define plus +
int two = 1 plus 1;

which in C you should probably never do except as a joke. But in Rust (as in the longhand_add example from earlier) at least you get a clue about what is going on because of the the longhand_add!{...} macro name surrounding the custom syntax; and the plus identifier doesn’t leak out into the rest of your program.

Lastly, code generation, which is what we want to do here in the assembler. In C it’s often complicated and this tangent is already long enough, but if you’re curious, here and here is an example of using a C preprocessor technique called X Macros to generate code that would otherwise be repetitive to write. In C, code generation using macros is a way of trading off less readability (because macros are complicated) for better maintainability (because in repeated blocks of very similar code it’s easy to make mistakes without noticing.) I imagine in Rust the tradeoff is much the same.

Designing an LC-3 assembler macro

You may remember in the previous post, in order to run a program with the VM, I had to write a small, separate Rust program to write the hand-assembled words to a file consisting of LC-3 bytecode. I could then load and run the file with the VM’s ld_img() method.

I would like to be able to write a file with the assembler macro, but I would also like to be able to write assembly language directly inline and execute it with the VM, without having to write it to a file. Something like this:

fn run_program() -> Result<()> {
    let mut vm = VM::new();
    vm.ld_asm(&asm! {
                .ORIG x3000
                LEA R0, hello  ; load address of string
                PUTS           ; output string to console
                HALT
        hello:  .STRINGZ "Hello World!\n"
                .END
    }?);
    vm.start()
}

My first thought was that I could have the asm macro expand to an array of LC-3 bytecodes. However, writing out a possible implementation for the VM.ld_asm() method shows that the asm macro needs to give two pieces of data: the origin address as well as the bytecodes.

pub fn ld_asm(&mut self, ???) {
    let mut addr = ???origin???;
    for inst in ???bytecodes??? {
        self.mem[addr] = Wrapping(*inst);
        addr += 1;
    }
}

So, it seemed better to have the asm macro expand to an expression that creates a struct with these two pieces of data in it. I started an assembler.rs submodule and called this object assembler::Program.

#[derive(Debug)]
pub struct Program {
    origin: u16,
    bytecode: Vec<u16>,
}

impl Program {
    pub fn origin(&self) -> usize {
        self.origin as usize
    }

    pub fn bytecode(&self) -> &[u16] {
        &self.bytecode
    }
}

Next, I needed to figure out how to get from LC-3 assembly language to the data model of Program. Obviously I needed the address to load the program into (origin), which is set by the .ORIG directive. But I also needed to turn the assembly language text into bytecodes somehow. Maybe the macro could do this … but at this point, my hunch from reading about Rust macros was that the macro should focus on transforming the assembly language into valid Rust code, and not so much on processing. Processing can be done in a method of Program using regular Rust code, not macro code. So the macro should just extract the information from the assembly language: a list of instructions and their operands, and a map of labels to their addresses (“symbol table”).3

#[derive(Clone, Copy, Debug)]
pub enum Reg { R0, R1, R2, R3, R4, R5, R6, R7 }

#[derive(Debug)]
pub enum Inst {
    Add1(/* dst: */ Reg, /* src1: */ Reg, /* src2: */ Reg),
    Add2(/* dst: */ Reg, /* src: */ Reg, /* imm: */ i8),
    And1(/* dst: */ Reg, /* src1: */ Reg, /* src2: */ Reg),
    And2(/* dst: */ Reg, /* src: */ Reg, /* imm: */ i8),
    // ...etc
    Trap(u8),
}

pub type SymbolTable = MultiMap<&'static str, u16>;

With all this, here’s effectively what I want the macro to extract out of the sample program I listed near the beginning of the post:

let origin: u16 = 0x3000;
let instructions = vec![
    Inst::Lea(Reg::R0, "hello"),
    Inst::Trap(0x22),
    Inst::Trap(0x25),
    Inst::Stringz("Hello world!\n"),
];
let symtab: SymbolTable = multimap!(
    "hello" => 0x3003,
);

(Remember from Part 1 that PUTS and HALT are system subroutines, called with the TRAP instruction.)

As the last step of the macro, I’ll then pass these three pieces of data to a static method of Program which will create an instance of the Program struct with the origin and bytecode in it:

Program::assemble(origin, &instructions, &symtab)

You may be surprised that I picked a multimap for the symbol table instead of just a map. In fact I originally used a map. But it’s possible for the assembly language code to include the same label twice, which is an error. I found that handling duplicate labels inside the macro made it much more complicated, whereas it was easier to handle errors in the assemble() method. But for that, we have to store two copies of the label in the symbol table so that we can determine later on that it is a duplicate.

Demystifying the magic

At this point I still hadn’t sat down to actually write a Rust macro. Now that I knew what I want the macro to achieve, I could start.4

The easy part was that the assembly language code should start with the .ORIG directive, to set the address at which to load the assembled bytecode; and end with the .END directive. Here’s a macro rule that does that:

(
    .ORIG $orig:literal
    $(/* magic happens here: recognize at least one asm instruction */)+
    .END
) => {{
    use $crate::assembler::{Inst::*, Program, Reg::*, SymbolTable};
    let mut instructions = Vec::new();
    let mut symtab: SymbolTable = Default::default();
    let origin: u16 = $orig;
    $(
        // more magic happens here: push a value into `instructions` for
        // each instruction recognized by the macro, and add a label to
        // the symbol table if there is one
    )*
    Program::assemble(origin, &instructions, &symtab)
}};

Easy, right? The hard part is what happens in the “magic”!

You might notice that the original LC-3 assembly language’s .ORIG directive looks like .ORIG x3000, and x3000 is decidedly not a Rust numeric literal that can be assigned to a u16.

At this point I had to decide what tradeoffs I wanted to make in the macro. Did I want to support the LC-3 assembly language from the specification exactly? It looked like I might be able to do that, x3000-formatted hex literals and all, if I scrapped what I had so far and instead wrote a procedural macro5, which operates directly on a stream of tokens from the lexer. But instead, I decided that my goal would be to support a DSL that looks approximately like the LC-3 assembly language, without making the macro too complicated.

In this case, “not making the macro too complicated” means that hex literals are Rust hex literals (0x3000 instead of x3000) and decimal literals are Rust decimal literals (15 instead of #15). That was good enough for me.

Next I had to write a matcher that would match each instruction. A line of LC-3 assembly language looks like this:6

instruction := [ label : ] opcode [ operand [ , operand ]* ] [ ; comment ] \n

So I first tried a matcher like this:

$($label:ident:)? $opcode:ident $($operands:expr),* $(; $comment:tt)

There are a few problems with this. The most pressing one is that “consume tokens until newline” is just not a thing in Rust macro matchers, so it’s not possible to ignore comments like this. Newlines are just treated like any other whitespace. There’s also no fragment specifier7 for “any token”; the closest is tt but that matches a token tree, which is not actually what I want here β€” I think it would mean the comment has to be valid Rust code, for one thing!

Keeping my tradeoff philosophy in mind, I gave up quickly on including semicolon-delimited comments in the macro. Regular // and /* comments would work just fine without even having to match them in the macro. Instead, I decided that each instruction would end with a semicolon, and that way I’d also avoid the problem of not being able to match newlines.

$($label:ident:)? $opcode:ident $($operands:expr),*;

The next problem is that macro matchers cannot look ahead or backtrack, so $label and $opcode are ambiguous here. If we write an identifier, it could be either a label or an opcode and we won’t know until we read the next token to see if it’s a colon or not; which is not allowed. So I made another change to the DSL, to make the colon come before the label.

With this matcher expression, I could write more of the body of the macro rule:8

(
    .ORIG $orig:literal;
    $($(:$label:ident)? $opcode:ident $($operands:expr),*;)+
    .END;
) => {{
    use $crate::assembler::{Inst::*, Program, Reg::*, SymbolTable};
    let mut instructions = Vec::new();
    let mut symtab: SymbolTable = Default::default();
    let origin: u16 = $orig;
    $(
        $(symtab.insert(stringify!($label), origin + instructions.len() as u16);)*
        // still some magic happening here...
    )*
    Program::assemble(origin, &instructions, &symtab)
}};

For each instruction matched by the macro, we insert its label (if there is one) into the symbol table to record that it should point to the current instruction. Then at the remaining “magic”, we have to insert an instruction into the code vector. I originally thought that I could do something like instructions.push($opcode($($argument),*));, in other words constructing a value of Inst directly. But that turned out to be impractical because the ADD and AND opcodes actually have two forms, one to do the operation with a value from a register, and one with a literal value. This means we actually need two different arms of the Inst enum for each of these instructions, as I listed above:

Add1(Reg, Reg, Reg),
Add2(Reg, Reg, i8),

I could have changed it so that we have to write ADD1 and ADD2 inside the asm! macro, but that seemed to me too much of a tradeoff in the wrong direction; it meant that if you wanted to copy an LC-3 assembly language listing into the asm! macro, you’d need to go over every ADD instruction and rename it to either ADD1 or ADD2, and same for AND. This would be a bigger cognitive burden than just mechanically getting the numeric literals in the right format.

Not requiring a 1-to-1 correspondence between assembly language opcodes and the Inst enum also meant I could easily define aliases for the trap routines. For example, HALT could translate to Inst::Trap(0x25) without having to define a separate Inst::Halt.

But then what to put in the “magic” part of the macro body? It seemed to me that another macro expansion could transform LEA R0, hello into Inst::Lea(Reg::R0, "hello")! I read about internal rules in the Little Book, and they seemed like a good fit for this.

So, I replaced the magic with this call to an internal rule @inst:

instructions.push(asm! {@inst $opcode $($operands),*});

And I wrote wrote a series of @inst internal rules, each of which constructs an arm of the Inst enum, such as:

(@inst ADD $dst:expr, $src:expr, $imm:literal) => { Add2($dst, $src, $imm) };
(@inst ADD $dst:expr, $src1:expr, $src2:expr)  => { Add1($dst, $src1, $src2) };
// ...
(@inst HALT)                                   => { Trap(0x25) };
// ...
(@inst LEA $dst:expr, $lbl:ident)              => { Lea($dst, stringify!($lbl)) };

Macro rules have to be written from most specific to least specific, so the rules for ADD first try to match against a literal in the third operand (e.g. ADD R0, R1, -1) and construct an Inst::Add2, and otherwise fall back to an Inst::Add1.

But unfortunately I ran into another problem here. The $lbl:ident in the LEA rule is not recognized. I’m still not 100% sure why this is, but the Little Book’s section on fragment specifiers says,

Capturing with anything but the ident, lifetime and tt fragments will render the captured AST opaque, making it impossible to further match it with other fragment specifiers in future macro invocations.

So I suppose this is because we capture the operands with $($operands:expr),*. I tried capturing them as token trees (tt) but then the macro becomes ambiguous because token trees can include the commas and semicolons that I’m using for delimitation. So, I had to rewrite the rules for opcodes that take a label as an operand, like this:

(@inst LEA $dst:expr, $lbl:literal) => { Lea($dst, $lbl) };

and now we have to write them like LEA R0, "hello" (with quotes). This is the one thing I wasn’t able to puzzle out to my satisfaction, that I wish I had been.

Finally, after writing all the @inst rules I realized I had a bug. When adding the address of a label into the symbol table, I calculated the current value of the program counter (PC) with origin + code.len(). But some instructions will translate into more than one word of bytecode: BLKW and STRINGZ.9 BLKW 8, for example, reserves a block of 8 words. This would give an incorrect address for a label occurring after any BLKW or STRINGZ instruction.

To fix this, I wrote a method for Inst to calculate the instruction’s word length:

impl Inst {
    pub fn word_len(&self) -> u16 {
        match *self {
            Inst::Blkw(len) => len,
            Inst::Stringz(s) => u16::try_from(s.len()).unwrap(),
            _ => 1,
        }
    }
}

and I changed the macro to insert the label into the symbol table pointing to the correct PC:10

$(
    symtab.insert(
        stringify!($lbl),
        origin + instructions.iter().map(|i| i.word_len()).sum::<u16>(),
    );
)*

At this point I had something that looked and worked quite a lot like how I originally envisioned the inline assembler. For comparison, the original idea was to put the LC-3 assembly language directly inside the macro:

asm! {
            .ORIG x3000
            LEA R0, hello  ; load address of string
            PUTS           ; output string to console
            HALT
    hello:  .STRINGZ "Hello World!\n"
            .END
}

Along the way I needed a few tweaks to avoid making the macro too complicated, now I had this:

asm! {
            .ORIG 0x3000;
            LEA R0, "hello";  // load address of string
            PUTS;             // output string to console
            HALT;
    :hello  STRINGZ "Hello World!\n";
            .END;
}

Just out of curiosity, I used cargo-expand (as I did in Part One) to expand the above use of the asm! macro, and I found it was quite readable:

A retro-looking cassette tape labeled "COMPUTAPE"
Now we’re ready to save the bytecode to tape. (Image by Bruno /Germany from Pixabay)
{
    use crate::assembler::{Inst::*, Program, Reg::*, SymbolTable};
    let mut instructions = Vec::new();
    let mut symtab: SymbolTable = Default::default();
    let origin: u16 = 0x3000;
    instructions.push(Lea(R0, "hello"));
    instructions.push(Trap(0x22));
    instructions.push(Trap(0x25));
    symtab.insert("hello", origin + instructions.iter().map(|i| i.word_len()).sum::<u16>());
    code.push(Stringz("Hello World!\n"));
    Program::assemble(origin, &instructions, &symtab)
}

Assembling the bytecode

I felt like the hard part was over and done with! Now all I needed was to write the Program::assemble() method. I knew already that the core of it would work like the inner loop of the VM in Part One of the series, only in reverse. Instead of using bitmatch to unpack the instruction words, I matched on Inst and used bitpack to pack the data into instruction words. Most of them were straightforward:

match inst {
    Add1(dst, src1, src2) => {
        let (d, s, a) = (*dst as u16, *src1 as u16, *src2 as u16);
        words.push(bitpack!("0001_dddsss000aaa"));
    }
    // ...etc

The instructions that take a label operand needed a bit of extra work. I had to look up the label in the symbol table, compute an offset relative to the PC, and pack that into the instruction word. This process may produce an error: the label might not exist, or might have been given more than once, or the offset might be too large to pack into the available bits (in other words, the instruction is trying to reference a label that’s too far away.)

fn integer_fits(integer: i32, bits: usize) -> Result<u16, String> {
    let shift = 32 - bits;
    if integer << shift >> shift != integer {
        Err(format!(
            "Value x{:04x} is too large to fit in {} bits",
            integer, bits
        ))
    } else {
        Ok(integer as u16)
    }
}

fn calc_offset(
    origin: u16,
    symtab: &SymbolTable,
    pc: u16,
    label: &'static str,
    bits: usize,
) -> Result<u16, String> {
    if let Some(v) = symtab.get_vec(label) {
        if v.len() != 1 {
            return Err(format!("Duplicate label \"{}\"", label));
        }
        let addr = v[0];
        let offset = addr as i32 - origin as i32 - pc as i32 - 1;
        Self::integer_fits(offset, bits).map_err(|_| {
            format!(
                "Label \"{}\" is too far away from instruction ({} words)",
                label, offset
            )
        })
    } else {
        Err(format!("Undefined label \"{}\"", label))
    }
}

The arm of the core match expression for such an instruction, for example LEA, looks like this:

Lea(dst, label) => {
    let d = *dst as u16;
    let o = Self::calc_offset(origin, symtab, pc, label, 9)
        .unwrap_or_else(append_error);
    words.push(bitpack!("1110_dddooooooooo"));
}

Here, append_error is a closure that pushes the error message returned by calc_offset() into an array: |e| { errors.push((origin + pc, e)); 0 }

Lastly, a couple of arms for the oddball instructions that define data words, not code:

Blkw(len) => words.extend(vec![0; *len as usize]),
Fill(v) => words.push(*v),
Stringz(s) => {
    words.extend(s.bytes().map(|b| b as u16));
    words.push(0);
}

At the end of the method, if there weren’t any errors, then we successfully assembled the program:

if words.len() > (0xffff - origin).into() {
    errors.push((0xffff, "Program is too large to fit in memory".to_string()));
}
if errors.is_empty() {
    Ok(Self {
        origin,
        bytecode: words,
    })
} else {
    Err(AssemblerError { errors })
}

Bells and whistles

Next thing was to add a few extensions to the assembly language to make writing programs easier. (Writing programs is what I’m going to cover in Part 3 of the series.)

While researching the LC-3 for Part 1, I found a whole lot of lab manuals and other university course material. No surprise, since the LC-3 is originally from a textbook. One document I stumbled upon was “LC3 Language Extensions” from Richard Squier’s course material at Georgetown. In it are a few handy aliases for opcodes:

  • MOV R3, R5 – copy R3 into R5; can be implemented as ADD R3, R5, 0, i.e. R5 = R3 + 0
  • ZERO R2 – clear (store zero into) R2; can be implemented as AND R2, R2, 0
  • INC R4 – increment (add one to) R4; can be implemented as ADD R4, R4, 1
  • DEC R4 – decrement (subtract one from) R4; can be implemented as ADD R4, R4, -1

Besides these, the LC-3 specification itself names RET as an alias for JMP R7. Finally, an all-zero instruction word is a no-op so I defined an alias NOP for it.11

These are straightforward to define in the macro:

(@inst MOV $dst:expr, $src:expr) => { Add2($dst, $src, 0) };
(@inst ZERO $dst:expr)           => { And2($dst, $dst, 0) };
(@inst INC $dst:expr)            => { Add2($dst, $dst, 1) };
(@inst DEC $dst:expr)            => { Add2($dst, $dst, -1) };
(@inst RET)                      => { Jmp(R7) };
(@inst NOP)                      => { Fill(0) };

I wrote the last one as Fill(0) and not as Br(false, false, false, 0) which might have been more instructive, because Br takes a &' static str for its last parameter, not an address. So I would have had to make a dummy label in the symbol table pointing to address 0. Filling a zero word seemed simpler and easier.

The final improvement I wanted was to have AssemblerError print nice error messages. I kind of glossed over AssemblerError earlier, but it is an implementation of Error that contains an array of error messages with their associated PC value:

#[derive(Debug, Clone)]
pub struct AssemblerError {
    errors: Vec<(u16, String)>,
}
impl error::Error for AssemblerError {}

I implemented Display such that it would display each error message alongside a nice hex representation of the PC where the instruction failed to assemble:

impl fmt::Display for AssemblerError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        writeln!(f, "Error assembling program")?;
        for (pc, message) in &self.errors {
            writeln!(f, "  x{:04x}: {}", pc, message)?;
        }
        Ok(())
    }
}

This still left me with a somewhat unsatisfying mix of kinds of errors. Ideally, the macro would catch all possible errors at compile time!

At compile time we can catch several kinds of errors:

// Unsyntactical assembly language program
asm!{ !!! 23 skidoo !!! }
// error: no rules expected the token `!`

HCF;  // Nonexistent instruction mnemonic
// error: no rules expected the token `HCF`

ADD R3, R2;  // Wrong number of arguments
// error: unexpected end of macro invocation
// note: while trying to match `,`
//     (@inst ADD $dst:expr, $src:expr, $imm:literal)  => { Add2($dst, $sr...
//                                    ^

ADD R3, "R2", 5;  // Wrong type of argument
// error[E0308]: mismatched types
//     ADD R3, "R2", 5;
//             ^^^^ expected `Reg`, found `&str`
//     ...al)  => { Add2($dst, $src, $imm) };
//                  ---- arguments to this enum variant are incorrect
// note: tuple variant defined here
//     Add2(/* dst: */ Reg, /* src: */ Reg, /* imm: */ i8),
//     ^^^^

BR R7;  // Wrong type of argument again
// error: no rules expected the token `R7`
// note: while trying to match meta-variable `$lbl:literal`
//     (@inst BR $lbl:literal)                         => { Br(true, true,...
//               ^^^^^^^^^^^^

These error messages from the compiler are not ideal β€” if I had written a dedicated assembler, I’d have made it output better error messages β€” but they are not terrible either.

Then there are some errors that could be caught at compile time, but not with this particular design of the macro. Although note that saying an error is caught at runtime is ambiguous here. Even if the Rust compiler doesn’t flag the error while processing the macro, we can still flag it at the time of the execution of assemble() β€” this is at runtime for the Rust program, but at compile time for the assembly language. It’s different from a LC-3 runtime error where the VM encounters an illegal opcode such as 0xDEAD during execution.

Anyway, this sample program contains one of each such error and shows the nice output of AssemblerError:

asm! {
            .ORIG 0x3000;
            LEA R0, "hello";   // label is a duplicate
            PUTS;
            LEA R0, "greet";   // label doesn't exist
            PUTS;
            LEA R0, "salute";  // label too far away to fit in offset
            ADD R3, R2, 0x7f;  // immediate value is out of range
            HALT;
    :hello  STRINGZ "Hello World!\n";
    :hello  STRINGZ "Good morning, planet!\n";
            BLKW 1024;
    :salute STRINGZ "Regards, globe!\n";
            BLKW 0xffff;  // extra space makes the program too big
            .END;
}?
// Error: Error assembling program
//   x3000: Duplicate label "hello"
//   x3002: Undefined label "greet"
//   x3004: Label "salute" is too far away from instruction (1061 words)
//   x3005: Value x007f is too large to fit in 5 bits
//   xffff: Program is too large to fit in memory

To check that all the labels are present only once, you need to do two passes on the input. In fact, the macro effectively does do two passes: one in the macro rules where it populates the symbol table, and one in assemble() where it reads the values back out again. But I don’t believe it’d be possible to do two passes in the macro rules themselves, to get compile time checking for this.

The out-of-range value in ADD R3, R2, 0x7f is an interesting case though! This could be caught at compile time if Rust had arbitrary bit-width integers.12 After all, TRAP -1 and TRAP 0x100 are caught at compile time because the definition of Inst::Trap(u8) does not allow you to construct one with those literal values.

I tried using types from the ux crate for this, e.g. Add2(Reg, Reg, ux::i5). But there is no support for constructing custom integer types from literals, so I would have had to use try_from() in the macro β€” in which case I wouldn’t get compile time errors anyway, so I didn’t bother.

My colleague Andreu Botella suggested that I could make out-of-range immediate values a compile time error by using a constant expression β€” something I didn’t know existed in Rust.

(@int $bits:literal $num:literal) => {{
    const _: () = {
        let converted = $num as i8;
        let shift = 8 - $bits;
        if converted << shift >> shift != converted {
            panic!("Value is too large to fit in bitfield");
        }
    };
    $num as i8
}};

(@inst ADD $dst:expr, $src:expr, $imm:literal)  => { Add2($dst, $src, asm!(@int 5 $imm)) };

I found this really clever! But on the other hand, it made having a good error message quite difficult. panic! in a constant expression cannot format values, it can only print a string literal. So you get a compile time error like this:

error[E0080]: evaluation of constant value failed
    asm! { .ORIG 0x3000; ADD R0, R0, 127; .END; }.unwrap_err();
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the evaluated program panicked at 'Value is too large to fit in bitfield'

The whole assembly language program is highlighted as the source of the error, and the message doesn’t give any clue which value is problematic. This would make it almost impossible to locate the error in a large program with many immediate values. For this reason, I decided not to adopt this technique. I found the runtime error preferable because it gives you the address of the instruction, as well as the offending value. But I did learn something!

Conclusion

At this point I was quite happy with the inline assembler macro! I was able to use it to write programs for the LC-3 without having to calculate label offsets by hand, which is all I really wanted to do in the first place. Part 3 of the series will be about some programs that I wrote to test the VM (and test my understanding of it!)

I felt like I had successfully demystified Rust macros for myself now, and would be able to write another macro if I needed to. I appreciated having the chance to gain that understanding while working on a personal project that caught my interest.13 This is β€” for my learning style, at least β€” my ideal way to learn. I hope that sharing it helps you too.

Finally, you should know that this writeup presents an idealized version of the process. In 2020 I wrote a series of posts where I journalled writing short Rust programs, including mistakes and all, and those got some attention. This post is not that. I tried many different dead ends while writing this, and if I’d chronicled all of them this post would be even longer. Here, I’ve tried to convey approximately the journey of understanding I went through, while smoothing it over so that it makes β€” hopefully β€” a good read.

Many thanks to Andreu Botella, Angelos Oikonomopoulos, and Federico Mena Quintero who read a draft of this and with their comments made it a better read than it was before.


[1] As might be the smarter, and certainly the more conventional, thing to do ↩

[2] Tokens as in what a lexer produces: identifiers, literals, … ↩

[3] A symbol table usually means something different, a table that contains information about a program’s variables and other identifiers. We don’t have variables in this assembly language; labels are the only symbols there are ↩

[4] What actually happened, I just sat down and started writing and deleting and rewriting macros until I got a feel for what I needed. But this blog post is supposed to be a coherent story, not a stream-of-consciousness log, so let’s pretend that I thought about it like this first ↩

[5] Conspicuously, still under construction in the Little Book ↩

[6] This isn’t a real grammar for the assembly language, just an approximation ↩

[7] Fragment specifiers, according to the Little Book, are what the expr part of $e:expr is called ↩

[8] Notice that .ORIG and .END now have semicolons too, but no labels; they aren’t instructions, so they can’t have an address ↩

[9] Eagle-eyed readers may note that in the LC-3 manual these are called .BLKW and .STRINGZ, with leading dots. I elided the dots, again to make the macro less complicated ↩

[10] This now seems wasteful rather than keeping the PC in a variable. On the other hand, it seems complicated to add a bunch of local variables to the repeated part of the macro ↩

[11] In fact, any instruction word with a value between 0x0000-0x01ff is a no-op. This is a consequence of BR‘s opcode being 0b0000, which I can only assume was chosen for exactly this reason. BR, with the three status register bits also being zero, never branches and always falls through to the next instruction ↩

[12] Or even just i5 and i6 types ↩

[13] Even if it took 1.5 years to complete it ↩

Advertisement

Comparing Apples and AppleOranges

Via Zach Holman’s blog post I found an interesting Twitter discussion that kicked off with these questions:

A couple of tough questions for all of you:
1. Is the date 2022-06-01 equal to the time 2022-06-01 12:00:00?
2. Is the date 2022-06-01 between the time 2022-06-01 12:00:00 and the time 2022-12-31 12:00:00?
3. Is the time 2022-06-01 12:00:00 after the date 2022-06-01?

I’ve been involved for two years and counting1 in the design of Temporal, an enhancement for the JavaScript language that adds modern facilities for handling dates and times. One of the principles of Temporal that was established long before I got involved, is that we should use different objects to represent different concepts. For example, if you want to represent a calendar date that’s not associated with any specific time of day, you use a class that doesn’t require you to make up a bogus time of day.2 Each class has a definition for equality, comparison, and other operations that are appropriate to the concept it represents, and you get to specify which one is appropriate for your use case by your choice of which one you use. In other, more jargony, words, Temporal offers different data types with different semantics.3

For me these questions all boil down to, when we consider a textual representation like 2022-06-01, what concept does it represent? I would say that each of these strings can represent more than one concept, and to get a good answer, you need to specify which concept you are talking about.

So, my answers to the three questions are “it depends”, “no but maybe yes”, and “it depends.” I’ll walk through why I think this, and how I would solve it with Temporal, for each question.

You can follow along or try out your own answers by going the Temporal documentation page, and opening your browser console. That will give you an environment where you can try these examples and experiment for yourself.

Question 1

Is the date 2022-06-01 equal to the time 2022-06-01 12:00:00?

As I mentioned above, Temporal has different data types with different semantics. In the case of this question, what the question refers to as a “time” we call a “date-time” in Temporal4, and the “date” is still a date. The specific types we’d use are PlainDateTime and PlainDate, respectively. PlainDate is a calendar date that doesn’t have a time associated with it: a single square on a wall calendar. PlainDateTime is a calendar date with a wall-clock time. In both cases, “plain” refers to not having a time zone attached, so we know we’re not dealing with any 23-hour or 25-hour or even more unusual day lengths.

The reason I say that the answer depends, is that you simply can’t say whether a date is equal to a date-time. They are two different concepts, so the answer is not well-defined. If you want to do that, you have to convert one to the other so that you either compare two dates, or two date-times, each with their accompanying definition of equality.

You do this in Temporal by choosing the type of object to create, PlainDate or PlainDateTime, and the resulting object’s equals() method will do the right thing:

> Temporal.PlainDate.from('2022-06-01').equals('2022-06-01 12:00:00')
true
> Temporal.PlainDateTime.from('2022-06-01').equals('2022-06-01 12:00:00')
false

I think either of PlainDate or PlainDateTime semantics could be valid based on your application, so it seems important that both are within reach of the programmer. I will say that I don’t expect PlainDateTime will get used very often in practice.5 But I can think of a use case for either one of these:

  • If you have a list of PlainDateTime events to present to a user, and you want to filter them by date. Let’s say we have data from a pedometer, where we care about what local time it was in the user’s time zone when they got their exercise, and the user has asked to see all the exercise they got yesterday. In this case I’d use date semantics: convert the PlainDateTime data to PlainDate data.
  • On the other hand, if the 2022-06-01 input comes from a date picker widget where the user could have input a time but didn’t, then we might decide that it makes sense to default the time of day to midnight, and therefore use date-time semantics.

Question 2

Is the date 2022-06-01 between the time 2022-06-01 12:00:00 and the time 2022-12-31 12:00:00?

I think the answer to this one is more unambiguously a no. If we use date-time semantics (in Temporal, PlainDateTime.compare()) the date implicitly converts to midnight on that day, so it comes before both of the date-times. If we use date semantics (PlainDate.compare()), 2022-06-01 and 2022-06-01 12:00:00 are equal as we determined in Question 1, so I wouldn’t say it’s “between” the two date-times.

> Temporal.PlainDateTime.compare('2022-06-01', '2022-06-01 12:00:00')
-1
> Temporal.PlainDateTime.compare('2022-06-01', '2022-12-31 12:00:00')
-1
> Temporal.PlainDate.compare('2022-06-01', '2022-06-01 12:00:00')
0
> Temporal.PlainDate.compare('2022-06-01', '2022-12-31 12:00:00')
-1

(Why these numbers?6 The compare methods return βˆ’1, 0, or 1, according to the convention used by Array.prototype.sort, so that you can do things like arr.sort(Temporal.PlainDate.compare). 0 means the arguments are equal and βˆ’1 means the first comes before the second.)

But maybe the answer still depends a little bit on what your definition of “between” is. If it means the date-times form a closed interval instead of an open interval, and we are using date semantics, then the answer is yes.7

Question 3

Is the time 2022-06-01 12:00:00 after the date 2022-06-01?

After thinking about the previous two questions, this should be clear. If we’re using date semantics, the two are equal, so no. If we’re using date-time semantics, and we choose to convert a date to a date-time by assuming midnight as the time, then yes.

Other people’s answers

I saw a lot of answers saying that you need more context to be able to compare the two, so I estimate that the way Temporal requires that you give that context, instead of assuming one or the other, does fit with the way that many people think. However, that wasn’t the only kind of reply I saw. (Otherwise the discussion wouldn’t have been that interesting!) I’ll discuss some of the other common replies that I saw in the Twitter thread.

“Yes, no, no: truncate to just the dates and compare those, since that’s the data you have in common.” People who said this seem like they might naturally gravitate towards date semantics. I’d estimate that date semantics are probably correct for more use cases. But maybe not your use case!

“No, no, yes: a date with no time means midnight is implicit.” People who said this seem like they might naturally gravitate towards date-time semantics. It makes sense to me that programmers think this way; if you’re missing a piece of data, just fill in 0 and keep going. I’d estimate that this isn’t how a lot of nontechnical users think of dates, though.

In this whole post I’ve assumed we assume the time is midnight when we convert a date to a date-time, but in the messy world of dates and times, it can make sense to assume other times than midnight, as well. This comes up especially if time zones are involved. For example, you might assume noon, or start-of-day, instead. Start-of-day is often, but not always midnight:

Temporal.PlainDateTime.from('2018-11-04T12:00')
  .toZonedDateTime('America/Sao_Paulo')
  .startOfDay()
  .toPlainTime()  // -> 01:00

“These need to have time zones attached for the question to make sense.” If this is your first reaction when you see a question like this, great! If you write JavaScript code, you probably make fewer bugs just by being aware that JavaScript’s Date object makes it really easy to confuse time zones.

I estimate that Temporal’s ZonedDateTime type is going to fit more use cases in practice than either PlainDate or PlainDateTime. In that sense, if you find yourself with this data and these questions in your code, it makes perfect sense to ask yourself whether you should be using a time-zone-aware type instead. But, I think I’ve given some evidence above that sometimes the answer to that is no: for example, the pedometer data that I mentioned above.

“Dates without times are 24-hour intervals.” Also mentioned as “all-day events”. I can sort of see where this comes from, but I’m not sure I agree with it. In the world where JavaScript Date is the only tool you have, it probably makes sense to think of a date as an interval. But I’d estimate that a lot of non-programmers don’t think of dates this way: instead, it’s a square on your calendar!

It’s also worth noting that in some calendar software, you can create an all-day event that lasts from 00:00 until 00:00 the following day, and you can also create an event for just the calendar date, and these are separate things.

A screenshot of calendar software showing a visual difference between one calendar event spanning 24 hours, and a second all-day event the next day.
A 24-hour interval and a calendar date. Although notably, Google Calendar collapses the 24-hour event into a calendar-date event if you do this.

“Doesn’t matter, just pick one convention and stick with it.” I hope after reading this post you’re convinced that it does matter, depending on your use case.

“Ugh!” That’s how I feel too and why I wrote a whole blog post about it!

How do I feel about the choices we made in Temporal?

I’m happy with how Temporal encourages the programmer to handle these cases. When I went to try out the comparisons that were suggested in the original tweet, I found it was natural to pick either PlainDate or PlainDateTime to represent the data.

One thing that Temporal could have done instead (and in fact, we went back and forth on this a few times before the proposal reached its currently frozen stage in the JS standardization process) would be to make the choice of data type, and therefore of comparison semantics, more explicit.

For example, one might make a case that it’s potentially confusing that the 12:00:00 part of the string in Temporal.PlainDate.from('2022-06-01').equals('2022-06-01 12:00:00') is ignored when the string is converted to a PlainDate. We could have chosen, for example, to throw if the argument to PlainDate.prototype.equals() was a string with a time in it, or if it was a PlainDateTime. That would make the code for answering question 1 look like this:

> Temporal.PlainDate.from('2022-06-01').equals(
... Temporal.PlainDateTime.from('2022-06-01 12:00:00')
... .toPlainDate())
true

This approach seems like it’s better at forcing the programmer to make a choice consciously by throwing exceptions when there is any doubt, but at the cost of writing such long-winded code that I find it difficult to follow. In the end, I prefer the more balanced approach we took.

Conclusion

This was a really interesting problem to dig into. I always find it good to be reminded that no matter what I think is correct about date-time handling, someone else is going to have a different opinion, and they won’t necessarily be wrong.

I said in the beginning of the post: “to get a good answer, you need to specify which concept you are talking about.” Something we’ve tried hard to achieve in Temporal is to make it easy and natural, but not too obtrusive, to specify this. When I went to answer the questions using Temporal code, I found it pretty straightforward, and I think that validates some of the design choices we made in Temporal.

I’d like to acknowledge my employer Igalia for letting me spend work time writing this post, as well as Bloomberg for sponsoring Igalia’s work on Temporal. Many thanks to my colleagues Tim Chevalier, Jesse Alama, and Sarah Groff Hennigh-Palermo for giving feedback on a draft of this post.


[1] 777 days at the time of writing, according to Temporal.PlainDate.from('2020-01-13').until(Temporal.Now.plainDateISO()) ↩

[2] A common source of bugs with JavaScript’s legacy Date when the made-up time of day doesn’t exist due to DST ↩

[3] “Semantics” is, unfortunately, a word I’m going to use a lot in this post ↩

[4] “Time” in Temporal refers to a time on a clock face, with no date associated with it ↩

[5] We even say this on the PlainDateTime documentation page ↩

[6] We don’t have methods like isBefore()/isAfter() in Temporal, but this is a place where they’d be useful. These methods seem like good contenders for a follow-up proposal in the future ↩

[7] Intervals bring all sorts of tricky questions too! Some other date-time libraries have interval objects. We also don’t have these in Temporal, but are likewise open to a follow-up proposal in the future ↩

December of Rust 2021, Part 1: A Little Computer

In the beginning of December I read Andrei Ciobanu’s Writing a simple 16-bit VM in less than 125 lines of C. Now, I’ve been interested in virtual machines and emulators for a long time, and I work tangential to VMs as part of my day job as a JavaScript engine developer for Igalia. I found this post really interesting because, well, it does what it says on the tin: A simple VM in less than 125 lines of C.

Readers of this blog, if I have any left at this point, might remember that in December 2020 I did a series of posts solving that year’s Advent of Code puzzles in order to try to teach myself how to write programs in the Rust programming language. I did say last year that if I were to do these puzzles again, I would do them at a slower pace and wouldn’t blog about them. Indeed, I started again this year, but it just wasn’t as interesting, having already gone through the progression from easy to hard puzzles and now having some idea already of the kinds of twists that they like to do in between the first and second halves of each puzzle.

So, instead of puzzles, I decided to see if I could write a similarly simple VM in Rust this December, as a continuation of my Rust learning exercise last year1.

Andrei Ciobanu’s article, as well as some other articles he cites, write VMs that simulate the LC-3 architecture2. What I liked about this one is that it was really concise and no-nonsense, and did really well at illustrating how a VM works. There are already plenty of other blog posts and GitHub repositories that implement an LC-3 VM in Rust, and I can’t say I didn’t incorporate any ideas from them, but I found many of them to be a bit verbose. I wanted to see if I could create something in the same concise spirit as Andrei’s, but still idiomatic Rust.

Over the course of a couple of weeks during my end-of-year break from work, I think I succeeded somewhat at that, and so I’m going to write a few blog posts about it.

About the virtual machine

This post is not a tutorial about virtual machines. There are already plenty of those, and Andrei’s article is already a great one so it doesn’t make sense for me to duplicate it. Instead, In this section I’ll note some things about the LC-3 architecture before we start.

First of all, it has a very spartan instruction set. Each instruction is 16 bits, and there are no variable length instructions. The opcode is packed in the topmost 4 bits of each word. That means there are at most 16 instructions. And one opcode (1101) is not even used!

Only three instructions are arithmetic-type ones: addition, bitwise AND, and bitwise NOT. If you’re used to x86 assembly language you’ll notice that other operations like subtraction, multiplication, bitwise OR, are missing. We only need these three to do all the other operations in 2’s-complement arithmetic, although it is somewhat tedious! As I started writing some LC-3 assembly language to test the VM, I learned how to implement some other arithmetic operations in terms of ADD, AND, and NOT.3 I’ll go into this in a following post.

The LC-3 does not have a stack. All the operations take place in registers. If you are used to thinking in terms of a stack machine (for example, SpiderMonkey is one), this takes some getting used to.

First steps

I started out by trying to port Andrei’s C code to Rust code in the most straightforward way possible, not worrying about whether it was idiomatic or not.

The first thing I noticed is that whereas in C it’s customary to use a mutable global, such as reserving storage for the VM’s memory and registers at the global level with declarations such as uint16_t mem[UINT16_MAX] = {0};, the Rust compiler makes this very difficult. You can use a mutable static variable, but accessing it needs to be marked as unsafe. In this way, the Rust compiler nudges you to encapsulate the storage inside a class:

struct VM {
    mem: [u16; MEM_SIZE],
    reg: [u16; NREGS],
    running: bool,
}

Next we write functions to access the memory. In the C code these are:

static inline uint16_t mr(uint16_t address) { return mem[address];  }
static inline void mw(uint16_t address, uint16_t val) { mem[address] = val; }

In Rust, we have to cast the address to a usize, since usize is the type that we index arrays with:

#[inline]
fn ld(&mut self, addr: u16) -> u16 {
    self.mem[addr as usize]
}

#[inline]
fn st(&mut self, addr: u16, val: u16) {
    self.mem[addr as usize] = val;
}

(I decide to name them ld and st for “load” and “store” instead of mr and mw, because the next thing I do is write similar functions for reading and writing the VM’s registers, which I’ll call r and rw for “register” and “register write”. These names look less similar, so I find that makes the code more readable yet still really concise.)

The next thing in the C code is a bunch of macros that do bit-manipulation operations to unpack the instructions. I decide to turn these into #[inline] functions in Rust. For example,

#define OPC(i) ((i)>>12)
#define FIMM(i) ((i>>5)&1)

from the C code, become, in Rust,

#[inline] #[rustfmt::skip] fn opc(i: u16) -> u16 { i >> 12 }
#[inline] #[rustfmt::skip] fn fimm(i: u16) -> bool { (i >> 5) & 1 != 0 }

I put #[rustfmt::skip] because I think it would be nicer if the rustfmt tool would allow you to put super-trivial functions on one line, so that they don’t take up more visual space than they deserve.

You might think that the return type of opc should be an enum. I originally tried making it that way, but Rust doesn’t make it very easy to convert between enums and integers. The num_enum crate provides a way to do this, but I ended up not using it, as you will read below.

We also need a way to load and run programs in LC-3 machine code. I made two methods of VM patterned after the ld_img() and start() functions from the C code.

First I’ll talk about ld_img(). What I really wanted to do is read the bytes of the file directly into the self.mem array, without copying, as the C code does. This is not easy to do in Rust. Whereas in C all pointers are essentially pointers to byte arrays in the end, this is not the case in Rust. It’s surprisingly difficult to express that I want to read u16s into an array of u16s! I finally found a concise solution, using both the byteorder and bytemuck crates. For this to work, you have to import the byteorder::ReadBytesExt trait into scope.

pub fn ld_img(&mut self, fname: &str, offset: u16) -> io::Result<()> {
    let mut file = fs::File::open(fname)?;
    let nwords = file.metadata()?.len() as usize / 2;
    let start = (PC_START + offset) as usize;
    file.read_u16_into::<byteorder::NetworkEndian>(bytemuck::cast_slice_mut(
        &mut self.mem[start..(start + nwords)],
    ))
}

What this does is read u16s, minding the correct byte order, into an array of u8. But we have an array of u16 that we want to store it in, not u8. So bytemuck::cast_slice_mut() treats the &mut [u16] slice as a &mut [u8] slice, essentially equivalent to casting it as (uint8_t*) in C. It does seem like this ought to be part of the Rust standard library, but the only similar facility is std::mem::transmute(), which does the same thing. But it also much more powerful things as well, and is therefore marked unsafe. (I’m trying to avoid having any code that needs to be marked unsafe in this project.)

For running the loaded machine code, I wrote this method:

pub fn start(&mut self, offset: u16) {
    self.running = true;
    self.rw(RPC, PC_START + offset);
    while self.running {
        let i = self.ld(self.r(RPC));
        self.rw(RPC, self.r(RPC) + 1);
        self.exec(i);
    }
}

I’ll talk more about what happens in self.exec() in the next section.

The basic execute loop

In the C code, Andrei cleverly builds a table of function pointers and indexes it with the opcode, in order to execute each instruction:

typedef void (*op_ex_f)(uint16_t instruction);
op_ex_f op_ex[NOPS] = { 
    br, add, ld, st, jsr, and, ldr, str, rti, not, ldi, sti, jmp, res, lea, trap 
};

// ...in main loop:
op_ex[OPC(i)](i);

Each function, such as add(), takes the instruction as a parameter, decodes it, and mutates the global state of the VM. In the main loop, at the point where I have self.exec(i) in my code, we have op_ex[OPC(i)](i) which decodes the opcode out of the instruction, indexes the table, and calls the function with the instruction as a parameter. A similar technique is used to execute the trap routines.

This approach of storing function pointers in an array and indexing it by opcode or trap vector is great in C, but is slightly cumbersome in Rust. You would have to do something like this in order to be equivalent to the C code:

type OpExF = fn(&mut VM, u16) -> ();

// in VM:
const OP_EX: [OpExF; NOPS] = [
    VM::br, VM::add, ..., VM::trap,
];

// ...in main loop:
OP_EX[opc(i) as usize](self, i);

Incidentally, this is why I decided above not to use an enum for the opcodes. Not only would you have to create it from a u16 when you unpack it from the instruction, you would also have to convert it to a usize in order to index the opcode table.

In Rust, a match expression is a much more natural fit:

match opc(i) {
    BR => {
        if (self.r(RCND) & fcnd(i) != 0) {
            self.rw(RPC, self.r(RPC) + poff9(i));
        }
    }
    ADD => {
        self.rw(dr(i), self.r(sr(i)) +
            if fimm(i) {
                sextimm(i)
            } else {
                self.r(sr2(i))
            });
        self.uf(dr(i));
    }
    // ...etc.
}

However, there is an even better alternative that makes the main loop much more concise, like the one in the C code! We can use the bitmatch crate to simultaneously match against bit patterns and decode parts out of them.

#[bitmatch]
fn exec(&mut self, i: u16) {
    #[bitmatch]
    match i {
        "0000fffooooooooo" /* BR */ => {
            if (self.r(RCND) & f != 0) {
                self.rw(RPC, self.r(RPC) + sext(o, 9));
            }
        }
        "0001dddsss0??aaa" /* ADD register */ => {
            self.rw(d, self.r(s) + self.r(a));
            self.uf(d);
        }
        "0001dddsss1mmmmm" /* ADD immediate */ => {
            self.rw(d, self.r(s) + sext(m, 5));
            self.uf(d);
        }
        // ...etc.
    }
}

This actually gets rid of the need for all the bit-manipulation functions that I wrote in the beginning, based on the C macros, such as opc(), fimm(), and poff9(), because bitmatch automatically does all the unpacking. The only bit-manipulation we still need to do is sign-extension when we unpack immediate values and offset values from the instructions, as we do above with sext(o, 9) and sext(m, 5).

I was curious what kind of code the bitmatch macros generate under the hood and whether it’s as performant as writing out all the bit-manipulations by hand. For that, I wrote a test program that matches against the same bit patterns as the main VM loop, but with the statements in the match arms just replaced by constants, in order to avoid cluttering the output:

#[bitmatch]
pub fn test(i: u16) -> u16 {
    #[bitmatch]
    match i {
        "0000fffooooooooo" => 0,
        "0001dddsss0??aaa" => 1,
        "0001dddsss1mmmmm" => 2,
        // ...etc.
    }
}

There is a handy utility for viewing expanded macros that you can install with cargo install cargo-expand, and then run with cargo expand --lib test (I put the test function in a dummy lib.rs file.)

Here’s what we get!

pub fn test(i: u16) -> u16 {
    match i {
        bits if bits & 0b1111000000000000 == 0b0000000000000000 => {
            let f = (bits & 0b111000000000) >> 9usize;
            let o = (bits & 0b111111111) >> 0usize;
            0
        }
        bits if bits & 0b1111000000100000 == 0b0001000000000000 => {
            let a = (bits & 0b111) >> 0usize;
            let d = (bits & 0b111000000000) >> 9usize;
            let s = (bits & 0b111000000) >> 6usize;
            1
        }
        bits if bits & 0b1111000000100000 == 0b0001000000100000 => {
            let d = (bits & 0b111000000000) >> 9usize;
            let m = (bits & 0b11111) >> 0usize;
            let s = (bits & 0b111000000) >> 6usize;
            2
        }
        // ...etc.
        _ => // ...some panicking code
    }
}

It’s looking a lot like what I’d written anyway, but with all the bit-manipulation functions inlined. The main disadvantage is that you have to AND the value with a bitmask at each arm of the match expression. But maybe that isn’t such a problem? Let’s look at the generated assembly to see what the computer actually executes. There is another Cargo tool for this, which you can install with cargo install cargo-asm and run with cargo asm --lib lc3::test4. In the result, there are actually only three AND instructions, because there are only three unique bitmasks tested among all the arms of the match expression (0b1111000000000000, 0b1111100000000000, and 0b1111000000100000). So it seems like the compiler is quite able to optimize this into something good.

First test runs

By the time I had implemented all the instructions except for TRAP, at this point I wanted to actually run a program on the LC-3 VM! Andrei has one program directly in his blog post, and another one in his GitHub repository, so those seemed easiest to start with.

Just like in the blog post, I wrote a program (in my examples/ directory so that it could be run with cargo r --example) to output the LC-3 machine code. It looked something like this:

let program: [u16; 7] = [
    0xf026,  // TRAP 0x26
    0x1220,  // ADD R1, R0, 0
    0xf026,  // TRAP 0x26
    0x1240,  // ADD R1, R1, R0
    0x1060,  // ADD R0, R1, 0
    0xf027,  // TRAP 0x27
    0xf025,  // TRAP 0x25
];
let mut file = fs::File::create(fname)?;
for inst in program {
    file.write_u16::<byteorder::NetworkEndian>(inst)?;
}
Ok(())

In order for this to work, I still needed to implement some of the TRAP routines. I had left those for last, and at that point my match expression for TRAP instructions looked like "1111????tttttttt" => self.trap(t), and my trap() method looked like this:

fn trap(&mut self, t: u8) {
    match t {
        _ => self.crash(&format!("Invalid TRAP vector {:#02x}", t)),
    }
}

For this program, we can see that three traps need to be implemented: 0x25 (HALT), 0x26 (INU16), and 0x27 (OUTU16). So I was able to add just three arms to my match expression:

0x25 => self.running = false,
0x26 => {
    let mut input = String::new();
    io::stdin().read_line(&mut input).unwrap_or(0);
    self.rw(0, input.trim().parse().unwrap_or(0));
}
0x27 => println!("{}", self.r(0)),

With this, I could run the sample program, type in two numbers, and print out their sum.

The second program sums an array of numbers. In this program, I added a TRAP 0x27 instruction right before the HALT in order to print out the answer, otherwise I couldn’t see if it was working! This also required changing R1 to R0 so that the sum is in R0 when we call the trap routine, and adjusting the offset in the LEA instruction.5

When I tried running this program, it crashed the VM! This is due to the instruction ADD R4, R4, x-1 which decrements R4 by adding -1 to it. R4 is the counter for how many array elements we have left to process, so initially it holds 10, and when we get to that instruction for the first time, we decrement it to 9. But if you look at the implementation of the ADD instruction that I wrote above, we are actually doing an unsigned addition of the lowest 5 bits of the instruction, sign-extended to 16 bits, so we are not literally decrementing it. We are adding 0xffff to 0x000a and expecting it to wrap around to 0x0009 like it does in C. But integer arithmetic doesn’t wrap in Rust!

Unless you specifically tell it to, that is. So we could use u16::wrapping_add() instead of the + operator to do the addition. But I got what I thought was a better idea, to use std::num::Wrapping! I rewrote the definition of VM at the top of the file:

type Word = Wrapping<u16>;

struct VM {
    mem: [Word; MEM_SIZE],
    reg: [Word; NREGS],
    running: bool,
}

This did require adding Wrapping() around some integer literals and adding .0 to unwrap to the bare u16 in some places, but on the whole it made the code more concise and readable. As an added bonus, this way, we are using the type system to express that the LC-3 processor does wrapping unsigned arithmetic. (I do wish that there were a nicer way to express literals of the Wrapping type though.)

And with that, the second example program works. It outputs 16, as expected. In a following post I’ll go on to explain some of the other test programs that I wrote.

Other niceties

At this point I decided to do some refactoring to make the Rust code more readable and hopefully more idiomatic as well. Inspired by the type alias for Word, I added several more, one for addresses and one for instructions, as well as a function to convert a word to an address:

type Addr = usize;
type Inst = u16;
type Reg = u16;
type Flag = Word;
type TrapVect = u8;

#[inline] fn adr(w: Word) -> Addr { w.0.into() }

Addr is convenient to alias to usize because that’s the type that we use to index the memory array. And Inst is convenient to alias to u16 because that’s what bitmatch works with.

In fact, using types in this way actually allowed me to catch two bugs in Andrei’s original C code, where the index of the register was used instead of the value contained in the register.

I also added a few convenience methods to VM for manipulating the program counter:

#[inline] fn pc(&self) -> Word { self.r(RPC) }
#[inline] fn jmp(&mut self, pc: Word) { self.rw(RPC, pc); }
#[inline] fn jrel(&mut self, offset: Word) { self.jmp(self.pc() + offset); }
#[inline] fn inc_pc(&mut self) { self.jrel(Wrapping(1)); }

With these, I could write a bunch of things to be more expressive and concise. For example, the start() method now looked like this:

pub fn start(&mut self, offset: u16) {
    self.running = true;
    self.jmp(PC_START + Wrapping(offset));
    while self.running {
        let i = self.ld(adr(self.pc())).0;
        self.inc_pc();
        self.exec(i);
    }
    Ok(())
}

I also added an iaddr() method to load an address indirectly from another address given as an offset relative to the program counter, to simplify the implementation of the LDI and STI instructions.

While I was at it, I noticed that the uf() (update flags) method always followed a store into a destination register, and I decided to rewrite it as one method dst(), which stores a value into a destination register and updates the flags register based on that value:

#[inline]
fn dst(&mut self, r: Reg, val: Word) {
    self.rw(r, val);
    self.rw(
        RCND,
        match val.0 {
            0 => FZ,
            1..=0x7fff => FP,
            0x8000..=0xffff => FN,
        },
    );
}

At this point, the VM’s main loop looked just about as concise and simple as the original C code did! The original:

static inline void br(uint16_t i)   { if (reg[RCND] & FCND(i)) { reg[RPC] += POFF9(i); } }
static inline void add(uint16_t i)  { reg[DR(i)] = reg[SR1(i)] + (FIMM(i) ? SEXTIMM(i) : reg[SR2(i)]); uf(DR(i)); }
static inline void ld(uint16_t i)   { reg[DR(i)] = mr(reg[RPC] + POFF9(i)); uf(DR(i)); }
static inline void st(uint16_t i)   { mw(reg[RPC] + POFF9(i), reg[DR(i)]); }
static inline void jsr(uint16_t i)  { reg[R7] = reg[RPC]; reg[RPC] = (FL(i)) ? reg[RPC] + POFF11(i) : reg[BR(i)]; }
static inline void and(uint16_t i)  { reg[DR(i)] = reg[SR1(i)] & (FIMM(i) ? SEXTIMM(i) : reg[SR2(i)]); uf(DR(i)); }
static inline void ldr(uint16_t i)  { reg[DR(i)] = mr(reg[SR1(i)] + POFF(i)); uf(DR(i)); }
static inline void str(uint16_t i)  { mw(reg[SR1(i)] + POFF(i), reg[DR(i)]); }
static inline void res(uint16_t i) {} // unused
static inline void not(uint16_t i)  { reg[DR(i)]=~reg[SR1(i)]; uf(DR(i)); }
static inline void ldi(uint16_t i)  { reg[DR(i)] = mr(mr(reg[RPC]+POFF9(i))); uf(DR(i)); }
static inline void sti(uint16_t i)  { mw(mr(reg[RPC] + POFF9(i)), reg[DR(i)]); }
static inline void jmp(uint16_t i)  { reg[RPC] = reg[BR(i)]; }
static inline void rti(uint16_t i) {} // unused
static inline void lea(uint16_t i)  { reg[DR(i)] =reg[RPC] + POFF9(i); uf(DR(i)); }
static inline void trap(uint16_t i) { trp_ex[TRP(i)-trp_offset](); }

My implementation:

#[bitmatch]
match i {
    "0000fffooooooooo" /* BR   */ => if (self.r(RCND).0 & f) != 0 { self.jrel(sext(o, 9)); },
    "0001dddsss0??aaa" /* ADD1 */ => self.dst(d, self.r(s) + self.r(a)),
    "0001dddsss1mmmmm" /* ADD2 */ => self.dst(d, self.r(s) + sext(m, 5)),
    "0010dddooooooooo" /* LD   */ => self.dst(d, self.ld(adr(self.pc() + sext(o, 9)))),
    "0011sssooooooooo" /* ST   */ => self.st(adr(self.pc() + sext(o, 9)), self.r(s)),
    "01000??bbb??????" /* JSRR */ => { self.rw(R7, self.pc()); self.jmp(self.r(b)); }
    "01001ooooooooooo" /* JSR  */ => { self.rw(R7, self.pc()); self.jrel(sext(o, 11)); }
    "0101dddsss0??aaa" /* AND1 */ => self.dst(d, self.r(s) & self.r(a)),
    "0101dddsss1mmmmm" /* AND2 */ => self.dst(d, self.r(s) & sext(m, 5)),
    "0110dddbbboooooo" /* LDR  */ => self.dst(d, self.ld(adr(self.r(b) + sext(o, 6)))),
    "0111sssbbboooooo" /* STR  */ => self.st(adr(self.r(b) + sext(o, 6)), self.r(s)),
    "1000????????????" /* n/a  */ => self.crash(&format!("Illegal instruction {:#04x}", i)),
    "1001dddsss??????" /* NOT  */ => self.dst(d, !self.r(s)),
    "1010dddooooooooo" /* LDI  */ => self.dst(d, self.ld(self.iaddr(o))),
    "1011sssooooooooo" /* STI  */ => self.st(self.iaddr(o), self.r(s)),
    "1100???bbb??????" /* JMP  */ => self.jmp(self.r(b)),
    "1101????????????" /* RTI  */ => self.crash("RTI not available in user mode"),
    "1110dddooooooooo" /* LEA  */ => self.dst(d, self.pc() + sext(o, 9)),
    "1111????tttttttt" /* TRAP */ => self.trap(t as u8),
}

Of course, you could legitimately complain that both are horrible soups of one- and two-letter identifiers.6 But I think in both of them, if you have the abbreviations close to hand (and you do, since the program is so small!) it’s actually easier to follow because everything fits well within one vertical screenful of text. The Rust version has the added bonus of the bitmatch patterns being very visual, and the reader not having to think about bit shifting in their head.

Here’s the key for abbreviations:

  • i β€” Instruction
  • r β€” Register
  • d β€” Destination register (“DR” in the LC-3 specification)
  • s β€” Source register (“SR1”)
  • a β€” Additional source register (“SR2”)
  • b β€” Base register (“BASER”)
  • m β€” iMmediate value
  • o β€” Offset (6, 9, or 11 bits)
  • f β€” Flags
  • t β€” Trap vector7
  • pc β€” Program Counter
  • st β€” STore in memory
  • ld β€” LoaD from memory
  • rw β€” Register Write
  • adr β€” convert machine word to memory ADdRess
  • jmp β€” JuMP
  • dst β€” Destination register STore and update flags
  • jrel β€” Jump RELative to the PC
  • sext β€” Sign EXTend
  • iaddr β€” load Indirect ADDRess

At this point the only thing left to do, to get the program to the equivalent level of functionality as the one in Andrei’s blog post, was to implement the rest of the trap routines.

Two of the trap routines involve waiting for a key press. This is actually surprisingly difficult in Rust, as far as I can tell, and definitely not as straightforward as the reg[R0] = getchar(); which you can do in C. You can use the libc crate, but libc::getchar() is marked unsafe.8 Instead, I ended up pulling in another dependency, the console crate, and adding a term: console::Term member to VM. With that, I could implement a getc() method that reads a character, stores its ASCII code in the R0 register, and returns the character itself:

fn getc(&mut self) -> char {
    let ch = self.term.read_char().unwrap_or('\0');
    let res = Wrapping(if ch.is_ascii() { ch as u16 & 0xff } else { 0 });
    self.rw(R0, res);
    ch
}

This by itself was enough to implement the GETC trap routine (which waits for a key press and stores its ASCII code in the lower 8 bits of R0) and the IN trap routine (which does the same thing but first prints a prompt, and echoes the character back to stdout) was not much more complicated:

0x20 => {
    self.getc();
}
0x23 => {
    print!("> ");
    let ch = self.getc();
    print!("{}", ch);
}

Next I wrote the OUT trap routine, which prints the lower 8 bits of R0 as an ASCII character. I wrote an ascii() function that converts the lower 8 bits of a machine word into a char:

#[inline]
fn ascii(val: Word) -> char {
    char::from_u32(val.0 as u32 & 0xff).unwrap_or('?')
}

// In the TRAP match expression:
0x21 => print!("{}", ascii(self.r(R0))),

Now the two remaining traps were PUTS (print a zero-terminated string starting at the address in R0) and PUTSP (same, but the string is packed two bytes per machine word). These two routines are very similar in that they both access a variable-length area of memory, starting at the address in R0 and ending with the next memory location that contains zero. I found a nice solution that feels very Rust-like to me, a strz_words() method that returns an iterator over exactly this region of memory:

fn strz_words(&self) -> impl Iterator<Item = &Word> {
    self.mem[adr(self.r(R0))..].iter().take_while(|&v| v.0 != 0)
}

The two trap routines differ in what they do with the items coming out of this iterator. For PUTS we convert each machine word to a char with ascii():

0x22 => print!("{}", self.strz_words().map(|&v| ascii(v)).collect::<String>()),

(It’s too bad that we have a borrowed value in the closure, otherwise we could just do map(ascii). On the positive side, collect::<String>() is really nice.)

PUTSP is a bit more complicated. It’s a neat trick to use flat_map() to convert our iterator over machine words into a twice-as-long iterator over bytes. However, we still have to collect the bytes into an intermediate vector so we can check if the last byte is zero, because the string might have an odd number of bytes. In that case we’d still have a final zero byte which we have to pop off the end, because the iterator doesn’t finish until we get a whole memory location that is zero.

0x24 => {
    let mut bytes = self
        .strz_words()
        .flat_map(|&v| v.0.to_ne_bytes())
        .collect::<Vec<_>>();
    if bytes[bytes.len() - 1] == 0 {
        bytes.pop();
    };
    print!("{}", String::from_utf8_lossy(&bytes));
}

Conclusion

At this point, what I had was mostly equivalent to what you have if you follow along with Andrei’s blog post until the end, so I’ll end this post here. Unlike the C version, it is not under 125 lines long, but it does clock in at just under 200 lines.9

After I had gotten this far, I spent some time improving what I had, making the VM a bit fancier and writing some tools to use with it. I intend to make this article into a series, and I’ll cover these improvements in following posts, starting with an assembler.

You can find the code in a GitHub repo. I have kept this first version apart, in an examples/first.rs file, but you can browse the other files in that repository if you want a sneak preview of some of the other things I’ll write about.

Many thanks to Federico Mena Quintero who gave some feedback on a draft of this post.


[1] In the intervening time, I didn’t write any Rust code at all ↩

[2] “LC” stands for “Little Computer”. It’s a computer architecture described in a popular textbook, and as such shows up in many university courses on low-level programming. I hadn’t heard of it before, but after all, I didn’t study computer science in university ↩

[3] I won’t lie, mostly by googling terms like “lc3 division” on the lazyweb ↩

[4] This tool is actually really bad at finding functions unless they’re in particular locations that it expects, such as lib.rs, which is the real reason why I stuck the test function in a dummy lib.rs file ↩

[5] Note that the offset is calculated relative to the following instruction. I assume this is so that branching to an offset of 0 takes you to the next instruction instead of looping back to the current one ↩

[6] Especially since bitmatch makes you use one-letter variable names when unpacking ↩

[7] i.e., memory address of the trap routine. I have no idea why the LC-3 specification calls it a vector, since it is in fact a scalar ↩

[8] I’m not sure why. Federico tells me it is possibly because it doesn’t acquire a lock on stdin. ↩

[9] Not counting comments, and provided we cheat a bit by sticking a few trivial functions onto one line with #[rustfmt::skip] ↩

Why Don’t You Just

If there was one phrase I could snap my fingers to make unutterable forevermore, it would be “why don’t you just …?”1

Someone is talking about a problem, or an achievement, or an idea, and the inevitable dismissive smartass pipes up and asks “Why don’t you just (move to a different apartment / adopt / rewrite it in Rust / eat more vegetables / mount your FTP account locally with curlftpfs, and then use SVN on the mounted filesystem)?”

“Why don’t you just?” is so second nature in my old field, physics, that XKCD made a comic about it: “So why does your field need a whole journal, anyway?”

“Why don’t you just?” means “I’ve thought about what you said for five seconds and I’m so smart that I really believe what I came up with in those five seconds is more valuable than whatever YOU thought of.”

“Why don’t you just?” means “I don’t trust you to do your own thinking.”

Sometimes “Why don’t you just?” even means “let me show you how smart I am by asking you why you didn’t just do something that I know won’t even work, and make YOU explain it to test if you’re as smart as me.”

The only possible answer to “Why don’t you just?” should be “If I could just, then I would just.”

…most of the time.

The tricky part is that sometimes, “Why don’t you just?” is the right question to ask, with the wrong wording. When you’re guiding people who are inexperienced at something, sometimes they do miss the obvious, easy solution. Or even experienced people miss it sometimes.

In those cases, how do you help them out without insulting their intelligence? Because “Why don’t you just?” is all about “me smart, you dumb”. The key is to put aside your ego and accept that this time, you may not get a chance to show that you are smart. First of all, check whether it is really the right time and place to give criticism! It might not be. If it is, then it’s a great rule of thumb to assume that they already tried the obvious thing you are about to suggest. That way, if they didn’t think of it themselves, they feel flattered and can save face. And if they did think of it or try it already, then you allow them to shine by explaining their thought process. Try something like “The way that I might have solved that problem is X. Would that also work in your situation?” Or “Stop me if you tried this already, but the way I’d have tackled it would be X.”

So, why don’t you just stop saying “why don’t you just?”


[1] Well, actually, I’d probably pick well, actually ↩

Taking Out the Garbage

From the title, you might think this post is about household chores. Instead, I’m happy to announce that we may have a path to solving GJS’s “Tardy Sweep Problem”.

For more information about the problem, read The Infamous GNOME Shell Memory Leak by Georges Stavracas. This is going to be a more technical post than my previous post on the topic, which was more about the social effects of writing blog posts about memory leaks. So first I’ll recap what the problem is.

Garbage, garbage, everywhere

Buzz Lightyear meme with the caption "Garbage, Garbage Everywhere"At the root of the GNOME desktop is an object-oriented technology called GObject. GObjects are reference counted, but not garbage collected. As long as their reference count is nonzero, they are “alive”, and when their reference count drops to zero, they are deleted from memory.

GObject reference counting

Graphical user interfaces (such as a large part of GNOME Shell) typically involve lots of GObjects which all increase each other’s reference count. A diagram for a simple GUI window made with GTK might look like this:

A typical GUI would involve many more objects than this, but this is just for illustrating the problem.

Here, each box is an object in the C program.

Note that these references are all non-directional, meaning that they aren’t really implemented as arrows. In reality it looks more like a list of numbers; Window (1); Box (1); etc. Each object “knows” that it has one reference, but it knows nothing about which other objects own those references. This will become important later.

When the app closes, it drops its reference to the window. The window’s reference count becomes zero, so it is erased. As part of that, it drops all the references it owns, so the reference count of the upper box becomes zero as well, and so on down the tree. Everything is erased and all the memory is reclaimed. This all happens immediately. So far, so good.

Javascript objects

To write the same GUI in a Javascript program, we want each GObject in the underlying C code to have a corresponding Javascript object so that we can interact with the GUI from our Javascript code.

Javascript objects are garbage collected, and the garbage collector in the SpiderMonkey JS engine is a “tracing” garbage collector, meaning that on every garbage collection pass it starts out with objects in a “root set” that it knows are not garbage. It “traces” each of those objects, asking it which other objects it refers to, and keeps tracing each new object until it hits a dead end. Any objects that weren’t traced are considered garbage, and are deleted. (For more information, the Wikipedia article is informative: https://en.wikipedia.org/wiki/Tracing_garbage_collection)

We need to integrate the JS objects and their garbage collection scheme with the GObjects and their reference counting scheme. That looks like this:

The associations between the Javascript objects (“JS”) and the GObjects are bidirectional. That means, the JS object owns a reference to the GObject, meaning the reference count of every GObject in this diagram is 2. The GObject also “roots” the JS object (marks it as unable to be garbage collected) because the JS object may have some state set on it (for example, by writing button._alreadyClicked = false; in JS) that should not be lost while the object is still alive.

The JS objects can also refer to each other. For example, see the rightmost arrow from the window’s JS object to the button’s JS object. The JS code that created this GUI probably contained something like win._button = button;. These references are directional, because the JS engine needs to know which objects refer to which other objects, in order to implement the garbage collector.

Speaking of the garbage collector! The JS objects, unlike the GObjects, are cleaned up by garbage collection. So as long as a JS object is not “rooted” and no other JS object refers to it, the garbage collector will clean it up. None of the JS objects in the above graph can be garbage collected, because they are all rooted by the GObjects.

Toggle references and tardy sweeps

Two objects (G and JS) keeping each other alive equals a reference cycle, you might think. That’s right; as I described it above, neither object could ever get deleted, so that’s a memory leak right there. We prevent this with a feature called toggle references: when a GObject’s reference count drops to 1 we assume that the owner of the one remaining reference is the JS object, and so the GObject drops its reference to the JS object (“toggles down“). The JS object is then eligible for garbage collection if no other JS object refers to it.

(If this doesn’t make much sense, don’t worry. Toggle references are among the most difficult to comprehend code in the GJS codebase. It took me about two years after I became the maintainer of GJS to fully understand them. I hope that writing about them will demystify them for others a bit.)

When we close the window of this GUI, here is approximately what happens. The app drops its references to the GObjects and JS objects that comprise the window. The window’s reference count drops to 1, so it toggles down, dropping one direction of the association between GObject and JS object.

Unlike the GObject-only case where everything was destroyed immediately, that’s all that can happen for now! Everything remains in place until the next garbage collection, because at the top of the object tree is the window’s JS object. It is eligible to be collected because it’s not rooted and no other JS object refers to it.

Normally the JS garbage collector can collect a whole tree of objects at once. That’s why the JS engine needs to have all the information about the directionality of the references.

However, it won’t do that for this tree. The JS garbage collector doesn’t know about the GObjects. So unfortunately, it takes several passes of the garbage collector to get everything. After one garbage collection only the window is gone, and the situation looks like this:

Now, the outermost box’s JS object has nothing referring to it, so it will be collected on the next pass of the garbage collector:

And then it takes one more pass for the last objects to be collected:

The objects were not leaked, as such, but it took four garbage collection passes to get all of them. The problem we previously had, that Georges blogged about, was that the garbage collector didn’t realize that this was happening. In normal use of a Javascript engine, there are no GObjects that behave differently, so trees of objects don’t deconstruct layer by layer like this. So, there might be hours or days in between garbage collector passes, making it seem like that memory was leaked. (And often, other trees would build up in the intervening time between passes.)

Avoiding toggle references

To mitigate the problem Georges implemented two optimizations. First, the “avoid toggle references” patch, which was actually written by Giovanni Campagna several years ago but never finished, made it so that objects don’t start out using the toggle reference system. Instead, only the JS objects hold references to the GObjects. The JS object can get garbage collected whenever nothing else refers to it, and it will drop its reference to the GObject.

A problem then occurs when that wasn’t the last reference to the GObject, i.e. it’s being kept alive by some C code somewhere, and the GObject resurfaces again in JS, for example by being returned by a C function. In this case we recreate the JS object, assuming that it will be identical to the one that was already garbage collected. The only case where that assumption doesn’t hold, is when the JS code sets some state on one of the JS objects. For example, you execute something like myButton._tag = 'foo';. If myButton gets deleted and recreated, it won’t have a _tag property. So in the case where any custom state is set on a JS object, we switch it over to the toggle reference system once again.

In theory this should help, because toggle references cause the tardy sweep problem, so if fewer objects use toggle references, there should be fewer objects collected tardily. However, this didn’t solve the problem, because especially in GNOME Shell, most JS objects have some state on them. And, sadly, it made the toggle reference code even more complicated than it already was.

The Big Hammer

The second optimization Georges implemented was the affectionately nicknamed “Big Hammer”. It checks if any GObjects toggled down during a garbage collector pass, and if so, restart the garbage collector a few seconds after. This made CPU performance worse, but would at least make sure that all unused objects were deleted from memory within a reasonable time frame (under a minute, rather than a day.)

Combined with some other memory optimizations, this made GNOME 3.30 quite a lot less memory hungry than its predecessors.

An afternoon at Mozilla

Earlier this year, I had been talking on IRC to Ted Campbell and Steve Fink on the SpiderMonkey team at Mozilla for a while, about various ins and outs of being an external (i.e. not Firefox) user of SpiderMonkey’s JS engine API. Early September I found myself in Toronto, where Ted Campbell is based out of, and I paid a visit to the Mozilla office one afternoon.

I had lunch with Ted and Kannan Vijayan of the SpiderMonkey team where we discussed the current status of external SpiderMonkey API users. Afterwards, we made the plans which eventually became this GitHub repository of examples and best practices for using the SpiderMonkey JS engine outside of Firefox. We have both documentation and code examples there, and more on the way. This is still in progress, but it should be the beginning of a good resource for embedding the JS engine, and the end of all those out-of-date pages on MDN!

I also learned some good practices that I can put to use in GJS. For example, we should avoid using JS::PersistentRooted except as a last resort, because it roots objects by putting them in a giant linked list, which is then traced during garbage collection. It’s often possible to store the objects more efficiently than that, and trace them from some other object, or the context.

Ending the tardy sweeps

In the second half of the afternoon we talked about some of the problems that I had with SpiderMonkey that were specific to GJS. Of course, the tardy sweep problem was one of them.

For advice on that, Ted introduced me to Nika Layzell, an engineer on the Gecko team. We looked at the XPCOM cycle collector and I was surprised to learn that Gecko uses a scheme similar to toggle references for some things. However, rather than GJS sticking with toggle references, she suggested a solution that had the advantage of being much simpler.

In “Avoiding toggle references” above, I mentioned that the only thing standing in the way of removing toggle references, is custom state on the JS objects. If there is custom state, the objects can’t be destroyed and recreated as needed. In Gecko, custom state properties on DOM objects are called “expandos” or “expando properties” and are troublesome in a similar way that they are in GJS’s toggle references.

Nika’s solution is to separate the JS object from the expandos, putting the expandos on a separate JS object which has a different lifetime from the JS object that represents the GObject in the JS code. We can then make the outer JS objects into JS Proxies so that when you get or set an expando property on the JS object, it delegates transparently to the expando object.

Kind of like this:

In the “before” diagram, there is a reference cycle which we have to solve with toggle references, and in the “after” diagram, there is no reference cycle, so everything can simply be taken care of by the garbage collector.

In cases where an object doesn’t have any expando properties set on it, we don’t even need to have an expando object at all. It can be created on demand, just like the JS object. It’s also important to note that the expando objects can never be accessed directly from JS code; the GObject is the sole conduit by which they can be accessed.

Recasting our GUI from the beginning of the post with a tree of GUI elements where the top-level window has an expando property pointing to the bottom-level button, and where the window was just closed, gives us this:

Most of these GObjects don’t even need to have expando objects, or JS objects!

At first glance this might seem to be garbage-collectable all at once, but we have to remember that GObjects aren’t integrated with the garbage collector, because they can’t be traced, they can only have their reference counts decremented. And the JS engine doesn’t allow you to make new garbage in the middle of a garbage collector sweep. So a naive implementation would have to collect this in two passes, leaving the window’s expando object and the button for the second pass:

This would require an extra garbage collector pass for every expando property that referred to another GObject via its JS object. Still a lot better than the previous situation, but it would be nice if we could collect the whole thing at once.

We can’t walk the whole tree of GObjects in the garbage collector’s marking phase; remember, GObject references are nondirectional, so there’s no generic way to ask a GObject which other GObjects it references. What we can do is partially integrate with the marking phase so that when a GObject has only one reference left, we make it so that the JS object traces the expando object directly, instead of the GObject rooting the expando object. Think of it as a “toggle reference lite”. This would solve the above case, but there are still some more corner cases that would require more than one garbage collection pass. I’m still thinking about how best to solve this.

What’s next

All together, this should make the horrible toggle reference code in GJS a lot simpler, and improve performance as well.

I started writing the code for this last weekend. If you would like to help, please get in touch with me. You can help by writing code, figuring out the corner cases, or testing the code by running GNOME Shell with the branch of GJS where this is being implemented. Follow along at issue #217.

Additionally, since I am in Toronto again, I’ll be visiting the Mozilla office again this week, and hopefully more good things will come out of that!

Acknowledgements

Thanks to Ted Campbell, Nika Layzell, and Kannan Vijayan of Mozilla for making me feel welcome at Mozilla Toronto, and taking some time out of their workday to talk to me; and thanks to my employer Endless for letting me take some time out of my workday to go there.

Thank you to Ted Campbell and Georges Stavracas for reading and commenting on a draft version of this post.

The diagrams in this post were made with svgbob, a nifty tool; hat tip to Federico Mena Quintero.

GUADEC 2018 Reminiscences

This year’s GUADEC in AlmerΓ­a, Spain, was over two months ago, and so here is a long overdue post about it. It was so long ago that I might as well call it a reminiscence! This will be a different kind of post than the ones I’ve done in past years, as plenty of other bloggers have already posted summaries about the talks.

Board of Directors

I didn’t even get to see that many talks anyway, as this was my first GUADEC after being elected to the GNOME Foundation board of directors and I found myself doing a lot of running around to complete things. The board has to prepare for a number of meetings including the GNOME Foundation’s Annual General Meeting that’s always held at GUADEC, and so there was plenty of preparation to be done.

So, except for a few sessions, I mainly followed the “hallway track” this year.

It’s an exciting time to be on the board; it’s been in the news recently that the GNOME Foundation has received two substantial donations, and is hiring some new roles. If you want more information and background, Rosanna Yuen, director of operations, explains all about it in this GUADEC talk.

Interns

Somehow I found myself mentoring two interns this summer, Avi Zajac and Evan Welsh, and both of them were able to attend GUADEC. (I co-mentored Evan with my coworker Manuel QuiΓ±ones, who unfortunately could not be there.) I had not done a good job introducing them to each other, but they connected with each other and realized that they were both working with me! If you haven’t already, make sure to read Avi’s blog post and Evan’s blog post for their perspectives on how it went. I was glad to have both of them there and really enjoyed meeting up in person.

Both internships have finished up in the meantime. Evan’s website is viewable here, as well as some improvements to DevDocs which I hope to deploy soon. Avi’s project was released as a technical preview in GNOME 3.30 and will be covered in my next blog post.

JavaScript Talk

I gave one scheduled talk, on GJS and JavaScript, and one unscheduled talk, on Endless Code.

I will cover the material from the JavaScript talk in my next post about the new features in GJS, but for now I wanted to post the slides for everyone’s reference. The video of the talk is here.

Endless Hack Talk

I was voted into one of the conference’s open talk slots with a proposal to talk about Endless Code (since then, renamed to Endless Hack). This is a new (well, it was new at the time of the conference) project at Endless. It’s a continuation of this feature which (I didn’t work on, but) my coworkers demoed about two years ago:

The Endless Hack product generalizes this idea to the whole desktop. The idea is that you should be able to tinker with everything, and there’s a narrative that guides you along the way and teaches you programming concepts. It’s aimed at children and young teenagers. Although this product hasn’t been released yet, and although some of the source code is currently open it’s not in a finished or usable state yet, I did want to talk about it at GUADEC because I think the ability to learn by tinkering is an important part of the free software experience and a direct consequence of one of the Four Software Freedoms, and it’s something the GNOME community should be aware of.

We also made a survey asking people about their experience learning how to program, or not learning how to program, and it’s still open, because I did not do a very good job in the talk of publicizing the link! You can fill it out here.

I haven’t dared to watch the video of me talking completely unrehearsed, but you can watch it here if you want.

Unconference

I had high hopes for organizing a GJS unconference session like last year, but after a certain point I was just completely tired out. We did eventually have a GJS session that consisted of people hacking on their favourite thing. Happily, I was able to convince Georges Stavracas to fix a regression that was preventing GNOME Shell from starting. I got a chance to work with Meg Ford on testing with JavaScript, and I also got some work done on the GJS debugger, a new feature in GNOME 3.30. I will talk about all this and more in my next post!

We also used some of the unconference time for a kickoff session for the GNOME Developer Center. Bastian IlsΓΈ is leading this initiative and has a lot of material for you to read on what’s happened in the meantime.

Acknowledgements

I’d like to thank the GNOME Foundation for making it possible for me to attend the conference and the board meetings.

Thank you to my coworker Lisette Silva for convincing me to submit the open talk and giving some last-minute feedback beforehand.

Indonesian recipes

In late February and early March I attended the GNOME Recipes hackfest in Yogyakarta, Indonesia. It was my second time visiting Indonesia, and food was a bit of a theme. The hackfest was about GNOME Recipes, so food, but also I love Indonesian food and I was eager to taste some more so I can improve how I cook it at home.

I haven’t contributed to GNOME Recipes. (Shamefully, not even a recipe yet!) So why was I going to a GNOME Recipes hackfest? It’s because on Endless OS we have a Cooking app, which in many ways is not as good as GNOME Recipes. It’s certainly less lovingly curated, and less community-oriented, than GNOME Recipes, and it allows recipe submissions by users while the Endless app is read-only.

However, there are a few things Endless’s Cooking app does better than GNOME Recipes: it is visually more appealing, it’s available in several languages (Arabic, Bengali, English, Portuguese, Spanish, and also a Spanish version customized for Guatemala), and it uses a better database backend (which also makes it fully offline.) It does these things using Endless’s “modular framework,” which if you want to know more about, I gave a talk two years ago at GUADEC. This modular framework is the product that I primarily work on at Endless, so a few of my team joined in the GNOME Recipes hackfest to see whether the two apps could share some technology.

It turns out that Matthias was eager to have somebody come along and make a database backend for GNOME Recipes, so the answer was yes, we could very well share some technology.

As an experiment, we made a recipes “lookalike” app using the modular framework technologies of which you can see some nice screenshots in Martin’s blog post.

We worked out some goals that we wanted to achieve by GUADEC in order to present our work, which you can see in the hackfest notes.

Outreach

There were also some goings on besides the hackfest. On the day before the hackfest started we did an outreach event for the students of AMIKOM University Yogyakarta, where the hackfest was held. We gave some talks on our work, and GNOME contributor and Endless Ambassador Siska closed the morning out with a very successful talk on how to get involved in GNOME.

View this post on Instagram

Universitas AMIKOM Yogyakarta menjadi tuan rumah Recipes Hackfest 2018 (28 Februari – 2 Maret 2018). Recipes Hackfest sendiri merupakan agenda yang diadakan oleh GNOME Recipes team and Endless, yang mempertemukan developer, kontributor, maupun komunitas GNOME untuk membahas berbagai hal, terutama dalam mengembangkan open source, terutama mengenai GNOME dan Endless OS. . . Agenda ini merupakan kerjasama antara Program Studi D3-Teknik informatika dan Amikom Business Park dengan Gnome dan Endless OS dan diadakan selama 3 hari di ruang Inkubator Universitas Amikom Yogyakarta. Dalam agenda ini, Tim pengembang bertemu dengan developer, kontributor, komunitas, maupun pengguna Endless OS, untuk mencari dan mengeksplorasi berbagai resep/cara yang tepat dalam mengembangkan aplikasi yang sesuai dengan kebutuhan pengguna Endless OS dan komunitas GNOME. . . Agenda Recipes Hackfest 2018 ini sendiri dibuka dengan kegiatan outreaching/workshop tentang GNOME dan Endless OS di Ruang Cinema Amikom (27/2). agenda ini diikuti oleh sekitar 100 orang Mahasiswa Universitas AMIKOM Yogyakarta dan para pegiat GNOME community. Terdapat 4 Materi yang disampaikan dalah agenda Outreaching tersebut, yaitu : . . 1. Introduction to GNOME From technology to users by Jonathan Blandford (GNOME Contributor) 2. Introduction to Endless by Cosimo Cecchi (Endless, organizer) 3. Getting involved in GNOME/GSoc/Outreach by Umang Jain (Core apps contributor, GNOME Contributor) 4. Introduction to Flatpak by Philip Chimento (Endless, engineer)

A post shared by Universitas AMIKOM Yogyakarta (@amikomjogja) on

After that I gave a live demo of how to make a GNOME app, the result of which you can find on GitHub here!

GNOME hackers and students seated around a table, watching a programming demo on a projector

This is me doing the live-coding demo of a GNOME app. Some of the students said I looked like Tony Stark.

 

Translation

One of the most interesting discussions we had was about how to internationalize GNOME Recipes. In different countries people cook very differently, so translating a recipe from one language into another is not enough. You also have to adapt the recipe to the ingredients that you can get in the country, and sometimes it’s not possible to get the same taste. For example, if I wanted to adapt my beloved pesto recipe from Marcella Hazan’s Classic Italian Cookbook, to Indonesia, first of all I’d probably have to substitute Thai basil which would change the taste entirely. Or to adapt Indonesian recipes to Canada, you have to go to some lengths to find ingredients like terasi (shrimp paste) and kemiri (candlenuts), and we just can’t get some of the same vegetables.

It can also be that when one language is used in two countries, the same recipe still won’t work for both. For example, in the UK, baking measurements are given by weight, and in Canada and the US they are given by volume. The metric system (ΒΊC, kg, ml) is used in the UK and the imperial system (ΒΊF, pounds, quarts, ounces, bushels, specks, caltrops, and jeroboams) in the US. To make matters worse, Canada uses the metric system for weight and volume measurements (kg, ml) but oven temperatures are given in Fahrenheit as in the US. All three countries cook with teaspoons and tablespoons, but teaspoons and tablespoons are metric in Canada and the UK (5 ml and 15 ml) but imperial in the US (4.93 ml and 14.79 ml).

We also discussed that many translation tools assume that the source language is always English since that’s the lingua franca of programmers, but it’s definitely not the lingua franca of cooking!

I would go so far as to say that all the existing translation infrastructure that we have for internationalizing GNOME is not going to be good enough to translate the recipes in GNOME Recipes.

Progress since then

In the time since the hackfest, I was able to make a little bit of progress on our goals. I worked on splitting out the code that handles data modelling into DModel, a separate library, so that GNOME Recipes could use it.

Food

I did get a chance to learn the flavors of Indonesian food more. When I lived in the Netherlands I already became familiar with some Indonesian food, but the Indonesian food in Indonesia is really much more delicious. In Vancouver we have only one Indonesian restaurant, which is kind of far away. And I found only one Indonesian store where I can buy ingredients like shrimp paste and candlenuts, which is even farther away.

Siska brought in packets of rendang spice paste for everyone to take home, for which I was especially grateful. Here’s a picture of my rendang that I made when I got back to Vancouver:

Beef rendang, still cooking down, next to a pot of rice cooking

Rendang and rice

I also tried to make the spice paste myself (because soon I will be out of the spice paste packets) but I haven’t got it figured out yet.

Some of the other dishes that I’ve made at home:

Yellow coconut curry in a bowl with kale and rice

Gulai curry (substituting kale for the cassava greens)
(The recipe is from Daily Cooking Quest which is a cooking blog from an Indonesian blogger who emigrated to the United States, and I’ve had good luck with those recipes because she uses ingredients that are possible for me to get in Vancouver, and she also gives the Indonesian names of the ingredients)

Fried noodles on a plate, with a fried egg and chili paste

Mi goreng

I am going to try making gudeg this week, which is a jackfruit curry, a specialty of Yogyakarta.

Acknowledgements

Sponsored by GNOME FoundationI’d like to thank AMIKOM University Yogyakarta for hosting the hackfest and giving us the opportunity to get some students interested in open source development, and the GNOME Foundation for sponsoring my travel and accommodations during the hackfest. Thanks also to Cosimo, Ekta, Elvin, Emmanuele, Haris, Jonathan, Kukuh, Martin, Matthias, Rama, Siska, and Umang, and also Kiki from Mozilla who joined on the last day, and Angky from Endless who helped arrange the hosting and logistics, for making the event a success!

More Memory, More Problems

In GJS we recently committed a patch that has been making waves. Thanks to GJS contributor Georges Basile “Feaneron” Stavracas Neto, some infamous memory problems with GNOME Shell 3.28 have been mitigated. (What’s the link between GNOME Shell and GJS? GNOME Shell uses GJS as its internal Javascript engine, in which some of the UI and all of the extensions are implemented.)

There is a technical explanation, having to do with toggle-refs, a GObject concept which we use to interface the JS engine’s garbage collector with GObject’s reference counting system. Georges has already provided a fantastic introduction to the technical details so I will not do another one here. This post will be more about social issues, future plans, and answers to some myths I’ve seen in various comments recently. To read this post, you only need to know that the problem has to do with toggle-refs and that toggle-refs are difficult to reason about.

Not a Memory Leak

I really don’t want to call this a memory leak, much less “the GNOME memory leak” that’s become common in the press coverage lately. I find that that sets the wrong expectations for users suffering from these memory problems. You might say that for the end user it makes no difference, their computer’s memory is being occupied by GNOME Shell, so what’s the point in not calling it a memory leak? And you would be partially right. The effect is no different. The expectations are different though, especially for users who have some technical knowledge. A memory leak is a simple problem to fix. When you have one, you run your software under Valgrind or ASAN, you get a backtrace that shows where the memory was allocated that you didn’t free, and you free it. Problem solved. You can even run Valgrind in your automatic tests to prevent new leaks. That’s not the case here, and if we refer to it as a memory leak then it can only cause frustration on the part of users who are aware how simple it is to fix a memory leak.

This problem is different. It’s not a leak in the traditional sense. The memory does eventually get freed, but GNOME Shell holds onto it for too long; long enough to cause problems on some systems. As GJS contributor Andy Holmes put it, it’s a “tardy GC sweep.” I think that has a catchy ring to it, so I’ll call it the “tardy sweep problem” from now on.

Scruffy the janitor from Futurama, relaxing in a chair, captioned

Meme by Andy Holmes, used with kind permission

To be honest, I found that the OMG!Ubuntu article about “the memory leak” attracted a lot of comments that don’t sit well with me, and I think that the wrong expectations set by calling it a “memory leak” are partly to blame. With this post, I hope to give a better idea of what GNOME users can expect.

On the bright side, due to the recent publicity and especially the OMG!Ubuntu article, more GNOME developers are talking about the memory problems and suggesting things, which is causing an exciting confluence of ideas that I couldn’t have come up with on my own.

Edit: I want to be absolutely clear that with the above I’m not blaming bug reporters for not knowing whether something is a “proper” memory leak or not. This is intended to bring some attention to the wrong expectations that arise, especially among technically savvy users, when GNOME developers and the tech press use the term “memory leak,” and illustrate why we ourselves should not use the term here.

 

The Big Hammer

Hammer smashing a peanut

CC0 licensed image, by stevepb

I’ve been calling this patch the “Big Hammer” because it’s a drastic measure: starting a whole new garbage collection cycle in order to clean up some objects that we already know should be cleaned up.

The tardy sweep problem has now been mitigated with the Big Hammer, but reducing GNOME Shell’s memory usage has been a battle for years, and it has very little to do with memory leaks.

There are many other causes of high memory usage in GNOME Shell. Some are real memory leaks, that generally get fixed before too long. GNOME Shell developers have had their suspicions about NVidia drivers for years. Another cause is JS memory leaks in GNOME Shell. (Contrary to popular belief, it is possible to leak memory in pure Javascript code. Andy’s new heapgraph tool is useful when tracking these down, but throughout most of the life of GNOME Shell this tool didn’t exist.) There’s also memory fragmentation, which can look like a memory leak in a resource monitor.1 In addition, when diagnosing reports from users, configurations vary wildly. Memory usage simply differs from system to system. Finally, people have different configurations of Shell extensions, some of which leak memory as well.

The Sad Lifecycle of a GNOME Shell Memory Leak Bug Report

  1. User reports “I have a memory leak”
  2. Developer runs Valgrind, sifts through Valgrind trace, finds a small leak and fixes it
  3. Problem isn’t fixed
  4. Repeat 2 and 3 until no leaks shown in the Valgrind trace
  5. Problem still isn’t fixed
  6. User eagerly awaits each point release hoping for relief, and is disappointed each time
  7. Bug report gains popularity, accretes followers like a katamari, some of whom vent about unrelated bugs, hound the developer, or become abusive, until the original point of the bug report is lost
  8. Developer can’t do anything productive with the bug report at this point. They know there’s still a memory problem and it’s not a traditional leak, but the bug report is not helping them find it
  9. Users don’t accept that answer
  10. Developer closes the bug report and makes users even angrier. (Or, developer ignores the bug report until it fizzles out, and makes users sad.)

Here are some examples of long-running bug reports where you can see this dynamic in action. It’s quite sad to observe, because everybody involved is doing what makes perfect sense from their perspective (except for a few people behaving badly), yet the result is a mess.

I hope this illustrates why it’s important to assume that people are acting in good faith.

I know some people will argue that the developer mustn’t close the bug report until the bug is “fixed”, meaning that there is no more unnecessary memory usage. But in my opinion that’s just not a useful way to think of bug reports. GNOME Shell developers know (and so do I, from the GJS side) that GNOME Shell uses a lot of memory. I agree it’s nice to keep a bug report open so that users know that we’re aware of it and it’s on our to-do lists somewhere, but very soon the time we spend dealing with the noise on the bug report eclipses whatever benefit it might bring to the community.

I hope GitLab will improve things a bit here, since if you feel strongly about an issue in the bugtracker, you can upvote it or add an emoji reaction to it. This is a good way for users to show that an issue is important to them, and if enough people use it then it’s a good indicator for me to see which issues are prioritized highest by users and contributors.

5 Myths About GNOME Shell’s Memory Problems (Paraphrased)

“GNOME developers never cared about the tardy sweep problem until OMG!Ubuntu reported on it. They don’t care about users until someone makes them.”

Carlos Garnacho, of GNOME Shell fame, pointed out to me that in a more global perspective there has been a very active hunt for actual memory leaks across many of GNOME Shell’s dependencies for quite some time, and he has personally patched leaks in IBus, AccountsService, libgweather, gnome-desktop, and more.

It seems the tardy sweep problem has gotten worse in recent versions of GNOME, although it’s hard to measure between different systems with different configurations. I don’t know why it’s gotten worse.2

It seems to have been known for a long time, though: for history buffs, it was alluded to in a comment in commit ae34ec49, back in 2011. That knowledge was apparently lost when GJS was without a maintainer for a couple of years. To be honest, it has taken me well over a year to get familiar enough with the toggle-ref code (which integrates the JS garbage collector with GObject’s refcounting system), that I feel even remotely comfortable making or reviewing changes to it. I only fully realized the implications of the tardy sweep problem after talking to Georges and seeing his memory graph.

It seems that a previous GJS maintainer, Giovanni Campagna, was trying to mitigate the tardy sweep problem already five years ago, with a patch that allowed objects to escape the tardy sweep by opting out of the whole toggle-ref system in some cases. Unfortunately, as far as I can tell from my bug tracker archaeology, his patch went through a few reviews and the answer was always “Wow, this is really complicated, I need to study it some more in order to understand it.” Then it fell by the wayside when he stepped down from GJS maintainership.

I picked the patch up again late last year and fixed up most of the bit-rot. It still had a few problems with it. I never found the time to fix it up completely, which Georges kindly took over for me. I initially preferred Giovanni’s patch above the Big Hammer, but unfortunately for me, Georges proved that it wasn’t as effective as we thought it would be, only clearing up about 5% of the tardy sweep memory.3

“GNOME fixed the problem with a shoddy solution. This will decrease performance but they won’t notice because they all have top-of-the-line machines.”

I don’t call it the Big Hammer for nothing. We were concerned about performance regressions too, so that’s reasonable. However, as you can read in the bug tracker, we did actually do some testing on lower-end hardware before merging the Big Hammer, and it was not as bad as I expected. Carlos has been doing some measurements and found that garbage collection accounts for about 2–3% of the time that GNOME Shell occupies the CPU.

However, it’s exactly because we want to be cautious that the Big Hammer has only been committed to master, which will first be released in the unstable GNOME 3.29.2 snapshot. I don’t plan to release it on a stable branch until we’ve run it some more.

Ubuntu has already put the Big Hammer in their LTS version. That’s more of a risk than I would have recommended, but it’s not my decision to make, and I am grateful that we will be getting some testing through that avenue. Endless is also considering putting the Big Hammer in their stable version.

(And alas, I don’t have a top-of-the-line machine. Feel free to donate me one if that’s what’s required to make me conform to some stereotype of GNOME developers. πŸ˜‡)

“The problem should be fixed now. GNOME Shell will run smooth from now on.”

No, it’s not. GNOME Shell still isn’t that great with memory.

Carlos is working on some merge requests which are approaching being ready to merge, which should make things a bit more memory-efficient. He’s also had some success with experiments trying to reduce memory fragmentation, taking better advantage of SpiderMonkey’s compacting garbage collector.

We are also bouncing around some ideas for making the Big Hammer into a smaller hammer. In particular, we’re trying to see if the extra garbage collections can be restricted to only the JS objects that represent GObjects, since those are the only objects that are affected by the tardy sweep problem. We’re also trying to see if there’s a way to return black-marked (reachable) objects to their original white-marked (eligible for collection) state when a GObject is toggled down in the middle of a garbage collection.

Another approach to investigate is to make better use of incremental garbage collection. SpiderMonkey offers this facility but we don’t use it yet. The idea is, instead of pausing and doing a big garbage collection, we do a slice of a few milliseconds whenever we have time. I don’t know yet whether this will have a large or small effect, or even render the Big Hammer unnecessary.

We’re also going to update to SpiderMonkey 60 in GNOME 3.30 which will hopefully bring in another year’s worth of Mozilla’s garbage collector research and optimization.

Finally, I’m gradually working on another unfinished merge request left over from Giovanni’s tenure as GJS maintainer, that should drastically increase the performance of GNOME Shell’s animations (though not necessarily help with memory.)

“GNOME has no business releasing any new versions until this problem is fixed.”

GNOME has a fixed release schedule, so they release new versions on the release dates, with whatever is aboard the train at that time. That’s not going to change.

“This version of GNOME is going into Ubuntu LTS! GNOME needs to work harder to fix this.”

Of course, I want whatever version of GNOME ships with any Linux distribution to be as good as possible. But as the upstream GJS maintainer, I have no say over what a downstream Linux distribution chooses to ship. The best way for a Linux distro to make sure their release is shipshape, is to contribute resources towards fixing whatever they consider a blocker.

That sounds a bit callous, as if I refuse to fix any bugs that Ubuntu wants fixed; that’s not what I mean at all. But my free time is limited. I’m paid for a part of my GJS maintainer work, but only for specific features. I can’t work to anyone’s external deadlines in my free time, because otherwise I’ll burn out and that’s not good for anyone with any interest in GJS either. Sometimes I have other priorities besides sitting at the computer; sometimes I do have time but no ideas about a particular problem; sometimes my brain isn’t up to fixing a difficult memory problem and I choose to work on something easier.4 Bugfixing work isn’t fungible.

I picked Ubuntu to illustrate this example, because contributing is exactly what the Ubuntu team has done; Ubuntu contributors fixed stability bugs in GJS, as well as GNOME Shell and Mutter, for GNOME 3.28. To say nothing of contributors from other downstreams, as well. That’s great and I’m looking forward to more of it! Some commenters seem to see downstreams fixing bugs as something that GNOME developers should be ashamed of, but I believe everyone is better off for it when that happens!

Acknowledgements

Thanks to Carlos Garnacho and Andy Holmes, who commented on a draft version of this blog post. Thanks in addition to Andy who coined the term “tardy sweep” and provided Scruffy as the mascot; Heartbleed has branding, why shouldn’t we? And of course, thanks to Georges who kicked off the whole research in the first place!


[1] and has often made people angry in bug reports when told it’s not a memory leak ↩

[2] I have a hunch, though. When I updated SpiderMonkey to version 38 in GNOME 3.24, we went from a conservative collector to an exact-rooted, moving one β€” see this Wikipedia article for definitions of those terms. It may be that the old garbage collector, though generally considered inferior, did actually mitigate the tardy sweeps a little, because I think back then it would have been possible for more objects to make their way into an ongoing sweep. It’s also possible that it was made worse earlier than that, by some adjustments in GNOME Shell that adjusted how often the garbage collector was called. ↩

[3] Technical explanation: Tweener, which is the animation framework used by GNOME Shell, renders many objects ineligible to opt out of the toggle-ref system. I would like to see Tweener replaced with Clutter implicit animations in GNOME Shell, which would make Giovanni’s patch much more effective, but that’s a big project. ↩

[4] Like writing a blog post about a difficult memory problem. Joke’s on me, that’s actually really hard ↩

Weekend Website Experiment

As you may know if you read this blog via Planet GNOME, the GNOME project is busy switching to GitLab for its code hosting and bug tracking. I like GitLab! It’s a large step up from Bugzilla, which was what GNOME used for the last 20 years. Compared to GitHub, GitLab is about equal, with a few nicer things and a few less nice things.

The one thing that I miss from Bugzilla is a dashboard showing the overall status of the bugs for your project. I thought it would not be too hard to use the GitLab API to do some simple queries and plop them on a web page. So, last weekend I gave it a try. The final result is here. Click the button to log into GitLab, and you’ll be redirected back to the page where you’ll get the results of the queries.

I’d like to write up what I did because I learned a new thing, and I think more writeups illustrating the trial and error of learning a new thing are always good.

My goals were to write something without being too meticulous, and write something that was not intended to scale. (Both are things that I normally disapprove of. It’s good to try the other side once in a while.) So, I was just going to mix the HTML, CSS, and JS all in one file. I used the base CSS and overall page structure from my personal website.

I decided to use GitHub Pages for hosting. My personal website is already hosted there. GitLab also offers GitLab Pages, which is very similar, but GNOME hasn’t enabled it yet on their GitLab instance. So, I created a fork of GNOME/gjs on GitHub, and created a gh-pages branch. Whatever you commit to that branch shows up as your project’s GitHub Pages site on the web.

The first thing I figured out is that you have to be authenticated to query issues in the API, even if the same information is publicly available on the web. That’s too bad! My little project suddenly went from “easy” to “figuring out something I’ve never done before.” But that’s also exciting.

First, I got the queries running on a local webpage, using a temporary personal GitLab access token. Each item looked kind of like this:

Number of crashers: 16
Number of bugs: 25
etc.

Next, I decided to tackle the authentication problem. I did some searches on variations of “gitlab authentication plugin”, “add gitlab authentication to webpage”, etc., to see if there was something ready-made I could drop in. No such luck. I did find NodeJS modules that I could have used, had I been writing the site using NodeJS. I weighed the unknown cost of implementing the authentication in plain old browser JS against the unknown cost of setting up tools that I was unfamiliar with in order to use the ready-made module (I wasn’t even sure what tools I would have to use β€” Webpack, I think?) and decided to keep looking.

I next searched for things like “gitlab oauth2 in browser”, “gitlab oauth2 example”, since I knew that the login used the OAuth2 protocol. Eventually I landed on this page and figured out that the magic words I wanted were “implicit flow” or “implicit grant”:

Implicit Flow – This flow is designed for user-agent only apps (e.g. single page web application running on GitLab Pages).

That sounded like exactly my use case, so I read further. You have to send the user to a certain page on the GitLab site, and send a redirect URL which the user will be sent back to with the authorization token in the URL hash. I managed to keep everything on the same webpage. In pseudocode, the flow looks like this:

if we have the authorization token:
    fetch the numbers with the API calls
else if there is a hash in the URL:
    token = get the token from the hash
    store the token
    fetch the numbers with the API calls
else:
    show a button that links to the authorization URL on GitLab

For storing the token so that you don’t have to log in every time, I used localStorage. I have no idea if that’s good practice or not, but from what I could read online it seems that it’s at least not bad practice. It’s quite easy to retrieve the token, but only if you have access to the browser where it is stored. I don’t think localStorage can leak out over the web, but with the recently discovered vulnerabilities who knows…

Last, I made it look nicer. I had a pretty good idea of what I wanted it to look like: I wanted the numbers to be large, in colored boxes with rounded corners and thick borders. I tried a few things with floating <div>s before giving up and using a CSS flex layout. This makes the page probably unviewable on older browsers, but I was seriously done with CSS positioning.

The code is here, or just “View Source” while you’re on the page.

What I would do differently

Writing HTML, CSS, and JS directly for the web is tedious and repetitive. I wish my younger self had used some sort of framework like Bootstrap to make my personal site. Failing that, I wish I had decided not to reuse components from my personal site to do this, and instead started with a fresh site using a framework. Bootstrap or Semantic UI are two that I know of, and maybe should have tried out. The code ended up being 263 lines of HTML, CSS, and JS, much of it just repeated items in order to do the boxes in different colors.

Reuse of this code

You may notice I did not release this code under an open source license. That’s because it’s probably full of bad practices, so I don’t want people to copy it. If you can convince me that it’s done right or tell me what I did wrong, then I’ll (fix it and) open-source it, and other GitLab maintainers might find it useful.

Geek tip: g_object_new and constructors

tl;dr Don’t put any code in your foo_label_new() function other than g_object_new(), and watch out with Vala.

From this GJS bug report I realized there’s a trap that GObject library writers can fall into,

Avoid code at your construction site.

Avoid code at your construction site.

that I don’t think is documented anywhere. So I’m writing a blog post about it. I hope readers from Planet GNOME can help figure out where it needs to be documented.

For an object (let’s call it FooLabel) that’s part of the public API of a library (let’s call it libfoo), creating the object via its foo_label_new() constructor function should be equivalent to creating it via g_object_new().

If foo_label_new() takes no arguments then it should literally be only this:

FooLabel *
foo_label_new(void)
{
    return g_object_new(FOO_TYPE_LABEL, NULL);
}

If it does take arguments, then they should correspond to construct properties, and they should get set in the g_object_new() call. (It’s customary to at least put all construct-only properties as arguments to the constructor function.) For example:

FooLabel *
foo_label_new(const char *text)
{
    return g_object_new(FOO_TYPE_LABEL,
        "text", text,
        NULL);
}

Do not put any other code in foo_label_new(). That is, don’t do this:

FooLabel *
foo_label_new(void)
{
    FooLabel *retval = g_object_new(FOO_TYPE_LABEL, NULL);
    retval->priv->some_variable = 5;  /* Don't do this! */
    return retval;
}

The reason for that is because callers of your library will expect to be able to create FooLabels using g_object_new() in many situations. This is done when creating a FooLabel in JS and Python, but also when creating one from a Glade file, and also in plain old C when you need to set construct properties. In all those situations, the private field some_variable will not get initialized to 5!

Instead, put the code in foo_label_init(). That way, it will be executed regardless of how the object is constructed. And if you need to write code in the constructor that depends on construct properties that have been set, use the constructed virtual function. There’s a code example here.

If you want more details about what function is called when, Allison Lortie has a really useful blog post.

This trap can be easy to fall into in Vala. Using a construct block is the right way to do it:

namespace Foo {
public class Label : GLib.Object {
    private int some_variable;

    construct {
        some_variable = 5;
    }
}
}

This is the wrong way to do it:

namespace Foo {
public class Label : GLib.Object {
    private int some_variable;

    public Label() {
        some_variable = 5;  // Don't do this!
    }
}
}

This is tricky because the wrong way seems like the most obvious way to me!

This has been a public service announcement for the GNOME community, but here’s where you come in! Please help figure out where this should be documented, and whether it’s possible to enforce it through automated tools.

For example, the Writing Bindable APIs page seems like a good place to warn about it, and I’ve already added it there. But this should probably go into Vala documentation in the appropriate place. I have no idea if this is a problem with Rust’s gobject_gen! macro, but if it is then it should be documented as well.

Documented pitfalls are better than undocumented pitfalls, but removing the pitfall altogether is better. Is there a way we can check this automatically?