Advent of Rust 18 and 19: Parsers and New Math

It’s another two days of the ongoing chronicle of me trying to teach myself the Rust programming language by doing programming puzzles from Advent of Code 2020. I’m well over halfway, in fact today’s post includes the three-quarters mark.

I mentioned it last time, but I’ve found this series from an experienced Rust programmer also doing these puzzles in Rust to be very enlightening (hat tip, Federico Mena Quintero.)

Day 18, Part 1

Today we have to solve arithmetic homework problems, from a bizarro parallel universe where there is no precedence of operators. We get an input file with homework problems that look like 1 + 2 * 3 + 4 * 5 + 6 to which the answer is not 33 as you might expect, but 71 because in the absence of parentheses you just do all the operations from left to right.

I have in the past had to build a parser for (normal) arithmetic expressions, so in theory I should know how to do this. However, reading the Faster than Lime posts, I have seen that the author is often using a Rust parser generator package called PEG for the problems where I used scan_fmt!(), and it looks quite easy to use. So I decide to use PEG as well to generate this parser.

Luckily there is literally the exact thing I need, a parser for arithmetic expressions with precedence of operators, in one of the examples in the PEG documentation! I just need to tweak it so that the precedence of addition and multiplication is equal:

extern crate peg;

peg::parser! {
    grammar bizarro_arithmetic() for str {
        rule number() -> u64 = n:$(['0'..='9']) { n.parse().unwrap() }
        pub rule expr() -> u64 = precedence!{
            x:(@) " + " y:@ { x + y }
            x:(@) " * " y:@ { x * y }
            --
            n:number() { n }
            "(" e:expr() ")" { e }
        }
    }
}

It’s slightly weird, and trips me up for a little while, that in order to accommodate the spaces in the strings I have to explicitly use the token " + ", and can’t just tell it to tokenize on whitespace. But eventually I figure it out.

Another thing that Faster than Lime has been doing is writing inline tests! This is pretty cool, and I decide to do it here with the examples in the puzzle description:

#[test]
fn part1_examples() {
    assert_eq!(bizarro_arithmetic::expr("1 + 2 * 3 + 4 * 5 + 6"), Ok(71));
    assert_eq!(bizarro_arithmetic::expr("1 + (2 * 3) + (4 * (5 + 6))"), Ok(51));
    assert_eq!(bizarro_arithmetic::expr("2 * 3 + (4 * 5)"), Ok(26));
    assert_eq!(bizarro_arithmetic::expr("5 + (8 * 3 + 9 + 3 * 4 * 3)"), Ok(437));
    assert_eq!(bizarro_arithmetic::expr("5 * 9 * (7 * 3 * 3 + 9 * 3 + (8 + 6 * 4))"), Ok(12240));
    assert_eq!(bizarro_arithmetic::expr("((2 + 4 * 9) * (6 + 9 * 8 + 6) + 6) + 2 + 4 * 2"), Ok(13632));
}

Running cargo test will execute these tests, and that helps me fix the bug that I mentioned above where I had "+" instead of " + ".

Once that is working, I’m pleasantly surprised that I’ve gotten to the point of confidence that the main() function practically writes itself:

fn main() {
    let input = include_str!("input");
    let answer: u64 = input
        .lines()
        .map(bizarro_arithmetic::expr)
        .map(|n| n.unwrap())
        .sum();
    println!("{}", answer);
}

Thanks to PEG, this has been one of the easiest puzzles so far, unless you consider using a parser generator “cheating” (which I most certainly don’t.)

Day 18, Part 2

Part 2 of the puzzle is just the same as Part 1, only with addition given a higher precedence than multiplication. According to the PEG documentation this can be achieved just by swapping the order and adding a -- to separate the two. I add a rule to the parser generator:

pub rule expr2() -> u64 = precedence!{
    x:(@) " * " y:@ { x * y }
    --
    x:(@) " + " y:@ { x + y }
    --
    n:number() { n }
    "(" e:expr2() ")" { e }
}

I also write inline tests for this one using the examples in the puzzle description. This inline testing thing is brilliant!

Then I change the map() call in main() to use the correct parser depending on whether we are running Part 1 or Part 2:

.map(if is_part2() {
    bizarro_arithmetic::expr2
} else {
    bizarro_arithmetic::expr
})

On to the next puzzle!

Day 19, Part 1

Today’s puzzle is validating a bunch of messages against rules given in a grammar-like form. The grammar has a certain syntax, which makes me think I can use what I learned yesterday about PEG to parse it. But instead of using a PEG parser that gives a number as the parsed result, we have to build a parser that gives… another, dynamically-generated, parser.

This sounds really daunting, and I almost say “oh no!” out loud when I read the puzzle description.

I take a long break and by the time I come back to it, it has occurred to me that maybe instead of another parser, the PEG parser can give a regular expression that we can use to match the messages? Ultimately, a regular expression is really a sort of dynamically-generated parser. I feel good about this idea.

After thinking about it some more after that, I realize I might have to have one intermediate step, because rules can reference other rules that haven’t been defined yet. So I will have to read them all in, parse them into an intermediate form such as an enum, and then generate a regular expression against which I match all the messages.

The rules are in a form like this:

0: 1 2
1: "a"
2: 1 3 | 3 1
3: "b"

This means that rule 0 consists of rule 1 followed by rule 2, rule 1 is the 'a' character, rule 3 is the 'b' character, and rule 2 consists of either 1 followed by 3, or 3 followed by 1, in other words "ab" or "ba". The rules are guaranteed to have no loops. Ideally this would yield the regular expression ^a(ab|ba)$ which I could then match against the list of messages.

I write a Rule enum and a RuleSet type, during which I learn that enum variants may not reference other enum variants. The compiler suggests using Box, so that’s what I do:

#[derive(Debug)]
enum Rule {
    Literal(char),
    Ref(usize),
    Seq(Box<Rule>, Box<Rule>),
    Choice(Box<Rule>, Box<Rule>),
}

type RuleSet = HashMap<usize, Rule>;

Using what I learned from yesterday’s puzzle, the parser follows:

peg::parser! {
    grammar rule_set() for str {
        rule index() -> usize = n:$(['0'..='9']+) ":" { n.parse().unwrap() }
        rule number() -> Rule = n:$(['0'..='9']+) { Rule::Ref(n.parse().unwrap()) }
        rule literal() -> Rule = "\"" c:$(['a'..='z' | 'A'..='Z']) "\"" {
            Rule::Literal(c.chars().next().unwrap())
        }
        rule seq() -> Rule = a:number() " " b:number() { Rule::Seq(Box::new(a), Box::new(b)) }
        rule choice_side() -> Rule = seq() / number()
        rule choice() -> Rule = a:choice_side() " | " b:choice_side() {
            Rule::Choice(Box::new(a), Box::new(b))
        }
        pub rule parse_line() -> (usize, Rule) =
            ix:index() " " expr:(choice() / seq() / number() / literal()) { (ix, expr) }
    }
}

This doesn’t compile:

error[E0446]: private type `Rule` in public interface
  --> puzzle19.rs:15:2
   |
6  |   enum Rule {
   |   - `Rule` declared as private
...
15 |   peg::parser! {
   |  __^
16 | |     grammar rule_set() for str {
17 | |         rule index() -> usize = n:$(['0'..='9']+) ":" { n.parse().unwrap() }
18 | |         rule number() -> Rule = n:$(['0'..='9']+) { Rule::Ref(n.parse().unwrap()) }
...  |
24 | |     }
25 | | }
   | |_^ can't leak private type
   |
   = note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

I’m not sure why this is happening, because enum Rule is at the top level and so is the PEG parser, so they ought to just have access to each other? In any case, I change it to pub enum Rule and it compiles

I then write a test for this parser:

#[test]
fn example1() {
    let mut rule_set = RuleSet::new();
    let example1 = ["0: 1 2", "1: \"a\"", "2: 1 3 | 3 1", "3: \"b\""];
    for line in example1.iter() {
        let (ix, rule) = rules_grammar::parse_line(line).unwrap();
        rule_set.insert(ix, rule);
    }

    use Rule::*;
    assert_eq!(rule_set.get(0), Some(Seq(Ref(1), Ref(2))));
    assert_eq!(rule_set.get(1), Some(Literal('a')));
    assert_eq!(
        rule_set.get(2),
        Some(Choice(Seq(Ref(1), Ref(3)), Seq(Ref(3), Ref(1))))
    );
    assert_eq!(rule_set.get(3), Some(Literal('b')));
}

It’s very wrong according to the compiler!

error[E0308]: mismatched types
  --> puzzle19.rs:46:29
   |
46 |     assert_eq!(rule_set.get(0), Some(Seq(Ref(1), Ref(2))));
   |                             ^
   |                             |
   |                             expected `&usize`, found integer
   |                             help: consider borrowing here: `&0`

error[E0308]: mismatched types
  --> puzzle19.rs:46:42
   |
46 |     assert_eq!(rule_set.get(0), Some(Seq(Ref(1), Ref(2))));
   |                                          ^^^^^^
   |                                          |
   |                                          expected struct `Box`, found enum `Rule`
   |                                          help: store this in the heap by calling `Box::new`: `Box::new(Ref(1))`
   |
   = note: expected struct `Box<Rule>`
                found enum `Rule`
   = note: for more on the distinction between the stack and the heap, read https://doc.rust-lang.org/book/ch15-01-box.html, https://doc.rust-lang.org/rust-by-example/std/box.html, and https://doc.rust-lang.org/std/boxed/index.html

error[E0308]: mismatched types
  --> puzzle19.rs:46:50
   |
46 |     assert_eq!(rule_set.get(0), Some(Seq(Ref(1), Ref(2))));
   |                                                  ^^^^^^
   |                                                  |
   |                                                  expected struct `Box`, found enum `Rule`
   |                                                  help: store this in the heap by calling `Box::new`: `Box::new(Ref(2))`
   |
   = note: expected struct `Box<Rule>`
                found enum `Rule`
   = note: for more on the distinction between the stack and the heap, read https://doc.rust-lang.org/book/ch15-01-box.html, https://doc.rust-lang.org/rust-by-example/std/box.html, and https://doc.rust-lang.org/std/boxed/index.html

error[E0369]: binary operation `==` cannot be applied to type `Option<&Rule>`
  --> puzzle19.rs:46:5
   |
46 |     assert_eq!(rule_set.get(0), Some(Seq(Ref(1), Ref(2))));
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |     |
   |     Option<&Rule>
   |     Option<Rule>
   |
   = note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

First of all, I’m not sure why the integer has to be borrowed to index the hash map! The second and third errors I do understand, because the rules need a Box::new() around them, though I’m not sure how that will work with asserting equality! Will it compare the addresses of the Boxes? And I don’t understand the fourth error, can’t assert_eq!() just check that the rules contain all the same variants?

Even if I do this to reference the right side:

assert_eq!(rule_set.get(&0), Some(&Seq(Box::new(Ref(1)), Box::new(Ref(2)))));

I still get an error such as this:

error[E0369]: binary operation `==` cannot be applied to type `Rule`
  --> puzzle19.rs:51:5
   |
51 |       assert_eq!(
   |  _____^
   | |_____|
   | |
52 | |         *rule_set.get(&0).unwrap(),
53 | |         Seq(Box::new(Ref(1)), Box::new(Ref(2)))
54 | |     );
   | |      ^
   | |______|
   | |______Rule
   |        Rule
   |
   = note: an implementation of `std::cmp::PartialEq` might be missing for `Rule`

(This hint doesn’t show up in the editor inline messages, but it does on the command line. Strange.) I guess PartialEq is exactly what’s missing, because adding it makes the tests pass, so apparently my worry about Box comparing by address was unfounded.

I also write the second example from the puzzle description as a test, but the first line 0: 4 1 5 already poses a problem, since I’ve written Seq so that it takes only a number, not another Seq. I have to rewrite the Rule::Seq member to be defined as Seq(Vec(Rule))1 and then rewrite the grammar to correspond to that. PEG’s ** operator with an optional minimum number of repetitions is very handy here:

rule seq() -> Rule = refs:(number() **<2,> " ") { Rule::Seq(refs) }

While debugging I also notice that order is important in /, which I’d originally gotten wrong.

I then write a trio of functions that convert rule 0 of a rule set into regex code for matching the input:

fn rule_to_regex(rule_set: &RuleSet, rule: &Rule) -> String {
    use Rule::*;
    match rule {
        Literal(c) => c.to_string(),
        Ref(ref_ix) => rule_index_to_regex(rule_set, *ref_ix),
        Seq(refs) => refs
            .iter()
            .map(|r| rule_to_regex(rule_set, r))
            .collect::<Vec<String>>()
            .join(""),
        Choice(l, r) => format!(
            "({}|{})",
            rule_to_regex(rule_set, &*l),
            rule_to_regex(rule_set, &*r)
        ),
    }
}

fn rule_index_to_regex(rule_set: &RuleSet, ix: usize) -> String {
    let rule = rule_set.get(&ix).unwrap();
    rule_to_regex(rule_set, rule)
}

fn rule_set_to_regex(rule_set: &RuleSet) -> String {
    format!("^{}$", rule_index_to_regex(rule_set, 0))
}

I add code to my tests such as this, in order to debug it:

let regex = rule_set_to_regex(&rule_set);
assert_eq!(regex, "^a(ab|ba)$");

Once I’ve got it working then I also test the second example against the example messages given in the puzzle description, with code like this:

let matcher = Regex::new(®ex).unwrap();
assert!(matcher.is_match("ababbb"));
assert!(!matcher.is_match("bababa"));
assert!(matcher.is_match("abbbab"));
assert!(!matcher.is_match("aaabbb"));
assert!(!matcher.is_match("aaaabbb"));

I’ve already done most of the work for the main function at this point, I just need to copy it from the tests:

fn main() {
    let input = include_str!("input");
    let mut blocks = input.split("\n\n");

    let rules_block = blocks.next().unwrap();
    let mut rule_set = RuleSet::new();
    for line in rules_block.lines() {
        let (ix, rule) = rules_grammar::parse_line(line).unwrap();
        rule_set.insert(ix, rule);
    }
    let matcher = Regex::new(&rule_set_to_regex(&rule_set)).unwrap();

    let messages_block = blocks.next().unwrap();
    let matches = messages_block
        .lines()
        .map(|line| matcher.is_match(line))
        .count();

    println!("{}", matches);
}

I get an answer and it’s too high. Oh, no! I am probably generating incorrect regex code.

Before I start digging into that, I decide to do a quick reality check, and print out the number of rules in the RuleSet, as well as the number of total messages, to make sure I have gotten all of the ones in the file. The number of rules is correct, but the number of messages is equal to the answer I just got! And then I see it … I used map() instead of filter() to count the number of valid messages. Now I get the correct answer.

Day 19, Part 2

Part 2 of the puzzle asks you to change rule 8 from 8: 42 to 8: 42 | 42 8, and rule 11 from 11: 42 31 | 42 11 31, and then count the number of valid messages again. The puzzle description says that this means the rules do now contain loops, but they give you a very big hint to hard-code this and not make your code generally handle loops.

I look at the two rule modifications. In regex-speak, 8: 42 | 42 8 is really just saying 8: 42+, that is, one or more applications of rule 42. So we could modify rule_index_to_regex() to hard-code an exception when ix == 8, return "(...)+" where ... is the regex code for rule 42. This seems like it should be easy to do.

For the modification to rule 11, on the other hand, I cannot come up with the change that I would need to make to the regex. After some browsing I find that someone else has asked the same thing on Stack Overflow, and it is not possible to do it in a regular expression because such a match is not regular! (The Stack Overflow shows how it can be done in Perl with backreferences, but the documentation tells me that the regular expression engine used in Rust’s regex module does not support this because it can blow up exponentially.2)

I am briefly afraid that I will have to rewrite my entire program to not use regular expressions because the problem is no longer regular.

I realize that I should listen to the hint from the puzzle description, and only deal with the input that I have, instead of trying to solve the puzzle generally. I can get around this limitation by allowing up to a certain number of loops, instead of infinite loops. For example, let’s say that rule 42 is "a" and rule 31 is "b", then rule 11 ought to match ab, aabb, aaabbb, etc. But we know that none of our messages are infinitely long, so there’s a maximum number of as and bs that we have to match. We can write the regex (ab|a{2}b{2}|a{3}b{3}|a{4}b{4}), for example, if that maximum number is 4. We just have to figure out, or more likely, estimate,3 what that maximum number is. The upper bound for that maximum number is the length of our longest message divided by 2, which is 44, but in practice it should be much less, because rules 42 and 31 are not single letters, and rule 11 is not the start symbol.

I start with trying a maximum of 4 repetitions. This is now my rule_index_to_regex() function:

fn rule_index_to_regex(rule_set: &RuleSet, ix: usize, is_part2: bool) -> String {
    let rule = rule_set.get(&ix).unwrap();
    match ix {
        8 if is_part2 => format!("({})+", rule_to_regex(rule_set, rule, is_part2)),
        11 if is_part2 => {
            if let Rule::Seq(refs) = rule {
                format!(
                    "({0}{1}|{0}{{2}}{1}{{2}}|{0}{{3}}{1}{{3}}|{0}{{4}}{1}{{4}})",
                    rule_to_regex(rule_set, &refs[0], is_part2),
                    rule_to_regex(rule_set, &refs[1], is_part2)
                )
            } else {
                panic!("Wrong type for rule 11");
            }
        }
        _ => rule_to_regex(rule_set, rule, is_part2),
    }
}

(I’ve changed the signatures of rule_to_regex() and rule_set_to_regex() accordingly.)

This gets me a correct test result on the test that I’ve written for the example given in Part 2 of the puzzle description. So I try it on the real input, and that gives me the correct answer. Apparently four is enough.

Afterword

Day 18 was only not tricky because of a few circumstances that coincided:

  • I had seen similar problems before, and knew what sort of tools I needed to solve it, before I tried to write any code.
  • I had been primed with the idea of using the PEG package by the blog posts that I’d read.
  • There was an example in the PEG documentation consisting of almost exactly what I needed to solve the puzzle.

If any of these three had not applied, I’m sure it would have taken me much longer to do, as Day 19 did.

Part 2 of Day 19, with the a{n}b{n} regex, is the first time during this series where I’ve been held back by not having a computer science degree — it seems likely to me that this is something I’d have learned in school. Nonetheless, I did manage to compensate the gap in my knowledge enough to solve the puzzle, just by reading Stack Overflow.4 This is why Stack Overflow is such a great resource.

It doesn’t sit well with me that there are a few instances of the borrow operator here that I just don’t understand why they are needed.

Lastly, now that I’ve gotten a bit more awareness of what I know and don’t know about Rust, going back and reading how an experienced Rust programmer solved the first few days’ puzzles was incredibly helpful. Perhaps the most helpful thing of all that I learned there, is the workflow of writing a function, writing tests for that function right below it, and running cargo test until they pass.


[1] My hunch was that the Box wouldn’t be necessary because the Vec‘s size is known without it, and indeed the compiler accepted it

[2] As we are indeed trying to achieve here?

[3] Or even more likely, try increasing numbers until one works

[4] I literally googled “regex match ab aabb aaabbb” and landed on that Stack Overflow post as the first result, isn’t it wild that this is possible?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.