Mahmoud Mazouz

Back in this post I introduced Absolut, a Rust library to facilitate the implementation of bytewise SIMD lookup tables. Version 0.1.1 of Absolut heavily relied on the Z3 Theorem Prover to compute lookup tables from declarative map definitions. This resulted in a few issues:

The z3 crate either uses a system-installed copy of Z3 or builds a locally bundled copy of it. Needless to say, this is too much software bloat for what Absolut essentially is: a set of procedural macros.
An SMT solver is a relatively opaque program. When Absolut inevitably fails to generate a lookup table because Z3 deemed it unsatisfiable, it's hard to determine the reason for the failure.
The execution time of Z3 is unpredictable. While it's possible to set an upper limit to it, such a change would result in less chances of finding a solution.

Putting aside technical issues around the use of Z3, the project suffered from a lack of documentation: except for the aforementioned post, I provided no README and no rustdoc documentation. At the time, I was so focused on getting a proof of concept off the ground that I neglected everything I didn't believe to be not strictly essential.

Moving away from SMT

In version 0.1.1, Absolut formulated the problem of generating a lookup table (referred to from now on as "The Problem") in first order logic. This allowed for expressing The Problem in terms of Z3's supported theories (e.g. fixed-size bit-vectors) at a relatively high level. In other words, using an SMT solver provided greater expressive power to implement and maintain Absolut.

On the downside, I was locked into the ecosystem of SMT solvers, which are complex and hard to embed into a small Rust library. So I reformulated The Problem in terms of propositional logic to use an SAT solver instead. A SAT solver is relatively easier to implement and there even exist two notable CDCL-based implementations for Rust: Varisat and Splr.

My goal at this point was to implement an embedded SAT solver for Absolut. I started out by implementing Donald Knuth's SAT0 algorithm in Rust, but I wasn't satisfied with its performance (or rather lack thereof). It often failed to terminate after multiple hours when used to generate a lookup table.

Before putting enormous effort into implementing a more sophisticated SAT solver, I asked on StackOverflow if I would benefit from a specialized SAT solver in the first place. The answer was the following:

[...] That points to the main weakness of the SAT formulation: it’s highly symmetric, and basic branching strategies will struggle. Sophisticated SAT solvers can detect symmetry and prune redundant branches, but the logic is complicated, and you really don’t want to implement it yourself for one application.

— David Eisenstat, stackoverflow.com

At this point, my goal of implementing a specialized SAT solver specifically optimized for usage inside Absolut grew more daunting. So I set out to evaluate the existing Rust-based SAT solvers to see if they work well enough for my use case.

I proceeded to contact Jannis Harder aka jix, the author of Varisat, to inquire about the state of the project. They told me that they would write Varisat very differently had they started it today, but there are no outstanding issues with the code nonetheless. They added that Splr on the other hand is closer to the state of art than Varisat and is being actively developed.

I didn't put too much thought into choosing between Varisat and Splr. In any case, the SAT solver is well encapsulated in the code of Absolut and could be readily swapped if needed. Ultimately, I was more familiar with Varisat as I had played with it in the past, So I went with it as I didn't encounter any issues with it while developing Absolut 0.2.0.

Moving away from SAT

David Eisenstat also pointed out in the StackOverflow answer that I can do without a SAT solver in some particular cases. To understand why, we need a quick recap of what The Problem is. Consider the following lookup table, described using Absolut:

#[absolut::one_hot]
enum Brainfuck {
    #[matches(b'<', b'>')]
    IncDec,
    #[matches(b'+', b'-')]
    IncDecPtr,
    #[matches(b',', b'.')]
    ReadWrite,
    // My syntax highlighter is struggling (；⌣̀_⌣́)
    #[matches(b'[', b']')]
    Jump,
    #[wildcard]
    Ignored,
}

The above Brainfuck lookup table maps < and > to Brainfuck::IncDec, maps + and - to Brainfuck::IncDecPtr, maps , and . to Brainfuck::ReadWrite, maps [ and ] to Brainfuck::Jump and maps everything else to Brainfuck::Ignored.

Such a lookup table could be useful for quickly identifying Brainfuck commands in source code, while quickly ignoring comments¹. The following lookup function illustrates, using non-vectorized code for demonstration purposes, how to use the Brainfuck lookup table:

fn lookup(input: &[u8; 16], output: &mut [u8; 16]) {
    for (index, byte) in input.iter().copied().enumerate() {
        let (lo, hi) = (byte & 0b1111, byte >> 4);
        output[index] = Brainfuck::TABLE_LOW_NIBBLES[lo as usize]
                      & Brainfuck::TABLE_HIGH_NIBBLES[hi as usize];
    }
}

Absolut solves for the Brainfuck::TABLE_LOW_NIBBLES and Brainfuck::TABLE_HIGH_NIBBLES tables, as well as all of the enum variants of Brainfuck. To do this, it suffices to encode the variants using the One-hot binary encoding. Such that every variant is an 8-bit integer with exactly one set bit, except for the #[wildcard]-annotated variant which is always 0.

This is because the set of bytes matched by each variant can be expressed as a Cartesian product, e.g. IncDec matches {0x3} ⨯ {0xE, 0xC} == {(0x3, 0xE), (0x3, 0xC)} where (0x3, 0xE) encodes 0x3E == b'<' and (0x3, 0xC) encodes 0x3C == b'>'. When this property is satisfied, one can simply set the exact same bit position in Brainfuck::TABLE_LOW_NIBBLES[0x3], Brainfuck::TABLE_HIGH_NIBBLES[0xE], Brainfuck::TABLE_LOW_NIBBLES[0xC] and Brainfuck::IncDec.

Brainfuck::IncDec as u8 == Brainfuck::TABLE_LOW_NIBBLES[0x3] 
                         & Brainfuck::TABLE_HIGH_NIBBLES[0xE];

Brainfuck::IncDec as u8 == Brainfuck::TABLE_LOW_NIBBLES[0x3] 
                         & Brainfuck::TABLE_HIGH_NIBBLES[0xC];

This algorithm is implemented in the absolut::one_hot procedural macro. The downside of one_hot is that it only supports up to 9 variants. On the upside, it's much simpler to implement and reason about.

Absolut 0.2.1 introduces yet another lookup table generation algorithm, called absolut::composite, which allows for arbitrary byte-to-byte maps using NEON on AArch64. If we change the above Brainfuck lookup table to use #[absolut::composite] instead of #[absolut::one_hot], then the lookup function becomes:

fn lookup(input: &[u8; 16], output: &mut [u8; 16]) {
    for (index, byte) in input.iter().copied().enumerate() {
        output[index] = match byte {
            0..=63    => Brainfuck::TABLE_QUARTERS[0][byte as usize],
            64..=127  => Brainfuck::TABLE_QUARTERS[1][byte as usize - 64],
            128..=191 => Brainfuck::TABLE_QUARTERS[2][byte as usize - 128],
            192..=255 => Brainfuck::TABLE_QUARTERS[3][byte as usize - 192],
        };
    }
}

The idea here is to use four distinct lookup tables, one for each quarter of the range of all bytes. I recommend reading Arbitrary byte-to-byte maps using ARM NEON? by Daniel Lemire for more information. I also came across an application of this idea in Beating the Fastest Lexer Generator in Rust by Adrian Alic.

`rustdoc` shenanigans

When writing rustdoc documentation for Absolut 0.2.1, I couldn't figure out how to get the "Available on crate feature feature only." notices. rustdoc by default only builds documentation for default crate features. However, if you enable a non-default feature in the rustdoc build, rustdoc doesn't mark the featured-gated items as such. To achieve that, you need to enable the doc_auto_cfg unstable language feature.

But that doesn't work in Stable Rust! Thankfully, the docs.rs build uses Nightly Rust. So it's possible to enable it there only:

[package.metadata.docs.rs]
all-features = true
rustdoc-args = ["--cfg", "doc_auto_cfg"]

What took me the longest time to realize was how to test the build with doc_auto_cfg locally without building with Nightly Rust and adding #![feature(doc_auto_cfg)] in the crate root. For that something like #![cfg_attr(absolut_docs, feature(doc_auto_cfg))] is needed, which means that if and only if the crate is configured with --cfg absolut_docs then the attribute should expand to #![feature(doc_auto_cfg)], otherwise no feature is enabled.

If you manage to implement such a thing, please email me! Bonus points if it's faster than scalar code!

Moving away from SMT

Moving away from SAT

rustdoc shenanigans

`rustdoc` shenanigans