Background Link to heading

I’m currently in the process of writing a post about things I like and things I dislike about Rust. One of the major things I like about Rust is their implementation of sum types and Inductive types. I like Rust’s sum types so much infact, that the section talking about them in that future post became large enough to become its own post (this post).

What are Sum Types and Inductive Types Link to heading

Simply, a sum type is a type that takes on one of many possible values. A simple example is an enum, such as DayOfTheWeek. A variable of type DayOfTheWeek may take on the value of Monday, Tuesday, Thursday, Friday, etc. This simple concept is easily extended to the notion of an inductive type where the type may have an object of the same type as itself as one of the possible members of the structure. Simple recursive grammars are a good example of this, we will demonsteate a simple propositional formula as an inductive type in our examples.

Note
This post does not pretend to be giving a good beginer overview of sum types nor inductive types, you can get the basic idea from the later examples, but check out some of the links for a more concreate and well written introduction to these concepts.

Rant Link to heading

It was really depressing to go from Lean to C/C++ and realize my best option for a basic sum type was a tagged union with a bunch of custom machinery to ensure I don’t forget to set the tag. It was even more depressing to see how the best option for matching a sum type in C is just a switch statement, but even that is better than the abomination that is C++’s std::variant and its abysmal attempt at matching with std::visit (See why std::visit is everything wrong with modern C++).

Lets now turn to a hands on example of why I like Rust’s enums, with a proper comparison to a language with first class sum type support like Lean4, and to languages with bad to no sum type implementation like C and modern C++20. For the sake of showing off the full power of sum types, we’ll look at an inductuve sum type (a simple recursive formula), and for the sake of showcasing patern matching matching we’ll implement a formula depth function.

Lean Link to heading

Lean provides proper sum types via the inductive keyword. Lean by far has the best support for sum types out of everything we look at.

inductive Formula where
| atom : String -> Formula
| not : Formula -> Formula
| and : Formula -> Formula -> Formula

def Formula.depth : Formula -> Nat
| atom _ => 1
| not f => depth f + 1
| and l r => max (depth l) (depth r) + 1   

open Formula
#eval depth (and (atom "A") (not (atom "B")))

Run it

Lets now turn to a C tagged union abomination of this exact same code.

#include <stdio.h>

struct Formula{
    enum Type {ATOM, NOT, AND} type;
    union{
        char atom;
        struct Formula* not;
        struct {
            struct Formula* l;
            struct Formula* r;
        } and;
    };
};

int max(int a, int b){
    return a > b ? a : b;
}

int depth(struct Formula* formula){
    switch(formula->type){
        case ATOM: return 1;
        case NOT: return depth(formula->not) + 1;
        case AND: return max(depth(formula->and.l), depth(formula->and.r)) + 1;
    }
}

int main() {
    struct Formula a = {ATOM, atom: 'A'};
    struct Formula b = {ATOM, atom: 'B'};
    struct Formula notB = {NOT, not: &b};
    struct Formula AandNotB = {AND, and:{&a, &notB}};
    printf("%d", depth(&AandNotB));
    return 0;
}

Run it

We could do something similar to C in C++, with a custom tagged union, but I would like to show off the state of modern C++20 and what the standard library designers would prefer we do using std::visit.

#include<variant>
#include<string>
#include<utility>
#include<algorithm>
#include<iostream>

//Forced to wrap variant in a struct to allow recursive reference 
struct Formula{
    std::variant<
        std::string,
        Formula*,
        std::pair<Formula*, Formula*>
    > data;
};

int depth(Formula* formula){
    //We could use lambdas if this wasn't a recursive struct
    struct Visitors{
        int operator()(std::string s){
            return 1;
        }
        int operator()(Formula* f){
            return std::visit(Visitors(), f->data) + 1;
        }
        int operator()(std::pair<Formula*, Formula*> p){
            return std::max(std::visit(Visitors(), p.first->data), 
                            std::visit(Visitors(), p.second->data)) + 1;
        }
    };
    return std::visit(Visitors(), formula->data);
}

int main(){
    Formula a = {"A"};
    Formula b = {"B"};
    Formula notb = {&b};
    Formula AandNotB = {std::make_pair(&a, &notb)};
    std::cout<<depth(&AandNotB)<<std::endl;
    return 0;
}

Run it

Finally lets get a breath of fresh air and look at how Rust does it…

enum Formula{
    Atom(char),
    Not(Box<Formula>),
    And(Box<Formula>,Box<Formula>)
}

fn depth(f : Formula)->u32{
    match f{
        Formula::Atom(_) => 1,
        Formula::Not(f) => depth(*f)+1,
        Formula::And(l,r) => std::cmp::max(depth(*l), depth(*r))+1
    }
}

fn main() {
    let f : Formula = Formula::And(
        Box::new(Formula::Atom('A')),
        Box::new(Formula::Not(Box::new(Formula::Atom('B')))) 
    );
    println!("{}", depth(f));
}

Run it

Closing Remarks Link to heading

Rust appears to be doing quite a nice job. Its syntax for the type itself is much more similar to the Lean version than the C or C++ version. Its syntax for matching is far more intuitive, allowing you to avoid any extra machinary that comes with switch statements, or better yet avoiding the awful synatax and runtime calling overhead of std::visit.