Background Link to heading
I’m currently in the process of writing a post about things I like and things I dislike about Rust. One of the major things I like about Rust is their implementation of sum types and Inductive types. I like Rust’s sum types so much infact, that the section talking about them in that future post became large enough to become its own post (this post).
What are Sum Types and Inductive Types Link to heading
Simply, a sum type is a type that takes on one of many possible values. A simple example is an enum, such as DayOfTheWeek
.
A variable of type DayOfTheWeek
may take on the value of Monday
, Tuesday
, Thursday
, Friday
, etc. This simple
concept is easily extended to the notion of an inductive type where the type may have an object of the same type as itself as one of the
possible members of the structure. Simple recursive grammars are
a good example of this, we will demonsteate a simple propositional formula as an inductive type in our examples.
Rant Link to heading
It was really depressing to go from Lean to C/C++ and realize my best option for a basic sum type was a tagged union
with a bunch of custom machinery to ensure I don’t forget to set the tag. It was even more depressing to see
how the best option for matching a sum type in C is just a switch statement, but even that is better than the
abomination that is C++’s std::variant
and its abysmal attempt at matching with std::visit
(See why
std::visit
is everything wrong with modern C++).
Lets now turn to a hands on example of why I like Rust’s enums, with a proper comparison to a language with first class sum type support like Lean4, and to languages with bad to no sum type implementation like C and modern C++20. For the sake of showing off the full power of sum types, we’ll look at an inductuve sum type (a simple recursive formula), and for the sake of showcasing patern matching matching we’ll implement a formula depth function.
Lean Link to heading
Lean provides proper sum types via the inductive keyword. Lean by far has the best support for sum types out of everything we look at.
inductive Formula where
| atom : String -> Formula
| not : Formula -> Formula
| and : Formula -> Formula -> Formula
def Formula.depth : Formula -> Nat
| atom _ => 1
| not f => depth f + 1
| and l r => max (depth l) (depth r) + 1
open Formula
#eval depth (and (atom "A") (not (atom "B")))
Lets now turn to a C tagged union abomination of this exact same code.
#include <stdio.h>
struct Formula{
enum Type {ATOM, NOT, AND} type;
union{
char atom;
struct Formula* not;
struct {
struct Formula* l;
struct Formula* r;
} and;
};
};
int max(int a, int b){
return a > b ? a : b;
}
int depth(struct Formula* formula){
switch(formula->type){
case ATOM: return 1;
case NOT: return depth(formula->not) + 1;
case AND: return max(depth(formula->and.l), depth(formula->and.r)) + 1;
}
}
int main() {
struct Formula a = {ATOM, atom: 'A'};
struct Formula b = {ATOM, atom: 'B'};
struct Formula notB = {NOT, not: &b};
struct Formula AandNotB = {AND, and:{&a, ¬B}};
printf("%d", depth(&AandNotB));
return 0;
}
We could do something similar to C in C++, with a custom tagged union, but I would like to show off the state of modern C++20 and
what the standard library designers would prefer we do using std::visit
.
#include<variant>
#include<string>
#include<utility>
#include<algorithm>
#include<iostream>
//Forced to wrap variant in a struct to allow recursive reference
struct Formula{
std::variant<
std::string,
Formula*,
std::pair<Formula*, Formula*>
> data;
};
int depth(Formula* formula){
//We could use lambdas if this wasn't a recursive struct
struct Visitors{
int operator()(std::string s){
return 1;
}
int operator()(Formula* f){
return std::visit(Visitors(), f->data) + 1;
}
int operator()(std::pair<Formula*, Formula*> p){
return std::max(std::visit(Visitors(), p.first->data),
std::visit(Visitors(), p.second->data)) + 1;
}
};
return std::visit(Visitors(), formula->data);
}
int main(){
Formula a = {"A"};
Formula b = {"B"};
Formula notb = {&b};
Formula AandNotB = {std::make_pair(&a, ¬b)};
std::cout<<depth(&AandNotB)<<std::endl;
return 0;
}
Finally lets get a breath of fresh air and look at how Rust does it…
enum Formula{
Atom(char),
Not(Box<Formula>),
And(Box<Formula>,Box<Formula>)
}
fn depth(f : Formula)->u32{
match f{
Formula::Atom(_) => 1,
Formula::Not(f) => depth(*f)+1,
Formula::And(l,r) => std::cmp::max(depth(*l), depth(*r))+1
}
}
fn main() {
let f : Formula = Formula::And(
Box::new(Formula::Atom('A')),
Box::new(Formula::Not(Box::new(Formula::Atom('B'))))
);
println!("{}", depth(f));
}
Closing Remarks Link to heading
Rust appears to be doing quite a nice job. Its syntax for the type itself is much more
similar to the Lean version than the C or C++ version. Its syntax for matching
is far more intuitive, allowing you to avoid any extra machinary that comes with switch statements,
or better yet avoiding the awful synatax and runtime calling overhead of std::visit
.