A Simple Typeclass for Logic Formulae in Lean4

Note

This post presents a basic (and most likely naive) typeclass for logic formulae. If you want more serious formulae typeclasses with a proven track record of being used to prove big results, I suggest looking at the recent incredible logic formalization work going on in the Foundation project which aims to formalize core mathematical logic results in lean, and has already completed formalizations of Gödel’s First and Second Incompleteness Theorems.

Last year, I made an attempt to formalize a wide class of logics ranging from propositional logic, first order logic, basic modal logics (K, KD45), Description Logics (ALC, fALC), relevance logic (B), and even situation calculus in Lean4. While the project itself made some progress, I inevitably got stuck on two issues:

Lean does not seem to have a nice natural way to extend inductive types. See this post of mine for a discussion on this issue.
Proving metatheorems related to the languages of logics requires a more general notion of language than what a single inductive type can provide. In particular, certain objects like consequence relations and properties of consequence relations like being Tarskian are language agnostic, we want to parameterize out the notion of language.

While working on another project, an attempt at a Lean formalization of Suszko’s reduction, I came across a structure for languages which I quickly realized could be turned into a typeclass for inductive languages. Lets take a look at the final version:

A Typeclass for Formulae in Lean4 Link to heading

/-
Kadlecikova 2025 page 11:
"A language L ... is the set of formulae built on the
non-empty set of atoms (Atom) and the propositional
connectives (C), with the set of sentences of L the
smallest set including Atom and closed under the
connectives from C."
-/
class Language (α : Type) where
  -- Equality of formulae is decidable
  α_deq : DecidableEq α
  -- The set of atoms and connectives in the language, indexed by arity
  -- Note: Atoms are treated as 0-place connectives
  connectives (n : Nat) : Set ((Fin n -> α) -> α)
  -- Condition: The set of atoms is non-empty
  atoms_ne : (cons 0).Nonempty
  -- The language is closed under the connectives
  -- IE, every element of the language has parse tree
  cons_parsable : ∀ (a : α), Nonempty (ParseTree cons a)

This class Language α says that a logic language type α can provide the following:

An algorithm to compare formulae (which we need if we want to have lean automatically derive operations on finite sets of formulae)
An indexed family of connectives, indexed by arity, where each connective is represented by a (Fin n -> α) -> α, this is essentially equivalent to the more intuitive Vector α n -> α, which constructs a formula out of a finite list of subformula, we merely use (Fin n -> α) -> α as the more type-theoretic way of doing things.

Note that the 0-arity connective plays the special role of storing “atoms” the base formulae which are not built of other formulae.

A condition that: the set of atoms is non-empty, otherwise the language empty, no formulae can be constructed.
A condition that: α is closed under the set of connectives, which we take to be a weak condition that every formula has at least one parse tree under the connectives.
Warning
Really this condition should probably be strengthened to say that every formulae has a unique parse tree under the connectives, but the current condition will serve our purpose.

To represent the type of parse trees of a formula a under an indexed family of connectives c, we use the following inductive type:

inductive ParseTree
  {α : Type}
  (c : (n : Nat) -> (Set ((Fin n -> α) → α))) :
  (a : α) -> Type
where
| node {a} {n : Nat} (f : (Fin n -> α) → α) (args : (Fin n -> α)) :
  c n f -> (∀ (i : Fin n), ParseTree c (args i)) ->
  f args = a -> ParseTree c a

The only constructor, node, takes:

a connective f representing the type of the node,
args a map from positions to subformulae (child nodes)
c n f a proof that f is actually an n-ary connective.
(∀ (i : Fin n), ParseTree c (args i)) A parse tree for each subformulae (i.e, a proof that each subformula has a parse tree)
f args = a a proof that the connective when given the subformulae actually produces the formulae we are constructing.

Thats it! Lets give an example by showing that a basic inductive propositional formula type is a language:

inductive PropFormula : Type
| atom : String -> PropFormula
| not : PropFormula -> PropFormula
| and : PropFormula -> PropFormula -> PropFormula
deriving Inhabited, DecidableEq, Repr

To keep things simple we only have negation and conjunction, perfectly suitable for a propositional language due to functional extensionally (and with the added benefit of keeping the upcoming proofs shorter…).

To start we’ll use some helpers that bundle our n-ary inductive constructors a -> a -> ... -> a into (Fin n -> a) -> a constructors.

def Language.bundleZero {α} (f : α) : (Fin 0 → α) -> α :=
  λ _ => f

instance {α}: Coe α ((Fin 0 → α) -> α) where
  coe := Language.bundleZero

def Language.bundleOne {α} (f : α -> α) : (Fin 1 → α) -> α :=
  λ f1 => f (f1 0)

instance {α}: Coe (α -> α) ((Fin 1 → α) -> α) where
  coe := Language.bundleOne

def Language.bundleTwo {α} (f : α -> α -> α) : (Fin 2 → α) -> α :=
  λ f2 => f (f2 0) (f2 1)

instance {α}: Coe (α -> α -> α) ((Fin 2 → α) -> α) where
  coe := Language.bundleTwo

Now lets start with the things we need to construct our language instance, first the indexed family of connectives:

def PropFormula.connectives (n : Nat) : Set ((Fin n -> PropFormula) -> PropFormula) :=
  match n with
    | 0 => {↑(PropFormula.atom s) | s : String}
    | 1 => {↑PropFormula.not}
    | 2 => {↑PropFormula.and}
    | _ => ∅

Note that the ↑ is just calling the respective bundling conversion to a (Fin n -> a) -> a defined above.

Now a proof that the set of atoms is not empty, which we prove by just saying “a” is an atom.

lemma PropFormula.atoms_ne : (PropFormula.connectives 0).Nonempty := by
  simp only [Set.Nonempty, Set.mem_setOf_eq];
  exists (λ _ => PropFormula.atom "a")
  exists "a"

And now finally the hardest part, proving that every formula has a parse tree under the connectives. While this is hard in the sense the proof is long, its actually a quite straightforward induction proof.

lemma PropFormula.cons_parsable : ∀ (a : PropFormula), Nonempty (ParseTree PropFormula.connectives a) := 
by
    intro f
    induction f
    . case atom s =>
      apply Nonempty.intro
      apply ParseTree.node (Language.bundleZero (PropFormula.atom s)) (λ _ => PropFormula.atom s)
      . simp only [PropFormula.connectives, setOf, exists_apply_eq_apply]
      . apply Fin.elim0
      . rfl
    . case not f ih =>
      apply Nonempty.intro
      have ih := Classical.choice ih
      apply ParseTree.node (Language.bundleOne PropFormula.not) (λa => f)
      . rfl
      . intro i
        exact ih
      . simp only [Language.bundleOne]
    . case and f1 f2 ih1 ih2 =>
      apply Nonempty.intro
      have ih1 := Classical.choice ih1
      have ih2 := Classical.choice ih2
      apply ParseTree.node (Language.bundleTwo PropFormula.and) (λa => match a with | 0 => f1| 1 => f2)
      . rfl
      . intro i
        match i with
        | 0 => exact ih1
        | 1 => exact ih2
      . simp only [Language.bundleTwo]

This is now everything we need to show that PropFormula is indeed a Language instance and we have:

instance : Language PropFormula where
  α_deq := inferInstance
  cons := PropFormula.connectives
  atoms_ne := PropFormula.atoms_ne
  cons_parsable := PropFormula.cons_parsable

You can run this code live on Lean 4 Web here!