3 files changed, 231 insertions, 51 deletions
diff --git a/.gitignore b/.gitignore
index d56598b..e8a31b4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,10 @@
+# Coq
 *.v#
+
+# TeX
+*.aux
+*.fdb_latexmk
+*.fls
+*.log
 *.pdf
+_minted*/
diff --git a/expr.tex b/expr.tex
new file mode 100644
index 0000000..df2800a
--- /dev/null
+++ b/expr.tex
@@ -0,0 +1,164 @@
+\documentclass[a4paper]{article}
+
+\usepackage{geometry}
+\usepackage{parskip}
+\usepackage{minted}
+\setminted{fontsize=\footnotesize,xleftmargin=2em,linenos=true,style=perldoc}
+\newcommand{\coq}[2]{\inputminted[firstline=#1,lastline=#2]{coq}{expr.v}}
+\usepackage{csquotes}
+
+\title{Proving an expression compiler correct}
+\author{Camil Staps}
+
+\begin{document}
+
+\maketitle
+
+\begin{abstract}
+	We define a simple language \textsf{Exp} for arithmetic expressions with variables.
+	The semantics are executable as a simple interpreter.
+	We discuss the implementation of an \textsf{Exp} compiler outputting reverse Polish notation.
+	The compiler is proven correct with respect to the semantics in Coq.
+\end{abstract}
+
+\section{Introduction}
+
+The \textsf{Exp} compiler, its semantics, the compiler and the proof
+	are formalised in Coq 8.6.
+The Coq file is attached to this report.
+The report walks through the file, but occasionally discusses things out of order when this aids understandability.
+
+The proof uses well-founded induction, which requires a library.
+We also use the \textsf{Omega} library for basic arithmetic,
+	and the \textsf{ListNotations} module for pretty list notation.
+
+\coq{1}{2}
+
+\section{The language}
+
+Expressions in \textsf{Exp} are literals (\textsf{elit}), variables (\textsf{evar}) and additions of other expressions (\textsf{eadd}).
+The \textsf{Exp} language is polymorphic in the type of variables:
+	for example, an expression of type \textsf{Exp nat} is one in which natural numbers operate as variable names.
+For readability, this type variable is often omitted in this report.
+Thus, we will discuss \textsf{elit 37} rather than \textsf{elit \_ 37}.
+
+\coq{4}{7}
+
+The semantics of the language are defined by \textsf{eval}, which is a straightforward recursive interpreter.
+It operates with an environment \textsf{env} which holds the values of variables.
+
+\coq{9}{14}
+
+\section{Compilation}
+
+The compiler is a function from \textsf{Exp v} to \textsf{list (RPNOp v)}; \textsf{RPN} stands for reverse Polish notation operator.
+The implicit execution model is a simple stack machine.
+Execution of a list of RPN instructions succeeds if after execution with an empty stack exactly one element is on the stack.
+This element is the result.
+
+A literal pushes to the stack, as does a variable (after looking up the value in the environment).
+The addition operator pops two values from the stack and pushes their sum.
+This leads to a straightforward compiler.
+
+\coq{16}{19}
+\coq{21}{26}
+
+\section{Evaluation of reverse Polish notation programs}
+
+To evaluate a \textsf{list (RPNOp v)} under a given environment,
+	we first define a helper function \textsf{pn\_eval},
+	which evaluates instructions in (plain) Polish notation and gives the full stack after execution.
+Evaluation of arbitrary lists of (R)PN operators may fail,%
+	\footnote{An example of a failing (R)PN program is \textsf{[rpnadd]}; addition cannot be applied when the stack is empty.}
+	so the result type is \textsf{option (list nat)}.
+
+\coq{42}{51}
+
+In this definition, we make use of two functions from category theory.
+First, \textsf{\textless\$\textgreater} lifts a morphism of type \textsf{a $\rightarrow$ b} to type \textsf{option a $\rightarrow$ option b},
+	which is possible because \textsf{option} is a functor.
+Functors are well-established in functional programming languages.
+Second, \textsf{\textgreater\textgreater=} is the monadic bind, again well-known from functional programming.
+
+\coq{28}{33}
+\coq{35}{40}
+
+Save for some administration, the correctness proof presented below does not change
+	when using \textsf{\textless\$\textgreater} and \textsf{\textgreater\textgreater=}
+	instead of handling \textsf{Some} and \textsf{None} explicitly in \textsf{pn\_eval} with pattern matching.
+It is merely an enhancement for readability and modularity.
+
+The final \textsf{rpn\_eval} function is a simple wrapper around \textsf{pn\_eval}.
+
+\coq{53}{57}
+
+\section{Proving correctness}
+
+We now formally prove that the compiler in combination with the RPN evaluator are correct with respect to the semantics of \textsf{Exp}.
+In other words, for all variables types \textsf{v}, expressions \textsf{e} and environments \textsf{env},
+	the result of \textsf{rpn\_eval v env (rpn v e)} should be \textsf{Some (eval v env e)}.
+
+\coq{87}{88}
+
+To prove this we first prove a lemma, operating on \textsf{pn\_eval} rather than \textsf{rpn\_eval}.
+Thus, we will show that the result of \textsf{pn\_eval v env (rev (rpn v e))} is \textsf{Some [eval v env e]}.
+However, we will prove something more powerful: that this evaluation does not depend on any further RPN instructions.
+Thus, if an arbitrary RPN program \textsf{rest} evaluates to the stack \textsf{results} when starting with an empty stack,
+	\textsf{pn\_eval v env (rev (rpn v e) ++ rest)} should be \textsf{Some (eval v env e :: results)}.
+
+\coq{59}{63}
+
+The proof is by well-founded induction on expressions.
+For this to work, we first provide a mapping \textsf{Exp v $\rightarrow$ nat} based on the size of expressions.
+Thus, there is some \enquote{ever decreasing score} that can be attributed to expressions, making the induction sound.
+
+\coq{65}{72}
+
+After introducing the hypotheses, we make a case distinction over \textsf{e}.
+The cases for literals and variables are very similar and can be handled by applying the induction hypothesis.
+After this, only the case for addition is left.
+
+\coq{74}{75}
+
+We now have an RPN program with \textsf{rpnadd} as the last instruction.
+Since the program is reversed for use in \textsf{pn\_eval}, we can rewrite the program to get \textsf{rpnadd} at its head.
+
+\coq{76}{78}
+
+We now consider the rest of the program.
+Suppose the original expression was \textsf{e = eadd e1 e2},
+	we now consider \textsf{pn\_eval v env ((rev (rpn v e1) ++ rev (rpn v e2)) ++ rest)}
+	in the evaluation of \textsf{rpnadd}.
+
+Due to associativity of \textsf{++}, we can reanalyse this as
+	\textsf{pn\_eval v env (rev (rpn v e1) ++ (rev (rpn v e2) ++ rest))},
+	which allows for application of the induction hypothesis with \textsf{rev (rpn v e2) ++ rest} as the \enquote{rest of the program}, i.e., the \textsf{rest}.
+
+\coq{79}{81}
+
+To be able to use the induction hypothesis on \textsf{e1} we have to show that it is strictly smaller than \textsf{eadd e1 e2}.
+This is easily done with \textsf{simpl} and \textsf{omega}.
+We then apply the induction hypothesis again for \textsf{e2} with \textsf{rest}.
+Here too, \textsf{simpl} and \textsf{omega} make the induction sound.
+We carry over the fact that \textsf{rest} results in \textsf{results} from the induction hypothesis using \textsf{assumption}.
+
+\coq{82}{82}
+
+The proof of the lemma now follows by applying the assertion of the induction step.
+
+\coq{83}{85}
+
+We now return to the proof of the correctness theorem, repeated here:
+
+\coq{87}{88}
+
+We outsource this proof to our proof on \textsf{pn\_eval} using \textsf{rest = result = []}.
+
+\coq{89}{93}
+
+The result of the call to \textsf{pn\_eval} from \textsf{rpn\_eval} is now known,
+	and we can simplify the \textsf{rpn\_eval} call further to the point of reflexivity.
+
+\coq{94}{97}
+
+\end{document}
diff --git a/expr.v b/expr.v
index 8530764..2288b3a 100644
--- a/expr.v
+++ b/expr.v
@@ -1,86 +1,94 @@
-Require Import Omega.
-Require Import Wellfounded.
+Require Import Omega Wellfounded Coq.Lists.List.
+Import ListNotations.
 
-Inductive Exp : Set :=
-  | elit : nat -> Exp
-  | evar : nat -> Exp
-  | eadd : Exp -> Exp -> Exp.
+Inductive Exp (v : Type) :=
+  | elit : nat -> Exp v
+  | evar : v   -> Exp v
+  | eadd : Exp v -> Exp v -> Exp v.
 
-Fixpoint eval (env : nat -> nat) (e : Exp) :=
+Fixpoint eval (v : Type) (env : v -> nat) (e : Exp v) : nat :=
   match e with
-  | elit n => n
-  | evar n => env n
-  | eadd a b => eval env a + eval env b
+  | elit _ n   => n
+  | evar _ n   => env n
+  | eadd _ a b => eval v env a + eval v env b
   end.
 
-Inductive RPNOp :=
-  | rpnlit : nat -> RPNOp
-  | rpnvar : nat -> RPNOp
-  | rpnadd : RPNOp.
+Inductive RPNOp (v : Type) :=
+  | rpnlit : nat -> RPNOp v
+  | rpnvar : v   -> RPNOp v
+  | rpnadd : RPNOp v.
 
-Fixpoint rpn (e : Exp) :=
+Fixpoint rpn (v : Type) (e : Exp v) : list (RPNOp v) :=
   match e with
-  | elit n => cons (rpnlit n) nil
-  | evar n => cons (rpnvar n) nil
-  | eadd a b => app (rpn b) (app (rpn a) (cons rpnadd nil))
+  | elit _ n => [rpnlit v n]
+  | evar _ n => [rpnvar v n]
+  | eadd _ a b => rpn v b ++ rpn v a ++ [rpnadd v]
   end.
 
-Definition _fmap A B (f : A -> B) (o : option A) :=
+Definition fmap (a : Type) (b : Type) (f : a -> b) (o : option a) : option b :=
   match o with
   | Some x => Some (f x)
   | None   => None
   end.
-Notation fmap := (_fmap _ _).
+Notation "x <$> y" := (fmap _ _ x y) (at level 65, left associativity).
 
-Fixpoint rpn_eval' (env : nat -> nat) (rpn : list RPNOp) :=
+Definition bind (a : Type) (b : Type) (f : a -> option b) (o : option a) : option b :=
+  match o with
+  | Some x => f x
+  | None   => None
+  end.
+Notation "x >>= f" := (bind _ _ f x) (at level 55, left associativity).
+
+Fixpoint pn_eval (v : Type) (env : v -> nat) (rpn : list (RPNOp v)) : option (list nat) :=
   match rpn with
-  | nil => Some nil
-  | cons (rpnlit n) a => fmap (cons n) (rpn_eval' env a)
-  | cons (rpnvar n) a => fmap (cons (env n)) (rpn_eval' env a)
-  | cons (rpnadd) a => match rpn_eval' env a with
-    | Some (cons x (cons y stack)) => Some (cons (x+y) stack)
-    | _                            => None
+  | []              => Some []
+  | rpnlit _ n :: a => cons n <$> pn_eval v env a
+  | rpnvar _ n :: a => cons (env n) <$> pn_eval v env a
+  | rpnadd _   :: a => pn_eval v env a >>= fun stack => match stack with
+    | x :: y :: stack => Some (x+y :: stack)
+    | _               => None
     end
   end.
 
-Function rpn_eval (env : nat -> nat) (rpn : list RPNOp) :=
-  match rpn_eval' env (List.rev rpn) with
-  | Some (cons x nil) => Some x
-  | _                 => None
+Function rpn_eval (v : Type) (env : v -> nat) (rpn : list (RPNOp v)) : option nat :=
+  match pn_eval v env (rev rpn) with
+  | Some [x] => Some x
+  | _        => None
   end.
 
-Fixpoint exprsize e :=
+Lemma correctness_partial_compilation (v : Type) (e : Exp v) :
+  forall (env : v -> nat) (rest : list (RPNOp v)) (results : list nat),
+  Some results = pn_eval v env rest ->
+  Some (eval v env e :: results) = pn_eval v env (rev (rpn v e) ++ rest).
+Proof.
+
+Fixpoint exprsize (v : Type) (e : Exp v) : nat :=
   match e with
-  | elit _ => 1
-  | evar _ => 1
-  | eadd a b => 1 + exprsize a + exprsize b
+  | elit _ _ => 1
+  | evar _ _ => 1
+  | eadd _ a b => 1 + exprsize v a + exprsize v b
   end.
-
-Lemma correctness_partial_compilation e :
-  forall env rest results,
-  Some results = rpn_eval' env rest ->
-  Some (eval env e :: results)%list = rpn_eval' env (app (List.rev (rpn e)) rest).
-Proof.
 induction e using (well_founded_induction
-  (wf_inverse_image _ nat _ exprsize PeanoNat.Nat.lt_wf_0)).
+  (wf_inverse_image _ nat _ (exprsize v) PeanoNat.Nat.lt_wf_0)).
+
 intros.
 destruct e; simpl; try rewrite <- H0; try reflexivity.
-rewrite (List.rev_app_distr (rpn e2)).
-rewrite (List.rev_app_distr (rpn e1)).
+rewrite (rev_app_distr (rpn v e2)).
+rewrite (rev_app_distr (rpn v e1)).
 simpl.
-(* Only eadd is left now *)
-rewrite (List.app_assoc_reverse (List.rev (rpn e1)) (List.rev (rpn e2)) rest).
-assert (Some (eval env e1 :: eval env e2 :: results)%list
-    = rpn_eval' env (app (List.rev (rpn e1)) (app (List.rev (rpn e2)) rest))) as recstep.
+rewrite (app_assoc_reverse (rev (rpn v e1)) (rev (rpn v e2)) rest).
+assert (Some (eval v env e1 :: eval v env e2 :: results)
+    = pn_eval v env (rev (rpn v e1) ++ rev (rpn v e2) ++ rest)) as recstep.
   apply H; try apply H; simpl; try assumption; omega.
 rewrite <- recstep.
 reflexivity.
 Qed.
 
-Theorem correctness e env : Some (eval env e) = rpn_eval env (rpn e).
+Theorem correctness (v : Type) (e : Exp v) (env : v -> nat) :
+  Some (eval v env e) = rpn_eval v env (rpn v e).
 Proof.
-assert (Some (eval env e :: nil)%list = rpn_eval' env (List.rev (rpn e))) as outsourcing.
-rewrite <- (List.app_nil_r (List.rev (rpn e))).
+assert (Some [eval v env e] = pn_eval v env (rev (rpn v e))) as outsourcing.
+rewrite <- (app_nil_r (rev (rpn v e))).
 apply correctness_partial_compilation.
 simpl. reflexivity.
 unfold rpn_eval.