diff options
-rw-r--r-- | .gitignore | 8 | ||||
-rw-r--r-- | expr.tex | 164 | ||||
-rw-r--r-- | expr.v | 110 |
3 files changed, 231 insertions, 51 deletions
@@ -1,2 +1,10 @@ +# Coq *.v# + +# TeX +*.aux +*.fdb_latexmk +*.fls +*.log *.pdf +_minted*/ diff --git a/expr.tex b/expr.tex new file mode 100644 index 0000000..df2800a --- /dev/null +++ b/expr.tex @@ -0,0 +1,164 @@ +\documentclass[a4paper]{article} + +\usepackage{geometry} +\usepackage{parskip} +\usepackage{minted} +\setminted{fontsize=\footnotesize,xleftmargin=2em,linenos=true,style=perldoc} +\newcommand{\coq}[2]{\inputminted[firstline=#1,lastline=#2]{coq}{expr.v}} +\usepackage{csquotes} + +\title{Proving an expression compiler correct} +\author{Camil Staps} + +\begin{document} + +\maketitle + +\begin{abstract} + We define a simple language \textsf{Exp} for arithmetic expressions with variables. + The semantics are executable as a simple interpreter. + We discuss the implementation of an \textsf{Exp} compiler outputting reverse Polish notation. + The compiler is proven correct with respect to the semantics in Coq. +\end{abstract} + +\section{Introduction} + +The \textsf{Exp} compiler, its semantics, the compiler and the proof + are formalised in Coq 8.6. +The Coq file is attached to this report. +The report walks through the file, but occasionally discusses things out of order when this aids understandability. + +The proof uses well-founded induction, which requires a library. +We also use the \textsf{Omega} library for basic arithmetic, + and the \textsf{ListNotations} module for pretty list notation. + +\coq{1}{2} + +\section{The language} + +Expressions in \textsf{Exp} are literals (\textsf{elit}), variables (\textsf{evar}) and additions of other expressions (\textsf{eadd}). +The \textsf{Exp} language is polymorphic in the type of variables: + for example, an expression of type \textsf{Exp nat} is one in which natural numbers operate as variable names. +For readability, this type variable is often omitted in this report. +Thus, we will discuss \textsf{elit 37} rather than \textsf{elit \_ 37}. + +\coq{4}{7} + +The semantics of the language are defined by \textsf{eval}, which is a straightforward recursive interpreter. +It operates with an environment \textsf{env} which holds the values of variables. + +\coq{9}{14} + +\section{Compilation} + +The compiler is a function from \textsf{Exp v} to \textsf{list (RPNOp v)}; \textsf{RPN} stands for reverse Polish notation operator. +The implicit execution model is a simple stack machine. +Execution of a list of RPN instructions succeeds if after execution with an empty stack exactly one element is on the stack. +This element is the result. + +A literal pushes to the stack, as does a variable (after looking up the value in the environment). +The addition operator pops two values from the stack and pushes their sum. +This leads to a straightforward compiler. + +\coq{16}{19} +\coq{21}{26} + +\section{Evaluation of reverse Polish notation programs} + +To evaluate a \textsf{list (RPNOp v)} under a given environment, + we first define a helper function \textsf{pn\_eval}, + which evaluates instructions in (plain) Polish notation and gives the full stack after execution. +Evaluation of arbitrary lists of (R)PN operators may fail,% + \footnote{An example of a failing (R)PN program is \textsf{[rpnadd]}; addition cannot be applied when the stack is empty.} + so the result type is \textsf{option (list nat)}. + +\coq{42}{51} + +In this definition, we make use of two functions from category theory. +First, \textsf{\textless\$\textgreater} lifts a morphism of type \textsf{a $\rightarrow$ b} to type \textsf{option a $\rightarrow$ option b}, + which is possible because \textsf{option} is a functor. +Functors are well-established in functional programming languages. +Second, \textsf{\textgreater\textgreater=} is the monadic bind, again well-known from functional programming. + +\coq{28}{33} +\coq{35}{40} + +Save for some administration, the correctness proof presented below does not change + when using \textsf{\textless\$\textgreater} and \textsf{\textgreater\textgreater=} + instead of handling \textsf{Some} and \textsf{None} explicitly in \textsf{pn\_eval} with pattern matching. +It is merely an enhancement for readability and modularity. + +The final \textsf{rpn\_eval} function is a simple wrapper around \textsf{pn\_eval}. + +\coq{53}{57} + +\section{Proving correctness} + +We now formally prove that the compiler in combination with the RPN evaluator are correct with respect to the semantics of \textsf{Exp}. +In other words, for all variables types \textsf{v}, expressions \textsf{e} and environments \textsf{env}, + the result of \textsf{rpn\_eval v env (rpn v e)} should be \textsf{Some (eval v env e)}. + +\coq{87}{88} + +To prove this we first prove a lemma, operating on \textsf{pn\_eval} rather than \textsf{rpn\_eval}. +Thus, we will show that the result of \textsf{pn\_eval v env (rev (rpn v e))} is \textsf{Some [eval v env e]}. +However, we will prove something more powerful: that this evaluation does not depend on any further RPN instructions. +Thus, if an arbitrary RPN program \textsf{rest} evaluates to the stack \textsf{results} when starting with an empty stack, + \textsf{pn\_eval v env (rev (rpn v e) ++ rest)} should be \textsf{Some (eval v env e :: results)}. + +\coq{59}{63} + +The proof is by well-founded induction on expressions. +For this to work, we first provide a mapping \textsf{Exp v $\rightarrow$ nat} based on the size of expressions. +Thus, there is some \enquote{ever decreasing score} that can be attributed to expressions, making the induction sound. + +\coq{65}{72} + +After introducing the hypotheses, we make a case distinction over \textsf{e}. +The cases for literals and variables are very similar and can be handled by applying the induction hypothesis. +After this, only the case for addition is left. + +\coq{74}{75} + +We now have an RPN program with \textsf{rpnadd} as the last instruction. +Since the program is reversed for use in \textsf{pn\_eval}, we can rewrite the program to get \textsf{rpnadd} at its head. + +\coq{76}{78} + +We now consider the rest of the program. +Suppose the original expression was \textsf{e = eadd e1 e2}, + we now consider \textsf{pn\_eval v env ((rev (rpn v e1) ++ rev (rpn v e2)) ++ rest)} + in the evaluation of \textsf{rpnadd}. + +Due to associativity of \textsf{++}, we can reanalyse this as + \textsf{pn\_eval v env (rev (rpn v e1) ++ (rev (rpn v e2) ++ rest))}, + which allows for application of the induction hypothesis with \textsf{rev (rpn v e2) ++ rest} as the \enquote{rest of the program}, i.e., the \textsf{rest}. + +\coq{79}{81} + +To be able to use the induction hypothesis on \textsf{e1} we have to show that it is strictly smaller than \textsf{eadd e1 e2}. +This is easily done with \textsf{simpl} and \textsf{omega}. +We then apply the induction hypothesis again for \textsf{e2} with \textsf{rest}. +Here too, \textsf{simpl} and \textsf{omega} make the induction sound. +We carry over the fact that \textsf{rest} results in \textsf{results} from the induction hypothesis using \textsf{assumption}. + +\coq{82}{82} + +The proof of the lemma now follows by applying the assertion of the induction step. + +\coq{83}{85} + +We now return to the proof of the correctness theorem, repeated here: + +\coq{87}{88} + +We outsource this proof to our proof on \textsf{pn\_eval} using \textsf{rest = result = []}. + +\coq{89}{93} + +The result of the call to \textsf{pn\_eval} from \textsf{rpn\_eval} is now known, + and we can simplify the \textsf{rpn\_eval} call further to the point of reflexivity. + +\coq{94}{97} + +\end{document} @@ -1,86 +1,94 @@ -Require Import Omega. -Require Import Wellfounded. +Require Import Omega Wellfounded Coq.Lists.List. +Import ListNotations. -Inductive Exp : Set := - | elit : nat -> Exp - | evar : nat -> Exp - | eadd : Exp -> Exp -> Exp. +Inductive Exp (v : Type) := + | elit : nat -> Exp v + | evar : v -> Exp v + | eadd : Exp v -> Exp v -> Exp v. -Fixpoint eval (env : nat -> nat) (e : Exp) := +Fixpoint eval (v : Type) (env : v -> nat) (e : Exp v) : nat := match e with - | elit n => n - | evar n => env n - | eadd a b => eval env a + eval env b + | elit _ n => n + | evar _ n => env n + | eadd _ a b => eval v env a + eval v env b end. -Inductive RPNOp := - | rpnlit : nat -> RPNOp - | rpnvar : nat -> RPNOp - | rpnadd : RPNOp. +Inductive RPNOp (v : Type) := + | rpnlit : nat -> RPNOp v + | rpnvar : v -> RPNOp v + | rpnadd : RPNOp v. -Fixpoint rpn (e : Exp) := +Fixpoint rpn (v : Type) (e : Exp v) : list (RPNOp v) := match e with - | elit n => cons (rpnlit n) nil - | evar n => cons (rpnvar n) nil - | eadd a b => app (rpn b) (app (rpn a) (cons rpnadd nil)) + | elit _ n => [rpnlit v n] + | evar _ n => [rpnvar v n] + | eadd _ a b => rpn v b ++ rpn v a ++ [rpnadd v] end. -Definition _fmap A B (f : A -> B) (o : option A) := +Definition fmap (a : Type) (b : Type) (f : a -> b) (o : option a) : option b := match o with | Some x => Some (f x) | None => None end. -Notation fmap := (_fmap _ _). +Notation "x <$> y" := (fmap _ _ x y) (at level 65, left associativity). -Fixpoint rpn_eval' (env : nat -> nat) (rpn : list RPNOp) := +Definition bind (a : Type) (b : Type) (f : a -> option b) (o : option a) : option b := + match o with + | Some x => f x + | None => None + end. +Notation "x >>= f" := (bind _ _ f x) (at level 55, left associativity). + +Fixpoint pn_eval (v : Type) (env : v -> nat) (rpn : list (RPNOp v)) : option (list nat) := match rpn with - | nil => Some nil - | cons (rpnlit n) a => fmap (cons n) (rpn_eval' env a) - | cons (rpnvar n) a => fmap (cons (env n)) (rpn_eval' env a) - | cons (rpnadd) a => match rpn_eval' env a with - | Some (cons x (cons y stack)) => Some (cons (x+y) stack) - | _ => None + | [] => Some [] + | rpnlit _ n :: a => cons n <$> pn_eval v env a + | rpnvar _ n :: a => cons (env n) <$> pn_eval v env a + | rpnadd _ :: a => pn_eval v env a >>= fun stack => match stack with + | x :: y :: stack => Some (x+y :: stack) + | _ => None end end. -Function rpn_eval (env : nat -> nat) (rpn : list RPNOp) := - match rpn_eval' env (List.rev rpn) with - | Some (cons x nil) => Some x - | _ => None +Function rpn_eval (v : Type) (env : v -> nat) (rpn : list (RPNOp v)) : option nat := + match pn_eval v env (rev rpn) with + | Some [x] => Some x + | _ => None end. -Fixpoint exprsize e := +Lemma correctness_partial_compilation (v : Type) (e : Exp v) : + forall (env : v -> nat) (rest : list (RPNOp v)) (results : list nat), + Some results = pn_eval v env rest -> + Some (eval v env e :: results) = pn_eval v env (rev (rpn v e) ++ rest). +Proof. + +Fixpoint exprsize (v : Type) (e : Exp v) : nat := match e with - | elit _ => 1 - | evar _ => 1 - | eadd a b => 1 + exprsize a + exprsize b + | elit _ _ => 1 + | evar _ _ => 1 + | eadd _ a b => 1 + exprsize v a + exprsize v b end. - -Lemma correctness_partial_compilation e : - forall env rest results, - Some results = rpn_eval' env rest -> - Some (eval env e :: results)%list = rpn_eval' env (app (List.rev (rpn e)) rest). -Proof. induction e using (well_founded_induction - (wf_inverse_image _ nat _ exprsize PeanoNat.Nat.lt_wf_0)). + (wf_inverse_image _ nat _ (exprsize v) PeanoNat.Nat.lt_wf_0)). + intros. destruct e; simpl; try rewrite <- H0; try reflexivity. -rewrite (List.rev_app_distr (rpn e2)). -rewrite (List.rev_app_distr (rpn e1)). +rewrite (rev_app_distr (rpn v e2)). +rewrite (rev_app_distr (rpn v e1)). simpl. -(* Only eadd is left now *) -rewrite (List.app_assoc_reverse (List.rev (rpn e1)) (List.rev (rpn e2)) rest). -assert (Some (eval env e1 :: eval env e2 :: results)%list - = rpn_eval' env (app (List.rev (rpn e1)) (app (List.rev (rpn e2)) rest))) as recstep. +rewrite (app_assoc_reverse (rev (rpn v e1)) (rev (rpn v e2)) rest). +assert (Some (eval v env e1 :: eval v env e2 :: results) + = pn_eval v env (rev (rpn v e1) ++ rev (rpn v e2) ++ rest)) as recstep. apply H; try apply H; simpl; try assumption; omega. rewrite <- recstep. reflexivity. Qed. -Theorem correctness e env : Some (eval env e) = rpn_eval env (rpn e). +Theorem correctness (v : Type) (e : Exp v) (env : v -> nat) : + Some (eval v env e) = rpn_eval v env (rpn v e). Proof. -assert (Some (eval env e :: nil)%list = rpn_eval' env (List.rev (rpn e))) as outsourcing. -rewrite <- (List.app_nil_r (List.rev (rpn e))). +assert (Some [eval v env e] = pn_eval v env (rev (rpn v e))) as outsourcing. +rewrite <- (app_nil_r (rev (rpn v e))). apply correctness_partial_compilation. simpl. reflexivity. unfold rpn_eval. |