Cleanup expression compiler; write basic report

author: Camil Staps 2018-05-13 01:03:33 +0200
committer: Camil Staps 2018-05-13 01:03:33 +0200
commit: f3a256dfa5a327a2760849338f05673b9c9e70ba (patch)
tree: f2507cecc903dd3a3c82857691175f392817e31a /expr.tex
parent: Fake true *reverse* polish notation (diff)
1 files changed, 164 insertions, 0 deletions
diff --git a/expr.tex b/expr.tex
new file mode 100644
index 0000000..df2800a
--- /dev/null
+++ b/expr.tex
@@ -0,0 +1,164 @@
+\documentclass[a4paper]{article}
+
+\usepackage{geometry}
+\usepackage{parskip}
+\usepackage{minted}
+\setminted{fontsize=\footnotesize,xleftmargin=2em,linenos=true,style=perldoc}
+\newcommand{\coq}[2]{\inputminted[firstline=#1,lastline=#2]{coq}{expr.v}}
+\usepackage{csquotes}
+
+\title{Proving an expression compiler correct}
+\author{Camil Staps}
+
+\begin{document}
+
+\maketitle
+
+\begin{abstract}
+	We define a simple language \textsf{Exp} for arithmetic expressions with variables.
+	The semantics are executable as a simple interpreter.
+	We discuss the implementation of an \textsf{Exp} compiler outputting reverse Polish notation.
+	The compiler is proven correct with respect to the semantics in Coq.
+\end{abstract}
+
+\section{Introduction}
+
+The \textsf{Exp} compiler, its semantics, the compiler and the proof
+	are formalised in Coq 8.6.
+The Coq file is attached to this report.
+The report walks through the file, but occasionally discusses things out of order when this aids understandability.
+
+The proof uses well-founded induction, which requires a library.
+We also use the \textsf{Omega} library for basic arithmetic,
+	and the \textsf{ListNotations} module for pretty list notation.
+
+\coq{1}{2}
+
+\section{The language}
+
+Expressions in \textsf{Exp} are literals (\textsf{elit}), variables (\textsf{evar}) and additions of other expressions (\textsf{eadd}).
+The \textsf{Exp} language is polymorphic in the type of variables:
+	for example, an expression of type \textsf{Exp nat} is one in which natural numbers operate as variable names.
+For readability, this type variable is often omitted in this report.
+Thus, we will discuss \textsf{elit 37} rather than \textsf{elit \_ 37}.
+
+\coq{4}{7}
+
+The semantics of the language are defined by \textsf{eval}, which is a straightforward recursive interpreter.
+It operates with an environment \textsf{env} which holds the values of variables.
+
+\coq{9}{14}
+
+\section{Compilation}
+
+The compiler is a function from \textsf{Exp v} to \textsf{list (RPNOp v)}; \textsf{RPN} stands for reverse Polish notation operator.
+The implicit execution model is a simple stack machine.
+Execution of a list of RPN instructions succeeds if after execution with an empty stack exactly one element is on the stack.
+This element is the result.
+
+A literal pushes to the stack, as does a variable (after looking up the value in the environment).
+The addition operator pops two values from the stack and pushes their sum.
+This leads to a straightforward compiler.
+
+\coq{16}{19}
+\coq{21}{26}
+
+\section{Evaluation of reverse Polish notation programs}
+
+To evaluate a \textsf{list (RPNOp v)} under a given environment,
+	we first define a helper function \textsf{pn\_eval},
+	which evaluates instructions in (plain) Polish notation and gives the full stack after execution.
+Evaluation of arbitrary lists of (R)PN operators may fail,%
+	\footnote{An example of a failing (R)PN program is \textsf{[rpnadd]}; addition cannot be applied when the stack is empty.}
+	so the result type is \textsf{option (list nat)}.
+
+\coq{42}{51}
+
+In this definition, we make use of two functions from category theory.
+First, \textsf{\textless\$\textgreater} lifts a morphism of type \textsf{a $\rightarrow$ b} to type \textsf{option a $\rightarrow$ option b},
+	which is possible because \textsf{option} is a functor.
+Functors are well-established in functional programming languages.
+Second, \textsf{\textgreater\textgreater=} is the monadic bind, again well-known from functional programming.
+
+\coq{28}{33}
+\coq{35}{40}
+
+Save for some administration, the correctness proof presented below does not change
+	when using \textsf{\textless\$\textgreater} and \textsf{\textgreater\textgreater=}
+	instead of handling \textsf{Some} and \textsf{None} explicitly in \textsf{pn\_eval} with pattern matching.
+It is merely an enhancement for readability and modularity.
+
+The final \textsf{rpn\_eval} function is a simple wrapper around \textsf{pn\_eval}.
+
+\coq{53}{57}
+
+\section{Proving correctness}
+
+We now formally prove that the compiler in combination with the RPN evaluator are correct with respect to the semantics of \textsf{Exp}.
+In other words, for all variables types \textsf{v}, expressions \textsf{e} and environments \textsf{env},
+	the result of \textsf{rpn\_eval v env (rpn v e)} should be \textsf{Some (eval v env e)}.
+
+\coq{87}{88}
+
+To prove this we first prove a lemma, operating on \textsf{pn\_eval} rather than \textsf{rpn\_eval}.
+Thus, we will show that the result of \textsf{pn\_eval v env (rev (rpn v e))} is \textsf{Some [eval v env e]}.
+However, we will prove something more powerful: that this evaluation does not depend on any further RPN instructions.
+Thus, if an arbitrary RPN program \textsf{rest} evaluates to the stack \textsf{results} when starting with an empty stack,
+	\textsf{pn\_eval v env (rev (rpn v e) ++ rest)} should be \textsf{Some (eval v env e :: results)}.
+
+\coq{59}{63}
+
+The proof is by well-founded induction on expressions.
+For this to work, we first provide a mapping \textsf{Exp v $\rightarrow$ nat} based on the size of expressions.
+Thus, there is some \enquote{ever decreasing score} that can be attributed to expressions, making the induction sound.
+
+\coq{65}{72}
+
+After introducing the hypotheses, we make a case distinction over \textsf{e}.
+The cases for literals and variables are very similar and can be handled by applying the induction hypothesis.
+After this, only the case for addition is left.
+
+\coq{74}{75}
+
+We now have an RPN program with \textsf{rpnadd} as the last instruction.
+Since the program is reversed for use in \textsf{pn\_eval}, we can rewrite the program to get \textsf{rpnadd} at its head.
+
+\coq{76}{78}
+
+We now consider the rest of the program.
+Suppose the original expression was \textsf{e = eadd e1 e2},
+	we now consider \textsf{pn\_eval v env ((rev (rpn v e1) ++ rev (rpn v e2)) ++ rest)}
+	in the evaluation of \textsf{rpnadd}.
+
+Due to associativity of \textsf{++}, we can reanalyse this as
+	\textsf{pn\_eval v env (rev (rpn v e1) ++ (rev (rpn v e2) ++ rest))},
+	which allows for application of the induction hypothesis with \textsf{rev (rpn v e2) ++ rest} as the \enquote{rest of the program}, i.e., the \textsf{rest}.
+
+\coq{79}{81}
+
+To be able to use the induction hypothesis on \textsf{e1} we have to show that it is strictly smaller than \textsf{eadd e1 e2}.
+This is easily done with \textsf{simpl} and \textsf{omega}.
+We then apply the induction hypothesis again for \textsf{e2} with \textsf{rest}.
+Here too, \textsf{simpl} and \textsf{omega} make the induction sound.
+We carry over the fact that \textsf{rest} results in \textsf{results} from the induction hypothesis using \textsf{assumption}.
+
+\coq{82}{82}
+
+The proof of the lemma now follows by applying the assertion of the induction step.
+
+\coq{83}{85}
+
+We now return to the proof of the correctness theorem, repeated here:
+
+\coq{87}{88}
+
+We outsource this proof to our proof on \textsf{pn\_eval} using \textsf{rest = result = []}.
+
+\coq{89}{93}
+
+The result of the call to \textsf{pn\_eval} from \textsf{rpn\_eval} is now known,
+	and we can simplify the \textsf{rpn\_eval} call further to the point of reflexivity.
+
+\coq{94}{97}
+
+\end{document}
author	Camil Staps	2018-05-13 01:03:33 +0200
committer	Camil Staps	2018-05-13 01:03:33 +0200
commit	f3a256dfa5a327a2760849338f05673b9c9e70ba (patch)
tree	f2507cecc903dd3a3c82857691175f392817e31a /expr.tex
parent	Fake true reverse polish notation (diff)