From f3a256dfa5a327a2760849338f05673b9c9e70ba Mon Sep 17 00:00:00 2001 From: Camil Staps Date: Sun, 13 May 2018 01:03:33 +0200 Subject: Cleanup expression compiler; write basic report --- expr.tex | 164 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 164 insertions(+) create mode 100644 expr.tex (limited to 'expr.tex') diff --git a/expr.tex b/expr.tex new file mode 100644 index 0000000..df2800a --- /dev/null +++ b/expr.tex @@ -0,0 +1,164 @@ +\documentclass[a4paper]{article} + +\usepackage{geometry} +\usepackage{parskip} +\usepackage{minted} +\setminted{fontsize=\footnotesize,xleftmargin=2em,linenos=true,style=perldoc} +\newcommand{\coq}[2]{\inputminted[firstline=#1,lastline=#2]{coq}{expr.v}} +\usepackage{csquotes} + +\title{Proving an expression compiler correct} +\author{Camil Staps} + +\begin{document} + +\maketitle + +\begin{abstract} + We define a simple language \textsf{Exp} for arithmetic expressions with variables. + The semantics are executable as a simple interpreter. + We discuss the implementation of an \textsf{Exp} compiler outputting reverse Polish notation. + The compiler is proven correct with respect to the semantics in Coq. +\end{abstract} + +\section{Introduction} + +The \textsf{Exp} compiler, its semantics, the compiler and the proof + are formalised in Coq 8.6. +The Coq file is attached to this report. +The report walks through the file, but occasionally discusses things out of order when this aids understandability. + +The proof uses well-founded induction, which requires a library. +We also use the \textsf{Omega} library for basic arithmetic, + and the \textsf{ListNotations} module for pretty list notation. + +\coq{1}{2} + +\section{The language} + +Expressions in \textsf{Exp} are literals (\textsf{elit}), variables (\textsf{evar}) and additions of other expressions (\textsf{eadd}). +The \textsf{Exp} language is polymorphic in the type of variables: + for example, an expression of type \textsf{Exp nat} is one in which natural numbers operate as variable names. +For readability, this type variable is often omitted in this report. +Thus, we will discuss \textsf{elit 37} rather than \textsf{elit \_ 37}. + +\coq{4}{7} + +The semantics of the language are defined by \textsf{eval}, which is a straightforward recursive interpreter. +It operates with an environment \textsf{env} which holds the values of variables. + +\coq{9}{14} + +\section{Compilation} + +The compiler is a function from \textsf{Exp v} to \textsf{list (RPNOp v)}; \textsf{RPN} stands for reverse Polish notation operator. +The implicit execution model is a simple stack machine. +Execution of a list of RPN instructions succeeds if after execution with an empty stack exactly one element is on the stack. +This element is the result. + +A literal pushes to the stack, as does a variable (after looking up the value in the environment). +The addition operator pops two values from the stack and pushes their sum. +This leads to a straightforward compiler. + +\coq{16}{19} +\coq{21}{26} + +\section{Evaluation of reverse Polish notation programs} + +To evaluate a \textsf{list (RPNOp v)} under a given environment, + we first define a helper function \textsf{pn\_eval}, + which evaluates instructions in (plain) Polish notation and gives the full stack after execution. +Evaluation of arbitrary lists of (R)PN operators may fail,% + \footnote{An example of a failing (R)PN program is \textsf{[rpnadd]}; addition cannot be applied when the stack is empty.} + so the result type is \textsf{option (list nat)}. + +\coq{42}{51} + +In this definition, we make use of two functions from category theory. +First, \textsf{\textless\$\textgreater} lifts a morphism of type \textsf{a $\rightarrow$ b} to type \textsf{option a $\rightarrow$ option b}, + which is possible because \textsf{option} is a functor. +Functors are well-established in functional programming languages. +Second, \textsf{\textgreater\textgreater=} is the monadic bind, again well-known from functional programming. + +\coq{28}{33} +\coq{35}{40} + +Save for some administration, the correctness proof presented below does not change + when using \textsf{\textless\$\textgreater} and \textsf{\textgreater\textgreater=} + instead of handling \textsf{Some} and \textsf{None} explicitly in \textsf{pn\_eval} with pattern matching. +It is merely an enhancement for readability and modularity. + +The final \textsf{rpn\_eval} function is a simple wrapper around \textsf{pn\_eval}. + +\coq{53}{57} + +\section{Proving correctness} + +We now formally prove that the compiler in combination with the RPN evaluator are correct with respect to the semantics of \textsf{Exp}. +In other words, for all variables types \textsf{v}, expressions \textsf{e} and environments \textsf{env}, + the result of \textsf{rpn\_eval v env (rpn v e)} should be \textsf{Some (eval v env e)}. + +\coq{87}{88} + +To prove this we first prove a lemma, operating on \textsf{pn\_eval} rather than \textsf{rpn\_eval}. +Thus, we will show that the result of \textsf{pn\_eval v env (rev (rpn v e))} is \textsf{Some [eval v env e]}. +However, we will prove something more powerful: that this evaluation does not depend on any further RPN instructions. +Thus, if an arbitrary RPN program \textsf{rest} evaluates to the stack \textsf{results} when starting with an empty stack, + \textsf{pn\_eval v env (rev (rpn v e) ++ rest)} should be \textsf{Some (eval v env e :: results)}. + +\coq{59}{63} + +The proof is by well-founded induction on expressions. +For this to work, we first provide a mapping \textsf{Exp v $\rightarrow$ nat} based on the size of expressions. +Thus, there is some \enquote{ever decreasing score} that can be attributed to expressions, making the induction sound. + +\coq{65}{72} + +After introducing the hypotheses, we make a case distinction over \textsf{e}. +The cases for literals and variables are very similar and can be handled by applying the induction hypothesis. +After this, only the case for addition is left. + +\coq{74}{75} + +We now have an RPN program with \textsf{rpnadd} as the last instruction. +Since the program is reversed for use in \textsf{pn\_eval}, we can rewrite the program to get \textsf{rpnadd} at its head. + +\coq{76}{78} + +We now consider the rest of the program. +Suppose the original expression was \textsf{e = eadd e1 e2}, + we now consider \textsf{pn\_eval v env ((rev (rpn v e1) ++ rev (rpn v e2)) ++ rest)} + in the evaluation of \textsf{rpnadd}. + +Due to associativity of \textsf{++}, we can reanalyse this as + \textsf{pn\_eval v env (rev (rpn v e1) ++ (rev (rpn v e2) ++ rest))}, + which allows for application of the induction hypothesis with \textsf{rev (rpn v e2) ++ rest} as the \enquote{rest of the program}, i.e., the \textsf{rest}. + +\coq{79}{81} + +To be able to use the induction hypothesis on \textsf{e1} we have to show that it is strictly smaller than \textsf{eadd e1 e2}. +This is easily done with \textsf{simpl} and \textsf{omega}. +We then apply the induction hypothesis again for \textsf{e2} with \textsf{rest}. +Here too, \textsf{simpl} and \textsf{omega} make the induction sound. +We carry over the fact that \textsf{rest} results in \textsf{results} from the induction hypothesis using \textsf{assumption}. + +\coq{82}{82} + +The proof of the lemma now follows by applying the assertion of the induction step. + +\coq{83}{85} + +We now return to the proof of the correctness theorem, repeated here: + +\coq{87}{88} + +We outsource this proof to our proof on \textsf{pn\_eval} using \textsf{rest = result = []}. + +\coq{89}{93} + +The result of the call to \textsf{pn\_eval} from \textsf{rpn\_eval} is now known, + and we can simplify the \textsf{rpn\_eval} call further to the point of reflexivity. + +\coq{94}{97} + +\end{document} -- cgit v1.2.3