8 files changed, 218 insertions, 55 deletions
diff --git a/thesis/Makefile b/thesis/Makefile
index f8cec89..3dc4522 100644
--- a/thesis/Makefile
+++ b/thesis/Makefile
@@ -26,5 +26,7 @@ all: $(DOC).pdf
 	$(RM) $(basename $<).mlog
 
 clean:
-	$(RM) $(addprefix $(DOC).,aux bbl bcf blg fmt log mlog run.xml out pdf toc)\
-		$(DOC)-blx.bib
+	$(RM) -r\
+		$(addprefix $(DOC).,aux bbl bcf blg fmt log mlog run.xml out pdf toc)\
+		$(DOC)-blx.bib\
+		_minted-$(DOC)
diff --git a/thesis/intro.tex b/thesis/intro.tex
index 9299305..85f13b8 100644
--- a/thesis/intro.tex
+++ b/thesis/intro.tex
@@ -116,23 +116,167 @@ A Clean program consists of a list of rewrite rules, for example:
 
 \begin{minted}{clean}
 	fac 0 = 1
-	fac n = n * (fac (n - 1))
+	fac n = n * fac (n - 1)
 	Start = fac 4
 \end{minted}
 
 Executing a Clean program means rewriting the \clean{Start} rule until no rewriting is possible any more.
 The first rewriting steps of the above program are shown in \cref{fig:intro:rewriting}.
 
-A Clean program is compiled to ABC-code, a platform-independent bytecode for an abstract graph rewriting machine~\parencite{execspecs}.
+\subsubsection{The ABC-machine}
+\label{sec:intro:clean:abc}
+A Clean program is compiled to ABC-code, a platform-independent language for the ABC-machine, an abstract graph rewriting machine~\parencite{execspecs}.
 A code generator is used to generate machine code equivalent to the ABC-code for concrete machines.
 This, again, is done in several steps, so that only a relatively small part of the compilation process is target-dependent.
-For adapting the ARM code generator to work for Thumb-2 one needs to have only very basic knowledge of the ABC-machine.
-For this reason we will not discuss its internals here ---
+
+The ABC-machine is \enquote{an imperative abstract machine architecture for graph rewriting}~\parencite{execspecs}.
+It consists of three stacks: for arguments (A), basic values (B) and control information like return addresses (C).
+It also has a program store containing the graph rewriting rules that make up the program,
+	a graph store that contains the graph to be rewritten,
+	and several other elements that we need not mention here.
+
+We do not discuss the internals of the ABC-machine in much depth here ---
 	they have been described in \cite{execspecs} and \cite{m680x0}.
+However, to get a better idea of the working of the ABC-machine,
+	we consider the ABC-code for the factorial example above briefly.
+
+The two rules for \clean{fac 0} and \clean{fac n} are translated to the following ABC-code%
+	\footnote{The actual code generated by the compiler is slightly larger, since it uses some unnecessary instructions.
+		They are removed automatically by the code generator and are not considered here for brevity.}:
+
+\begin{minted}[tabsize=4]{abc}
+	sfac.1
+		eqI_b 0 0       | Is argument on B-stack 0?
+		jmp_true case.1 | Jump to first alternative
+		jmp case.2      | Jump to second alternative
+	case.1              | First alternative (fac 0)
+		pop_b 1         | Pop argument from B-stack
+		pushI 1         | Push result to B-stack
+		rtn             | Return
+	case.2              | Second alternative (fac 1)
+		pushI 1         | Push 1 to B-stack
+		push_b 1        | Copy argument to B-stack
+		subI            | Subtract on B-stack
+		jsr sfac.1      | Call factorial
+		mulI            | Multiply two stack elements
+		rtn             | Return
+\end{minted}
+
+The code under \abc{sfac.1} is for the pattern match on the first argument.
+In \abc{case.1}, we handle the case that it is zero.
+We only need to remove it from the stack and push the return value 1.
+In \abc{case.2}, we copy the argument, decrement it and recursively call \abc{sfac.1}.
+After the \abc{jsr} instruction, we will have two elements on the stack:
+	the result of the recursive call and the original argument.
+We can the multiply them and return.
+
+This function uses the B-stack, for basic values, for its arguments and return values, because it uses only integers.
+More complex types would be passed on the A-stack.
+
+The \clean{Start} rule compiles to%
+	\footnote{Again, the code has been shorted insignificantly for brevity.}:
+
+\begin{minted}[tabsize=4]{abc}
+	__fac_Start
+		build _ 0 nStart.2 | Build the Start node
+		jmp _driver        | Jump to the RTS
+	nStart.2               | The Start node entry
+		jsr eaStart.2      | Jump to actual code
+		fillI_b 0 0        | Copy result to A-stack
+		pop_b 1            | Pop B-stack
+		rtn                | Return
+	eaStart.2              | Actual code
+		pushI 4            | Push 4 to B-stack
+		jmp sfac.1         | Call factorial
+\end{minted}
+
+When the program starts and \abc{__fac_Start} is executed, a node for \clean{Start} is created in the graph store.
+We then jump to the run-time system (discussed in \cref{sec:intro:clean:rts}), where the \abc{_driver} subroutine will make sure that this node is reduced and printed.
+In \abc{nStart.2}, we call \abc{eaStart.2}.
+This subroutine will execute \clean{fac 4} and return the result on the B-stack.
+Since \abc{nStart.2} is expected to return its result on the A-stack, we need to copy it and clean up the B-stack.
+In \abc{eaStart.2}, we only need to set the argument for the factorial, 4, on the stack, and we jump to \abc{sfac.1} to let it do the rest.
+
+\subsubsection{The code generator}
+\label{sec:intro:clean:cg}
+The code generator translates the abstract ABC-code into concrete machine code.
+While the ABC-machine can be implemented in a straightforward manner using macro expansion,
+	Clean's code generator does more than that:
+	it introduces several optimisations, some of which are target-dependent.
+
+The ARM code for the factorial example is as follows%
+	\footnote{Some irrelevant peculiarities have been removed for brevity and clarity.}:
+
+\begin{minted}{ual}
+	sfac_P1:
+	  cmp  r4,#0        @ Is argument 0?
+	  bne  case_P2
+	case_P1:
+	  mov  r4,#1        @ Return 1
+	  ldr  pc,[sp],#4
+	case_P2:
+	  str  r4,[sp,#-4]! @ Copy argument
+	  add  r4,r4,#-1    @ Decrement argument
+	  str  pc,[sp,#-4]!
+	  bl   sfac_P1      @ Call factorial
+	  ldr  r12,[sp],#4  @ Multiply with own argument
+	  mul  r4,r12,r4
+	  ldr  pc,[sp],#4
+\end{minted}
+
+We see that the argument is passed in \ual{r4}, and that register is also used for the result.
+In ARM, \ual{r4} through \ual{r0} are used for the top of the B-stack.
+The machine stack pointer \ual{sp} is used as the B-stack pointer.
+For intermediate results, \ual{r12} is used as a scratch register.
+When a reduction has finished, control is passed back to the callee by loading the PC from the stack (\ual{ldr pc,[sp],#4}).
+
+The \clean{Start} rule translates roughly to:
+
+\begin{minted}{ual}
+	____fac__Start:
+	  ldr  r12,=nStart_P2 @ Store Start node on the heap
+	  str  r12,[r10]
+	  mov  r6,r10         @ Save node pointer on A-stack
+	  add  r10,r10,#12    @ Reserve heap space
+	  b    __driver       @ Jump to RTS
+	nStart_P2:
+	  str  r6,[r9],#4     @ Save node entry
+	  str  pc,[sp,#-4]!   @ Save return address
+	  bl   eaStart_P2     @ Jump to actual code
+	  ldr  r6,[r9,#-4]!   @ Restore node entry
+	  ldr  r12,=INT+2     @ Make an Int node on A-stack
+	  str  r12,[r6]
+	  str  r4,[r6,#4]     @ Give that node value r4
+	  ldr  pc,[sp],#4     @ Return to callee
+	eaStart_P2:
+	  mov  r4,#4          @ Push 4 to B-stack
+	sfac_P1               @ A jump to sfac_P1 is not
+	  @ ...               @ needed as it is right below
+\end{minted}
+
+For the A-stack, \ual{r6} through \ual{r8} and \ual{r11} are used, and \ual{r9} is the A-stack pointer.
+Nodes (the graph store of the ABC-machine) are stored on a heap which uses \ual{r10} as a pointer.
+To store the \clean{Start} node on the heap, we need to write its address and then increase \ual{r10} to reserve the space.
+The \ual{__driver} subroutine will reduce the top of the A-stack, so saving the pointer to the \clean{Start} node in \ual{r6}, the top of the A-stack, makes sure that \ual{__driver} jumps to \ual{nStart_P2}.
+There, we do some administration and jump to \ual{eaStart_P2}.
+In that routine, the literal \ual{4} is pushed to the B-stack and we \enquote{jump} to \ual{sfac_P1} ---
+	the code generator optimises this by placing \ual{sfac_P1} right below and removing the branch instruction.
+When \ual{sfac_P1} returns, we continue in \ual{nStart_P2}.
+There we need to copy the result from the B-stack to the A-stack.
+Since the B-stack is untyped (the compiler makes sure that it is safe), while the A-stack is typed,
+	we need to create a node on the heap with the \ual{INT+2} entry address.
+
+Note that ABC instructions like \abc{pushI 4} (push 4 to the B-stack) are translated to ARM code like \ual{mov r4,#4}:
+	the content of the B-stack is not moved down, as one might expect.
+This is possible because the code generator knows that the B-stack will be empty at this point.
+Had it not been empty, but had one element, for example, a \ual{mov r3,r4} instruction would have preceded to move it down.
+The actual ABC-code contains annotations that tell the code generator how many elements the stacks hold, so that these optimisations can be made.
 
+\subsubsection{The run-time system}
+\label{sec:intro:clean:rts}
 After compilation, a Clean program is linked together with the Clean run-time system.
 The RTS ensures that that the \clean{Start} node is reduced and printed and takes care of garbage collecting.
-This system is written in C and assembly, so making Clean available on Thumb-2 inherently means adapting the platform-dependent parts of the RTS as well.
+This system is written in Clean, C and assembly, so making Clean available on Thumb-2 inherently means adapting the platform-dependent parts of the RTS as well.
 
 \subsection{A Thumb backend for Clean}
 \label{sec:intro:clean-thumb}
diff --git a/thesis/preamble.tex b/thesis/preamble.tex
index 636d2ad..ae16a89 100644
--- a/thesis/preamble.tex
+++ b/thesis/preamble.tex
@@ -1,6 +1,13 @@
 \documentclass[a4paper,twoside]{article}
 
-\usepackage[hmarginratio={3:2}]{geometry}
+\newif\ifdraft
+\drafttrue
+
+\ifdraft
+	\usepackage[inner=2cm,outer=2cm,top=2cm,bottom=25mm]{geometry}
+\else
+	\usepackage[hmarginratio={3:2}]{geometry}
+\fi
 \usepackage{multicol}
 
 \usepackage[T1]{fontenc}
@@ -30,27 +37,16 @@
 \usepackage{latexgit}
 
 \usepackage{minted}
-\usepackage{mdframed}
-% Need to patch mdframed, see http://tex.stackexchange.com/a/292090/23992
-\usepackage{xpatch}
-\makeatletter
-\xpatchcmd{\endmdframed}
-  {\aftergroup\endmdf@trivlist\color@endgroup}
-  {\endmdf@trivlist\color@endgroup\@doendpe}
-  {}{}
-\makeatother
 \definecolor{mintedbg}{rgb}{0.95,0.95,0.95}
-\surroundwithmdframed[%
-	backgroundcolor=mintedbg,
-	linewidth=0pt,
-	skipabove=.5em,skipbelow=.5em]{minted}
 \setminted{%
 	autogobble,
 	tabsize=4,
 	fontsize=\scriptsize,
-	style=lovelace
-	%bgcolor=mintedbg
+	style=lovelace,
+	bgcolor=mintedbg
 }
+\BeforeBeginEnvironment{minted}{\noindent\begin{minipage}{\linewidth}}
+\AfterEndEnvironment{minted}{\end{minipage}}
 \setmintedinline{fontsize=\footnotesize,bgcolor={}}
 \usepackage{ual}
 \usepackage{clean}
@@ -74,7 +70,7 @@
 \makeatother
 
 \let\oldsection\section
-\renewcommand{\section}{\cleardoublepage\oldsection}
+\renewcommand{\section}{\ifdraft\else\clearpage\fi\oldsection}
 
 \raggedcolumns
 \renewcommand{\baselinestretch}{1.1} 
diff --git a/thesis/results.tex b/thesis/results.tex
index 8ffb2a7..cfd043d 100644
--- a/thesis/results.tex
+++ b/thesis/results.tex
@@ -5,8 +5,10 @@
 
 \subsection{Code size}
 \label{sec:results:size}
+\todo{\dots}
 
 \subsection{Running time}
 \label{sec:results:time}
+\todo{\dots}
 
 \end{multicols}
diff --git a/thesis/storing-pc.tex b/thesis/storing-pc.tex
index b642ee9..e08fb4d 100644
--- a/thesis/storing-pc.tex
+++ b/thesis/storing-pc.tex
@@ -65,19 +65,19 @@ Since \clean{isEmpty} pattern matches on its first argument, it needs to be eval
 This is done with \abc{jsr_eval 0}.
 Only after that can we check if its constructor is \abc{_Nil} (the empty list), which is done with \abc{eq_desc _Nil 0 0}.
 
-\begin{minted}[tabsize=4]{text}
+\begin{minted}[tabsize=4]{abc}
 	sisEmpty.1
-		jsr_eval 0        || Evaluate argument
-		eq_desc _Nil 0 0  || If it equals []
-		jmp_true case.1   || .. jump to case.1
-		jmp case.2        || [else] to case.2
+		jsr_eval 0        | Evaluate argument
+		eq_desc _Nil 0 0  | If it equals []
+		jmp_true case.1   | .. jump to case.1
+		jmp case.2        | [else] to case.2
 	case.1
-		pop_a 1           || Pop argument
-		pushB TRUE        || Return True
+		pop_a 1           | Pop argument
+		pushB TRUE        | Return True
 		rtn
 	case.2
-		pop_a 1           || Pop argument
-		pushB FALSE       || Return False
+		pop_a 1           | Pop argument
+		pushB FALSE       | Return False
 		rtn
 \end{minted}
 
diff --git a/thesis/thesis.bib b/thesis/thesis.bib
index f7e5134..30f945c 100644
--- a/thesis/thesis.bib
+++ b/thesis/thesis.bib
@@ -1,24 +1,21 @@
 @manual{armv7ar,
 	label="ARM Ltd.",
-	key="ARM Ltd.",
+	key="ARM Ltd. 1996",
 	title="ARM Architecture Reference Manual. ARMv7-A and ARMv7-R edition",
-	organization="ARM Ltd.",
 	year=1996
 }
 
 @manual{armv7m,
 	label="ARM Ltd.",
-	key="ARM Ltd.",
+	key="ARM Ltd. 2006",
 	title="ARMv7-M Architecture Reference Manual",
-	organization="ARM Ltd.",
 	year=2006
 }
 
 @manual{armv8a,
 	label="ARM Ltd.",
-	key="ARM Ltd.",
+	key="ARM Ltd. 2013",
 	title="ARM Architecture Reference Manual. ARMv8, for ARMv8-A architecture profile (Beta)",
-	organization="ARM Ltd.",
 	year=2013
 }
 
diff --git a/thesis/thesis.tex b/thesis/thesis.tex
index d2f1e06..8760159 100644
--- a/thesis/thesis.tex
+++ b/thesis/thesis.tex
@@ -1,19 +1,25 @@
 %&thesis
 \begin{document}
 
-\maketitleru[
-	course={Bachelor Thesis\\[.4em]Computer Science},
-	authorstext={Author:},
-	authors={Camil Staps},
-	righttextheader={First supervisor:},
-	righttext={prof.~dr.~dr.h.c.~ir.~M.J.~Plasmeijer},
-	righttextBheader={Second supervisor:},
-	righttextB={drs.~J.H.G.~van~Groningen},
-	pagenr=1]
-
-\setcounter{page}{2}
+\ifdraft
+	\begin{center}
+		{\Large Bachelor Thesis}
 
-\cleardoublepage
+		\textcolor{red}{\emph{This is a draft version. The layout has been changed to save paper.}}
+	\end{center}
+\else
+	\maketitleru[
+		course={Bachelor Thesis\\[.4em]Computer Science},
+		authorstext={Author:},
+		authors={Camil Staps},
+		righttextheader={First supervisor:},
+		righttext={prof.~dr.~dr.h.c.~ir.~M.J.~Plasmeijer},
+		righttextBheader={Second supervisor:},
+		righttextB={drs.~J.H.G.~van~Groningen},
+		pagenr=1]
+	\setcounter{page}{2}
+	\cleardoublepage
+\fi
 
 \begin{abstract}
 The Thumb-2 instruction set combines the best features of the ARM and Thumb instruction sets (speed and small code size, respectively).
@@ -23,26 +29,41 @@ Specifically, we look at code generation for the purely functional programming l
 \todo{results, efficiency, etc.}
 \end{abstract}
 
+\ifdraft\else
 \cleardoublepage
+\fi
 
-\thispagestyle{empty}
-\tableofcontents
+\ifdraft
+	\begin{multicols}{2}
+		\tableofcontents
+	\end{multicols}
+\else
+	\thispagestyle{empty}
+	\tableofcontents
+\fi
 
+\ifdraft\else
 \cleardoublepage
+\fi
 
 \input{intro}
 \input{storing-pc}
 \input{two-bits}
 \input{results}
 
+\ifdraft\else
 \cleardoublepage
+\fi
 \appendix
 \input{abbr}
 
+\ifdraft\else
 \cleardoublepage
+\fi
 
 \let\oldurl\url
 \renewcommand{\url}[1]{{\small\oldurl{#1}}}
+\phantomsection\addcontentsline{toc}{section}{\bibname}
 \printbibliography
 
 \end{document}
diff --git a/thesis/two-bits.tex b/thesis/two-bits.tex
index fcd4046..56e7c45 100644
--- a/thesis/two-bits.tex
+++ b/thesis/two-bits.tex
@@ -1,4 +1,4 @@
-\section{Two bits}
+\section{Code addresses}
 \label{sec:two-bits}
 
 \begin{multicols}{2}
@@ -44,7 +44,8 @@ Flipping the meaning of the LSB in the copying collector fixes this issue.
 
 \subsection{Comparison}
 \label{sec:two-bits:comparison}
-Flipping the meaning of the LSB in the garbage collector amounts to swapping \ual{bne} and \ual{beq} and similar changes that do not effect the program's efficiency.
+Flipping the meaning of the LSB in the garbage collector amounts to swapping \ual{bne} and \ual{beq} and similar changes that do not effect the program's efficiency or size.
+
 By word-aligning all node entry addresses we lose one alignment byte per node entry address on average
 	(assuming that half of the node entry points are word-aligned already).
 This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller.