summaryrefslogtreecommitdiff
path: root/thesis/intro.tex
diff options
context:
space:
mode:
Diffstat (limited to 'thesis/intro.tex')
-rw-r--r--thesis/intro.tex154
1 files changed, 149 insertions, 5 deletions
diff --git a/thesis/intro.tex b/thesis/intro.tex
index 9299305..85f13b8 100644
--- a/thesis/intro.tex
+++ b/thesis/intro.tex
@@ -116,23 +116,167 @@ A Clean program consists of a list of rewrite rules, for example:
\begin{minted}{clean}
fac 0 = 1
- fac n = n * (fac (n - 1))
+ fac n = n * fac (n - 1)
Start = fac 4
\end{minted}
Executing a Clean program means rewriting the \clean{Start} rule until no rewriting is possible any more.
The first rewriting steps of the above program are shown in \cref{fig:intro:rewriting}.
-A Clean program is compiled to ABC-code, a platform-independent bytecode for an abstract graph rewriting machine~\parencite{execspecs}.
+\subsubsection{The ABC-machine}
+\label{sec:intro:clean:abc}
+A Clean program is compiled to ABC-code, a platform-independent language for the ABC-machine, an abstract graph rewriting machine~\parencite{execspecs}.
A code generator is used to generate machine code equivalent to the ABC-code for concrete machines.
This, again, is done in several steps, so that only a relatively small part of the compilation process is target-dependent.
-For adapting the ARM code generator to work for Thumb-2 one needs to have only very basic knowledge of the ABC-machine.
-For this reason we will not discuss its internals here ---
+
+The ABC-machine is \enquote{an imperative abstract machine architecture for graph rewriting}~\parencite{execspecs}.
+It consists of three stacks: for arguments (A), basic values (B) and control information like return addresses (C).
+It also has a program store containing the graph rewriting rules that make up the program,
+ a graph store that contains the graph to be rewritten,
+ and several other elements that we need not mention here.
+
+We do not discuss the internals of the ABC-machine in much depth here ---
they have been described in \cite{execspecs} and \cite{m680x0}.
+However, to get a better idea of the working of the ABC-machine,
+ we consider the ABC-code for the factorial example above briefly.
+
+The two rules for \clean{fac 0} and \clean{fac n} are translated to the following ABC-code%
+ \footnote{The actual code generated by the compiler is slightly larger, since it uses some unnecessary instructions.
+ They are removed automatically by the code generator and are not considered here for brevity.}:
+
+\begin{minted}[tabsize=4]{abc}
+ sfac.1
+ eqI_b 0 0 | Is argument on B-stack 0?
+ jmp_true case.1 | Jump to first alternative
+ jmp case.2 | Jump to second alternative
+ case.1 | First alternative (fac 0)
+ pop_b 1 | Pop argument from B-stack
+ pushI 1 | Push result to B-stack
+ rtn | Return
+ case.2 | Second alternative (fac 1)
+ pushI 1 | Push 1 to B-stack
+ push_b 1 | Copy argument to B-stack
+ subI | Subtract on B-stack
+ jsr sfac.1 | Call factorial
+ mulI | Multiply two stack elements
+ rtn | Return
+\end{minted}
+
+The code under \abc{sfac.1} is for the pattern match on the first argument.
+In \abc{case.1}, we handle the case that it is zero.
+We only need to remove it from the stack and push the return value 1.
+In \abc{case.2}, we copy the argument, decrement it and recursively call \abc{sfac.1}.
+After the \abc{jsr} instruction, we will have two elements on the stack:
+ the result of the recursive call and the original argument.
+We can the multiply them and return.
+
+This function uses the B-stack, for basic values, for its arguments and return values, because it uses only integers.
+More complex types would be passed on the A-stack.
+
+The \clean{Start} rule compiles to%
+ \footnote{Again, the code has been shorted insignificantly for brevity.}:
+
+\begin{minted}[tabsize=4]{abc}
+ __fac_Start
+ build _ 0 nStart.2 | Build the Start node
+ jmp _driver | Jump to the RTS
+ nStart.2 | The Start node entry
+ jsr eaStart.2 | Jump to actual code
+ fillI_b 0 0 | Copy result to A-stack
+ pop_b 1 | Pop B-stack
+ rtn | Return
+ eaStart.2 | Actual code
+ pushI 4 | Push 4 to B-stack
+ jmp sfac.1 | Call factorial
+\end{minted}
+
+When the program starts and \abc{__fac_Start} is executed, a node for \clean{Start} is created in the graph store.
+We then jump to the run-time system (discussed in \cref{sec:intro:clean:rts}), where the \abc{_driver} subroutine will make sure that this node is reduced and printed.
+In \abc{nStart.2}, we call \abc{eaStart.2}.
+This subroutine will execute \clean{fac 4} and return the result on the B-stack.
+Since \abc{nStart.2} is expected to return its result on the A-stack, we need to copy it and clean up the B-stack.
+In \abc{eaStart.2}, we only need to set the argument for the factorial, 4, on the stack, and we jump to \abc{sfac.1} to let it do the rest.
+
+\subsubsection{The code generator}
+\label{sec:intro:clean:cg}
+The code generator translates the abstract ABC-code into concrete machine code.
+While the ABC-machine can be implemented in a straightforward manner using macro expansion,
+ Clean's code generator does more than that:
+ it introduces several optimisations, some of which are target-dependent.
+
+The ARM code for the factorial example is as follows%
+ \footnote{Some irrelevant peculiarities have been removed for brevity and clarity.}:
+
+\begin{minted}{ual}
+ sfac_P1:
+ cmp r4,#0 @ Is argument 0?
+ bne case_P2
+ case_P1:
+ mov r4,#1 @ Return 1
+ ldr pc,[sp],#4
+ case_P2:
+ str r4,[sp,#-4]! @ Copy argument
+ add r4,r4,#-1 @ Decrement argument
+ str pc,[sp,#-4]!
+ bl sfac_P1 @ Call factorial
+ ldr r12,[sp],#4 @ Multiply with own argument
+ mul r4,r12,r4
+ ldr pc,[sp],#4
+\end{minted}
+
+We see that the argument is passed in \ual{r4}, and that register is also used for the result.
+In ARM, \ual{r4} through \ual{r0} are used for the top of the B-stack.
+The machine stack pointer \ual{sp} is used as the B-stack pointer.
+For intermediate results, \ual{r12} is used as a scratch register.
+When a reduction has finished, control is passed back to the callee by loading the PC from the stack (\ual{ldr pc,[sp],#4}).
+
+The \clean{Start} rule translates roughly to:
+
+\begin{minted}{ual}
+ ____fac__Start:
+ ldr r12,=nStart_P2 @ Store Start node on the heap
+ str r12,[r10]
+ mov r6,r10 @ Save node pointer on A-stack
+ add r10,r10,#12 @ Reserve heap space
+ b __driver @ Jump to RTS
+ nStart_P2:
+ str r6,[r9],#4 @ Save node entry
+ str pc,[sp,#-4]! @ Save return address
+ bl eaStart_P2 @ Jump to actual code
+ ldr r6,[r9,#-4]! @ Restore node entry
+ ldr r12,=INT+2 @ Make an Int node on A-stack
+ str r12,[r6]
+ str r4,[r6,#4] @ Give that node value r4
+ ldr pc,[sp],#4 @ Return to callee
+ eaStart_P2:
+ mov r4,#4 @ Push 4 to B-stack
+ sfac_P1 @ A jump to sfac_P1 is not
+ @ ... @ needed as it is right below
+\end{minted}
+
+For the A-stack, \ual{r6} through \ual{r8} and \ual{r11} are used, and \ual{r9} is the A-stack pointer.
+Nodes (the graph store of the ABC-machine) are stored on a heap which uses \ual{r10} as a pointer.
+To store the \clean{Start} node on the heap, we need to write its address and then increase \ual{r10} to reserve the space.
+The \ual{__driver} subroutine will reduce the top of the A-stack, so saving the pointer to the \clean{Start} node in \ual{r6}, the top of the A-stack, makes sure that \ual{__driver} jumps to \ual{nStart_P2}.
+There, we do some administration and jump to \ual{eaStart_P2}.
+In that routine, the literal \ual{4} is pushed to the B-stack and we \enquote{jump} to \ual{sfac_P1} ---
+ the code generator optimises this by placing \ual{sfac_P1} right below and removing the branch instruction.
+When \ual{sfac_P1} returns, we continue in \ual{nStart_P2}.
+There we need to copy the result from the B-stack to the A-stack.
+Since the B-stack is untyped (the compiler makes sure that it is safe), while the A-stack is typed,
+ we need to create a node on the heap with the \ual{INT+2} entry address.
+
+Note that ABC instructions like \abc{pushI 4} (push 4 to the B-stack) are translated to ARM code like \ual{mov r4,#4}:
+ the content of the B-stack is not moved down, as one might expect.
+This is possible because the code generator knows that the B-stack will be empty at this point.
+Had it not been empty, but had one element, for example, a \ual{mov r3,r4} instruction would have preceded to move it down.
+The actual ABC-code contains annotations that tell the code generator how many elements the stacks hold, so that these optimisations can be made.
+\subsubsection{The run-time system}
+\label{sec:intro:clean:rts}
After compilation, a Clean program is linked together with the Clean run-time system.
The RTS ensures that that the \clean{Start} node is reduced and printed and takes care of garbage collecting.
-This system is written in C and assembly, so making Clean available on Thumb-2 inherently means adapting the platform-dependent parts of the RTS as well.
+This system is written in Clean, C and assembly, so making Clean available on Thumb-2 inherently means adapting the platform-dependent parts of the RTS as well.
\subsection{A Thumb backend for Clean}
\label{sec:intro:clean-thumb}