summaryrefslogtreecommitdiff
path: root/thesis/storing-pc.tex
diff options
context:
space:
mode:
Diffstat (limited to 'thesis/storing-pc.tex')
-rw-r--r--thesis/storing-pc.tex150
1 files changed, 150 insertions, 0 deletions
diff --git a/thesis/storing-pc.tex b/thesis/storing-pc.tex
new file mode 100644
index 0000000..98b03ae
--- /dev/null
+++ b/thesis/storing-pc.tex
@@ -0,0 +1,150 @@
+\section{Storing the program counter}
+\label{sec:storing-pc}
+
+\begin{multicols}{2}
+
+\subsection{Introduction}
+\label{sec:storing-pc:intro}
+Storing the program counter on the stack is something done commonly in many languages during a function call.
+Usually, an offset to the actual program counter at the moment of the store instruction is stored,
+ to allow for a branch after that instruction, and have the callee return to the address after that branch.
+The ARM architecture accommodates for this: an ARM instruction that reads the program counter, actually reads
+ \enquote{the address of the current instruction plus 8}~\parencite[A2.3]{armv7ar}.
+The following ARM assembly example illustrates this (\texttt{armstartup.s:540-2},~\cite{armrts}):
+
+\begin{minted}{ual}
+ str pc,[sp,#-4]! @ 0x20
+ bl init_clean @ 0x24
+ tst r4,r4 @ 0x28
+\end{minted}
+
+Dummy addresses have been indicated in comments.
+When execution arrives at \ual{0x20}, the program counter is set to \ual{0x20}.
+Per the above documentation, \ual{str} stores \ual{0x20+8} on the stack
+ (to be precise: to the address indicated by the stack pointer with pre-indexed offset \ual{-4}).
+The processor branches to \ual{init_clean},
+ which loads the value from the stack back into the program counter to return to the caller.
+The program counter is then \ual{0x28}.
+For the next instruction cycle, the \ual{tst} command is executed.
+
+There are two reasons why the above cannot be used in Thumb-2 code.
+First, \ual{pc} is not allowed as the first operand of a \ual{str} instruction.
+
+The second problem we meet is that the instruction to store the program counter may be halfword-aligned rather than word-aligned.
+We saw above that a read of the program counter in ARM mode reads as PC+8.
+In Thumb mode this is more complicated.
+In this case, we
+ \enquote{[r]ead the word-aligned PC value, that is,
+ the address of the current instruction + 4,
+ with bits [1:0] forced to zero}~\parencite[A5.1.2]{armv7m}.
+This means that when the \ual{add} instruction above is at \ual{0x22},
+ we will still store \ual{0x2d} on the stack,
+ since the word-aligned program counter is \ual{0x20} as before.
+However, in this case \ual{bl} is located at \ual{0x2a}, and since this is a 32 bits instruction we point to the middle of that instruction.
+
+\subsection{Usage in Clean}
+\label{sec:storing-pc:clean}
+Storing the PC is required for jumping to subroutines.
+We see a clear example of this when a Clean function definition uses pattern matching.
+Consider the following example:
+
+\begin{minted}{clean}
+ isEmpty :: [a] -> Bool
+ isEmpty [] = True
+ isEmpty _ = False
+\end{minted}
+
+The generated ABC-code looks as below%
+ \footnote{The current Clean compiler misses the \abc{jsr_eval 0} line:
+ the strictness analyser recognises that \clean{isEmpty} is strict in its first argument,
+ so evaluating the argument to head normal form is not needed any more.
+ In cases where the strictness analyser cannot derive strictness,
+ code similar to this example is generated.
+ The code given here can be reproduced with \bash{clm -nsa}.}.
+Since \clean{isEmpty} pattern matches on its first argument, it needs to be evaluated to head normal form.
+This is done with \abc{jsr_eval 0}.
+Only after that can we check if its constructor is \abc{_Nil} (the empty list), which is done with \abc{eq_desc _Nil 0 0}.
+
+\begin{minted}[tabsize=4]{text}
+ sisEmpty.1
+ jsr_eval 0 || Evaluate argument
+ eq_desc _Nil 0 0 || If it equals []
+ jmp_true case.1 || .. jump to case.1
+ jmp case.2 || [else] to case.2
+ case.1
+ pop_a 1 || Pop argument
+ pushB TRUE || Return True
+ rtn
+ case.2
+ pop_a 1 || Pop argument
+ pushB FALSE || Return False
+ rtn
+\end{minted}
+
+Evaluating the argument is done by jumping to the subroutine indicated by the node entry of that argument.
+Hence, we store the PC, jump to that address, and continue with \abc{eq_desc _Nil 0 0} when the node has been evaluated to head normal form.
+The ARM code for pattern matching is:
+
+\begin{minted}{ual}
+ sisEmpty_P1:
+ ldr r12,[r6] @ Load node entry point
+ tst r12,#2 @ If in HNF already
+ bne e_0 @ Skip evaluation
+ str pc,[sp,#-4]! @ Store PC
+ blx r12 @ Evaluate argument
+ e_0:
+ ldr r12,[r6] @ If it does not equal []
+ ldr r14,=__Nil+2
+ cmp r12,r14
+ bne case_P2
+\end{minted}
+
+We can see here
+ (a) that evaluating a node requires a \abc{jsr_eval} ABC instruction, and
+ (b) that \abc{jsr} ABC instructions require storing the PC in ARM assembly.
+
+Of course, evaluating nodes is something that happens throughout the source code and has to be done all the time during the execution of a Clean program.
+We therefore need a fast, small Thumb alternative for the ARM code.
+
+\subsection{Solution}
+\label{sec:storing-pc:solution}
+To solve the first problem, we need to first move \ual{pc} to an auxiliary register, and then push that on the stack.
+We then get, for the example from \texttt{armstartup.s:540-2}:
+
+\begin{minted}{ual}
+ add lr,pc,#9 @ 0x20
+ str lr,[sp,#-4]! @ 0x24
+ bl init_clean @ 0x28
+ tst r4,r4 @ 0x2c
+\end{minted}
+
+We store the value \ual{0x2d}.
+This address points to the \ual{tst} instruction as before, with the LSB set to 1 to indicate Thumb mode.
+
+In assembly code, we can solve this by adding labels for the addresses we want to store on the stack.
+In object code, we need to keep track of the current alignment and add either 9 or 11 to the read program counter.
+
+The offset, 9, is calculated as the number of bytes to the instruction after the branch plus one for Thumb mode.
+For \ual{b} and \ual{bl} instructions, this means an offset of 9, since these instructions are 32-bit.
+The \ual{bx} and \ual{blx} instructions are 16-bit, and require an offset of 7.
+
+\subsection{Other solutions}
+\label{sec:storing-pc:other-solutions}
+Another solution than the one we present makes use of the link register.
+Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register.
+We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack.
+This is the approach taken by GCC.
+The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}.
+
+When generating code for a functional language, it is not straightforward to do this, due to tail recursion.
+It is an easier solution to have the caller responsible for storing the return address,
+ which is why this approach is taken in Clean's ARM code generator~\parencite{armcg}
+ and why we continue along these lines for the Thumb backend.
+
+\subsection{Comparison}
+To be done. %TODO
+
+For every occurrence: +1 word code size;
+for every function call: +1 instruction.
+
+\end{multicols}