diff options
Diffstat (limited to 'thesis/storing-pc.tex')
-rw-r--r-- | thesis/storing-pc.tex | 150 |
1 files changed, 150 insertions, 0 deletions
diff --git a/thesis/storing-pc.tex b/thesis/storing-pc.tex new file mode 100644 index 0000000..98b03ae --- /dev/null +++ b/thesis/storing-pc.tex @@ -0,0 +1,150 @@ +\section{Storing the program counter} +\label{sec:storing-pc} + +\begin{multicols}{2} + +\subsection{Introduction} +\label{sec:storing-pc:intro} +Storing the program counter on the stack is something done commonly in many languages during a function call. +Usually, an offset to the actual program counter at the moment of the store instruction is stored, + to allow for a branch after that instruction, and have the callee return to the address after that branch. +The ARM architecture accommodates for this: an ARM instruction that reads the program counter, actually reads + \enquote{the address of the current instruction plus 8}~\parencite[A2.3]{armv7ar}. +The following ARM assembly example illustrates this (\texttt{armstartup.s:540-2},~\cite{armrts}): + +\begin{minted}{ual} + str pc,[sp,#-4]! @ 0x20 + bl init_clean @ 0x24 + tst r4,r4 @ 0x28 +\end{minted} + +Dummy addresses have been indicated in comments. +When execution arrives at \ual{0x20}, the program counter is set to \ual{0x20}. +Per the above documentation, \ual{str} stores \ual{0x20+8} on the stack + (to be precise: to the address indicated by the stack pointer with pre-indexed offset \ual{-4}). +The processor branches to \ual{init_clean}, + which loads the value from the stack back into the program counter to return to the caller. +The program counter is then \ual{0x28}. +For the next instruction cycle, the \ual{tst} command is executed. + +There are two reasons why the above cannot be used in Thumb-2 code. +First, \ual{pc} is not allowed as the first operand of a \ual{str} instruction. + +The second problem we meet is that the instruction to store the program counter may be halfword-aligned rather than word-aligned. +We saw above that a read of the program counter in ARM mode reads as PC+8. +In Thumb mode this is more complicated. +In this case, we + \enquote{[r]ead the word-aligned PC value, that is, + the address of the current instruction + 4, + with bits [1:0] forced to zero}~\parencite[A5.1.2]{armv7m}. +This means that when the \ual{add} instruction above is at \ual{0x22}, + we will still store \ual{0x2d} on the stack, + since the word-aligned program counter is \ual{0x20} as before. +However, in this case \ual{bl} is located at \ual{0x2a}, and since this is a 32 bits instruction we point to the middle of that instruction. + +\subsection{Usage in Clean} +\label{sec:storing-pc:clean} +Storing the PC is required for jumping to subroutines. +We see a clear example of this when a Clean function definition uses pattern matching. +Consider the following example: + +\begin{minted}{clean} + isEmpty :: [a] -> Bool + isEmpty [] = True + isEmpty _ = False +\end{minted} + +The generated ABC-code looks as below% + \footnote{The current Clean compiler misses the \abc{jsr_eval 0} line: + the strictness analyser recognises that \clean{isEmpty} is strict in its first argument, + so evaluating the argument to head normal form is not needed any more. + In cases where the strictness analyser cannot derive strictness, + code similar to this example is generated. + The code given here can be reproduced with \bash{clm -nsa}.}. +Since \clean{isEmpty} pattern matches on its first argument, it needs to be evaluated to head normal form. +This is done with \abc{jsr_eval 0}. +Only after that can we check if its constructor is \abc{_Nil} (the empty list), which is done with \abc{eq_desc _Nil 0 0}. + +\begin{minted}[tabsize=4]{text} + sisEmpty.1 + jsr_eval 0 || Evaluate argument + eq_desc _Nil 0 0 || If it equals [] + jmp_true case.1 || .. jump to case.1 + jmp case.2 || [else] to case.2 + case.1 + pop_a 1 || Pop argument + pushB TRUE || Return True + rtn + case.2 + pop_a 1 || Pop argument + pushB FALSE || Return False + rtn +\end{minted} + +Evaluating the argument is done by jumping to the subroutine indicated by the node entry of that argument. +Hence, we store the PC, jump to that address, and continue with \abc{eq_desc _Nil 0 0} when the node has been evaluated to head normal form. +The ARM code for pattern matching is: + +\begin{minted}{ual} + sisEmpty_P1: + ldr r12,[r6] @ Load node entry point + tst r12,#2 @ If in HNF already + bne e_0 @ Skip evaluation + str pc,[sp,#-4]! @ Store PC + blx r12 @ Evaluate argument + e_0: + ldr r12,[r6] @ If it does not equal [] + ldr r14,=__Nil+2 + cmp r12,r14 + bne case_P2 +\end{minted} + +We can see here + (a) that evaluating a node requires a \abc{jsr_eval} ABC instruction, and + (b) that \abc{jsr} ABC instructions require storing the PC in ARM assembly. + +Of course, evaluating nodes is something that happens throughout the source code and has to be done all the time during the execution of a Clean program. +We therefore need a fast, small Thumb alternative for the ARM code. + +\subsection{Solution} +\label{sec:storing-pc:solution} +To solve the first problem, we need to first move \ual{pc} to an auxiliary register, and then push that on the stack. +We then get, for the example from \texttt{armstartup.s:540-2}: + +\begin{minted}{ual} + add lr,pc,#9 @ 0x20 + str lr,[sp,#-4]! @ 0x24 + bl init_clean @ 0x28 + tst r4,r4 @ 0x2c +\end{minted} + +We store the value \ual{0x2d}. +This address points to the \ual{tst} instruction as before, with the LSB set to 1 to indicate Thumb mode. + +In assembly code, we can solve this by adding labels for the addresses we want to store on the stack. +In object code, we need to keep track of the current alignment and add either 9 or 11 to the read program counter. + +The offset, 9, is calculated as the number of bytes to the instruction after the branch plus one for Thumb mode. +For \ual{b} and \ual{bl} instructions, this means an offset of 9, since these instructions are 32-bit. +The \ual{bx} and \ual{blx} instructions are 16-bit, and require an offset of 7. + +\subsection{Other solutions} +\label{sec:storing-pc:other-solutions} +Another solution than the one we present makes use of the link register. +Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register. +We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack. +This is the approach taken by GCC. +The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}. + +When generating code for a functional language, it is not straightforward to do this, due to tail recursion. +It is an easier solution to have the caller responsible for storing the return address, + which is why this approach is taken in Clean's ARM code generator~\parencite{armcg} + and why we continue along these lines for the Thumb backend. + +\subsection{Comparison} +To be done. %TODO + +For every occurrence: +1 word code size; +for every function call: +1 instruction. + +\end{multicols} |