diff options
-rw-r--r-- | thesis/storing-pc.tex | 27 | ||||
-rw-r--r-- | thesis/two-bits.tex | 17 |
2 files changed, 31 insertions, 13 deletions
diff --git a/thesis/storing-pc.tex b/thesis/storing-pc.tex index e08fb4d..81a688b 100644 --- a/thesis/storing-pc.tex +++ b/thesis/storing-pc.tex @@ -128,20 +128,8 @@ The offset, 9, is calculated as the number of bytes to the instruction after the For \ual{b} and \ual{bl} instructions, this means an offset of 9, since these instructions are 32-bit. The \ual{bx} and \ual{blx} instructions are 16-bit, and require an offset of 7. -\subsection{Other solutions} -\label{sec:storing-pc:other-solutions} -Another solution than the one we present makes use of the link register. -Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register. -We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack. -This is the approach taken by GCC. -The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}. - -When generating code for a functional language, it is not straightforward to do this, due to tail recursion. -It is an easier solution to have the caller responsible for storing the return address, - which is why this approach is taken in Clean's ARM code generator~\parencite{armcg} - and why we continue along these lines for the Thumb backend. - \subsection{Comparison} +\label{sec:storing-pc:comparison} Assuming the worst case, that all instructions in the jump block are wide, we need four more bytes in Thumb than in ARM. As a benchmark, the Clean compiler has 41,006 jumps of this kind in 1,253,978 instructions, a rough 3.27\%. The four extra bytes in Thumb mean a size increase of $41006\cdot4\approx160$KiB on the 5.3MiB file, an increase of 3.00\%. @@ -157,4 +145,17 @@ A general comparison of running time under ARM and Thumb is made in \cref{sec:re % pi@rasppi:~/clean/exe$ objdump -d cocl | grep -E '^\s+[0123456789abcdef]{5,8}:\s+[0123456789abcdef]{8}' | wc -l % 1253978 +\subsection{Other solutions} +\label{sec:storing-pc:other-solutions} +Another solution than the one we present makes use of the link register. +Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register. +We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack. +This is the approach taken by GCC. +The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}. + +When generating code for a functional language, it is not straightforward to do this, due to tail recursion. +It is an easier solution to have the caller responsible for storing the return address, + which is why this approach is taken in Clean's ARM code generator~\parencite{armcg} + and why we continue along these lines for the Thumb backend. + \end{multicols} diff --git a/thesis/two-bits.tex b/thesis/two-bits.tex index 56e7c45..ac8419b 100644 --- a/thesis/two-bits.tex +++ b/thesis/two-bits.tex @@ -51,4 +51,21 @@ By word-aligning all node entry addresses we lose one alignment byte per node en This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller. Aligning node entries has no effect on the program's efficiency, since the \ual{nop} instruction that is inserted above it will never be executed. +\subsection{Other solutions} +\label{sec:two-bits:other-solutions} +The solution described above exploits the fact that the LSB of a code address is only used inside the garbage collector, + and has a fixed value everywhere else. +The solution for bit 1, however, is not specific to the Clean RTS. +Therefore, a general solution to the problem that the two LSBs of a code address cannot be used to store information in Thumb mode would be to align all addresses that we need to store info of on double-words, + that is, ensuring the three LSBs are always zero. +That way, the LSB can be used for ARM and Thumb interworking, and bit 1 and 2 can be used to store information. + +Of course, whether this is a viable solution depends on the density of code addresses that should be aligned. +If every second instruction needs to be aligned, it would introduce so many \ual{nop} instructions + that code size will increase dramatically (even compared to ARM) and + that performance is degraded significantly. + +Then again, in many programs the issue we have explored in this section will not be a problem at all, + because the two LSBs of code addresses are not commonly used. + \end{multicols} |