summaryrefslogtreecommitdiff
path: root/thesis
diff options
context:
space:
mode:
Diffstat (limited to 'thesis')
-rw-r--r--thesis/storing-pc.tex27
-rw-r--r--thesis/two-bits.tex17
2 files changed, 31 insertions, 13 deletions
diff --git a/thesis/storing-pc.tex b/thesis/storing-pc.tex
index e08fb4d..81a688b 100644
--- a/thesis/storing-pc.tex
+++ b/thesis/storing-pc.tex
@@ -128,20 +128,8 @@ The offset, 9, is calculated as the number of bytes to the instruction after the
For \ual{b} and \ual{bl} instructions, this means an offset of 9, since these instructions are 32-bit.
The \ual{bx} and \ual{blx} instructions are 16-bit, and require an offset of 7.
-\subsection{Other solutions}
-\label{sec:storing-pc:other-solutions}
-Another solution than the one we present makes use of the link register.
-Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register.
-We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack.
-This is the approach taken by GCC.
-The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}.
-
-When generating code for a functional language, it is not straightforward to do this, due to tail recursion.
-It is an easier solution to have the caller responsible for storing the return address,
- which is why this approach is taken in Clean's ARM code generator~\parencite{armcg}
- and why we continue along these lines for the Thumb backend.
-
\subsection{Comparison}
+\label{sec:storing-pc:comparison}
Assuming the worst case, that all instructions in the jump block are wide, we need four more bytes in Thumb than in ARM.
As a benchmark, the Clean compiler has 41,006 jumps of this kind in 1,253,978 instructions, a rough 3.27\%.
The four extra bytes in Thumb mean a size increase of $41006\cdot4\approx160$KiB on the 5.3MiB file, an increase of 3.00\%.
@@ -157,4 +145,17 @@ A general comparison of running time under ARM and Thumb is made in \cref{sec:re
% pi@rasppi:~/clean/exe$ objdump -d cocl | grep -E '^\s+[0123456789abcdef]{5,8}:\s+[0123456789abcdef]{8}' | wc -l
% 1253978
+\subsection{Other solutions}
+\label{sec:storing-pc:other-solutions}
+Another solution than the one we present makes use of the link register.
+Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register.
+We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack.
+This is the approach taken by GCC.
+The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}.
+
+When generating code for a functional language, it is not straightforward to do this, due to tail recursion.
+It is an easier solution to have the caller responsible for storing the return address,
+ which is why this approach is taken in Clean's ARM code generator~\parencite{armcg}
+ and why we continue along these lines for the Thumb backend.
+
\end{multicols}
diff --git a/thesis/two-bits.tex b/thesis/two-bits.tex
index 56e7c45..ac8419b 100644
--- a/thesis/two-bits.tex
+++ b/thesis/two-bits.tex
@@ -51,4 +51,21 @@ By word-aligning all node entry addresses we lose one alignment byte per node en
This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller.
Aligning node entries has no effect on the program's efficiency, since the \ual{nop} instruction that is inserted above it will never be executed.
+\subsection{Other solutions}
+\label{sec:two-bits:other-solutions}
+The solution described above exploits the fact that the LSB of a code address is only used inside the garbage collector,
+ and has a fixed value everywhere else.
+The solution for bit 1, however, is not specific to the Clean RTS.
+Therefore, a general solution to the problem that the two LSBs of a code address cannot be used to store information in Thumb mode would be to align all addresses that we need to store info of on double-words,
+ that is, ensuring the three LSBs are always zero.
+That way, the LSB can be used for ARM and Thumb interworking, and bit 1 and 2 can be used to store information.
+
+Of course, whether this is a viable solution depends on the density of code addresses that should be aligned.
+If every second instruction needs to be aligned, it would introduce so many \ual{nop} instructions
+ that code size will increase dramatically (even compared to ARM) and
+ that performance is degraded significantly.
+
+Then again, in many programs the issue we have explored in this section will not be a problem at all,
+ because the two LSBs of code addresses are not commonly used.
+
\end{multicols}