path: root/thesis
diff options
Diffstat (limited to 'thesis')
2 files changed, 31 insertions, 13 deletions
diff --git a/thesis/storing-pc.tex b/thesis/storing-pc.tex
index e08fb4d..81a688b 100644
--- a/thesis/storing-pc.tex
+++ b/thesis/storing-pc.tex
@@ -128,20 +128,8 @@ The offset, 9, is calculated as the number of bytes to the instruction after the
For \ual{b} and \ual{bl} instructions, this means an offset of 9, since these instructions are 32-bit.
The \ual{bx} and \ual{blx} instructions are 16-bit, and require an offset of 7.
-\subsection{Other solutions}
-Another solution than the one we present makes use of the link register.
-Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register.
-We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack.
-This is the approach taken by GCC.
-The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}.
-When generating code for a functional language, it is not straightforward to do this, due to tail recursion.
-It is an easier solution to have the caller responsible for storing the return address,
- which is why this approach is taken in Clean's ARM code generator~\parencite{armcg}
- and why we continue along these lines for the Thumb backend.
Assuming the worst case, that all instructions in the jump block are wide, we need four more bytes in Thumb than in ARM.
As a benchmark, the Clean compiler has 41,006 jumps of this kind in 1,253,978 instructions, a rough 3.27\%.
The four extra bytes in Thumb mean a size increase of $41006\cdot4\approx160$KiB on the 5.3MiB file, an increase of 3.00\%.
@@ -157,4 +145,17 @@ A general comparison of running time under ARM and Thumb is made in \cref{sec:re
% pi@rasppi:~/clean/exe$ objdump -d cocl | grep -E '^\s+[0123456789abcdef]{5,8}:\s+[0123456789abcdef]{8}' | wc -l
% 1253978
+\subsection{Other solutions}
+Another solution than the one we present makes use of the link register.
+Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register.
+We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack.
+This is the approach taken by GCC.
+The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}.
+When generating code for a functional language, it is not straightforward to do this, due to tail recursion.
+It is an easier solution to have the caller responsible for storing the return address,
+ which is why this approach is taken in Clean's ARM code generator~\parencite{armcg}
+ and why we continue along these lines for the Thumb backend.
diff --git a/thesis/two-bits.tex b/thesis/two-bits.tex
index 56e7c45..ac8419b 100644
--- a/thesis/two-bits.tex
+++ b/thesis/two-bits.tex
@@ -51,4 +51,21 @@ By word-aligning all node entry addresses we lose one alignment byte per node en
This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller.
Aligning node entries has no effect on the program's efficiency, since the \ual{nop} instruction that is inserted above it will never be executed.
+\subsection{Other solutions}
+The solution described above exploits the fact that the LSB of a code address is only used inside the garbage collector,
+ and has a fixed value everywhere else.
+The solution for bit 1, however, is not specific to the Clean RTS.
+Therefore, a general solution to the problem that the two LSBs of a code address cannot be used to store information in Thumb mode would be to align all addresses that we need to store info of on double-words,
+ that is, ensuring the three LSBs are always zero.
+That way, the LSB can be used for ARM and Thumb interworking, and bit 1 and 2 can be used to store information.
+Of course, whether this is a viable solution depends on the density of code addresses that should be aligned.
+If every second instruction needs to be aligned, it would introduce so many \ual{nop} instructions
+ that code size will increase dramatically (even compared to ARM) and
+ that performance is degraded significantly.
+Then again, in many programs the issue we have explored in this section will not be a problem at all,
+ because the two LSBs of code addresses are not commonly used.