2 files changed, 31 insertions, 13 deletions
diff --git a/thesis/storing-pc.tex b/thesis/storing-pc.tex
index e08fb4d..81a688b 100644
--- a/thesis/storing-pc.tex
+++ b/thesis/storing-pc.tex
@@ -128,20 +128,8 @@ The offset, 9, is calculated as the number of bytes to the instruction after the
 For \ual{b} and \ual{bl} instructions, this means an offset of 9, since these instructions are 32-bit.
 The \ual{bx} and \ual{blx} instructions are 16-bit, and require an offset of 7.
 
-\subsection{Other solutions}
-\label{sec:storing-pc:other-solutions}
-Another solution than the one we present makes use of the link register.
-Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register.
-We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack.
-This is the approach taken by GCC.
-The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}.
-
-When generating code for a functional language, it is not straightforward to do this, due to tail recursion.
-It is an easier solution to have the caller responsible for storing the return address,
-	which is why this approach is taken in Clean's ARM code generator~\parencite{armcg}
-	and why we continue along these lines for the Thumb backend.
-
 \subsection{Comparison}
+\label{sec:storing-pc:comparison}
 Assuming the worst case, that all instructions in the jump block are wide, we need four more bytes in Thumb than in ARM.
 As a benchmark, the Clean compiler has 41,006 jumps of this kind in 1,253,978 instructions, a rough 3.27\%.
 The four extra bytes in Thumb mean a size increase of $41006\cdot4\approx160$KiB on the 5.3MiB file, an increase of 3.00\%.
@@ -157,4 +145,17 @@ A general comparison of running time under ARM and Thumb is made in \cref{sec:re
 % pi@rasppi:~/clean/exe$ objdump -d cocl | grep -E '^\s+[0123456789abcdef]{5,8}:\s+[0123456789abcdef]{8}' | wc -l
 % 1253978
 
+\subsection{Other solutions}
+\label{sec:storing-pc:other-solutions}
+Another solution than the one we present makes use of the link register.
+Some branch instructions, like \ual{bl}, store the address of the next instruction in the link register.
+We could therefore imagine a setup where the callee gets the return address from that register rather than from the stack.
+This is the approach taken by GCC.
+The code of a typical C subroutine starts with \ual{push {...,lr}} and ends with \ual{pop {...,pc}}.
+
+When generating code for a functional language, it is not straightforward to do this, due to tail recursion.
+It is an easier solution to have the caller responsible for storing the return address,
+	which is why this approach is taken in Clean's ARM code generator~\parencite{armcg}
+	and why we continue along these lines for the Thumb backend.
+
 \end{multicols}
diff --git a/thesis/two-bits.tex b/thesis/two-bits.tex
index 56e7c45..ac8419b 100644
--- a/thesis/two-bits.tex
+++ b/thesis/two-bits.tex
@@ -51,4 +51,21 @@ By word-aligning all node entry addresses we lose one alignment byte per node en
 This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller.
 Aligning node entries has no effect on the program's efficiency, since the \ual{nop} instruction that is inserted above it will never be executed.
 
+\subsection{Other solutions}
+\label{sec:two-bits:other-solutions}
+The solution described above exploits the fact that the LSB of a code address is only used inside the garbage collector,
+	and has a fixed value everywhere else.
+The solution for bit 1, however, is not specific to the Clean RTS.
+Therefore, a general solution to the problem that the two LSBs of a code address cannot be used to store information in Thumb mode would be to align all addresses that we need to store info of on double-words,
+	that is, ensuring the three LSBs are always zero.
+That way, the LSB can be used for ARM and Thumb interworking, and bit 1 and 2 can be used to store information.
+
+Of course, whether this is a viable solution depends on the density of code addresses that should be aligned.
+If every second instruction needs to be aligned, it would introduce so many \ual{nop} instructions
+	that code size will increase dramatically (even compared to ARM) and
+	that performance is degraded significantly.
+
+Then again, in many programs the issue we have explored in this section will not be a problem at all,
+	because the two LSBs of code addresses are not commonly used.
+
 \end{multicols}