summaryrefslogtreecommitdiff
path: root/thesis/two-bits.tex
diff options
context:
space:
mode:
authorCamil Staps2016-11-14 23:29:54 +0100
committerCamil Staps2016-11-14 23:29:54 +0100
commit22e576fa306f1681252bc6351f113d1ae7fa8b31 (patch)
tree7b5c833777e14845baee674bd507055d3ad0b242 /thesis/two-bits.tex
parentCorrect log (diff)
Thesis: introduction, two LSBs
Diffstat (limited to 'thesis/two-bits.tex')
-rw-r--r--thesis/two-bits.tex53
1 files changed, 53 insertions, 0 deletions
diff --git a/thesis/two-bits.tex b/thesis/two-bits.tex
new file mode 100644
index 0000000..4b08d89
--- /dev/null
+++ b/thesis/two-bits.tex
@@ -0,0 +1,53 @@
+\section{Two bits}
+\label{sec:two-bits}
+
+\begin{multicols}{2}
+
+\subsection{Introduction}
+\label{sec:two-bits:intro}
+The ARM ISA only has 32-bit instructions, meaning that the two least significant bits of a code address are always zero.
+When a branch is attempted to an address which has one or both of these bits set, the processor will automatically align the address by clearing these bits.
+
+Thumb mixes 32 and 16-bit instructions.
+Instructions are halfword-aligned, so bit 1 is part of the address.
+Additionally, bit 0 is used to facilitate ARM and Thumb interworking.
+When the PC is written to with a value where the LSB is cleared,
+ the processor jumps and switches to ARM mode.
+When the LSB is set, it switches to Thumb mode~\citep[A2.3.2]{armv7ar}.
+
+\subsection{Usage in Clean}
+\label{sec:two-bits:clean}
+The fact that in ARM mode the two lowest bits of a value written to the PC are cleared automatically has been exploited in the ARM code generator:
+ these two bits are used to store information.
+Bit 1 is used to mark a node as having reached head normal form%
+ \footnote{A node is in \emph{head} or \emph{root normal form} when it cannot be rewritten itself (though its children may)}.
+Bit 0 is used in Clean's default garbage collector, the copying collector, to record whether a node has been copied to the other semispace already:
+ when cleared (the normal state of an address as stored on the heap), it has not been copied yet.
+
+Porting to Thumb, we need to find a way to store this information without jumping to the middle of an instruction or accidentally switching to ARM mode.
+
+\subsection{Solution}
+\label{sec:two-bits:solution}
+The solution for bit 1 is straightforward.
+By aligning all code addresses that we will ever want to jump to (the node entry addresses) on words rather than half-words, we ensure that bit 1 is always cleared.
+When a node has reached head normal form, setting bit 1 will give a corrupt address.
+Jumping to it would either jump after the first instruction (if it is 16-bit) or to the middle of the first instruction (if it is 32-bit).
+However, when a node is in HNF, there is no pointing in jumping to its entry address,
+ so the corrupted address is not a problem.
+
+The Thumb backend for Clean proposed in this thesis does not use interworking.
+However, we do need to make sure that whenever the PC is written to, the LSB is set.
+To that end, we store node entry addresses with the LSB on heap and stack.
+This only introduces problems in the copying collector,
+ which will then not copy any node (since having bit 0 set means having been copied already).
+Flipping the meaning of the LSB in the copying collector fixes this issue.
+
+\subsection{Comparison}
+\label{sec:two-bits:comparison}
+Flipping the meaning of the LSB in the garbage collector amounts to swapping \ual{bne} and \ual{beq} and similar changes that do not effect the program's efficiency.
+By word-aligning all node entry addresses we lose one alignment byte per node entry address on average
+ (assuming that half of the node entry points are word-aligned already).
+This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller.
+Aligning node entries has no effect on the program's efficiency, since the \ual{nop} instruction that is inserted above it will never be executed.
+
+\end{multicols}