diff options
author | Camil Staps | 2016-11-14 23:29:54 +0100 |
---|---|---|
committer | Camil Staps | 2016-11-14 23:29:54 +0100 |
commit | 22e576fa306f1681252bc6351f113d1ae7fa8b31 (patch) | |
tree | 7b5c833777e14845baee674bd507055d3ad0b242 /thesis/two-bits.tex | |
parent | Correct log (diff) |
Thesis: introduction, two LSBs
Diffstat (limited to 'thesis/two-bits.tex')
-rw-r--r-- | thesis/two-bits.tex | 53 |
1 files changed, 53 insertions, 0 deletions
diff --git a/thesis/two-bits.tex b/thesis/two-bits.tex new file mode 100644 index 0000000..4b08d89 --- /dev/null +++ b/thesis/two-bits.tex @@ -0,0 +1,53 @@ +\section{Two bits} +\label{sec:two-bits} + +\begin{multicols}{2} + +\subsection{Introduction} +\label{sec:two-bits:intro} +The ARM ISA only has 32-bit instructions, meaning that the two least significant bits of a code address are always zero. +When a branch is attempted to an address which has one or both of these bits set, the processor will automatically align the address by clearing these bits. + +Thumb mixes 32 and 16-bit instructions. +Instructions are halfword-aligned, so bit 1 is part of the address. +Additionally, bit 0 is used to facilitate ARM and Thumb interworking. +When the PC is written to with a value where the LSB is cleared, + the processor jumps and switches to ARM mode. +When the LSB is set, it switches to Thumb mode~\citep[A2.3.2]{armv7ar}. + +\subsection{Usage in Clean} +\label{sec:two-bits:clean} +The fact that in ARM mode the two lowest bits of a value written to the PC are cleared automatically has been exploited in the ARM code generator: + these two bits are used to store information. +Bit 1 is used to mark a node as having reached head normal form% + \footnote{A node is in \emph{head} or \emph{root normal form} when it cannot be rewritten itself (though its children may)}. +Bit 0 is used in Clean's default garbage collector, the copying collector, to record whether a node has been copied to the other semispace already: + when cleared (the normal state of an address as stored on the heap), it has not been copied yet. + +Porting to Thumb, we need to find a way to store this information without jumping to the middle of an instruction or accidentally switching to ARM mode. + +\subsection{Solution} +\label{sec:two-bits:solution} +The solution for bit 1 is straightforward. +By aligning all code addresses that we will ever want to jump to (the node entry addresses) on words rather than half-words, we ensure that bit 1 is always cleared. +When a node has reached head normal form, setting bit 1 will give a corrupt address. +Jumping to it would either jump after the first instruction (if it is 16-bit) or to the middle of the first instruction (if it is 32-bit). +However, when a node is in HNF, there is no pointing in jumping to its entry address, + so the corrupted address is not a problem. + +The Thumb backend for Clean proposed in this thesis does not use interworking. +However, we do need to make sure that whenever the PC is written to, the LSB is set. +To that end, we store node entry addresses with the LSB on heap and stack. +This only introduces problems in the copying collector, + which will then not copy any node (since having bit 0 set means having been copied already). +Flipping the meaning of the LSB in the copying collector fixes this issue. + +\subsection{Comparison} +\label{sec:two-bits:comparison} +Flipping the meaning of the LSB in the garbage collector amounts to swapping \ual{bne} and \ual{beq} and similar changes that do not effect the program's efficiency. +By word-aligning all node entry addresses we lose one alignment byte per node entry address on average + (assuming that half of the node entry points are word-aligned already). +This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller. +Aligning node entries has no effect on the program's efficiency, since the \ual{nop} instruction that is inserted above it will never be executed. + +\end{multicols} |