From 9d983139338eae79dd375f70378579233da30340 Mon Sep 17 00:00:00 2001 From: Camil Staps Date: Thu, 24 Nov 2016 11:44:58 +0100 Subject: two-bits -> code-addresses --- thesis/code-addresses.tex | 71 +++++++++++++++++++++++++++++++++++++++++++++++ thesis/intro.tex | 6 ++-- thesis/thesis.tex | 2 +- thesis/two-bits.tex | 71 ----------------------------------------------- 4 files changed, 75 insertions(+), 75 deletions(-) create mode 100644 thesis/code-addresses.tex delete mode 100644 thesis/two-bits.tex diff --git a/thesis/code-addresses.tex b/thesis/code-addresses.tex new file mode 100644 index 0000000..2c3a043 --- /dev/null +++ b/thesis/code-addresses.tex @@ -0,0 +1,71 @@ +\section{Code addresses} +\label{sec:code-addresses} + +\begin{multicols}{2} + +\subsection{Introduction} +\label{sec:code-addresses:intro} +The ARM ISA only has 32-bit instructions, meaning that the two least significant bits of a code address are always zero. +When a branch is attempted to an address which has one or both of these bits set, the processor will automatically align the address by clearing these bits. + +Thumb mixes 32 and 16-bit instructions. +Instructions are halfword-aligned, so bit 1 is part of the address. +Additionally, bit 0 is used to facilitate ARM and Thumb interworking. +When the PC is written to with a value where the LSB is cleared, + the processor jumps and switches to ARM mode. +When the LSB is set, it switches to Thumb mode~\parencite[A2.3.2]{armv7ar}. + +\subsection{Usage in Clean} +\label{sec:code-addresses:clean} +The fact that in ARM mode the two lowest bits of a value written to the PC are cleared automatically has been exploited in the ARM code generator: + these two bits are used to store information. +Bit 1 is used to mark a node as having reached head normal form% + \footnote{A node is in \emph{head} or \emph{root normal form} when it cannot be rewritten itself (though its children may)}. +Bit 0 is used in Clean's default garbage collector, the copying collector, to record whether a node has been copied to the other semispace already: + when cleared (the normal state of an address as stored on the heap), it has not been copied yet. + +Porting to Thumb, we need to find a way to store this information without jumping to the middle of an instruction or accidentally switching to ARM mode. + +\subsection{Solution} +\label{sec:code-addresses:solution} +The solution for bit 1 is straightforward. +By aligning all code addresses that we will ever want to jump to (the node entry addresses) on words rather than half-words, we ensure that bit 1 is always cleared. +When a node has reached head normal form, setting bit 1 will give a corrupt address. +Jumping to it would either jump after the first instruction (if it is 16-bit) or to the middle of the first instruction (if it is 32-bit). +However, when a node is in HNF, there is no pointing in jumping to its entry address, + so the corrupted address is not a problem. + +The Thumb backend for Clean proposed in this thesis does not use interworking. +However, we do need to make sure that whenever the PC is written to, the LSB is set. +To that end, we store node entry addresses with the LSB on heap and stack. +This only introduces problems in the copying collector, + which will then not copy any node (since having bit 0 set means having been copied already). +Flipping the meaning of the LSB in the copying collector fixes this issue. + +\subsection{Comparison} +\label{sec:code-addresses:comparison} +Flipping the meaning of the LSB in the garbage collector amounts to swapping \ual{bne} and \ual{beq} and similar changes that do not effect the program's efficiency or size. + +By word-aligning all node entry addresses we lose one alignment byte per node entry address on average + (assuming that half of the node entry points are word-aligned already). +This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller. +Aligning node entries has no effect on the program's efficiency, since the \ual{nop} instruction that is inserted above it will never be executed. + +\subsection{Other solutions} +\label{sec:code-addresses:other-solutions} +The solution described above exploits the fact that the LSB of a code address is only used inside the garbage collector, + and has a fixed value everywhere else. +The solution for bit 1, however, is not specific to the Clean RTS. +Therefore, a general solution to the problem that the two LSBs of a code address cannot be used to store information in Thumb mode would be to align all addresses that we need to store info of on double-words, + that is, ensuring the three LSBs are always zero. +That way, the LSB can be used for ARM and Thumb interworking, and bit 1 and 2 can be used to store information. + +Of course, whether this is a viable solution depends on the density of code addresses that should be aligned. +If every second instruction needs to be aligned, it would introduce so many \ual{nop} instructions + that code size will increase dramatically (even compared to ARM) and + that performance is degraded significantly. + +Then again, in many programs the issue we have explored in this section will not be a problem at all, + because the two LSBs of code addresses are not commonly used. + +\end{multicols} diff --git a/thesis/intro.tex b/thesis/intro.tex index 85f13b8..1056d9f 100644 --- a/thesis/intro.tex +++ b/thesis/intro.tex @@ -106,7 +106,7 @@ The ARM and Thumb instruction sets are designed to \emph{interwork}: The Thumb-2 code generator proposed in this thesis does not produce ARM code, though the existence of the interworking facility has effects on the techniques that can be used in it. -This will be covered in \cref{sec:two-bits}. +This will be covered in \cref{sec:code-addresses}. \subsection{Clean} \label{sec:intro:clean} @@ -292,7 +292,7 @@ Second, we doubt that using interworking can be done efficiently. In the run-time system, only minimal time overhead is introduced by using Thumb instructions. For generated code it would be complicated to detect if the ARM or Thumb instruction set would give better results, and this would give significantly better results only in specific cases. -Third, the problem discussed in \cref{sec:two-bits} could be solved efficiently only without interworking. +Third, the problem discussed in \cref{sec:code-addresses} could be solved efficiently only without interworking. Using interworking would introduce overhead at every branch instruction, since the solution to this problem would have to be adapted. @@ -311,7 +311,7 @@ In much of the rest of this thesis we discuss differences between ARM and Thumb, their influences on code generation, and the way they were dealt with in the Thumb backend for Clean proposed in this thesis. In \cref{sec:storing-pc}, we consider an issue arising from halfword-aligned instructions and the way a read of PC is interpreted under Thumb. -\Cref{sec:two-bits} discusses problems related to the fact that Thumb instruction addresses use bit 1 and should have bit 0 set for interworking, +\Cref{sec:code-addresses} discusses problems related to the fact that Thumb instruction addresses use bit 1 and should have bit 0 set for interworking, while under ARM a branch will automatically clear both these bits. The Thumb backend for Clean has been benchmarked. diff --git a/thesis/thesis.tex b/thesis/thesis.tex index 8760159..9bcd318 100644 --- a/thesis/thesis.tex +++ b/thesis/thesis.tex @@ -48,7 +48,7 @@ Specifically, we look at code generation for the purely functional programming l \input{intro} \input{storing-pc} -\input{two-bits} +\input{code-addresses} \input{results} \ifdraft\else diff --git a/thesis/two-bits.tex b/thesis/two-bits.tex deleted file mode 100644 index ac8419b..0000000 --- a/thesis/two-bits.tex +++ /dev/null @@ -1,71 +0,0 @@ -\section{Code addresses} -\label{sec:two-bits} - -\begin{multicols}{2} - -\subsection{Introduction} -\label{sec:two-bits:intro} -The ARM ISA only has 32-bit instructions, meaning that the two least significant bits of a code address are always zero. -When a branch is attempted to an address which has one or both of these bits set, the processor will automatically align the address by clearing these bits. - -Thumb mixes 32 and 16-bit instructions. -Instructions are halfword-aligned, so bit 1 is part of the address. -Additionally, bit 0 is used to facilitate ARM and Thumb interworking. -When the PC is written to with a value where the LSB is cleared, - the processor jumps and switches to ARM mode. -When the LSB is set, it switches to Thumb mode~\parencite[A2.3.2]{armv7ar}. - -\subsection{Usage in Clean} -\label{sec:two-bits:clean} -The fact that in ARM mode the two lowest bits of a value written to the PC are cleared automatically has been exploited in the ARM code generator: - these two bits are used to store information. -Bit 1 is used to mark a node as having reached head normal form% - \footnote{A node is in \emph{head} or \emph{root normal form} when it cannot be rewritten itself (though its children may)}. -Bit 0 is used in Clean's default garbage collector, the copying collector, to record whether a node has been copied to the other semispace already: - when cleared (the normal state of an address as stored on the heap), it has not been copied yet. - -Porting to Thumb, we need to find a way to store this information without jumping to the middle of an instruction or accidentally switching to ARM mode. - -\subsection{Solution} -\label{sec:two-bits:solution} -The solution for bit 1 is straightforward. -By aligning all code addresses that we will ever want to jump to (the node entry addresses) on words rather than half-words, we ensure that bit 1 is always cleared. -When a node has reached head normal form, setting bit 1 will give a corrupt address. -Jumping to it would either jump after the first instruction (if it is 16-bit) or to the middle of the first instruction (if it is 32-bit). -However, when a node is in HNF, there is no pointing in jumping to its entry address, - so the corrupted address is not a problem. - -The Thumb backend for Clean proposed in this thesis does not use interworking. -However, we do need to make sure that whenever the PC is written to, the LSB is set. -To that end, we store node entry addresses with the LSB on heap and stack. -This only introduces problems in the copying collector, - which will then not copy any node (since having bit 0 set means having been copied already). -Flipping the meaning of the LSB in the copying collector fixes this issue. - -\subsection{Comparison} -\label{sec:two-bits:comparison} -Flipping the meaning of the LSB in the garbage collector amounts to swapping \ual{bne} and \ual{beq} and similar changes that do not effect the program's efficiency or size. - -By word-aligning all node entry addresses we lose one alignment byte per node entry address on average - (assuming that half of the node entry points are word-aligned already). -This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller. -Aligning node entries has no effect on the program's efficiency, since the \ual{nop} instruction that is inserted above it will never be executed. - -\subsection{Other solutions} -\label{sec:two-bits:other-solutions} -The solution described above exploits the fact that the LSB of a code address is only used inside the garbage collector, - and has a fixed value everywhere else. -The solution for bit 1, however, is not specific to the Clean RTS. -Therefore, a general solution to the problem that the two LSBs of a code address cannot be used to store information in Thumb mode would be to align all addresses that we need to store info of on double-words, - that is, ensuring the three LSBs are always zero. -That way, the LSB can be used for ARM and Thumb interworking, and bit 1 and 2 can be used to store information. - -Of course, whether this is a viable solution depends on the density of code addresses that should be aligned. -If every second instruction needs to be aligned, it would introduce so many \ual{nop} instructions - that code size will increase dramatically (even compared to ARM) and - that performance is degraded significantly. - -Then again, in many programs the issue we have explored in this section will not be a problem at all, - because the two LSBs of code addresses are not commonly used. - -\end{multicols} -- cgit v1.2.3