summaryrefslogtreecommitdiff
path: root/thesis
diff options
context:
space:
mode:
Diffstat (limited to 'thesis')
-rw-r--r--thesis/preamble.tex6
-rw-r--r--thesis/reg-alloc.tex143
-rw-r--r--thesis/storing-pc.tex11
-rw-r--r--thesis/system.tex24
-rw-r--r--thesis/terms.tex4
-rw-r--r--thesis/thesis.tex3
6 files changed, 167 insertions, 24 deletions
diff --git a/thesis/preamble.tex b/thesis/preamble.tex
index 34f2ba2..ea4725f 100644
--- a/thesis/preamble.tex
+++ b/thesis/preamble.tex
@@ -17,6 +17,7 @@
\usepackage[british]{babel}
\usepackage[babel=true]{csquotes}
\usepackage{enumitem}
+\usepackage{array}
\usepackage{amsmath}
\usepackage[usenames,dvipsnames,svgnames,table]{xcolor}
@@ -56,6 +57,7 @@
\usepackage{subcaption}
\usepackage{pgfplots}
+\pgfplotsset{compat=newest}
\usepackage{tikz}
\usetikzlibrary{positioning,calc}
@@ -75,6 +77,10 @@
\let\oldsection\section
\renewcommand{\section}{\ifdraft\else\clearpage\fi\oldsection}
+\newenvironment{inlinefloat}
+ {\par\bigskip\noindent\minipage{\linewidth}}
+ {\endminipage\par\medskip}
+
\raggedcolumns
\renewcommand{\baselinestretch}{1.1}
diff --git a/thesis/reg-alloc.tex b/thesis/reg-alloc.tex
index 0fc764e..5918b59 100644
--- a/thesis/reg-alloc.tex
+++ b/thesis/reg-alloc.tex
@@ -14,7 +14,8 @@
, ytick=data
, scaled x ticks=real:1
, xtick scale label code/.code={}
- , ytick pos=left
+ , axis lines*=left
+ , compat={1.3}
]
\addplot coordinates {
(378618,S0)
@@ -47,7 +48,8 @@
, height=1.1\linewidth
, symbolic y coords={S0,A ptr.,A0,Heap ptr.,A1,B0,A2,B ptr.,B1,S1,B2,B3,B4,Heap ctr.,A3}
, ytick=data
- , ytick pos=left
+ , axis lines*=left
+ , compat={1.3}
]
\addplot coordinates {
(387,S0)
@@ -149,6 +151,30 @@ In Clean programs, the ARM registers are used as follows~\parencite[\texttt{arms
\item The program counter PC.
\end{itemize}
+\begin{table*}[b!]
+ \centering
+ \newcommand{\highlight}[1]{\textcolor{red}{\textbf{#1}}}
+ \begin{tabular}{l >{\tt\footnotesize}c >{\tt\footnotesize}c}
+ Register & {\normalsize\normalfont ARM} & {\normalsize\normalfont Thumb-2} \\\hline
+ A3 & r11 & \highlight{r12} \\
+ Heap counter & r5 & \highlight{r9} \\
+ B4 & r0 & \highlight{r10} \\
+ B3 & r1 & \highlight{r11} \\
+ B2 & r2 & \highlight{r8} \\
+ S1 & r14 & r14 \\
+ B1 & r3 & r3 \\
+ B pointer & sp & sp \\
+ A2 & r8 & \highlight{r2} \\
+ B0 & r4 & r4 \\
+ A1 & r7 & r7 \\
+ Heap pointer & r10 & \highlight{r1} \\
+ A0 & r6 & r6 \\
+ A pointer & r9 & \highlight{r5} \\
+ S0 & r12 & \highlight{r0} \\
+ \end{tabular}
+ \captionof{table}{Register allocation in the ARM and Thumb backends\label{tab:reg-alloc:arm-and-thumb}}
+\end{table*}
+
In \cref{fig:reg-usage}, we count for each register the number of instructions it occurs in, in the code generated for the Clean compiler~\parencite{cocl}.
When a register is used multiple times in the same instruction, this counts as one occurrence.
The vertical line at indicates the number of occurrences that a variable should have to justify putting it in the lower registers,
@@ -162,31 +188,106 @@ For small programs it may not be negligible,
but these programs are not likely to need rigorous optimisation for code size,
considering that they will be small anyway.
+The counting method used here is rather simplistic:
+ basing a register allocation on these counts alone means assuming that every instruction has a 16-bit variant, which is not the case.
+A more accurate method would only count those instructions where using a high or low register actually makes a difference.
+This is much more complicated, and for a rough estimate the simplistic method used here will already allow us to shrink down the code size.
+
+\begin{figure*}[b!]
+ \small
+ \centering
+ \begin{tikzpicture}
+ \begin{axis}
+ [ anchor=south east
+ , height=.4\linewidth
+ , xbar
+ , xmin=0, xmax=6000000
+ , xlabel={Bytes}
+ , ytick=\empty
+ , reverse stacked plots
+ , ylabel={Clean compiler}
+ , ymin=-1, ymax=1
+ , axis lines*=left
+ , nodes near coords, nodes near coords align=west
+ , reverse legend
+ , every axis plot/.append style={point meta=explicit symbolic}
+ , legend style={at={($(1,-0.3)+(1.5em,0pt)$)},anchor=north,legend columns=-1}
+ ]
+ \addplot coordinates { (3785144,0) [80.7\%] };
+ \addplot coordinates { (3827868,0) [81.6\%] };
+ \addplot coordinates { (4385964,0) [93.5\%] };
+ \addplot coordinates { (4692628,0) [100\%] };
+ \legend{Thumb object-code (approximation), Thumb optimised, Thumb, ARM}
+ \end{axis}
+ \begin{axis}
+ [ anchor=south west
+ , height=.4\linewidth
+ , xshift=3em
+ , xbar
+ , xmin=0, xmax=80000
+ , xlabel={Bytes}
+ , ytick=\empty
+ , ymin=-1, ymax=1
+ , ylabel={\texttt{frontend/scanner.icl}}
+ , axis lines*=left
+ , every axis plot/.append style={point meta=explicit symbolic}
+ , nodes near coords, nodes near coords align=west
+ ]
+ \addplot coordinates { (57296,0) [82.0\%] };
+ \addplot coordinates { (57864,0) [82.8\%] };
+ \addplot coordinates { (64588,0) [92.4\%] };
+ \addplot coordinates { (69896,0) [100\%] };
+ \end{axis}
+ \end{tikzpicture}
+ \caption{Code size for different backends\label{fig:reg-alloc:results}}
+\end{figure*}
+
\subsection{Optimisation}
\label{sec:reg-alloc:optimisation}
+The register allocation used in the ARM backend is inefficient for Thumb-2 on several points (\cref{tab:reg-alloc:arm-and-thumb}).
+A number of often-used registers (the main scratch register, the A-stack pointer and the heap pointer) are in the upper half,
+ while registers that are used less often are in the lower half (the heap counter and B2 through B4).
+
\Cref{fig:reg-usage} suggests that we put all variables that occur more than 85,000 times in the low registers.
However, the B-stack pointer has to be SP:
the B- and C-stacks are combined into one
- (an optimisation suggested early on by \cite{execspecs} \todo{exact ref.})
and the C-stack pointer should be the system pointer to allow for an efficient foreign function interface.
-% Before optimisation:
-% pi@rasppi:~/clean/src/compiler$ ls -l a.out
-% -rwxr-xr-x 1 pi pi 9987016 Dec 7 21:46 a.out
-% pi@rasppi:~/clean/src/compiler$ strip --strip-all a.out
-% pi@rasppi:~/clean/src/compiler$ ls -l a.out
-% -rwxr-xr-x 1 pi pi 5034316 Dec 7 21:46 a.out
-
-% After optimisation as suggested by John:
-% 1 r9 <-> r5
-% 2 r0 -> r10
-% 3 r1 -> r11
-% 4 r10 -> r1
-% 5 r11 -> r12
-% 6 r12 -> r0
-% static char *register_name[15]={"sp","r1","r5","r12","r8","r7","r6","r4","r3","r2","r11","r10","r9","r0","r14"};
-% static int register_order[15]={13,1,5,12,8,7,6,4,3,2,11,10,9,0,14};
-% pi@rasppi:~/clean/src/compiler$ ls -l a.out
-% -rwxr-xr-x 1 pi pi 7591584 Dec 7 22:16 a.out
+The third column in \cref{tab:reg-alloc:arm-and-thumb} shows the proposed allocation for Thumb.
+The heap counter and the A-stack pointer are swapped, as are A2 and B2.
+The registers for S0, B4, the heap pointer, B3 and A3 have been rotated.
+This way, all eight most often used registers are in the lower half except the B-stack pointer.
+
+\subsection{Results}
+\label{sec:reg-alloc:results}
+To measure the code size improvement introduced by this optimisation,
+ we again take the Clean compiler and compile it in three different ways:
+
+\begin{itemize}[itemsep=0pt]
+ \item
+ With the existing ARM backend \parencite{armcg}.
+ \item
+ With the new Thumb backend, leaving the register allocation as in the ARM backend.
+ \item
+ With the new Thumb backend, with the optimised register allocation as in the third column of \cref{tab:reg-alloc:arm-and-thumb}.
+ \item
+ With the new Thumb backend, with the optimised register allocation and leaving out any \ual{.align} directives to approximate object-code level efficiency.
+\end{itemize}
+
+The object code generator has not been finished yet at the moment these measurements were done,
+ so we can only estimate its efficiency.
+A modification described in \cref{sec:storing-pc:solution} required us to add \ual{.align} to jump instructions, adding a \ual{nop} instruction in approximately half the instances.
+In the last measurement, we approximate object-code level efficiency by leaving out these \ual{.align} directives, as they will not be needed in the object code generator.
+
+We only measure the size of generated code, that is, all \texttt{.text} segments except those that belong to the RTS or external libraries.
+These measurements were done on the system described in \cref{sec:system}.
+One may argue that the Clean compiler is not representative code because of some peculiarities like large lookup tables and arrays that are rarely used in practice.
+For this reason, we compare both the total code size and the size of a part of the lexer \parencite[\texttt{frontend/scanner.icl}]{cocl},
+ which can be considered representative.
+The results are in \cref{fig:reg-alloc:results}.
+
+We see that even without the optimisation discussed in this section, some code size is saved.
+After all, at least some instructions can be made 16-bit, and in the sections above we have hardly ever had to add extra instructions.
+However, optimising register allocation allows us to save much more, up to almost 20\%.
\end{multicols}
diff --git a/thesis/storing-pc.tex b/thesis/storing-pc.tex
index 40da1d1..bfb8442 100644
--- a/thesis/storing-pc.tex
+++ b/thesis/storing-pc.tex
@@ -121,13 +121,18 @@ We then get, for the example from \texttt{armstartup.s:540-2}:
We store the value \ual{0x2d}.
This address points to the \ual{tst} instruction as before, with the LSB set to 1 to indicate Thumb mode.
-In assembly code, we can solve this by adding labels for the addresses we want to store on the stack.
-In object code, we need to keep track of the current alignment and add either 9 or 11 to the read program counter.
-
The offset, 9, is calculated as the number of bytes to the instruction after the branch plus one for Thumb mode.
For \ual{b} and \ual{bl} instructions, this means an offset of 9, since these instructions are 32-bit.
The \ual{bx} and \ual{blx} instructions are 16-bit, and require an offset of 7.
+The second problem is that
+ when the \ual{add} instruction is located at the start of a halfword (e.g. \ual{0x22}),
+ the value that is read for PC will still be \ual{0x20}, as it is word-aligned~\parencite[A5.1.2]{armv7m}
+When generating object code, we need to keep track of the current alignment and add either 7, 9 or 11 to the read program counter,
+ depending on both the alignment and the size of the branch instruction.
+When generating assembly code, we are less concerned with efficiency.
+In this code generator, we simply force-align the \ual{add} instruction (by adding an \ual{.align} directive, which will insert a \ual{nop} if necessary).
+
\subsection{Implementation details}
\label{sec:storing-pc:implementation}
A jump always jumps to either a label (with \ual{b} or \ual{bl}) or a register (with \ual{bx} or \ual{blx}).
diff --git a/thesis/system.tex b/thesis/system.tex
new file mode 100644
index 0000000..669329e
--- /dev/null
+++ b/thesis/system.tex
@@ -0,0 +1,24 @@
+\section{System setup}
+\label{sec:system}
+
+\begin{multicols}{2}
+
+Tests were run on a Raspberry Pi 3, Model B with 1GB RAM.
+Its processor is listed in \texttt{/proc/cpuinfo} as an `ARMv7 Processor rev 4 (v7l)',
+ although the Hardware field is `BCM2709', which is an ARMv8 chip.
+
+To build programs, we used the following suite:
+
+\begin{itemize}[itemsep=0pt]
+ \item gcc (Raspbian 4.9.2-10) 4.9.2
+ \item GNU assembler (GNU Binutils for Raspbian) 2.25
+ \item GNU ld (GNU Binutils for Raspbian) 2.25
+ \item A Clean compiler built from intermediate ABC files retrieved on December 7, 2016 \parencite{cocl},
+ built with the ARM code generator revision 295 \parencite{armcg}
+ and the ARM RTS revision 387 \parencite{armrts}.
+\end{itemize}
+
+These programs are given \texttt{-march=armv7-a} when applicable to optimise for the ARMv7 architecture.
+The Thumb backend proposed in this thesis uses features that are deprecated by ARMv8-A (see \cref{sec:intro:conditionals}).
+
+\end{multicols}
diff --git a/thesis/terms.tex b/thesis/terms.tex
index 707b956..bb5e6a1 100644
--- a/thesis/terms.tex
+++ b/thesis/terms.tex
@@ -1,6 +1,8 @@
\section{Terminology}
\label{sec:terminology}
+\begin{multicols}{2}
+
\begin{description}
\item[HNF]
Head normal form, the state of a graph node that cannot be rewritten itself (though it children may).
@@ -22,3 +24,5 @@
Unified Assembler Language.
\todo{reference, more explanation.}
\end{description}
+
+\end{multicols}
diff --git a/thesis/thesis.tex b/thesis/thesis.tex
index 2ab1667..fed073f 100644
--- a/thesis/thesis.tex
+++ b/thesis/thesis.tex
@@ -35,6 +35,7 @@ The code generator proposed here can be used in situations where small code size
\cleardoublepage
\fi
+\setcounter{tocdepth}{2}
\ifdraft
\begin{multicols}{2}
\tableofcontents
@@ -58,8 +59,10 @@ The code generator proposed here can be used in situations where small code size
\ifdraft\else
\cleardoublepage
\fi
+\let\section\oldsection
\appendix
\input{terms}
+\input{system}
\ifdraft\else
\cleardoublepage