1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
|
\section{Introduction}
\label{sec:intro}
\begin{multicols}{2}
\subsection{ARM, Thumb and Thumb-2}
\label{sec:intro:arm-thumb-thumb2}
ARM is a RISC architecture with several enhancements to a basic RISC architecture allowing ARM processors to
\enquote{achieve a good balance of high performance, small program size, low power consumption, and small silicon area}~\parencite[A1.1]{armv7ar}.
Several instruction sets were designed for the ARM architecture.
First of all, the 32-bit ARM ISA allows the programmer to easily make full use of all the architecture's features.
The Thumb instruction set provides a 16-bit alternative to the ARM ISA, giving in on performance to achieve improved code density.
Starting from ARMv6T2, an extension to the Thumb instruction set, known as Thumb-2, adds 32-bit instructions to Thumb to
\enquote{achieve performance similar to ARM code, with code density better than that of earlier Thumb code}~\parencite[A1.2]{armv7ar}.
This gives the ARM and Thumb instruction sets \enquote{almost identical functionality}~\parencite[A1.2]{armv7ar},
whereas Thumb gives a smaller code size.
Using the Unified Assembler Language (UAL), one can write assembly code for both the ARM and the Thumb instruction set and fix the target ISA only at assemble-time.
The main differences between ARM and Thumb-2 are the following:
\subsubsection{Conditional execution}
\label{sec:intro:conditionals}
In ARM, every instruction has a 4-bit conditional field that allows for conditional execution.
In the Thumb instruction set, all conditional instructions except branches have to be in an \enquote{IT block}.
A first \ual{it} instruction gives the base condition and a then-else pattern.
The following instructions are executed conditionally.
For example:
\begin{minted}{ual}
itte gt @ tte: then, then, else
movgt r2,r3 @ mov if gt
movgt r0,r1 @ mov if gt
movle r0,#0 @ mov if le (= not gt)
\end{minted}
This is UAL syntax.
When assembling for Thumb, an \ual{it} instruction with three \ual{mov} instructions (without conditional field) is generated.
For ARM, the \ual{it} instruction is ignored and three conditional \ual{mov} instructions are generated.
%ARMv8-A deprecates some uses of the \ual{it} instruction for performance reasons~\parencite[J5.2]{armv8a}.
\subsubsection{Register usage}
\label{sec:intro:registers}
ARM processors have sixteen registers.
ARM instructions have 4-bit register fields to address them.
Some 16-bit Thumb instructions have 3-bit register fields that can only address the lowest eight registers.
For these instructions there exist 32-bit variants that can address all sixteen registers.
\begin{figure*}[t]
\centering
\begin{subfigure}[b]{.2\linewidth}
\centering
\begin{tikzpicture}[clean]
\node (start) {\clean{Start}};
\end{tikzpicture}
\caption{Initially}
\end{subfigure}%
\begin{subfigure}[b]{.2\linewidth}
\centering
\begin{tikzpicture}[clean]
\node[node args=1] (fac) {\clean{fac}};
\node[below=of fac.arg1.south] (4) {\clean{4}};
\draw (fac.arg1) -- (4.north);
\end{tikzpicture}
\caption{Applying \clean{Start}}
\end{subfigure}%
\begin{subfigure}[b]{.3\linewidth}
\centering
\begin{tikzpicture}[clean]
\node[node args=2] (times) {\clean{*}};
\node[below=of times.arg1.south] (4) {\clean{4}};
\node[node args=1,right=of times.arg2.east] (fac) {\clean{fac}};
\node[node args=2,right=of fac.arg1.east] (minus) {\clean{-}};
\node[right=of minus.arg2.east] (1) {\clean{1}};
\draw (times.arg1) -- (4.north);
\draw (times.arg2) -- (fac.west);
\draw (fac.arg1) -- (minus.west);
\draw (minus.arg1) |- (4.east);
\draw (minus.arg2) -- (1.west);
\end{tikzpicture}
\caption{Applying \clean{fac n}}
\end{subfigure}%
\begin{subfigure}[b]{.2\linewidth}
\centering
\begin{tikzpicture}[clean]
\node[node args=2] (times) {\clean{*}};
\node[below=of times.arg1.south] (4) {\clean{4}};
\node[node args=1,right=of times.arg2.east] (fac) {\clean{fac}};
\node[below=of fac.arg1.south] (3) {\clean{3}};
\draw (times.arg1) -- (4.north);
\draw (times.arg2) -- (fac.west);
\draw (fac.arg1) -- (3.north);
\end{tikzpicture}
\caption{Applying \clean{-}}
\end{subfigure}
\caption{Rewriting a Clean node\label{fig:intro:rewriting}}
\end{figure*}
\subsubsection{Interworking}
\label{sec:intro:interworking}
The ARM and Thumb instruction sets are designed to \emph{interwork}:
different parts of a program can be assembled for different instruction sets
and it is possible to switch instruction set when the program counter is written to~\parencite[A4.1]{armv7ar}.
The Thumb-2 code generator proposed in this thesis does not produce ARM code,
though the existence of the interworking facility has effects on the techniques that can be used in it.
This will be covered in \cref{sec:two-bits}.
\subsection{Clean}
\label{sec:intro:clean}
Clean is \enquote{a general purpose, state-of-the-art, pure and lazy functional programming language designed for making real-world applications}~\parencite{clean}.
It evolved from LEAN, an intermediate language based on graph rewriting~\parencite{lean}.
A Clean program consists of a list of rewrite rules, for example:
\begin{minted}{clean}
fac 0 = 1
fac n = n * (fac (n - 1))
Start = fac 4
\end{minted}
Executing a Clean program means rewriting the \clean{Start} rule until no rewriting is possible any more.
The first rewriting steps of the above program are shown in \cref{fig:intro:rewriting}.
A Clean program is compiled to ABC-code, a platform-independent bytecode for an abstract graph rewriting machine~\parencite{execspecs}.
A code generator is used to generate machine code equivalent to the ABC-code for concrete machines.
This, again, is done in several steps, so that only a relatively small part of the compilation process is target-dependent.
For adapting the ARM code generator to work for Thumb-2 one needs to have only very basic knowledge of the ABC-machine.
For this reason we will not discuss its internals here ---
they have been described in \cite{execspecs} and \cite{m680x0}.
After compilation, a Clean program is linked together with the Clean run-time system.
The RTS ensures that that the \clean{Start} node is reduced and printed and takes care of garbage collecting.
This system is written in C and assembly, so making Clean available on Thumb-2 inherently means adapting the platform-dependent parts of the RTS as well.
\subsection{A Thumb backend for Clean}
\label{sec:intro:clean-thumb}
In this thesis, we propose a Thumb backend for Clean.
The ARM code generator and RTS were taken as a starting point.
Thanks to the Unified Assembler Language, only little time was needed to make these systems assemble for the Thumb instruction set.
The proposed backend does not use ARM and Thumb interworking.
The reason for this is threefold.
First, there are several processors, like the ARMv7-M series~\parencite[A4.1]{armv7m}, that do support Thumb instructions but cannot run ARM code.
By only using Thumb, we target the widest possible range of devices.
Second, we doubt that using interworking can be done efficiently.
In the run-time system, only minimal time overhead is introduced by using Thumb instructions.
For generated code it would be complicated to detect if the ARM or Thumb instruction set would give better results,
and this would give significantly better results only in specific cases.
Third, the problem discussed in \cref{sec:two-bits} could be solved efficiently only without interworking.
Using interworking would introduce overhead at every branch instruction,
since the solution to this problem would have to be adapted.
\subsection{Terminology}
\label{sec:intro:terminology}
In this thesis, I will usually write \enquote{Thumb} where the Thumb-2 extension is meant.
Only when the distinction with pre-ARMv6T2 Thumb is important will I distinguish between (early) Thumb and Thumb-2.
As for \enquote{ARM}, it should be clear from the context whether the architecture or the instruction set is meant.
For brevity, a number of common abbreviations is used.
An overview can be found in \cref{sec:abbreviations}.
\subsection{Organisation}
\label{sec:intro:organisation}
In much of the rest of this thesis we discuss differences between ARM and Thumb,
their influences on code generation,
and the way they were dealt with in the Thumb backend for Clean proposed in this thesis.
In \cref{sec:storing-pc}, we consider an issue arising from halfword-aligned instructions and the way a read of PC is interpreted under Thumb.
\Cref{sec:two-bits} discusses problems related to the fact that Thumb instruction addresses use bit 1 and should have bit 0 set for interworking,
while under ARM a branch will automatically clear both these bits.
The Thumb backend for Clean has been benchmarked.
The results are collected and discussed in \cref{sec:results}.
\end{multicols}
|