diff options
Diffstat (limited to 'thesis/intro.tex')
-rw-r--r-- | thesis/intro.tex | 106 |
1 files changed, 103 insertions, 3 deletions
diff --git a/thesis/intro.tex b/thesis/intro.tex index 5be42af..0b75b49 100644 --- a/thesis/intro.tex +++ b/thesis/intro.tex @@ -1,6 +1,56 @@ \section{Introduction} \label{sec:intro} +\begin{figure*}[b] + \centering + \begin{subfigure}[b]{.2\linewidth} + \centering + \begin{tikzpicture}[clean] + \node (start) {\clean{Start}}; + \end{tikzpicture} + \caption{Initially} + \end{subfigure}% + \begin{subfigure}[b]{.2\linewidth} + \centering + \begin{tikzpicture}[clean] + \node[node args=1] (fac) {\clean{fac}}; + \node[below=of fac.arg1.south] (4) {\clean{4}}; + \draw (fac.arg1) -- (4.north); + \end{tikzpicture} + \caption{Applying \clean{Start}} + \end{subfigure}% + \begin{subfigure}[b]{.3\linewidth} + \centering + \begin{tikzpicture}[clean] + \node[node args=2] (times) {\clean{*}}; + \node[below=of times.arg1.south] (4) {\clean{4}}; + \node[node args=1,right=of times.arg2.east] (fac) {\clean{fac}}; + \node[node args=2,right=of fac.arg1.east] (minus) {\clean{-}}; + \node[right=of minus.arg2.east] (1) {\clean{1}}; + \draw (times.arg1) -- (4.north); + \draw (times.arg2) -- (fac.west); + \draw (fac.arg1) -- (minus.west); + \draw (minus.arg1) |- (4.east); + \draw (minus.arg2) -- (1.west); + \end{tikzpicture} + \caption{Applying \clean{fac n}} + \end{subfigure}% + \begin{subfigure}[b]{.2\linewidth} + \centering + \begin{tikzpicture}[clean] + \node[node args=2] (times) {\clean{*}}; + \node[below=of times.arg1.south] (4) {\clean{4}}; + \node[node args=1,right=of times.arg2.east] (fac) {\clean{fac}}; + \node[below=of fac.arg1.south] (3) {\clean{3}}; + \draw (times.arg1) -- (4.north); + \draw (times.arg2) -- (fac.west); + \draw (fac.arg1) -- (3.north); + \end{tikzpicture} + \caption{Applying \clean{-}} + \end{subfigure} + \caption{Rewriting a Clean node\label{fig:intro:rewriting}} +\end{figure*} + \begin{multicols}{2} \subsection{Introduction} @@ -24,20 +74,70 @@ Starting from ARMv6T2, an extension to the Thumb instruction set, known as Thumb This gives the ARM and Thumb instruction sets `almost identical functionality'~\citep[A1.2]{armv7ar}, whereas Thumb gives a smaller code size. +Using the Unified Assembly Language (UAL), one can write assembly code for both the ARM and the Thumb instruction set and fix the target ISA only at assemble-time. + +\label{sec:intro:interworking} +The ARM and Thumb instruction sets are designed to \emph{interwork}: + different parts of a program can be assembled for different instruction sets + and it is possible to switch instruction set when the program counter is written~\citep[A4.1]{armv7ar}. +The Thumb-2 code generator proposed in this thesis does not produce ARM code, + though the existence of the interworking facility has effects on the techniques that can be used in it. +This will be covered in \autoref{sec:two-bits}. + \subsection{Clean} \label{sec:intro:clean} Clean is `a general purpose, state-of-the-art, pure and lazy functional programming language designed for making real-world applications'~\cite{clean}. It evolved from LEAN, an intermediate language based on graph rewriting~\citep{lean}. -As such, one of the steps in the compilation of a Clean program is the translation to bytecode for the ABC-machine, an abstract graph rewriting machine~\citep{execspecs}. -A code generator is used to generate code equivalent to the ABC-code for concrete machines. -%todo +A Clean program consists of a list of rewrite rules, for example: + +\begin{minted}{clean} + fac 0 = 1 + fac n = n * (fac (n - 1)) + Start = fac 4 +\end{minted} + +Executing a Clean program means rewriting the \clean{Start} rule until no rewriting is possible any more. +The first rewriting steps of the above program are shown in \autoref{fig:intro:rewriting}. + +A Clean program is compiled to ABC-code, a platform-independent bytecode for an abstract graph rewriting machine~\citep{execspecs}. +A code generator is used to generate machine code equivalent to the ABC-code for concrete machines. +This, again, is done in several steps, so that only a relatively small part of the compilation process is target-dependent. +For adapting the ARM code generator to work for Thumb-2 one needs to have only very basic knowledge of the ABC-machine. +For this reason we will not discuss its internals here --- + they have been described in \cite{execspecs} and \cite{m680x0}. + +After compilation, a Clean program is linked together with the Clean run-time system. +The RTS ensures that that the \clean{Start} node is reduced and printed and takes care of garbage collecting. +This system is written in C and assembly, so making Clean available on Thumb-2 inherently means adapting the platform-dependent parts of the RTS as well. + +\subsection{A Thumb backend for Clean} +\label{sec:intro:clean-thumb} +In this thesis, we propose a Thumb backend for Clean. +The ARM code generator and RTS were taken as a starting point. +Thanks to the Unified Assembly Language, only little time was needed to make these systems assemble for the Thumb instruction set. + +The proposed backend does not use ARM and Thumb interworking. +The reason for this is threefold. +First, there are several processors, like the ARMv7-M series~\citep[A4.1]{armv7m}, that do support Thumb instructions but cannot run ARM code. +By only using Thumb, we target the widest possible range of devices. +Second, we doubt that using interworking can be done efficiently. +In the run-time system, only minimal time overhead is introduced by using Thumb instructions. +For generated code it would be complicated to detect if the ARM or Thumb instruction set would give better results, + and this would give significantly better results only in specific cases. +Third, the problem discussed in \autoref{sec:two-bits} could be solved efficiently only without interworking. +Using interworking would introduce overhead at every branch instruction, + since the solution to this problem would have to be adapted. \subsection{Terminology} \label{sec:intro:terminology} In this thesis, I will usually write `Thumb' where the Thumb-2 extension is meant. Only when the distinction with pre-ARMv6T2 Thumb is important will I distinguish between (early) Thumb and Thumb-2. +As for `ARM', it should be clear from the context whether the architecture or the instruction set is meant. For brevity, a number of common abbreviations is used. An overview can be found in \autoref{sec:abbreviations}. +\subsection{Organisation} +\label{sec:intro:organisation} + \end{multicols} |