From 45e52506a41870afa57389fe9741e6bfd57408e8 Mon Sep 17 00:00:00 2001 From: Camil Staps Date: Tue, 22 Nov 2016 23:56:49 +0100 Subject: More intro; draft mode --- thesis/Makefile | 6 +- thesis/intro.tex | 154 ++++++++++++++++++++++++++++++++++++++++++++++++-- thesis/preamble.tex | 30 +++++----- thesis/results.tex | 2 + thesis/storing-pc.tex | 18 +++--- thesis/thesis.bib | 9 +-- thesis/thesis.tex | 49 +++++++++++----- thesis/two-bits.tex | 5 +- 8 files changed, 218 insertions(+), 55 deletions(-) diff --git a/thesis/Makefile b/thesis/Makefile index f8cec89..3dc4522 100644 --- a/thesis/Makefile +++ b/thesis/Makefile @@ -26,5 +26,7 @@ all: $(DOC).pdf $(RM) $(basename $<).mlog clean: - $(RM) $(addprefix $(DOC).,aux bbl bcf blg fmt log mlog run.xml out pdf toc)\ - $(DOC)-blx.bib + $(RM) -r\ + $(addprefix $(DOC).,aux bbl bcf blg fmt log mlog run.xml out pdf toc)\ + $(DOC)-blx.bib\ + _minted-$(DOC) diff --git a/thesis/intro.tex b/thesis/intro.tex index 9299305..85f13b8 100644 --- a/thesis/intro.tex +++ b/thesis/intro.tex @@ -116,23 +116,167 @@ A Clean program consists of a list of rewrite rules, for example: \begin{minted}{clean} fac 0 = 1 - fac n = n * (fac (n - 1)) + fac n = n * fac (n - 1) Start = fac 4 \end{minted} Executing a Clean program means rewriting the \clean{Start} rule until no rewriting is possible any more. The first rewriting steps of the above program are shown in \cref{fig:intro:rewriting}. -A Clean program is compiled to ABC-code, a platform-independent bytecode for an abstract graph rewriting machine~\parencite{execspecs}. +\subsubsection{The ABC-machine} +\label{sec:intro:clean:abc} +A Clean program is compiled to ABC-code, a platform-independent language for the ABC-machine, an abstract graph rewriting machine~\parencite{execspecs}. A code generator is used to generate machine code equivalent to the ABC-code for concrete machines. This, again, is done in several steps, so that only a relatively small part of the compilation process is target-dependent. -For adapting the ARM code generator to work for Thumb-2 one needs to have only very basic knowledge of the ABC-machine. -For this reason we will not discuss its internals here --- + +The ABC-machine is \enquote{an imperative abstract machine architecture for graph rewriting}~\parencite{execspecs}. +It consists of three stacks: for arguments (A), basic values (B) and control information like return addresses (C). +It also has a program store containing the graph rewriting rules that make up the program, + a graph store that contains the graph to be rewritten, + and several other elements that we need not mention here. + +We do not discuss the internals of the ABC-machine in much depth here --- they have been described in \cite{execspecs} and \cite{m680x0}. +However, to get a better idea of the working of the ABC-machine, + we consider the ABC-code for the factorial example above briefly. + +The two rules for \clean{fac 0} and \clean{fac n} are translated to the following ABC-code% + \footnote{The actual code generated by the compiler is slightly larger, since it uses some unnecessary instructions. + They are removed automatically by the code generator and are not considered here for brevity.}: + +\begin{minted}[tabsize=4]{abc} + sfac.1 + eqI_b 0 0 | Is argument on B-stack 0? + jmp_true case.1 | Jump to first alternative + jmp case.2 | Jump to second alternative + case.1 | First alternative (fac 0) + pop_b 1 | Pop argument from B-stack + pushI 1 | Push result to B-stack + rtn | Return + case.2 | Second alternative (fac 1) + pushI 1 | Push 1 to B-stack + push_b 1 | Copy argument to B-stack + subI | Subtract on B-stack + jsr sfac.1 | Call factorial + mulI | Multiply two stack elements + rtn | Return +\end{minted} + +The code under \abc{sfac.1} is for the pattern match on the first argument. +In \abc{case.1}, we handle the case that it is zero. +We only need to remove it from the stack and push the return value 1. +In \abc{case.2}, we copy the argument, decrement it and recursively call \abc{sfac.1}. +After the \abc{jsr} instruction, we will have two elements on the stack: + the result of the recursive call and the original argument. +We can the multiply them and return. + +This function uses the B-stack, for basic values, for its arguments and return values, because it uses only integers. +More complex types would be passed on the A-stack. + +The \clean{Start} rule compiles to% + \footnote{Again, the code has been shorted insignificantly for brevity.}: + +\begin{minted}[tabsize=4]{abc} + __fac_Start + build _ 0 nStart.2 | Build the Start node + jmp _driver | Jump to the RTS + nStart.2 | The Start node entry + jsr eaStart.2 | Jump to actual code + fillI_b 0 0 | Copy result to A-stack + pop_b 1 | Pop B-stack + rtn | Return + eaStart.2 | Actual code + pushI 4 | Push 4 to B-stack + jmp sfac.1 | Call factorial +\end{minted} + +When the program starts and \abc{__fac_Start} is executed, a node for \clean{Start} is created in the graph store. +We then jump to the run-time system (discussed in \cref{sec:intro:clean:rts}), where the \abc{_driver} subroutine will make sure that this node is reduced and printed. +In \abc{nStart.2}, we call \abc{eaStart.2}. +This subroutine will execute \clean{fac 4} and return the result on the B-stack. +Since \abc{nStart.2} is expected to return its result on the A-stack, we need to copy it and clean up the B-stack. +In \abc{eaStart.2}, we only need to set the argument for the factorial, 4, on the stack, and we jump to \abc{sfac.1} to let it do the rest. + +\subsubsection{The code generator} +\label{sec:intro:clean:cg} +The code generator translates the abstract ABC-code into concrete machine code. +While the ABC-machine can be implemented in a straightforward manner using macro expansion, + Clean's code generator does more than that: + it introduces several optimisations, some of which are target-dependent. + +The ARM code for the factorial example is as follows% + \footnote{Some irrelevant peculiarities have been removed for brevity and clarity.}: + +\begin{minted}{ual} + sfac_P1: + cmp r4,#0 @ Is argument 0? + bne case_P2 + case_P1: + mov r4,#1 @ Return 1 + ldr pc,[sp],#4 + case_P2: + str r4,[sp,#-4]! @ Copy argument + add r4,r4,#-1 @ Decrement argument + str pc,[sp,#-4]! + bl sfac_P1 @ Call factorial + ldr r12,[sp],#4 @ Multiply with own argument + mul r4,r12,r4 + ldr pc,[sp],#4 +\end{minted} + +We see that the argument is passed in \ual{r4}, and that register is also used for the result. +In ARM, \ual{r4} through \ual{r0} are used for the top of the B-stack. +The machine stack pointer \ual{sp} is used as the B-stack pointer. +For intermediate results, \ual{r12} is used as a scratch register. +When a reduction has finished, control is passed back to the callee by loading the PC from the stack (\ual{ldr pc,[sp],#4}). + +The \clean{Start} rule translates roughly to: + +\begin{minted}{ual} + ____fac__Start: + ldr r12,=nStart_P2 @ Store Start node on the heap + str r12,[r10] + mov r6,r10 @ Save node pointer on A-stack + add r10,r10,#12 @ Reserve heap space + b __driver @ Jump to RTS + nStart_P2: + str r6,[r9],#4 @ Save node entry + str pc,[sp,#-4]! @ Save return address + bl eaStart_P2 @ Jump to actual code + ldr r6,[r9,#-4]! @ Restore node entry + ldr r12,=INT+2 @ Make an Int node on A-stack + str r12,[r6] + str r4,[r6,#4] @ Give that node value r4 + ldr pc,[sp],#4 @ Return to callee + eaStart_P2: + mov r4,#4 @ Push 4 to B-stack + sfac_P1 @ A jump to sfac_P1 is not + @ ... @ needed as it is right below +\end{minted} + +For the A-stack, \ual{r6} through \ual{r8} and \ual{r11} are used, and \ual{r9} is the A-stack pointer. +Nodes (the graph store of the ABC-machine) are stored on a heap which uses \ual{r10} as a pointer. +To store the \clean{Start} node on the heap, we need to write its address and then increase \ual{r10} to reserve the space. +The \ual{__driver} subroutine will reduce the top of the A-stack, so saving the pointer to the \clean{Start} node in \ual{r6}, the top of the A-stack, makes sure that \ual{__driver} jumps to \ual{nStart_P2}. +There, we do some administration and jump to \ual{eaStart_P2}. +In that routine, the literal \ual{4} is pushed to the B-stack and we \enquote{jump} to \ual{sfac_P1} --- + the code generator optimises this by placing \ual{sfac_P1} right below and removing the branch instruction. +When \ual{sfac_P1} returns, we continue in \ual{nStart_P2}. +There we need to copy the result from the B-stack to the A-stack. +Since the B-stack is untyped (the compiler makes sure that it is safe), while the A-stack is typed, + we need to create a node on the heap with the \ual{INT+2} entry address. + +Note that ABC instructions like \abc{pushI 4} (push 4 to the B-stack) are translated to ARM code like \ual{mov r4,#4}: + the content of the B-stack is not moved down, as one might expect. +This is possible because the code generator knows that the B-stack will be empty at this point. +Had it not been empty, but had one element, for example, a \ual{mov r3,r4} instruction would have preceded to move it down. +The actual ABC-code contains annotations that tell the code generator how many elements the stacks hold, so that these optimisations can be made. +\subsubsection{The run-time system} +\label{sec:intro:clean:rts} After compilation, a Clean program is linked together with the Clean run-time system. The RTS ensures that that the \clean{Start} node is reduced and printed and takes care of garbage collecting. -This system is written in C and assembly, so making Clean available on Thumb-2 inherently means adapting the platform-dependent parts of the RTS as well. +This system is written in Clean, C and assembly, so making Clean available on Thumb-2 inherently means adapting the platform-dependent parts of the RTS as well. \subsection{A Thumb backend for Clean} \label{sec:intro:clean-thumb} diff --git a/thesis/preamble.tex b/thesis/preamble.tex index 636d2ad..ae16a89 100644 --- a/thesis/preamble.tex +++ b/thesis/preamble.tex @@ -1,6 +1,13 @@ \documentclass[a4paper,twoside]{article} -\usepackage[hmarginratio={3:2}]{geometry} +\newif\ifdraft +\drafttrue + +\ifdraft + \usepackage[inner=2cm,outer=2cm,top=2cm,bottom=25mm]{geometry} +\else + \usepackage[hmarginratio={3:2}]{geometry} +\fi \usepackage{multicol} \usepackage[T1]{fontenc} @@ -30,27 +37,16 @@ \usepackage{latexgit} \usepackage{minted} -\usepackage{mdframed} -% Need to patch mdframed, see http://tex.stackexchange.com/a/292090/23992 -\usepackage{xpatch} -\makeatletter -\xpatchcmd{\endmdframed} - {\aftergroup\endmdf@trivlist\color@endgroup} - {\endmdf@trivlist\color@endgroup\@doendpe} - {}{} -\makeatother \definecolor{mintedbg}{rgb}{0.95,0.95,0.95} -\surroundwithmdframed[% - backgroundcolor=mintedbg, - linewidth=0pt, - skipabove=.5em,skipbelow=.5em]{minted} \setminted{% autogobble, tabsize=4, fontsize=\scriptsize, - style=lovelace - %bgcolor=mintedbg + style=lovelace, + bgcolor=mintedbg } +\BeforeBeginEnvironment{minted}{\noindent\begin{minipage}{\linewidth}} +\AfterEndEnvironment{minted}{\end{minipage}} \setmintedinline{fontsize=\footnotesize,bgcolor={}} \usepackage{ual} \usepackage{clean} @@ -74,7 +70,7 @@ \makeatother \let\oldsection\section -\renewcommand{\section}{\cleardoublepage\oldsection} +\renewcommand{\section}{\ifdraft\else\clearpage\fi\oldsection} \raggedcolumns \renewcommand{\baselinestretch}{1.1} diff --git a/thesis/results.tex b/thesis/results.tex index 8ffb2a7..cfd043d 100644 --- a/thesis/results.tex +++ b/thesis/results.tex @@ -5,8 +5,10 @@ \subsection{Code size} \label{sec:results:size} +\todo{\dots} \subsection{Running time} \label{sec:results:time} +\todo{\dots} \end{multicols} diff --git a/thesis/storing-pc.tex b/thesis/storing-pc.tex index b642ee9..e08fb4d 100644 --- a/thesis/storing-pc.tex +++ b/thesis/storing-pc.tex @@ -65,19 +65,19 @@ Since \clean{isEmpty} pattern matches on its first argument, it needs to be eval This is done with \abc{jsr_eval 0}. Only after that can we check if its constructor is \abc{_Nil} (the empty list), which is done with \abc{eq_desc _Nil 0 0}. -\begin{minted}[tabsize=4]{text} +\begin{minted}[tabsize=4]{abc} sisEmpty.1 - jsr_eval 0 || Evaluate argument - eq_desc _Nil 0 0 || If it equals [] - jmp_true case.1 || .. jump to case.1 - jmp case.2 || [else] to case.2 + jsr_eval 0 | Evaluate argument + eq_desc _Nil 0 0 | If it equals [] + jmp_true case.1 | .. jump to case.1 + jmp case.2 | [else] to case.2 case.1 - pop_a 1 || Pop argument - pushB TRUE || Return True + pop_a 1 | Pop argument + pushB TRUE | Return True rtn case.2 - pop_a 1 || Pop argument - pushB FALSE || Return False + pop_a 1 | Pop argument + pushB FALSE | Return False rtn \end{minted} diff --git a/thesis/thesis.bib b/thesis/thesis.bib index f7e5134..30f945c 100644 --- a/thesis/thesis.bib +++ b/thesis/thesis.bib @@ -1,24 +1,21 @@ @manual{armv7ar, label="ARM Ltd.", - key="ARM Ltd.", + key="ARM Ltd. 1996", title="ARM Architecture Reference Manual. ARMv7-A and ARMv7-R edition", - organization="ARM Ltd.", year=1996 } @manual{armv7m, label="ARM Ltd.", - key="ARM Ltd.", + key="ARM Ltd. 2006", title="ARMv7-M Architecture Reference Manual", - organization="ARM Ltd.", year=2006 } @manual{armv8a, label="ARM Ltd.", - key="ARM Ltd.", + key="ARM Ltd. 2013", title="ARM Architecture Reference Manual. ARMv8, for ARMv8-A architecture profile (Beta)", - organization="ARM Ltd.", year=2013 } diff --git a/thesis/thesis.tex b/thesis/thesis.tex index d2f1e06..8760159 100644 --- a/thesis/thesis.tex +++ b/thesis/thesis.tex @@ -1,19 +1,25 @@ %&thesis \begin{document} -\maketitleru[ - course={Bachelor Thesis\\[.4em]Computer Science}, - authorstext={Author:}, - authors={Camil Staps}, - righttextheader={First supervisor:}, - righttext={prof.~dr.~dr.h.c.~ir.~M.J.~Plasmeijer}, - righttextBheader={Second supervisor:}, - righttextB={drs.~J.H.G.~van~Groningen}, - pagenr=1] - -\setcounter{page}{2} +\ifdraft + \begin{center} + {\Large Bachelor Thesis} -\cleardoublepage + \textcolor{red}{\emph{This is a draft version. The layout has been changed to save paper.}} + \end{center} +\else + \maketitleru[ + course={Bachelor Thesis\\[.4em]Computer Science}, + authorstext={Author:}, + authors={Camil Staps}, + righttextheader={First supervisor:}, + righttext={prof.~dr.~dr.h.c.~ir.~M.J.~Plasmeijer}, + righttextBheader={Second supervisor:}, + righttextB={drs.~J.H.G.~van~Groningen}, + pagenr=1] + \setcounter{page}{2} + \cleardoublepage +\fi \begin{abstract} The Thumb-2 instruction set combines the best features of the ARM and Thumb instruction sets (speed and small code size, respectively). @@ -23,26 +29,41 @@ Specifically, we look at code generation for the purely functional programming l \todo{results, efficiency, etc.} \end{abstract} +\ifdraft\else \cleardoublepage +\fi -\thispagestyle{empty} -\tableofcontents +\ifdraft + \begin{multicols}{2} + \tableofcontents + \end{multicols} +\else + \thispagestyle{empty} + \tableofcontents +\fi +\ifdraft\else \cleardoublepage +\fi \input{intro} \input{storing-pc} \input{two-bits} \input{results} +\ifdraft\else \cleardoublepage +\fi \appendix \input{abbr} +\ifdraft\else \cleardoublepage +\fi \let\oldurl\url \renewcommand{\url}[1]{{\small\oldurl{#1}}} +\phantomsection\addcontentsline{toc}{section}{\bibname} \printbibliography \end{document} diff --git a/thesis/two-bits.tex b/thesis/two-bits.tex index fcd4046..56e7c45 100644 --- a/thesis/two-bits.tex +++ b/thesis/two-bits.tex @@ -1,4 +1,4 @@ -\section{Two bits} +\section{Code addresses} \label{sec:two-bits} \begin{multicols}{2} @@ -44,7 +44,8 @@ Flipping the meaning of the LSB in the copying collector fixes this issue. \subsection{Comparison} \label{sec:two-bits:comparison} -Flipping the meaning of the LSB in the garbage collector amounts to swapping \ual{bne} and \ual{beq} and similar changes that do not effect the program's efficiency. +Flipping the meaning of the LSB in the garbage collector amounts to swapping \ual{bne} and \ual{beq} and similar changes that do not effect the program's efficiency or size. + By word-aligning all node entry addresses we lose one alignment byte per node entry address on average (assuming that half of the node entry points are word-aligned already). This increases code size slightly, but since many instructions that were 32-bit in ARM are now 16-bit, the overall code size is still smaller. -- cgit v1.2.3