diff options
author | Camil Staps | 2023-01-30 21:36:34 +0100 |
---|---|---|
committer | Camil Staps | 2023-01-30 21:36:34 +0100 |
commit | 6b26fd43a6bfc4b45798c4e15c47159db9f880f4 (patch) | |
tree | 0ed25dc7b03e9e618df9681f801a19db82dc71c9 /doc | |
parent | Align on halfwords instead of double words; use data/text boundary to disting... (diff) |
Add documentation
Diffstat (limited to 'doc')
-rw-r--r-- | doc/docs/backend/registers.md | 29 | ||||
-rw-r--r-- | doc/docs/backend/rts.md | 205 | ||||
-rw-r--r-- | doc/docs/frontend/syntax.md | 80 | ||||
-rw-r--r-- | doc/docs/index.md | 45 | ||||
-rw-r--r-- | doc/mkdocs.yml | 20 | ||||
-rw-r--r-- | doc/requirements.txt | 1 |
6 files changed, 380 insertions, 0 deletions
diff --git a/doc/docs/backend/registers.md b/doc/docs/backend/registers.md new file mode 100644 index 0000000..60a9f20 --- /dev/null +++ b/doc/docs/backend/registers.md @@ -0,0 +1,29 @@ +--- +vim: noexpandtab tabstop=2 shiftwidth=2 +--- + +# Registers + +The following registers have a special use in the Snug run time system: + +| # | Name | Standard use | Snug use | +|---:|-------|--------------|----------| +| 0 | `$0` | Always zero. | | +| 1 | `$at` | Reserved for the assembler. | | +| 2-3 | `$v0`-`$v1` | Function result, callee-saved(?). | | +| 4-7 | `$a0`-`$a3` | Integer argument, caller-saved. | | +| 8-15 | `$t0`-`$t7` | Caller-saved GPR. | | +| 16 | `$s0` | Callee-saved GPR.| [Back pointer](rts.md#evaluation) for printing. | +| 17 | `$s1` | id. | [Front pointer](rts.md#evaluation) for printing. | +| 18 | `$s2` | id. | [Back pointer](rts.md#evaluation) for evaluation. | +| 19 | `$s3` | id. | [Front pointer](rts.md#evaluation) for evaluation. | +| 20 | `$s4` | id. | | +| 21 | `$s5` | id. | | +| 22 | `$s6` | id. | | +| 23 | `$s7` | id. | [Data/text region boundary](rts.md#node-layout). | +| 24-25 | `$t8`-`$t9` | Caller-saved GPR. | | +| 26-27 | `$k0`-`$k1` | Reserved for kernel use. | | +| 28 | `$gp` | Global pointer. | Heap pointer. | +| 29 | `$sp` | Stack pointer. | | +| 30 | `$fp` | Frame pointer / callee-saved GPR. | | +| 31 | `$ra` | Return address / caller-saved GPR. | | diff --git a/doc/docs/backend/rts.md b/doc/docs/backend/rts.md new file mode 100644 index 0000000..d401f3a --- /dev/null +++ b/doc/docs/backend/rts.md @@ -0,0 +1,205 @@ +--- +vim: noexpandtab tabstop=2 shiftwidth=2 +--- + +# The Snug run time system + +The Snug run time system is based on graph reduction. It is based on two +sources of inspiration: + +- The run time system of [Clean][], which I got to know in depth through my + work on the [ABC interpreter][]. I am grateful for many discussions with John + van Groningen and Rinus Plasmeijer during my work on Clean. +- The 1987 book *The implementation of functional programming languages* by + Simon L. Peyton Jones (henceforth IFPL). + +At times, unorthodox choices have been made to reduce memory usage as much as +possible, sacrificing speed when necessary. + +## Node layout + +There are two types of nodes: head normal forms (hnfs) and thunks. + +Both types of nodes are of variable width. Thunks have at least 2 words so that +they can always be overwritten with an indirection. Hnfs have no minimum size. + +The first word of a hnf node points to a [descriptor](#hnf-descriptors) +containing information such as arity and constructor name (for printing). The +first word of a thunk node points to the code that can evaluate the thunk. +Information about the arity can be found above this code (see +[thunk descriptors](#thunk-descriptors)). Thus the nodes look like this: + +| Descriptor | Argument 1 | Argument 2 | +|------------|---------------|---------------| +| CONS | (ptr to head) | (ptr to tail) | + +| Descriptor | Argument 1 | Argument 2 | +|------------|--------------|--------------| +| ++ | (ptr to arg) | (ptr to arg) | + +The two types of nodes can be distinguished because the descriptor of hnf nodes +is in the data section, while the code to evaluate a thunk is in the text +section. To quickly check whether a node is a hnf or a thunk, the address of +the boundary of the data and text section is held in a +[register](registers.md). + +The arguments of some built-in constructors can be unboxed: + +| Descriptor | Argument 1 | +|------------|------------| +| INT | 37 | + +## Descriptors + +### Hnf descriptors +Descriptors of hnf nodes are located in the data section. They contain +information about the arity of the node. It also contains the number of +arguments to be curried in, which is 0 for all data constructors but non-zero +for the [descriptors of closures](#closure-descriptors). (This means that, in +principle, data constructor descriptors only need a halfword.) + +```mipsasm + .align 1 +__main_cTuple: + .byte 2 # pointer arity + .byte 0 # basic value arity + .byte 0 # number of arguments to be curried in +``` + +### Thunk descriptors +Thunk nodes point to the code that can evaluate them. Above this code a +descriptor is used storing information about arity and strictness: + +```mipsasm + .align 1 + .byte 0x1 # strict in 1st argument + .byte 2 # arity +__main_nappend: + # ... +``` + +!!! warning "Implementation may change" + Strictness will probably be removed in the future, as evaluating arguments + will be moved out of the driver into generated code (see + [evaluation](#evaluation)). + + This also makes the implementation of `ap` simpler, as it does not need to + check strictness of the function it evaluates any more. + +### Closure descriptors +For each thunk with a non-zero number of arguments there is also a closure +descriptor table in the data section. This is a table of hnf descriptors used +for closures (unsaturated function applications) of the thunk. It looks like +this: + +```mipsasm + .align 2 + # entry for closure with 0 arguments: + .byte 0 # pointer arity + .byte 0 # basic value arity + .byte 2 # number of arguments that are still to be curried in + .byte 0 # reserved + # entry for closure with 1 argument: + .byte 1 # pointer arity + .byte 0 # basic value arity + .byte 1 # number of arguments that are still to be curried in + .byte 0 # reserved + # pointer to corresponding code entry: +__main_uappend: + .word __main_nappend +``` + +The node of a closure that still needs *n* arguments is given the descriptor +`__MODULE_uFUNCTION`-4\**n*. + +Data descriptors also have a closure descriptor table. The final word points to +a thunk entry that simply builds a hnf of the corresponding saturated data +descriptor. + +Arguments can be curried into a closure using a special `ap` thunk. It checks +the number of arguments that are still to be curried in. If this number is 1, +it evaluates the function; if not, it creates a new closure: + +```mipsasm + .align 1 + .byte 0x1 # strict in the first argument + .byte 2 # pointer arity +_nap: + # custom implementation in assembly +``` + +!!! note "Optimization" + This can be optimized for space by adding `ap2`, `ap3`, etc., for currying in + multiple arguments at once. That way we can, for example, use a 5-word `ap3` + node for currying in three arguments, instead of using a chain of three `ap` + nodes (9 words in total). However, since this is presumably not very + frequent, it is not high priority, especially not for large numbers of + arguments. + +## Evaluation + +Evaluation is done using pointer reversal so that the stack can be avoided. The +technique is described in IFPL ยง11.6.1. The idea is as follows. + +We have a *back pointer* which initially points to a special ROOT node. We have +a *front pointer* which initially points to the root of the graph to be +evaluated. During evaluation, the front pointer will always point to the node +under evaluation, and the back pointer will always point to its parent node. + +To evaluate a node, its strict arguments must be evaluated. To do this, the +following steps are executed simultaneously: + +- The pointer to the child node is updated with the value of the back pointer + and marked by setting the least significant bit. +- The back pointer is updated to the front pointer. +- The front pointer is updated to the child node. +- Jump to the evaluation entry of the child node. + +The old back pointer can be recovered by finding the marked cell in the parent +node, which is now pointed to by the back pointer. Therefore, after evaluation +of the child node, the following steps are performed simultaneously: + +- The cell pointing to the child node is recovered by checking which argument + of the parent node (pointed to by the back pointer) is marked. +- The cell pointing to the child node is updated with the new address of the + child node (it may have changed if the old node could not be overwritten + because it was smaller, or in case garbage collection ran). +- The back pointer is set to the old back pointer (the value in the cell + pointing to the child node). +- The front pointer is set to the back pointer. +- Jump to the evaluation entry of the parent node. + +The parent node will then check again if the child that has just been evaluated +needs to be evaluated. This isn't terribly efficient, but we assume there to be +few strict arguments. + +!!! warning "Implementation may change" + At the moment, thunk descriptors contain information about strictness. This + way, the central `eval` driver can check if arguments of a thunk node need to + be evaluated before evaluating the thunk itself. + + It is more efficient, and does not cost a lot in terms of program size, if + thunks get a separate entry point that first evaluates its strict arguments. + The descriptor then does not need to include strictness information, and the + central `eval` driver does not need to use this to dynamically evaluate + children. + +## Garbage collection + +Not implemented yet: a variant of Jonkers' algorithm, using the most +significant bit for pointer reversal. The most significant bit is always set on +the PIC32 devices we target. + +## Reserved bits + +The design leaves some space in addresses for special use of the run time +system: + +- The least significant bit is always zero and can be used for pointer reversal + during [evaluation](#evaluation). +- The most significant bit is always one (check the data sheet for the memory + map), so it can be used for pointer reversal during [garbage + collection](#garbage-collection). + +[ABC interpreter]: https://clean-lang.org/pkg/abc-interpreter +[Clean]: https://clean-lang.org diff --git a/doc/docs/frontend/syntax.md b/doc/docs/frontend/syntax.md new file mode 100644 index 0000000..836201c --- /dev/null +++ b/doc/docs/frontend/syntax.md @@ -0,0 +1,80 @@ +--- +vim: noexpandtab tabstop=2 shiftwidth=2 +--- + +# Syntax + +Snug syntax is based on [S-expressions][] to simplify parsing, with some +modifications. + +Importantly, parentheses are meaningful as they are used to express lists in +the syntax. For instance, the list of arguments to a function is enclosed in +parentheses. In such contexts it is an error to use redundant parentheses, or +to omit parentheses in case of a single element. + +## Data types + +Algebraic data types can be defined with: + +```snug +(data List (a) ( + Nil + (Cons a (List a)))) +``` + +In abstract terms, the syntax is: + +```snug +(data NAME (VARIABLES) (CONSTRUCTORS)) +``` + +where constructors are of the form `NAME` (without arguments) or +`(NAME TYPE [TYPE..])` (with arguments). + +Type synonyms can be defined with: + +```snug +(type String (List Char)) +``` + +## Functions + +```snug +(fun length ((xs : List a)) : Int + (length_acc 0 xs)) + +(fun length_acc ((n : Int) (xs : List a)) : Int + (case xs ( + (Nil -> n) + (Cons _ xs -> length_acc (+ n 1) xs)))) +``` + +In abstract terms, the syntax is: + +```snug +(fun NAME (ARGUMENTS) : TYPE (EXPRESSION)) +``` + +where arguments are of the form `(NAME : TYPE)`. + +## Expressions + +Expressions are: + +- Basic values like `37` or `'A'` +- Symbol identifiers, either global or local +- Data constructor identifiers +- Function applications +- Case expressions + +Case expressions have the following syntax: + +```snug +(case EXPRESSION (ALTERNATIVES)) +``` + +and alternatives have the form `(PATTERN -> EXPRESSION)`. Patterns use the same +form as expressions, but only basic values, symbol identifiers (for variable +binding), and fully saturated data constructors are allowed. + +[S-expressions]: https://en.wikipedia.org/wiki/S-expression diff --git a/doc/docs/index.md b/doc/docs/index.md new file mode 100644 index 0000000..c2bebe4 --- /dev/null +++ b/doc/docs/index.md @@ -0,0 +1,45 @@ +--- +vim: noexpandtab tabstop=2 shiftwidth=2 +--- + +# Snug + +Snug is a lazy functional programming language. It is designed to be tiny in +memory footprint of both generated code and run time memory use, so that it can +run on embedded systems. + +The current goal is to be able to run a Snug compiler written in Snug on a +[PIC32MZ2048EFG064][]. This microcontroller has 2MB flash memory, 512KB SRAM, +and a MIPS32 instruction set (with microMIPS code compression). + +The end goal is as follows: + +1. The microcontroller has access to a flash drive or SD card with a basic file + system. +2. The kernel contains a basic terminal application and can run programs from + the external memory. To this end it dynamically loads these programs into + RAM and executes from RAM. +3. There is an application to edit text files. +4. The Snug compiler is just one of these programs, and was compiled from Snug + itself. It outputs human-readable assembly code. +5. A basic assembler (presumably written in C) can turn this output into object + code. +6. A linker (presumably written in C) can link several object code files + together into an executable program. +7. Combining steps 3-6, possibly using a `make`-like program, enables the user + to edit, compile, and run new Snug programs entirely on the embedded system. + +Before we can accomplish this some intermediate steps are required: + +1. At first the compiler is written in [Clean][], and we use an assembler and + linker on a PC. This way an executable compiler can be built entirely on a + PC. +2. The next step is to test how much RAM the compiler requires. It should be + able to compile reasonably large programs with only 512KB of RAM (and + actually less, because part of the RAM must be reserved for the kernel and + to hold the program itself). +3. After this a basic assembler and linker can be written. If these are written + in C it is easily possible to test them on a PC. + +[Clean]: https://clean-lang.org +[PIC32MZ2048EFG064]: https://www.microchip.com/en-us/product/PIC32MZ2048EFG064 diff --git a/doc/mkdocs.yml b/doc/mkdocs.yml new file mode 100644 index 0000000..ca7f7ec --- /dev/null +++ b/doc/mkdocs.yml @@ -0,0 +1,20 @@ +site_name: Snug + +theme: + name: readthedocs + hljs_languages: + - mipsasm + +markdown_extensions: + - admonition + - toc: + permalink: true + - smarty + +nav: + - Introduction: index.md + - The Snug frontend: + - Syntax: frontend/syntax.md + - The Snug backend: + - Run time system: backend/rts.md + - Registers: backend/registers.md diff --git a/doc/requirements.txt b/doc/requirements.txt new file mode 100644 index 0000000..c212d74 --- /dev/null +++ b/doc/requirements.txt @@ -0,0 +1 @@ +mkdocs>=1.1.2 |