aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorCamil Staps2023-01-30 21:36:34 +0100
committerCamil Staps2023-01-30 21:36:34 +0100
commit6b26fd43a6bfc4b45798c4e15c47159db9f880f4 (patch)
tree0ed25dc7b03e9e618df9681f801a19db82dc71c9 /doc
parentAlign on halfwords instead of double words; use data/text boundary to disting... (diff)
Add documentation
Diffstat (limited to 'doc')
-rw-r--r--doc/docs/backend/registers.md29
-rw-r--r--doc/docs/backend/rts.md205
-rw-r--r--doc/docs/frontend/syntax.md80
-rw-r--r--doc/docs/index.md45
-rw-r--r--doc/mkdocs.yml20
-rw-r--r--doc/requirements.txt1
6 files changed, 380 insertions, 0 deletions
diff --git a/doc/docs/backend/registers.md b/doc/docs/backend/registers.md
new file mode 100644
index 0000000..60a9f20
--- /dev/null
+++ b/doc/docs/backend/registers.md
@@ -0,0 +1,29 @@
+---
+vim: noexpandtab tabstop=2 shiftwidth=2
+---
+
+# Registers
+
+The following registers have a special use in the Snug run time system:
+
+| # | Name | Standard use | Snug use |
+|---:|-------|--------------|----------|
+| 0 | `$0` | Always zero. | |
+| 1 | `$at` | Reserved for the assembler. | |
+| 2-3 | `$v0`-`$v1` | Function result, callee-saved(?). | |
+| 4-7 | `$a0`-`$a3` | Integer argument, caller-saved. | |
+| 8-15 | `$t0`-`$t7` | Caller-saved GPR. | |
+| 16 | `$s0` | Callee-saved GPR.| [Back pointer](rts.md#evaluation) for printing. |
+| 17 | `$s1` | id. | [Front pointer](rts.md#evaluation) for printing. |
+| 18 | `$s2` | id. | [Back pointer](rts.md#evaluation) for evaluation. |
+| 19 | `$s3` | id. | [Front pointer](rts.md#evaluation) for evaluation. |
+| 20 | `$s4` | id. | |
+| 21 | `$s5` | id. | |
+| 22 | `$s6` | id. | |
+| 23 | `$s7` | id. | [Data/text region boundary](rts.md#node-layout). |
+| 24-25 | `$t8`-`$t9` | Caller-saved GPR. | |
+| 26-27 | `$k0`-`$k1` | Reserved for kernel use. | |
+| 28 | `$gp` | Global pointer. | Heap pointer. |
+| 29 | `$sp` | Stack pointer. | |
+| 30 | `$fp` | Frame pointer / callee-saved GPR. | |
+| 31 | `$ra` | Return address / caller-saved GPR. | |
diff --git a/doc/docs/backend/rts.md b/doc/docs/backend/rts.md
new file mode 100644
index 0000000..d401f3a
--- /dev/null
+++ b/doc/docs/backend/rts.md
@@ -0,0 +1,205 @@
+---
+vim: noexpandtab tabstop=2 shiftwidth=2
+---
+
+# The Snug run time system
+
+The Snug run time system is based on graph reduction. It is based on two
+sources of inspiration:
+
+- The run time system of [Clean][], which I got to know in depth through my
+ work on the [ABC interpreter][]. I am grateful for many discussions with John
+ van Groningen and Rinus Plasmeijer during my work on Clean.
+- The 1987 book *The implementation of functional programming languages* by
+ Simon L. Peyton Jones (henceforth IFPL).
+
+At times, unorthodox choices have been made to reduce memory usage as much as
+possible, sacrificing speed when necessary.
+
+## Node layout
+
+There are two types of nodes: head normal forms (hnfs) and thunks.
+
+Both types of nodes are of variable width. Thunks have at least 2 words so that
+they can always be overwritten with an indirection. Hnfs have no minimum size.
+
+The first word of a hnf node points to a [descriptor](#hnf-descriptors)
+containing information such as arity and constructor name (for printing). The
+first word of a thunk node points to the code that can evaluate the thunk.
+Information about the arity can be found above this code (see
+[thunk descriptors](#thunk-descriptors)). Thus the nodes look like this:
+
+| Descriptor | Argument 1 | Argument 2 |
+|------------|---------------|---------------|
+| CONS | (ptr to head) | (ptr to tail) |
+
+| Descriptor | Argument 1 | Argument 2 |
+|------------|--------------|--------------|
+| ++ | (ptr to arg) | (ptr to arg) |
+
+The two types of nodes can be distinguished because the descriptor of hnf nodes
+is in the data section, while the code to evaluate a thunk is in the text
+section. To quickly check whether a node is a hnf or a thunk, the address of
+the boundary of the data and text section is held in a
+[register](registers.md).
+
+The arguments of some built-in constructors can be unboxed:
+
+| Descriptor | Argument 1 |
+|------------|------------|
+| INT | 37 |
+
+## Descriptors
+
+### Hnf descriptors
+Descriptors of hnf nodes are located in the data section. They contain
+information about the arity of the node. It also contains the number of
+arguments to be curried in, which is 0 for all data constructors but non-zero
+for the [descriptors of closures](#closure-descriptors). (This means that, in
+principle, data constructor descriptors only need a halfword.)
+
+```mipsasm
+ .align 1
+__main_cTuple:
+ .byte 2 # pointer arity
+ .byte 0 # basic value arity
+ .byte 0 # number of arguments to be curried in
+```
+
+### Thunk descriptors
+Thunk nodes point to the code that can evaluate them. Above this code a
+descriptor is used storing information about arity and strictness:
+
+```mipsasm
+ .align 1
+ .byte 0x1 # strict in 1st argument
+ .byte 2 # arity
+__main_nappend:
+ # ...
+```
+
+!!! warning "Implementation may change"
+ Strictness will probably be removed in the future, as evaluating arguments
+ will be moved out of the driver into generated code (see
+ [evaluation](#evaluation)).
+
+ This also makes the implementation of `ap` simpler, as it does not need to
+ check strictness of the function it evaluates any more.
+
+### Closure descriptors
+For each thunk with a non-zero number of arguments there is also a closure
+descriptor table in the data section. This is a table of hnf descriptors used
+for closures (unsaturated function applications) of the thunk. It looks like
+this:
+
+```mipsasm
+ .align 2
+ # entry for closure with 0 arguments:
+ .byte 0 # pointer arity
+ .byte 0 # basic value arity
+ .byte 2 # number of arguments that are still to be curried in
+ .byte 0 # reserved
+ # entry for closure with 1 argument:
+ .byte 1 # pointer arity
+ .byte 0 # basic value arity
+ .byte 1 # number of arguments that are still to be curried in
+ .byte 0 # reserved
+ # pointer to corresponding code entry:
+__main_uappend:
+ .word __main_nappend
+```
+
+The node of a closure that still needs *n* arguments is given the descriptor
+`__MODULE_uFUNCTION`-4\**n*.
+
+Data descriptors also have a closure descriptor table. The final word points to
+a thunk entry that simply builds a hnf of the corresponding saturated data
+descriptor.
+
+Arguments can be curried into a closure using a special `ap` thunk. It checks
+the number of arguments that are still to be curried in. If this number is 1,
+it evaluates the function; if not, it creates a new closure:
+
+```mipsasm
+ .align 1
+ .byte 0x1 # strict in the first argument
+ .byte 2 # pointer arity
+_nap:
+ # custom implementation in assembly
+```
+
+!!! note "Optimization"
+ This can be optimized for space by adding `ap2`, `ap3`, etc., for currying in
+ multiple arguments at once. That way we can, for example, use a 5-word `ap3`
+ node for currying in three arguments, instead of using a chain of three `ap`
+ nodes (9 words in total). However, since this is presumably not very
+ frequent, it is not high priority, especially not for large numbers of
+ arguments.
+
+## Evaluation
+
+Evaluation is done using pointer reversal so that the stack can be avoided. The
+technique is described in IFPL ยง11.6.1. The idea is as follows.
+
+We have a *back pointer* which initially points to a special ROOT node. We have
+a *front pointer* which initially points to the root of the graph to be
+evaluated. During evaluation, the front pointer will always point to the node
+under evaluation, and the back pointer will always point to its parent node.
+
+To evaluate a node, its strict arguments must be evaluated. To do this, the
+following steps are executed simultaneously:
+
+- The pointer to the child node is updated with the value of the back pointer
+ and marked by setting the least significant bit.
+- The back pointer is updated to the front pointer.
+- The front pointer is updated to the child node.
+- Jump to the evaluation entry of the child node.
+
+The old back pointer can be recovered by finding the marked cell in the parent
+node, which is now pointed to by the back pointer. Therefore, after evaluation
+of the child node, the following steps are performed simultaneously:
+
+- The cell pointing to the child node is recovered by checking which argument
+ of the parent node (pointed to by the back pointer) is marked.
+- The cell pointing to the child node is updated with the new address of the
+ child node (it may have changed if the old node could not be overwritten
+ because it was smaller, or in case garbage collection ran).
+- The back pointer is set to the old back pointer (the value in the cell
+ pointing to the child node).
+- The front pointer is set to the back pointer.
+- Jump to the evaluation entry of the parent node.
+
+The parent node will then check again if the child that has just been evaluated
+needs to be evaluated. This isn't terribly efficient, but we assume there to be
+few strict arguments.
+
+!!! warning "Implementation may change"
+ At the moment, thunk descriptors contain information about strictness. This
+ way, the central `eval` driver can check if arguments of a thunk node need to
+ be evaluated before evaluating the thunk itself.
+
+ It is more efficient, and does not cost a lot in terms of program size, if
+ thunks get a separate entry point that first evaluates its strict arguments.
+ The descriptor then does not need to include strictness information, and the
+ central `eval` driver does not need to use this to dynamically evaluate
+ children.
+
+## Garbage collection
+
+Not implemented yet: a variant of Jonkers' algorithm, using the most
+significant bit for pointer reversal. The most significant bit is always set on
+the PIC32 devices we target.
+
+## Reserved bits
+
+The design leaves some space in addresses for special use of the run time
+system:
+
+- The least significant bit is always zero and can be used for pointer reversal
+ during [evaluation](#evaluation).
+- The most significant bit is always one (check the data sheet for the memory
+ map), so it can be used for pointer reversal during [garbage
+ collection](#garbage-collection).
+
+[ABC interpreter]: https://clean-lang.org/pkg/abc-interpreter
+[Clean]: https://clean-lang.org
diff --git a/doc/docs/frontend/syntax.md b/doc/docs/frontend/syntax.md
new file mode 100644
index 0000000..836201c
--- /dev/null
+++ b/doc/docs/frontend/syntax.md
@@ -0,0 +1,80 @@
+---
+vim: noexpandtab tabstop=2 shiftwidth=2
+---
+
+# Syntax
+
+Snug syntax is based on [S-expressions][] to simplify parsing, with some
+modifications.
+
+Importantly, parentheses are meaningful as they are used to express lists in
+the syntax. For instance, the list of arguments to a function is enclosed in
+parentheses. In such contexts it is an error to use redundant parentheses, or
+to omit parentheses in case of a single element.
+
+## Data types
+
+Algebraic data types can be defined with:
+
+```snug
+(data List (a) (
+ Nil
+ (Cons a (List a))))
+```
+
+In abstract terms, the syntax is:
+
+```snug
+(data NAME (VARIABLES) (CONSTRUCTORS))
+```
+
+where constructors are of the form `NAME` (without arguments) or
+`(NAME TYPE [TYPE..])` (with arguments).
+
+Type synonyms can be defined with:
+
+```snug
+(type String (List Char))
+```
+
+## Functions
+
+```snug
+(fun length ((xs : List a)) : Int
+ (length_acc 0 xs))
+
+(fun length_acc ((n : Int) (xs : List a)) : Int
+ (case xs (
+ (Nil -> n)
+ (Cons _ xs -> length_acc (+ n 1) xs))))
+```
+
+In abstract terms, the syntax is:
+
+```snug
+(fun NAME (ARGUMENTS) : TYPE (EXPRESSION))
+```
+
+where arguments are of the form `(NAME : TYPE)`.
+
+## Expressions
+
+Expressions are:
+
+- Basic values like `37` or `'A'`
+- Symbol identifiers, either global or local
+- Data constructor identifiers
+- Function applications
+- Case expressions
+
+Case expressions have the following syntax:
+
+```snug
+(case EXPRESSION (ALTERNATIVES))
+```
+
+and alternatives have the form `(PATTERN -> EXPRESSION)`. Patterns use the same
+form as expressions, but only basic values, symbol identifiers (for variable
+binding), and fully saturated data constructors are allowed.
+
+[S-expressions]: https://en.wikipedia.org/wiki/S-expression
diff --git a/doc/docs/index.md b/doc/docs/index.md
new file mode 100644
index 0000000..c2bebe4
--- /dev/null
+++ b/doc/docs/index.md
@@ -0,0 +1,45 @@
+---
+vim: noexpandtab tabstop=2 shiftwidth=2
+---
+
+# Snug
+
+Snug is a lazy functional programming language. It is designed to be tiny in
+memory footprint of both generated code and run time memory use, so that it can
+run on embedded systems.
+
+The current goal is to be able to run a Snug compiler written in Snug on a
+[PIC32MZ2048EFG064][]. This microcontroller has 2MB flash memory, 512KB SRAM,
+and a MIPS32 instruction set (with microMIPS code compression).
+
+The end goal is as follows:
+
+1. The microcontroller has access to a flash drive or SD card with a basic file
+ system.
+2. The kernel contains a basic terminal application and can run programs from
+ the external memory. To this end it dynamically loads these programs into
+ RAM and executes from RAM.
+3. There is an application to edit text files.
+4. The Snug compiler is just one of these programs, and was compiled from Snug
+ itself. It outputs human-readable assembly code.
+5. A basic assembler (presumably written in C) can turn this output into object
+ code.
+6. A linker (presumably written in C) can link several object code files
+ together into an executable program.
+7. Combining steps 3-6, possibly using a `make`-like program, enables the user
+ to edit, compile, and run new Snug programs entirely on the embedded system.
+
+Before we can accomplish this some intermediate steps are required:
+
+1. At first the compiler is written in [Clean][], and we use an assembler and
+ linker on a PC. This way an executable compiler can be built entirely on a
+ PC.
+2. The next step is to test how much RAM the compiler requires. It should be
+ able to compile reasonably large programs with only 512KB of RAM (and
+ actually less, because part of the RAM must be reserved for the kernel and
+ to hold the program itself).
+3. After this a basic assembler and linker can be written. If these are written
+ in C it is easily possible to test them on a PC.
+
+[Clean]: https://clean-lang.org
+[PIC32MZ2048EFG064]: https://www.microchip.com/en-us/product/PIC32MZ2048EFG064
diff --git a/doc/mkdocs.yml b/doc/mkdocs.yml
new file mode 100644
index 0000000..ca7f7ec
--- /dev/null
+++ b/doc/mkdocs.yml
@@ -0,0 +1,20 @@
+site_name: Snug
+
+theme:
+ name: readthedocs
+ hljs_languages:
+ - mipsasm
+
+markdown_extensions:
+ - admonition
+ - toc:
+ permalink: true
+ - smarty
+
+nav:
+ - Introduction: index.md
+ - The Snug frontend:
+ - Syntax: frontend/syntax.md
+ - The Snug backend:
+ - Run time system: backend/rts.md
+ - Registers: backend/registers.md
diff --git a/doc/requirements.txt b/doc/requirements.txt
new file mode 100644
index 0000000..c212d74
--- /dev/null
+++ b/doc/requirements.txt
@@ -0,0 +1 @@
+mkdocs>=1.1.2