Add documentation

author: Camil Staps 2023-01-30 21:36:34 +0100
committer: Camil Staps 2023-01-30 21:36:34 +0100
commit: 6b26fd43a6bfc4b45798c4e15c47159db9f880f4 (patch)
tree: 0ed25dc7b03e9e618df9681f801a19db82dc71c9 /doc
parent: Align on halfwords instead of double words; use data/text boundary to disting... (diff)
6 files changed, 380 insertions, 0 deletions
diff --git a/doc/docs/backend/registers.md b/doc/docs/backend/registers.md
new file mode 100644
index 0000000..60a9f20
--- /dev/null
+++ b/doc/docs/backend/registers.md
@@ -0,0 +1,29 @@
+---
+vim: noexpandtab tabstop=2 shiftwidth=2
+---
+
+# Registers
+
+The following registers have a special use in the Snug run time system:
+
+| #  | Name  | Standard use | Snug use |
+|---:|-------|--------------|----------|
+| 0  | `$0`  | Always zero. | |
+| 1  | `$at` | Reserved for the assembler. | |
+| 2-3 | `$v0`-`$v1` | Function result, callee-saved(?). | |
+| 4-7 | `$a0`-`$a3` | Integer argument, caller-saved. | |
+| 8-15 | `$t0`-`$t7` | Caller-saved GPR. | |
+| 16 | `$s0` | Callee-saved GPR.| [Back pointer](rts.md#evaluation) for printing. |
+| 17 | `$s1` | id. | [Front pointer](rts.md#evaluation) for printing. |
+| 18 | `$s2` | id. | [Back pointer](rts.md#evaluation) for evaluation. |
+| 19 | `$s3` | id. | [Front pointer](rts.md#evaluation) for evaluation. |
+| 20 | `$s4` | id. | |
+| 21 | `$s5` | id. | |
+| 22 | `$s6` | id. | |
+| 23 | `$s7` | id. | [Data/text region boundary](rts.md#node-layout). |
+| 24-25 | `$t8`-`$t9` | Caller-saved GPR. | |
+| 26-27 | `$k0`-`$k1` | Reserved for kernel use. | |
+| 28 | `$gp` | Global pointer. | Heap pointer. |
+| 29 | `$sp` | Stack pointer. | |
+| 30 | `$fp` | Frame pointer / callee-saved GPR. | |
+| 31 | `$ra` | Return address / caller-saved GPR. | |
diff --git a/doc/docs/backend/rts.md b/doc/docs/backend/rts.md
new file mode 100644
index 0000000..d401f3a
--- /dev/null
+++ b/doc/docs/backend/rts.md
@@ -0,0 +1,205 @@
+---
+vim: noexpandtab tabstop=2 shiftwidth=2
+---
+
+# The Snug run time system
+
+The Snug run time system is based on graph reduction. It is based on two
+sources of inspiration:
+
+- The run time system of [Clean][], which I got to know in depth through my
+	work on the [ABC interpreter][]. I am grateful for many discussions with John
+	van Groningen and Rinus Plasmeijer during my work on Clean.
+- The 1987 book *The implementation of functional programming languages* by
+	Simon L. Peyton Jones (henceforth IFPL).
+
+At times, unorthodox choices have been made to reduce memory usage as much as
+possible, sacrificing speed when necessary.
+
+## Node layout
+
+There are two types of nodes: head normal forms (hnfs) and thunks.
+
+Both types of nodes are of variable width. Thunks have at least 2 words so that
+they can always be overwritten with an indirection. Hnfs have no minimum size.
+
+The first word of a hnf node points to a [descriptor](#hnf-descriptors)
+containing information such as arity and constructor name (for printing). The
+first word of a thunk node points to the code that can evaluate the thunk.
+Information about the arity can be found above this code (see
+[thunk descriptors](#thunk-descriptors)). Thus the nodes look like this:
+
+| Descriptor | Argument 1    | Argument 2    |
+|------------|---------------|---------------|
+| CONS       | (ptr to head) | (ptr to tail) |
+
+| Descriptor | Argument 1   | Argument 2   |
+|------------|--------------|--------------|
+| ++         | (ptr to arg) | (ptr to arg) |
+
+The two types of nodes can be distinguished because the descriptor of hnf nodes
+is in the data section, while the code to evaluate a thunk is in the text
+section. To quickly check whether a node is a hnf or a thunk, the address of
+the boundary of the data and text section is held in a
+[register](registers.md).
+
+The arguments of some built-in constructors can be unboxed:
+
+| Descriptor | Argument 1 |
+|------------|------------|
+| INT        | 37         |
+
+## Descriptors
+
+### Hnf descriptors
+Descriptors of hnf nodes are located in the data section. They contain
+information about the arity of the node. It also contains the number of
+arguments to be curried in, which is 0 for all data constructors but non-zero
+for the [descriptors of closures](#closure-descriptors). (This means that, in
+principle, data constructor descriptors only need a halfword.)
+
+```mipsasm
+	.align	1
+__main_cTuple:
+	.byte	2	# pointer arity
+	.byte	0	# basic value arity
+	.byte	0	# number of arguments to be curried in
+```
+
+### Thunk descriptors
+Thunk nodes point to the code that can evaluate them. Above this code a
+descriptor is used storing information about arity and strictness:
+
+```mipsasm
+	.align	1
+	.byte	0x1	# strict in 1st argument
+	.byte	2	# arity
+__main_nappend:
+	# ...
+```
+
+!!! warning "Implementation may change"
+	Strictness will probably be removed in the future, as evaluating arguments
+	will be moved out of the driver into generated code (see
+	[evaluation](#evaluation)).
+
+	This also makes the implementation of `ap` simpler, as it does not need to
+	check strictness of the function it evaluates any more.
+
+### Closure descriptors
+For each thunk with a non-zero number of arguments there is also a closure
+descriptor table in the data section. This is a table of hnf descriptors used
+for closures (unsaturated function applications) of the thunk. It looks like
+this:
+
+```mipsasm
+	.align	2
+	# entry for closure with 0 arguments:
+	.byte	0	# pointer arity
+	.byte	0	# basic value arity
+	.byte	2	# number of arguments that are still to be curried in
+	.byte	0	# reserved
+	# entry for closure with 1 argument:
+	.byte	1	# pointer arity
+	.byte	0	# basic value arity
+	.byte	1	# number of arguments that are still to be curried in
+	.byte	0	# reserved
+	# pointer to corresponding code entry:
+__main_uappend:
+	.word	__main_nappend
+```
+
+The node of a closure that still needs *n* arguments is given the descriptor
+`__MODULE_uFUNCTION`-4\**n*.
+
+Data descriptors also have a closure descriptor table. The final word points to
+a thunk entry that simply builds a hnf of the corresponding saturated data
+descriptor.
+
+Arguments can be curried into a closure using a special `ap` thunk. It checks
+the number of arguments that are still to be curried in. If this number is 1,
+it evaluates the function; if not, it creates a new closure:
+
+```mipsasm
+	.align	1
+	.byte	0x1	# strict in the first argument
+	.byte	2	# pointer arity
+_nap:
+	# custom implementation in assembly
+```
+
+!!! note "Optimization"
+	This can be optimized for space by adding `ap2`, `ap3`, etc., for currying in
+	multiple arguments at once. That way we can, for example, use a 5-word `ap3`
+	node for currying in three arguments, instead of using a chain of three `ap`
+	nodes (9 words in total). However, since this is presumably not very
+	frequent, it is not high priority, especially not for large numbers of
+	arguments.
+
+## Evaluation
+
+Evaluation is done using pointer reversal so that the stack can be avoided. The
+technique is described in IFPL §11.6.1. The idea is as follows.
+
+We have a *back pointer* which initially points to a special ROOT node. We have
+a *front pointer* which initially points to the root of the graph to be
+evaluated. During evaluation, the front pointer will always point to the node
+under evaluation, and the back pointer will always point to its parent node.
+
+To evaluate a node, its strict arguments must be evaluated. To do this, the
+following steps are executed simultaneously:
+
+- The pointer to the child node is updated with the value of the back pointer
+	and marked by setting the least significant bit.
+- The back pointer is updated to the front pointer.
+- The front pointer is updated to the child node.
+- Jump to the evaluation entry of the child node.
+
+The old back pointer can be recovered by finding the marked cell in the parent
+node, which is now pointed to by the back pointer. Therefore, after evaluation
+of the child node, the following steps are performed simultaneously:
+
+- The cell pointing to the child node is recovered by checking which argument
+	of the parent node (pointed to by the back pointer) is marked.
+- The cell pointing to the child node is updated with the new address of the
+	child node (it may have changed if the old node could not be overwritten
+	because it was smaller, or in case garbage collection ran).
+- The back pointer is set to the old back pointer (the value in the cell
+	pointing to the child node).
+- The front pointer is set to the back pointer.
+- Jump to the evaluation entry of the parent node.
+
+The parent node will then check again if the child that has just been evaluated
+needs to be evaluated. This isn't terribly efficient, but we assume there to be
+few strict arguments.
+
+!!! warning "Implementation may change"
+	At the moment, thunk descriptors contain information about strictness. This
+	way, the central `eval` driver can check if arguments of a thunk node need to
+	be evaluated before evaluating the thunk itself.
+
+	It is more efficient, and does not cost a lot in terms of program size, if
+	thunks get a separate entry point that first evaluates its strict arguments.
+	The descriptor then does not need to include strictness information, and the
+	central `eval` driver does not need to use this to dynamically evaluate
+	children.
+
+## Garbage collection
+
+Not implemented yet: a variant of Jonkers' algorithm, using the most
+significant bit for pointer reversal. The most significant bit is always set on
+the PIC32 devices we target.
+
+## Reserved bits
+
+The design leaves some space in addresses for special use of the run time
+system:
+
+- The least significant bit is always zero and can be used for pointer reversal
+	during [evaluation](#evaluation).
+- The most significant bit is always one (check the data sheet for the memory
+	map), so it can be used for pointer reversal during [garbage
+	collection](#garbage-collection).
+
+[ABC interpreter]: https://clean-lang.org/pkg/abc-interpreter
+[Clean]: https://clean-lang.org
diff --git a/doc/docs/frontend/syntax.md b/doc/docs/frontend/syntax.md
new file mode 100644
index 0000000..836201c
--- /dev/null
+++ b/doc/docs/frontend/syntax.md
@@ -0,0 +1,80 @@
+---
+vim: noexpandtab tabstop=2 shiftwidth=2
+---
+
+# Syntax
+
+Snug syntax is based on [S-expressions][] to simplify parsing, with some
+modifications.
+
+Importantly, parentheses are meaningful as they are used to express lists in
+the syntax. For instance, the list of arguments to a function is enclosed in
+parentheses. In such contexts it is an error to use redundant parentheses, or
+to omit parentheses in case of a single element.
+
+## Data types
+
+Algebraic data types can be defined with:
+
+```snug
+(data List (a) (
+	Nil
+	(Cons a (List a))))
+```
+
+In abstract terms, the syntax is:
+
+```snug
+(data NAME (VARIABLES) (CONSTRUCTORS))
+```
+
+where constructors are of the form `NAME` (without arguments) or
+`(NAME TYPE [TYPE..])` (with arguments).
+
+Type synonyms can be defined with:
+
+```snug
+(type String (List Char))
+```
+
+## Functions
+
+```snug
+(fun length ((xs : List a)) : Int
+	(length_acc 0 xs))
+
+(fun length_acc ((n : Int) (xs : List a)) : Int
+	(case xs (
+		(Nil -> n)
+		(Cons _ xs -> length_acc (+ n 1) xs))))
+```
+
+In abstract terms, the syntax is:
+
+```snug
+(fun NAME (ARGUMENTS) : TYPE (EXPRESSION))
+```
+
+where arguments are of the form `(NAME : TYPE)`.
+
+## Expressions
+
+Expressions are:
+
+- Basic values like `37` or `'A'`
+- Symbol identifiers, either global or local
+- Data constructor identifiers
+- Function applications
+- Case expressions
+
+Case expressions have the following syntax:
+
+```snug
+(case EXPRESSION (ALTERNATIVES))
+```
+
+and alternatives have the form `(PATTERN -> EXPRESSION)`. Patterns use the same
+form as expressions, but only basic values, symbol identifiers (for variable
+binding), and fully saturated data constructors are allowed.
+
+[S-expressions]: https://en.wikipedia.org/wiki/S-expression
diff --git a/doc/docs/index.md b/doc/docs/index.md
new file mode 100644
index 0000000..c2bebe4
--- /dev/null
+++ b/doc/docs/index.md
@@ -0,0 +1,45 @@
+---
+vim: noexpandtab tabstop=2 shiftwidth=2
+---
+
+# Snug
+
+Snug is a lazy functional programming language. It is designed to be tiny in
+memory footprint of both generated code and run time memory use, so that it can
+run on embedded systems.
+
+The current goal is to be able to run a Snug compiler written in Snug on a
+[PIC32MZ2048EFG064][]. This microcontroller has 2MB flash memory, 512KB SRAM,
+and a MIPS32 instruction set (with microMIPS code compression).
+
+The end goal is as follows:
+
+1. The microcontroller has access to a flash drive or SD card with a basic file
+	 system.
+2. The kernel contains a basic terminal application and can run programs from
+	 the external memory. To this end it dynamically loads these programs into
+	 RAM and executes from RAM.
+3. There is an application to edit text files.
+4. The Snug compiler is just one of these programs, and was compiled from Snug
+	 itself. It outputs human-readable assembly code.
+5. A basic assembler (presumably written in C) can turn this output into object
+	 code.
+6. A linker (presumably written in C) can link several object code files
+	 together into an executable program.
+7. Combining steps 3-6, possibly using a `make`-like program, enables the user
+	 to edit, compile, and run new Snug programs entirely on the embedded system.
+
+Before we can accomplish this some intermediate steps are required:
+
+1. At first the compiler is written in [Clean][], and we use an assembler and
+	 linker on a PC. This way an executable compiler can be built entirely on a
+	 PC.
+2. The next step is to test how much RAM the compiler requires. It should be
+	 able to compile reasonably large programs with only 512KB of RAM (and
+	 actually less, because part of the RAM must be reserved for the kernel and
+	 to hold the program itself).
+3. After this a basic assembler and linker can be written. If these are written
+	 in C it is easily possible to test them on a PC.
+
+[Clean]: https://clean-lang.org
+[PIC32MZ2048EFG064]: https://www.microchip.com/en-us/product/PIC32MZ2048EFG064
diff --git a/doc/mkdocs.yml b/doc/mkdocs.yml
new file mode 100644
index 0000000..ca7f7ec
--- /dev/null
+++ b/doc/mkdocs.yml
@@ -0,0 +1,20 @@
+site_name: Snug
+
+theme:
+  name: readthedocs
+  hljs_languages:
+    - mipsasm
+
+markdown_extensions:
+  - admonition
+  - toc:
+      permalink: true
+  - smarty
+
+nav:
+  - Introduction: index.md
+  - The Snug frontend:
+    - Syntax: frontend/syntax.md
+  - The Snug backend:
+    - Run time system: backend/rts.md
+    - Registers: backend/registers.md
diff --git a/doc/requirements.txt b/doc/requirements.txt
new file mode 100644
index 0000000..c212d74
--- /dev/null
+++ b/doc/requirements.txt
@@ -0,0 +1 @@
+mkdocs>=1.1.2
author	Camil Staps	2023-01-30 21:36:34 +0100
committer	Camil Staps	2023-01-30 21:36:34 +0100
commit	6b26fd43a6bfc4b45798c4e15c47159db9f880f4 (patch)
tree	0ed25dc7b03e9e618df9681f801a19db82dc71c9 /doc
parent	Align on halfwords instead of double words; use data/text boundary to disting... (diff)