doc/docs/backend/rts.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205

---
vim: noexpandtab tabstop=2 shiftwidth=2
---

# The Snug run time system

The Snug run time system is based on graph reduction. It is based on two
sources of inspiration:

- The run time system of [Clean][], which I got to know in depth through my
	work on the [ABC interpreter][]. I am grateful for many discussions with John
	van Groningen and Rinus Plasmeijer during my work on Clean.
- The 1987 book *The implementation of functional programming languages* by
	Simon L. Peyton Jones (henceforth IFPL).

At times, unorthodox choices have been made to reduce memory usage as much as
possible, sacrificing speed when necessary.

## Node layout

There are two types of nodes: head normal forms (hnfs) and thunks.

Both types of nodes are of variable width. Thunks have at least 2 words so that
they can always be overwritten with an indirection. Hnfs have no minimum size.

The first word of a hnf node points to a [descriptor](#hnf-descriptors)
containing information such as arity and constructor name (for printing). The
first word of a thunk node points to the code that can evaluate the thunk.
Information about the arity can be found above this code (see
[thunk descriptors](#thunk-descriptors)). Thus the nodes look like this:

| Descriptor | Argument 1    | Argument 2    |
|------------|---------------|---------------|
| CONS       | (ptr to head) | (ptr to tail) |

| Descriptor | Argument 1   | Argument 2   |
|------------|--------------|--------------|
| ++         | (ptr to arg) | (ptr to arg) |

The two types of nodes can be distinguished because the descriptor of hnf nodes
is in the data section, while the code to evaluate a thunk is in the text
section. To quickly check whether a node is a hnf or a thunk, the address of
the boundary of the data and text section is held in a
[register](registers.md).

The arguments of some built-in constructors can be unboxed:

| Descriptor | Argument 1 |
|------------|------------|
| INT        | 37         |

## Descriptors

### Hnf descriptors
Descriptors of hnf nodes are located in the data section. They contain
information about the arity of the node. It also contains the number of
arguments to be curried in, which is 0 for all data constructors but non-zero
for the [descriptors of closures](#closure-descriptors). (This means that, in
principle, data constructor descriptors only need a halfword.)

```mipsasm
	.align	1
__main_cTuple:
	.byte	2	# pointer arity
	.byte	0	# basic value arity
	.byte	0	# number of arguments to be curried in
```

### Thunk descriptors
Thunk nodes point to the code that can evaluate them. Above this code a
descriptor is used storing information about arity and strictness:

```mipsasm
	.align	1
	.byte	0x1	# strict in 1st argument
	.byte	2	# arity
__main_nappend:
	# ...
```

!!! warning "Implementation may change"
	Strictness will probably be removed in the future, as evaluating arguments
	will be moved out of the driver into generated code (see
	[evaluation](#evaluation)).

	This also makes the implementation of `ap` simpler, as it does not need to
	check strictness of the function it evaluates any more.

### Closure descriptors
For each thunk with a non-zero number of arguments there is also a closure
descriptor table in the data section. This is a table of hnf descriptors used
for closures (unsaturated function applications) of the thunk. It looks like
this:

```mipsasm
	.align	2
	# entry for closure with 0 arguments:
	.byte	0	# pointer arity
	.byte	0	# basic value arity
	.byte	2	# number of arguments that are still to be curried in
	.byte	0	# reserved
	# entry for closure with 1 argument:
	.byte	1	# pointer arity
	.byte	0	# basic value arity
	.byte	1	# number of arguments that are still to be curried in
	.byte	0	# reserved
	# pointer to corresponding code entry:
__main_uappend:
	.word	__main_nappend
```

The node of a closure that still needs *n* arguments is given the descriptor
`__MODULE_uFUNCTION`-4\**n*.

Data descriptors also have a closure descriptor table. The final word points to
a thunk entry that simply builds a hnf of the corresponding saturated data
descriptor.

Arguments can be curried into a closure using a special `ap` thunk. It checks
the number of arguments that are still to be curried in. If this number is 1,
it evaluates the function; if not, it creates a new closure:

```mipsasm
	.align	1
	.byte	0x1	# strict in the first argument
	.byte	2	# pointer arity
_nap:
	# custom implementation in assembly
```

!!! note "Optimization"
	This can be optimized for space by adding `ap2`, `ap3`, etc., for currying in
	multiple arguments at once. That way we can, for example, use a 5-word `ap3`
	node for currying in three arguments, instead of using a chain of three `ap`
	nodes (9 words in total). However, since this is presumably not very
	frequent, it is not high priority, especially not for large numbers of
	arguments.

## Evaluation

Evaluation is done using pointer reversal so that the stack can be avoided. The
technique is described in IFPL §11.6.1. The idea is as follows.

We have a *back pointer* which initially points to a special ROOT node. We have
a *front pointer* which initially points to the root of the graph to be
evaluated. During evaluation, the front pointer will always point to the node
under evaluation, and the back pointer will always point to its parent node.

To evaluate a node, its strict arguments must be evaluated. To do this, the
following steps are executed simultaneously:

- The pointer to the child node is updated with the value of the back pointer
	and marked by setting the least significant bit.
- The back pointer is updated to the front pointer.
- The front pointer is updated to the child node.
- Jump to the evaluation entry of the child node.

The old back pointer can be recovered by finding the marked cell in the parent
node, which is now pointed to by the back pointer. Therefore, after evaluation
of the child node, the following steps are performed simultaneously:

- The cell pointing to the child node is recovered by checking which argument
	of the parent node (pointed to by the back pointer) is marked.
- The cell pointing to the child node is updated with the new address of the
	child node (it may have changed if the old node could not be overwritten
	because it was smaller, or in case garbage collection ran).
- The back pointer is set to the old back pointer (the value in the cell
	pointing to the child node).
- The front pointer is set to the back pointer.
- Jump to the evaluation entry of the parent node.

The parent node will then check again if the child that has just been evaluated
needs to be evaluated. This isn't terribly efficient, but we assume there to be
few strict arguments.

!!! warning "Implementation may change"
	At the moment, thunk descriptors contain information about strictness. This
	way, the central `eval` driver can check if arguments of a thunk node need to
	be evaluated before evaluating the thunk itself.

	It is more efficient, and does not cost a lot in terms of program size, if
	thunks get a separate entry point that first evaluates its strict arguments.
	The descriptor then does not need to include strictness information, and the
	central `eval` driver does not need to use this to dynamically evaluate
	children.

## Garbage collection

Not implemented yet: a variant of Jonkers' algorithm, using the most
significant bit for pointer reversal. The most significant bit is always set on
the PIC32 devices we target.

## Reserved bits

The design leaves some space in addresses for special use of the run time
system:

- The least significant bit is always zero and can be used for pointer reversal
	during [evaluation](#evaluation).
- The most significant bit is always one (check the data sheet for the memory
	map), so it can be used for pointer reversal during [garbage
	collection](#garbage-collection).

[ABC interpreter]: https://clean-lang.org/pkg/abc-interpreter
[Clean]: https://clean-lang.org