resources/md/2021-06-23-compiling-clean-in-the-browser-with-webassembly-part-3-putting-it-all-together.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193

*This is part 3 in a series on running the [Clean][] compiler in
[WebAssembly][], with the proof of concept in the [Clean Sandbox][]. In this
part I discuss how all the individual components are integrated. See the
[introduction][] for a high-level overview. In the previous part, I discussed
[the compilation pipeline][pipeline].*

[[toc]]

# Overview

We have to put all our different elements together in JavaScript. In particular
we have the following elements:

- The C tools (compiler backend, bytecode generator, bytecode linker, and
  bytecode prelinker)
- The WebAssembly interpreters in which the Clean tools run (compiler frontend,
  `make`-like program)
- The WebAssembly interpreter for the compiled program itself.

# Linking the compiler frontend and backend

We have already seen how [the `make` tool communicates with the C tools and
with the compiler][pipeline]. It is a little trickier to set up the
communication between the compiler frontend and the backend. This is because
the `make` tool was written for this application specifically, while the
compiler wasn't.

In Clean, the foreign function interface works through a special ABC
instruction, `ccall`. For example, if we have a C function `int c_add(int,
int)`, we can define `add` below. `II:I` here stands for the type: two integer
arguments and an integer return type.

```clean
add :: !Int !Int -> Int
add _ _ = code {
	ccall c_add "II:I"
}
```

This instruction is not supported by the interpreter, because there is no way
to implement it generically in that context. When interpretation reaches a
`ccall` (or any other unimplemented instruction), the WebAssembly interpreter
calls a JavaScript function that can try to handle it. This function can read
and modify the state of the ABC machine, and return to it after it has handled
the instruction. This allows us to link the compiler frontend to the backend in
JavaScript:

```js
var clean_compiler;
function create_clean_compiler () {
	return ABCInterpreter.instantiate({
		interpreter_imports: {
			handle_illegal_instr: (pc, instr, asp, bsp, csp, hp, hp_free) => {
				instr = ABCInterpreter.instructions[instr];
				if (instr == 'ccall')
					return i_ccall.bind(c_compiler)(clean_compiler, pc, asp, bsp);
				else
					return 0;
			},
			/* other options .. */
		}
	}).then(abc => {
		clean_compiler = abc;
	});
}
```

With `i_ccall` we can link a WebAssembly interpreter to an Emscripten module,
which contains the global C functions:

```js
function i_ccall (abc, pc, asp, bsp) {
	const fun = '_' + abc.get_clean_string(abc.memory_array[pc/4+2]-8); /* e.g. c_add; emscripten adds an underscore */
	var type = abc.get_clean_string (abc.memory_array[pc/4+4]-8); /* e.g. II:I */

	if (!(fun in this))
		throw 'ccall: unknown function '+fun;

	/* parse type; get arguments from the Clean heap and stack .. */

	const result = this[fun].apply(null, args);

	/* copy the result to the WebAssembly interpreter .. */

	return pc+24; /* return the address of the next instruction */
}
```

# Clean file I/O

We have a similar problem with Clean programs doing file I/O. The compiler
frontend uses file I/O, but for this it needs ABC instructions that are not
implemented in the WebAssembly interpreter. We extend `handle_illegal_instr` to
also catch these instructions, and use Emscripten's `FS` library to implement
them. For example, `writeFS` writes a string to a file. `handle_illegal_instr`
becomes:

```js
handle_illegal_instr: (pc, instr, asp, bsp, csp, hp, hp_free) => {
	instr = ABCInterpreter.instructions[instr];
	if (instr == 'ccall')
		return i_ccall.bind(c_compiler)(clean_compiler, pc, asp, bsp);
	else if (instr == 'writeFS')
		return i_writeFS.bind(c_compiler)(clean_compiler, pc, asp, bsp);
	else
		return 0;
},
```

And again `i_witeFS` is defined to link an arbitrary WebAssembly interpreter
and Emscripten module together:

```js

function i_writeFS (abc, pc, asp, bsp) {
	const i = abc.memory_array[bsp/4+2];

	/* get arguments from the stack */
	const s_ptr = abc.memory_array[asp/4];
	const size = abc.memory_array[s_ptr/4+2];
	const s = new Uint8Array(abc.memory_array.buffer, s_ptr+16, size);

	this.FS.write(abc.files[i].stream, s, 0, size);

	abc.interpreter.instance.exports.set_asp(asp-8); /* pop the string from the stack */
	return pc+8; /* return the address of the next instruction */
}
```

In reality we need about 10 instructions for file I/O, but the others are
similar.

# Running the generated bytecode

We can now run a Clean program from our editor through the [compilation
pipeline][pipeline] and generate a prelinked bytecode file. This is similar to
a native executable, but for the WebAssembly interpreter. What remains is
actually running the bytecode.

We do this by simply creating a new instance of the ABC interpreter. The
`instantiate` function uses `fetch()` to get supporting WebAssembly and
bytecode files. In this case, our bytecode comes from the local file system, so
we monkey-patch `fetch()` to supply the bytecode from the Emscripten file
system (the case that the path is `null`):

```js
function run (path) {
	const opts = {
		fetch: (p) => {
			return p !== null ? fetch(p) : new Promise(resolve => {
				const pbc = c_bcprelink.FS.readFile(path);
				resolve({
					ok: true,
					arrayBuffer: () => pbc.buffer
				});
			});
		},
	};

	return ABCInterpreter.instantiate(opts).then(abc => {
		abc.interpreter.instance.exports.set_pc(abc.start);
		var r = 0;
		try {
			r = abc.interpreter.instance.exports.interpret();
		} catch (e) {
			sandbox.stderr(e+'\n');
			r = -1;
		}

		/* flush output buffer */
		if (sandbox.stdout_buffer.length>0)
			sandbox.stdout('\n');

		if (r!=0)
			sandbox.stderr('failed with return code '+r);
	});
}
```

# Wrapping up

That's it! There are many other small tricks here and there, but these four
posts covered the architecture and most interesting implementation details. You
can try out the [Clean Sandbox][] to see it live, or check out the [source
code](https://gitlab.com/camilstaps/clean-sandbox).

[Clean]: http://clean.cs.ru.nl/
[Clean Sandbox]: https://camilstaps.gitlab.io/clean-sandbox/
[WebAssembly]: https://webassembly.org/

[introduction]: 2021-06-23-compiling-clean-in-the-browser-with-webassembly-part-1-introduction.html
[pipeline]: 2021-06-23-compiling-clean-in-the-browser-with-webassembly-part-2-the-pipeline.html
[integration]: 2021-06-23-compiling-clean-in-the-browser-with-webassembly-part-3-putting-it-all-together.html