README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

# Tests for an LLVM-based minimum effort code generator for ABC

This directory contains some preliminary tests to see if a code generator for
ABC could be built with minimum effort using LLVM.

The general idea is due to Erin van der Veen, and his repository is located at
https://gitlab.com/top-software/llvm-code-generator.

## General idea

Non-control-flow ABC instructions are implemented in LLVM IR, in this directory
in `rts.ll`. For example:

```llvm
attributes #0 = { alwaysinline }

define private i64* @addI(i64* %bsp.0) #0 {
  %t.0 = load i64, i64* %bsp.0
  store i64 undef, i64* %bsp.0
  %bsp.1 = getelementptr i64, i64* %bsp.0, i64 1
  %t.1 = load i64, i64* %bsp.1
  %t.2 = add i64 %t.0, %t.1
  store i64 %t.2, i64* %bsp.1
  ret i64* %bsp.1
}
```

ABC code can then be compiled without much effort; we only need to cleverly
break the ABC code up into subroutines and handle the control flow instructions
correctly. For example:

```llvm
  %bsp.307 = call i64* @pushI(i64 %r.1, i64* %bsp.306.0)
  %bsp.308 = call i64* @addI(i64* %bsp.307)
  %bsp.309 = call i64* @pushI(i64 1, i64* %bsp.308)
  %bsp.310 = call i64* @push_b(i64 1, i64* %bsp.309)
  ; ...
```

Because the definitions of the instructions (e.g. `addI`) are `alwaysinline`,
there is no overhead for these calls, and we can rely on LLVM to optimize
subroutine bodies after the `alwaysinline` pass.

This means we need a custom LLVM pipeline, implemented in the Makefile as:

```sh
	| opt-11 -S -always-inline \
	| opt-11 -S -O3 \
	| sed 's/noinline nounwind optnone/nounwind/' \
	| opt-11 -S -O3 \
```

## Stacks

B-stack arguments are passed as subroutine arguments and return values; there
is no global B-stack. Subroutines do have a global A-stack:

```llvm
define private i64 @s1(i64** %globasp, i64 %arg) #0 {
  %astack = alloca i64*, i64 10000
  %asp.000 = getelementptr i64*, i64** %astack
  %bstack = alloca i64, i64 10000
  %bsp.000 = getelementptr i64, i64* %bstack, i64 9999

  %bsp.001 = call i64* @pushI(i64 %arg, i64* %bsp.000)
  ; ...
```

This ensures that the B-stack is combined with the C-stack, and the `alloca`s
for subroutine-local stacks are optimized away by LLVM.

## Control flow

Example of a `jmp_false else.1` instruction, using `peek_b` and `pop_b1` from
the RTS:

```llvm
  %t.0 = call i64 @peek_b(i64* %bsp.102)
  %bsp.103 = call i64* @pop_b1(i64* %bsp.102)
  %t.1 = trunc i64 %t.0 to i1
  br i1 %t.1, label %l.0, label %else.1
```

Example of a `jsr s1` instruction with one B-stack argument and one B-stack
return value:

```llvm
  %arg.0 = call i64 @peek_b(i64* %bsp.302)
  %bsp.302.0 = call i64* @pop_b1(i64* %bsp.302)
  %r.0 = call i64 @s1(i64** %globasp, i64 %arg.0)
  %bsp.303 = call i64* @pushI(i64 %r.0, i64* %bsp.302.0)
```