1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
|
This file contains documentation of the transformation phase.
Contents
- Overview
- The analysation phase
- The transformation phase
- Overview
- Generating a new function
- Status
Originally the transformation phase was designed to solve three tasks:
- specialisation of overloaded functions (in the following just
"overloading specialisation"):
- inlining of higher order arguments in functions that are
defined locally in a macro (Nijmegen slang: "fold optimalisation")
- deforestation
All these optimisations have some things in common (more or less): they
are all a kind of specialisation. Specialisation involves creating new
versions of functions. That's why it was decided to implement one algorithm
that can handle all three optimalisations at the same time.
We call this algorithm the "fusion" algorithm.
Overview
********
In fact there are two phases involved:
- the analysation phase
- the transformation phase
Goal of the analysys is to assign two properties to each icl function argument
(or: function argument position): The consumer classification (::Int)
and the linearity (::Bool). These values are returned by analyseGroups in the
{! ConsClasses} value. Furtheron so called "active" cases are marked as such (these
markings are stored in the ExpressionHeap). The consumer classification can be
one of these four values:
cPassive:
if none of the following
cActive:
Only these function arguments can be specialized. The
variable ("x" below) appears at least in one of the following positions:
- x.member (record selection: could be overloading specialised)
- x @ ... (higher order app: fold optimalisation)
- case x of .. (could be deforested)
- in a call to a function that's also cActive in that argument
cAccumulating:
function argument can't be specialized because there
is a recursive call where the argument is not a variable, which
could lead into non termination of that algorithm.
cVarOfMultimatchCase:
A multimatch case is a case where one constructor could match
twice, e.g.:
g x = case x of
Cons h t | p1 h -> t
Cons h t | p2 h -> t
Such cases were exluded from deforestation in a first
approximation because they were difficult to handle (e.g. linearity
analysys)
The linearity classification indicates, well, whether a function argument
is used linearily. Only linear variables can be substituted with expressions
(otherwise work could be duplicated).
The transformation phase uses this information to generate new functions.
The transformation algorithm will run over all icl functions, beginning
with the leafs of the component tree. For every icl function he will run
through the body expression (postorder!) and when he finds a function
appliation that can be specialised
he will do so (if the function wasn't specialised in that way already).
The criteria to specialize a function "f" in an argument "A" are the following:
- f must be cActive in that argument.
- A must be an application
- If A is an application of a dictionary constructor: YEAH, go for it!
- If A is a curried application and f is defined as a local macro function:
YEAH!!!
- If A is a function application (g E1 .. En) and f is linear in that
argument: YEAH! Deforest! Cut all trees.
Otherwise nothing happens.
E.g. if (f E1 (g E2 E3) E4) meets the above condition then a new function
f_g will be created and that function application will become
f_g E1 E2 E3 E4 (if g is a function)
f_g E1 E4 (if g is a dictionary constructor) (it's not always that simple)
f is called the _consumer_ and g the _producer_.
Of course a new type for f_g has to be derived.
How is f_g created? In case g was
- a dictionary:the formal argument in f is substituted the dictionary application
- a curried function: the formal (higher order) argument in f is substituted
with "g"
- an uncurried function: the formal argument in f is substituted with the
_body_ of g.
The analysation phase
*********************
The analysys happens component wise, beginning with the leafs.
The fundamental function here is the consumerRequirements class whose
instances run through an expression. It's main data structure it is working
on is the *ConsClassSubst:=={#Int}. Each variable that is used in a function
that is currently analysed is assigned a number. This number is an index into
the ConsClassSubst array. So the ConsClassSubst assigns a consumer classification
to every variable. The classification process reminds on typing. If an array
element is negative then the corresponding variable currently has been assigned
to one of the four classifications:
cPassive :== -1
cActive :== -2
cAccumulating :== -3
cVarOfMultimatchCase :== -4
If it is non negative (say i) then this states that the classification is the same
as the variable which corresponds to i.
The instances of consumerRequirements return a boolean that indicates whether
the expression is unsafe, e.g. in
case x of
Cons h t -> case p h of True -> t
...
(assumed that these cases result from patterns and guards) the
consumerRequirements instance would return True for the inner case,
because this case could fail so that control would GOTO the outer case.
In this case the boolean would be used to classify x as
cVarOfMultimatchCase.
The ai_cur_ref_counts field of the AnalyseInfo record is used to
determine the linearity of all variables, the arguments and the local
variables.
The ai_cases_of_vars_for_function field of the AnalyseInfo record
accumulates all cases of which the case_expr is a variable. At the end
of analyzing a component the case_info_ptr is marked (in function
set_case_expr_info) with an EEI_ActiveCase tag, if the corresponding
variable is active. So after the analysation phase every case is
either "active" or not.
The information whether a case is active is stored with the
case_info_ptr. But there is already type information stored in the
ExpressionHeap. For every case the case_info_ptr points to an
EI_CaseType entry. For every "let" expression the let_info_ptr
points to an EI_LetType entry. These entries assign the variables
that are introduced by a pattern or let bind a type. This information
is needed for lifting (also in the following phases). So I invented
the EI_Extended constructor which allows to store several independent
pieces of information for one pointer. At the end of the transformation
phase the original heap contents has to be restored again
(function "cleanup_attributes") (the advantages of functional
programming shine through here very clearly).
The transformation phase
************************
The transformation phase - Overview
************************^^^^^^^^^^^
There are two basic transformations that are applied for deforestation
(overloading specialisation and the fold optimalisation are just simple
inlining!):
- The "case in case" transformation:
case (case x of
P1 -> A1
P2 -> A2)
P3 -> A3
P4 -> A4
transforms into
case x of
P1 -> case A1 of
P3 -> A3
P4 -> A4
P2 -> case A2 of
P3 -> A3
P4 -> A4
This happens in function lift_case. Note that this transformation
duplicates code.
- The "match" transformation:
case (Cons A1 A2) of
Cons h t -> A3
..
transforms into
let h=A1
t=A2
in A3
If "h" or "t" are used linearily in A3 the can be inlined, too.
The match transformation reduces the amount of code (all the other patterns).
Note that this transformation is only valid, if the case is not
a multimatch case
Let's consider some examples first:
sum l =
case l of
Nil -> 0
Cons h t -> +I h (sum t) // +I: plus on integers
every_second toggle l2 =
case l2 of
Cons h2 t2
-> case toggle of
True -> Cons h2 (every_second (not toggle) t2)
False -> (every_second (not toggle) t2)
Nil -> Nil
The analysys phase will assign l (the argument of sum) "cActive" and "linear".
"cActive" because
- the recursive call has a variable (t) as argument (in case of a
non variable it would be cAccumulating)
- l is used as a case_expression (case l of) (otherwise it would be cPassive
or cVarOfMultiMatchCase
- the case is not a multimatch case
"linear" because l appears only once
The case in the sum function will be marked as "active case of 'sum' in 'l'":
sum l =
case<*sum*> l of ...
Indeed, sum is a typical list consumer!
The arguments of every_second are classified as:
- toggle:cAccumulating, because there is a recursive call (in fact two)
with a non variable argument (not toggle).
- l2: cActive (for the same reasons as "l" in function sum)
Now if we transform the following function
Start = sum (every_second True [42,42,42])
we see that we have an uncurried function application in the active
consumer position of sum. We fuse these two functions by replacing
"l" in sum with the body of every_second (happens in "generateFunction"):
Start = sum_every_second True [42,42,42]
sum_every_second toggle l2 =
case<*sum*> (case l2 of
Cons h2 t2
-> case toggle of
True -> Cons h2 (every_second (not toggle) t2)
False -> (every_second (not toggle) t2)
Nil -> Nil) of
Nil -> 0
Cons h t -> +I h (sum t)
This newly generated function will be stored in the FunctionHeap.
Now we transform sum_every_second. First we apply the case in case
transformation.
sum_every_second toggle l2 =
case l2 of
Cons h2 t2
-> case<*sum*> (case toggle of
True -> Cons h2 (every_second (not toggle) t2)
False -> (every_second (not toggle) t2))
Nil -> 0
Cons h t -> +I h (sum t)
Nil -> case<*sum*> Nil of
Nil -> 0
Cons h t -> +I h (sum t)
Now we apply the case in case transformation once more in the
Cons case and in the Nil case we do the match:
sum_every_second toggle l2 =
case l2 of
Cons h2 t2
-> case toggle of
True -> case<*sum*> Cons h2 (every_second (not toggle) t2) of
Nil -> 0
Cons h t -> +I h (sum t)
False -> case<*sum*> every_second (not toggle) t2 of
Nil -> 0
Cons h t -> +I h (sum t)
Nil -> 0
Finally we apply the match in the True case and we do the FOLD step:
sum_every_second toggle l2 =
case l2 of
Cons h2 t2
-> case toggle of
True -> +I h2 (sum (every_second (not toggle) t2))
False -> sum_every_second (not toggle) t2
Nil -> 0
The fold step can be applied, because we know that this particular
case is the root case of the sum function, hence
case<*sum*> every_second (not toggle) t2 of
Nil -> 0
Cons h t -> +I h (sum t)
is equivalent to
sum (every_second (not toggle) t2)
which in turn is known to be specialised already:
sum_every_second (not toggle) t2
So we can apply the fold step, if the root case comes together with the
producer again. What do we do when during transformation a non-root case
comes together with the producer? Answer: We generate a new function.
Comment: This is bullshit. It would be better to generate functions for
_all_ non root cases in a seperate phase before,
because currently after the transformation
phase (while interfacing with the backend) this is done anyway! BTW, the
every_second function has no non root case.
Generating a new function would be necessary in the following case:
consumer l =
1 + (case l of
Cons h t -> h + consumer t
Nil -> 0)
filter p l2 =
case l2 of
Nil -> Nil
Cons h2 t2 -> case p @ h2 of
True -> Cons h2 (filter p t2)
False -> filter p t2
Fusing these two would proceed like follows:
consumer_filter p l2 =
1 + (case<*consumer*> (case l2 of
Nil -> Nil
Cons h2 t2 -> case p h2 of
True -> Cons h2 (filter p t2)
False -> filter p t2) of
Cons h t -> h + consumer t
Nil -> 0 )
consumer_filter p l2 =
1 + (case l2 of
Nil -> case<*consumer*> Nil of
Cons h t -> h + consumer t
Nil -> 0
Cons h2 t2 -> case<*consumer*> (case p h2 of
True -> Cons h2 (filter p t2)
False -> filter p t2) of
Cons h t -> h + consumer t
Nil -> 0)
consumer_filter p l2 =
1 + (case l2 of
Nil -> 0
Cons h2 t2 -> case p h2 of
True -> case<*consumer*> Cons h2 (filter p t2)
Cons h t -> h + consumer t
Nil -> 0
False -> case<*consumer*> filter p t2 of
Cons h t -> h + consumer t
Nil -> 0)
consumer_filter p l2 =
1 + (case l2 of
Nil -> 0
Cons h2 t2 -> case p h2 of
True -> h2 + (consumer_filter p t2)
False -> case<*consumer*> filter p t2 of
Cons h t -> h + consumer t
Nil -> 0)
Now we should _not_ apply the fold step like
consumer_filter p l2 =
1 + (case l2 of
Nil -> 0
Cons h2 t2 -> case p h2 of
True -> h2 + (consumer_filter p t2)
False -> consumer_filter p t2
Then (consumer_filter isPositive [-1]) would evaluate to 2 while
(consumer (filter isPositive [-1])) would evaluate to 1!
Instead we do so as if consumer was defined like
consumer l = 1 + (consumer_case l)
consumer_case l = case l of
Cons h t -> h + consumer t
Nil -> 0
Now we can do the fold step. So the result would be
consumer_filter p l2 = 1+(consumer_case_filter p l2)
consumer_case_filter p l2
= case l2 of
Nil -> 0
Cons h2 t2 -> case p h2 of
True -> h2 + (consumer_filter p t2)
False -> consumer_case_filter p t2
The additional functions that are generated for non root cases
are created in function "generate_case_function".
To finish this overview I give an example for fold optimalisation:
foldl f l init =
case l of Nil -> init
Cons h t -> foldl f t (f @ h init)
Start = foldl (+) [] 0
Classification:
foldl f:(cActive, not linear) l:(cActive, linear) init:(cAccumulating, linear)
The first arg is active because it appears on the RHS of @.
I don't fuse the constructor []! (Sjaak does -> code explosion)
But I specialize in the first arg. Being not linear doesn't matter
for higher order arguments, because there is no calculation
duplicated (same with dictionaries)
foldl_plus l init =
case l of Nil -> init
Cons h t -> foldl_plus t (h + init)
Start = foldl_plus [] 0
And I give example for overloading specialisation
The programmer writes:
gimme_da_first :: (a e) -> e | Array a e
gimme_da_first a = a.[0]
gimme_a_strict_array :: {!Int}
gimme_a_strict_array = {42}
Start = gimme_da_first gimme_a_strict_array
The compiler makes from this while typing:
:: ArrayDictionary a e =
{ select :: (a e) Int -> e
, update :: (a e) Int e -> (a e)
, size :: (a e) -> Int
..
gimme_da_first :: (ArrayDictionary a e) (a e) -> e
gimme_da_first dictionary a = dictionary.select @ a 0
Start = gimme_da_first { select = strictArraySelect, update=strictArrayUpdate,
size = strictArraySize .. }
gimme_a_strict_array
We see that the first dictionary argument of "gimme_da_first" is classified as
a consumer (cActive because of the selection and even linear). So we fuse
"gimme_da_first" with the record constructor:
gimme_da_first_StrictArray :: {!e} -> e
gimme_da_first_StrictArray a = strictArraySelect a 0
Start = gimme_da_first_StrictArray gimme_a_strict_array
In this example we can see very clear that through fusion (be it
overloading specialisation, deforestation or fold optimalisation) the type
of the newly created function has become specialised, too. In this case
for (a e) the a has been substituted with {!} yielding {!e}
This happens by unifying the producer's result type with the type of the
argument of the producer. In this case we unify
(ArrayDictionary {!} e) // producer { select = strictArraySelect, ...
with (ArrayDictionary a e) // consumer gimme_da_first
yielding the substitution a->{!}
The transformation phase - Generating a new function
************************^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The function "generateFunction" is called when a function is specialised.
As the important parameters it gets the consumers function definition, the
classification of it's arguments and an array "prods" that contains a producer
description for each consumer argument. So generateFunction indeed
specialises one consumer with possibly multiple producers. A producer
description of type "Producer" indicates, whether the producer is a
dictionary, a curried application or a function.
First some trouble with types happens. As stated before we have to apply
type unification. Therefore we reuse the infrastructure from the typing
phase. Unfortunately this infrastructure makes certain assumptions about
the given types for which some work has to be done to fulfill them.
The unification algorithm assumes type variables in "integer form" (TempV)
instead of "pointer form" (TV). Furtheron we have to take care about
uniqueness information. We have to add propagation information (see function
"add_propagation_attributes). This will transform a type
consumer :: [*{Int}] -> Int
into
consumer :: *[*{Int}] -> Int
After giving every involved type variable a number we create an initial
type variable substitution "subst". This array assigns every type variable
(number) a type. With every producer this substitution can be further specialised.
The "determine_args" function is the called. To explain what this function
does here is an example (I write (List a) instead of [a]):
consumer :: (List a) (List b) (a, b) -> (Int, Int, Int)
consumer x y z = (case x of ..., case y of ..., fun0 z z z)
producer :: (List c) T2 -> (List Int)
producer1 a b = fun1 a b
producer2 :: T3 -> (List Real)
producer2 c = fun2 c
Our application is:
Start = consumer (producer1 E1 E2) (producer E3) E4
Because we are generating a new functions all variables in that function
should be fresh. The generated function should look like this:
consumer` a1 b1 c1 z1 = (case fun1 a1 b1 of ..., case fun2 c1 of ..., fun0 z1 z1 z1)
The RHS expression is later created by calling the "unfold" function. "unfold" substitutes
all local variables of a expression with fresh ones, but the free variables (here x, y and z)
are replaced with an expression on which the free variable has been bound before.
So determine_args establishes the following bindings (stored in the var_heap):
x -> fun1 a1 b1
y -> fun2 c1
z -> z1
As it's first result value determine_args delivers the new function args: [a1, b1, c1, z1]
What about the types of these arguments? The second result value of determine args
is an array whose elements either contain the producers argument types or (in case there
was no producer for that argument) the consumers original type. This would be in the
upper example:
{[List c, T2],[T3],[(a, b)]}
But the following type would be wrong:
consumer` :: (List c) T2 T3 (a, b) -> (Int, Int, Int)
The right type is
consumer` :: (List c1) T2 T3 (Int, Real) -> (Int, Int, Int) // c1 "fresh"
Fortunately we can infer this type correctly because "determine_args" corrects
our type variable substitution "subst" with every producer. So the
substitution would look like this
a -> Int // from producer1
b -> Real // from producer2
c -> c1 // fresh
Furtheron determine_args also accumulates uniqueness requirements: inequalities between
attributed types. Later the inference of the attributes of the type of the generated
function happens like in the typing phase: the (attribute less) substitution is lifted
and attribute coercions are derived from the expanded uniqueness requirements,
the coercion graph is partitionated, unused attribute variables are removed.
Finally, the "integer" type representation is again copied into a "pointer" type
representation.
After we have let "unfold" create the RHS of our new function, we of course transform
it.
There is a lot of type information stored in the ExpressionHeap. This information has
to be kept up to date when creating the RHS expression. Especially the type
information for case and let expressions has to be refreshened and specialized.
E.g. when in the upper original consumer for some local variable the polymorph
type "a" was stored then this should be "Int" in the specialized consumer`.
Status (June 2001)
******
The following issues are suboptimal:
- The algorithm does not terminate, e.g. when transforming:
zipWith f l1 l2 :== [f e1 e2 \\ e1<-l1 & e2<-l2]
l = [1 : zipWith (+) [1] l]
t [] = []
t [e:l] = [e: t l]
Start = t l
One could try to find criteria that ensure that fusion will terminate. Alternatively
one could also decide while fusing, whether this fusion should be aborted using
ad hoc methods (e.g. a counter that counts keeps track of the recursion depth, or
whatever). It can be quite nice if a compiler is known to terminate always.
- It is possible that through fusion some functions can get unused. Imagine a function
that is called only once. When this call is specialised the original won't be
called anymore. But this superflous function still remains in the functions array,
doesn't get garbage collected and is further processed (e.g. in the convertcases
phase). One could infer the components once more after the transformation phase
to deal with that issue.
- Local definitions don't get inlined, e.g. let "c" be a consumer and "p" a producer.
The follwing would be fused:
c p
while the following would not:
let pp = p in c pp
This could be improved.
- Only two functions are fused together, but a third one is not getting involved.
E.g. when fusing the follwing:
i x = x
producer x = i [x:producer x]
consumer l = case l of ...
the algorithm stops at
consumer_producer x = case (i [x:producer x]) of
- Multimatch cases are ignored (see above). The compiler generates lots of
multimatch cases (I remember the zip2 function).
- Only function arguments get deforested, but not e.g. the following
case [] of [] -> 1
The following are real bugs:
- Strict expressions get optimised away although they shouldn't
:: T = C !Int
producer = C (abort "ABORT")
consumer x = case x of C _ -> 1
Start = consumer producer
fusion result:
Start = consumer_producer
consumer_producer = 1
(no abort)
- It can happen that while fusing one sees that a case will never match:
producer = []
consumer [_:_] = 1
Start = consumer producer
fusion result:
Start = consumer_producer
consumer_producer = <never_matching_case>
- Deforestation should better be done _after_ strictness analysys. Look at:
consumer :: ![a] -> [Int]
consumer l
= [case l of
[] -> 0
[_:t] -> 1
]
producer = undef
Start = consumer producer
In this case "consumer" and "producer" may not be fused here although this
happens with the current implementation. The problem here is that the
argument of "consumer" has been made artificially strict. Possible solutions:
- VERBIEDEN!!!!
Never fuse consumer arguments that have been marked strict.
- If a consumer argument has been marked strict, ensure that it is indeed
strict. Therefore one has to reason about strictness of expressions.
A primitive strictness analysys could be part of the transformation
phase (prima!).
- The algorithm makes no distinction between cases that come from patterns/guards
and cases that were specified by the programmer. The algorithm assumes that no
case was specified by the programmer. I think two places where this leads into
wrong results are:
- the second return value of the consumerRequirements class
- the removeNeverMatchingSubcases function
|