1 files changed, 711 insertions, 0 deletions
diff --git a/coclmaindll/doc/trans.doc.txt b/coclmaindll/doc/trans.doc.txt
new file mode 100644
index 0000000..bd9a491
--- /dev/null
+++ b/coclmaindll/doc/trans.doc.txt
@@ -0,0 +1,711 @@
+This file contains documentation of the transformation phase.
+
+Contents
+  - Overview
+  - The analysation phase
+  - The transformation phase
+      - Overview
+      - Generating a new function
+  - Status
+
+Originally the transformation phase was designed to solve three tasks:
+ - specialisation of overloaded functions (in the following just
+   "overloading specialisation"):
+ - inlining of higher order arguments in functions that are
+   defined locally in a macro (Nijmegen slang: "fold optimalisation")
+ - deforestation
+
+All these optimisations have some things in common (more or less): they
+are all a kind of specialisation. Specialisation involves creating new
+versions of functions. That's why it was decided to implement one algorithm
+that can handle all three optimalisations at the same time.
+We call this algorithm the "fusion" algorithm.
+
+Overview
+********
+
+In fact there are two phases involved: 
+ - the analysation phase
+ - the transformation phase
+
+Goal of the analysys is to assign two properties to each icl function argument
+(or: function argument position): The consumer classification (::Int)
+and the linearity (::Bool). These values are returned by analyseGroups in the
+{! ConsClasses} value. Furtheron so called "active" cases are marked as such (these
+markings are stored in the ExpressionHeap). The consumer classification can be
+one of these four values:
+
+  cPassive: 
+    if none of the following
+  cActive: 
+    Only these function arguments can be specialized. The
+    variable ("x" below) appears at least in one of the following positions:
+      - x.member (record selection: could be overloading specialised)
+      - x @ ... (higher order app: fold optimalisation)
+      - case x of .. (could be deforested)
+      - in a call to a function that's also cActive in that argument
+  cAccumulating: 
+    function argument can't be specialized because there
+    is a recursive call where the argument is not a variable, which
+    could lead into non termination of that algorithm.
+  cVarOfMultimatchCase:
+    A multimatch case is a case where one constructor could match
+    twice, e.g.:
+
+      g x = case x of
+              Cons h t | p1 h -> t
+              Cons h t | p2 h -> t
+
+    Such cases were exluded from deforestation in a first
+    approximation because they were difficult to handle (e.g. linearity
+    analysys)
+
+The linearity classification indicates, well, whether a function argument
+is used linearily. Only linear variables can be substituted with expressions
+(otherwise work could be duplicated).
+
+The transformation phase uses this information to generate new functions.
+The transformation algorithm will run over all icl functions, beginning
+with the leafs of the component tree. For every icl function he will run
+through the body expression (postorder!) and when he finds a function
+appliation that can be specialised
+he will do so (if the function wasn't specialised in that way already).
+
+The criteria to specialize a function "f" in an argument "A" are the following:
+  - f must be cActive in that argument.
+  - A must be an application
+  - If A is an application of a dictionary constructor: YEAH, go for it!
+  - If A is a curried application and f is defined as a local macro function:
+    YEAH!!!
+  - If A is a function application (g E1 .. En) and f is linear in that
+    argument: YEAH! Deforest! Cut all trees.  
+Otherwise nothing happens.
+
+E.g. if (f E1 (g E2 E3) E4) meets the above condition then a new function
+f_g will be created and that function application will become
+
+  f_g E1 E2 E3 E4  (if g is a function)
+  f_g E1 E4        (if g is a dictionary constructor) (it's not always that simple)
+
+f is called the _consumer_ and g the _producer_.
+
+Of course a new type for f_g has to be derived.
+
+How is f_g created? In case g was
+  - a dictionary:the formal argument in f is substituted the dictionary application
+  - a curried function: the formal (higher order) argument in f is substituted
+    with "g"
+  - an uncurried function: the formal argument in f is substituted with the
+    _body_ of g.
+
+
+The analysation phase
+*********************
+
+The analysys happens component wise, beginning with the leafs.
+
+The fundamental function here is the consumerRequirements class whose
+instances run through an expression. It's main data structure it is working
+on is the *ConsClassSubst:=={#Int}. Each variable that is used in a function
+that is currently analysed is assigned a number. This number is an index into
+the ConsClassSubst array. So the ConsClassSubst assigns a consumer classification
+to every variable. The classification process reminds on typing. If an array
+element is negative then the corresponding variable currently has been assigned
+to one of the four classifications:
+
+  cPassive              :== -1
+  cActive               :== -2
+  cAccumulating         :== -3
+  cVarOfMultimatchCase  :== -4
+
+If it is non negative (say i) then this states that the classification is the same
+as the variable which corresponds to i.
+
+The instances of consumerRequirements return a boolean that indicates whether
+the expression is unsafe, e.g. in
+
+  case x of
+    Cons h t -> case p h of True -> t
+    ... 
+
+(assumed that these cases result from patterns and guards) the
+consumerRequirements instance would return True for the inner case,
+because this case could fail so that control would GOTO the outer case.
+In this case the boolean would be used to classify x as 
+cVarOfMultimatchCase.
+
+The ai_cur_ref_counts field of the AnalyseInfo record is used to
+determine the linearity of all variables, the arguments and the local
+variables.
+
+The ai_cases_of_vars_for_function field of the AnalyseInfo record 
+accumulates all cases of which the case_expr is a variable. At the end
+of analyzing a component the case_info_ptr is marked (in function
+set_case_expr_info) with an EEI_ActiveCase tag, if the corresponding
+variable is active. So after the analysation phase every case is
+either "active" or not.
+
+The information whether a case is active is stored with the
+case_info_ptr. But there is already type information stored in the
+ExpressionHeap. For every case the case_info_ptr points to an 
+EI_CaseType entry. For every "let" expression the let_info_ptr
+points to an EI_LetType entry. These entries assign the variables
+that are introduced by a pattern or let bind a type. This information
+is needed for lifting (also in the following phases). So I invented
+the EI_Extended constructor which allows to store several independent
+pieces of information for one pointer. At the end of the transformation
+phase the original heap contents has to be restored again
+(function "cleanup_attributes") (the advantages of functional
+programming shine through here very clearly).
+
+The transformation phase
+************************
+
+The transformation phase - Overview
+************************^^^^^^^^^^^
+
+There are two basic transformations that are applied for deforestation
+(overloading specialisation and the fold optimalisation are just simple
+inlining!):
+
+ - The "case in case" transformation:
+
+     case (case x of
+             P1 -> A1
+             P2 -> A2)
+       P3 -> A3
+       P4 -> A4
+   
+   transforms into
+
+     case x of
+       P1 -> case A1 of
+               P3 -> A3
+               P4 -> A4
+       P2 -> case A2 of
+               P3 -> A3
+               P4 -> A4
+       
+   This happens in function lift_case. Note that this transformation
+   duplicates code.
+
+ - The "match" transformation:
+
+     case (Cons A1 A2) of
+       Cons h t -> A3
+       ..
+
+   transforms into
+
+     let h=A1
+         t=A2
+     in A3
+
+   If "h" or "t" are used linearily in A3 the can be inlined, too.
+   The match transformation reduces the amount of code (all the other patterns).
+   Note that this transformation is only valid, if the case is not
+   a multimatch case 
+
+Let's consider some examples first:
+
+  sum l =
+    case l of
+      Nil -> 0
+      Cons h t -> +I h (sum t) // +I: plus on integers
+
+  every_second toggle l2 =
+    case l2 of
+      Cons h2 t2
+          -> case toggle of
+               True -> Cons h2 (every_second (not toggle) t2)
+               False -> (every_second (not toggle) t2)
+      Nil -> Nil
+
+The analysys phase will assign l (the argument of sum) "cActive" and "linear".
+"cActive" because 
+ - the recursive call has a variable (t) as argument (in case of a
+   non variable it would be cAccumulating)
+ - l is used as a case_expression (case l of) (otherwise it would be cPassive
+   or cVarOfMultiMatchCase
+ - the case is not a multimatch case
+"linear" because l appears only once
+
+The case in the sum function will be marked as "active case of 'sum' in 'l'":
+
+  sum l =
+    case<*sum*> l of ...
+
+Indeed, sum is a typical list consumer!
+
+The arguments of every_second are classified as:
+ - toggle:cAccumulating, because there is a recursive call (in fact two)
+     with a non variable argument (not toggle).
+ - l2: cActive (for the same reasons as "l" in function sum)
+
+Now if we transform the following function
+
+  Start = sum (every_second True [42,42,42])
+
+we see that we have an uncurried function application in the active
+consumer position of sum. We fuse these two functions by replacing
+"l" in sum with the body of every_second (happens in "generateFunction"):
+
+  Start = sum_every_second True [42,42,42]
+
+  sum_every_second toggle l2 =
+    case<*sum*> (case l2 of
+                   Cons h2 t2
+                    -> case toggle of
+                         True -> Cons h2 (every_second (not toggle) t2)
+                         False -> (every_second (not toggle) t2)
+                   Nil -> Nil) of
+      Nil -> 0
+      Cons h t -> +I h (sum t)
+
+This newly generated function will be stored in the FunctionHeap.
+
+Now we transform sum_every_second. First we apply the case in case
+transformation. 
+
+  sum_every_second toggle l2 =
+    case l2 of
+      Cons h2 t2
+          -> case<*sum*> (case toggle of
+                            True -> Cons h2 (every_second (not toggle) t2)
+                            False -> (every_second (not toggle) t2))
+               Nil -> 0
+               Cons h t -> +I h (sum t)
+      Nil -> case<*sum*> Nil of
+               Nil -> 0
+               Cons h t -> +I h (sum t)
+
+Now we apply the case in case transformation once more in the
+Cons case and in the Nil case we do the match:
+
+  sum_every_second toggle l2 =
+    case l2 of
+      Cons h2 t2
+        -> case toggle of
+             True  -> case<*sum*> Cons h2 (every_second (not toggle) t2) of
+                        Nil -> 0
+                        Cons h t -> +I h (sum t)
+             False -> case<*sum*> every_second (not toggle) t2 of
+                        Nil -> 0
+                        Cons h t -> +I h (sum t)
+      Nil -> 0
+
+Finally we apply the match in the True case and we do the FOLD step:
+
+  sum_every_second toggle l2 =
+    case l2 of
+      Cons h2 t2
+        -> case toggle of
+             True  -> +I h2 (sum (every_second (not toggle) t2))
+             False -> sum_every_second (not toggle) t2
+      Nil -> 0
+
+The fold step can be applied, because we know that this particular
+case is the root case of the sum function, hence 
+
+  case<*sum*> every_second (not toggle) t2 of
+    Nil -> 0
+    Cons h t -> +I h (sum t)
+
+is equivalent to 
+
+  sum (every_second (not toggle) t2)
+
+which in turn is known to be specialised already:
+
+  sum_every_second (not toggle) t2
+
+So we can apply the fold step, if the root case comes together with the
+producer again. What do we do when during transformation a non-root case
+comes together with the producer? Answer: We generate a new function.
+Comment: This is bullshit. It would be better to generate functions for
+_all_ non root cases in a seperate phase before,
+because currently after the transformation
+phase (while interfacing with the backend) this is done anyway! BTW, the 
+every_second function has no non root case.
+
+Generating a new function would be necessary in the following case:
+
+  consumer l = 
+    1 + (case l of
+           Cons h t -> h + consumer t
+           Nil -> 0)
+
+  filter p l2 =
+    case l2 of
+      Nil -> Nil 
+      Cons h2 t2 -> case p @ h2 of
+                      True -> Cons h2 (filter p t2)
+                      False -> filter p t2
+
+Fusing these two would proceed like follows:
+
+  consumer_filter p l2 =
+    1 + (case<*consumer*> (case l2 of
+                             Nil -> Nil 
+                             Cons h2 t2 -> case p h2 of
+                                             True -> Cons h2 (filter p t2)
+                                             False -> filter p t2) of
+           Cons h t -> h + consumer t
+           Nil      -> 0 )
+
+  consumer_filter p l2 =
+    1 + (case l2 of
+           Nil        -> case<*consumer*> Nil of
+                           Cons h t -> h + consumer t
+                           Nil      -> 0
+           Cons h2 t2 -> case<*consumer*> (case p h2 of
+                                             True -> Cons h2 (filter p t2)
+                                             False -> filter p t2) of
+                           Cons h t -> h + consumer t
+                           Nil      -> 0) 
+
+  consumer_filter p l2 =
+    1 + (case l2 of
+           Nil        -> 0
+           Cons h2 t2 -> case p h2 of
+                           True  -> case<*consumer*> Cons h2 (filter p t2)
+                                      Cons h t -> h + consumer t
+                                      Nil      -> 0
+                           False -> case<*consumer*> filter p t2 of
+                                      Cons h t -> h + consumer t
+                                      Nil      -> 0) 
+
+  consumer_filter p l2 =
+    1 + (case l2 of
+           Nil        -> 0
+           Cons h2 t2 -> case p h2 of
+                           True  -> h2 + (consumer_filter p t2)
+                           False -> case<*consumer*> filter p t2 of
+                                      Cons h t -> h + consumer t
+                                      Nil      -> 0) 
+
+Now we should _not_ apply the fold step like
+
+  consumer_filter p l2 =
+    1 + (case l2 of
+           Nil        -> 0
+           Cons h2 t2 -> case p h2 of
+                           True  -> h2 + (consumer_filter p t2)
+                           False -> consumer_filter p t2
+
+Then (consumer_filter isPositive [-1]) would evaluate to 2 while
+(consumer (filter isPositive [-1])) would evaluate to 1!
+
+Instead we do so as if consumer was defined like
+
+  consumer l = 1 + (consumer_case l)
+  consumer_case l = case l of
+                      Cons h t -> h + consumer t
+                      Nil -> 0
+
+Now we can do the fold step. So the result would be
+
+  consumer_filter p l2  = 1+(consumer_case_filter p l2)
+  consumer_case_filter p l2
+    = case l2 of
+           Nil        -> 0
+           Cons h2 t2 -> case p h2 of
+                           True  -> h2 + (consumer_filter p t2)
+                           False -> consumer_case_filter p t2
+
+The additional functions that are generated for non root cases
+are created in function "generate_case_function".
+
+To finish this overview I give an example for fold optimalisation:
+
+  foldl f l init =
+    case l of Nil -> init
+              Cons h t -> foldl f t (f @ h init)
+
+  Start = foldl (+) [] 0
+
+Classification:
+   foldl f:(cActive, not linear) l:(cActive, linear) init:(cAccumulating, linear)
+
+The first arg is active because it appears on the RHS of @.
+
+I don't fuse the constructor []! (Sjaak does -> code explosion)
+
+But I specialize in the first arg. Being not linear doesn't matter
+for higher order arguments, because there is no calculation 
+duplicated (same with dictionaries)
+
+  foldl_plus l init =
+    case l of Nil -> init
+              Cons h t -> foldl_plus t (h + init)
+
+  Start = foldl_plus [] 0
+
+And I give example for overloading specialisation
+
+The programmer writes:
+
+  gimme_da_first :: (a e) -> e | Array a e
+  gimme_da_first a = a.[0]
+
+  gimme_a_strict_array :: {!Int}
+  gimme_a_strict_array = {42}
+  
+  Start = gimme_da_first gimme_a_strict_array
+
+The compiler makes from this while typing:
+
+  :: ArrayDictionary a e =
+      { select :: (a e) Int -> e
+      , update :: (a e) Int e -> (a e)
+      , size   :: (a e) -> Int
+      ..  
+
+  gimme_da_first :: (ArrayDictionary a e) (a e) -> e
+  gimme_da_first dictionary a = dictionary.select @ a 0
+
+  Start = gimme_da_first { select = strictArraySelect, update=strictArrayUpdate,
+                           size = strictArraySize .. }
+                         gimme_a_strict_array
+
+We see that the first dictionary argument of "gimme_da_first" is classified as
+a consumer (cActive because of the selection and even linear). So we fuse
+"gimme_da_first" with the record constructor:
+
+  gimme_da_first_StrictArray :: {!e} -> e
+  gimme_da_first_StrictArray a = strictArraySelect a 0
+
+  Start = gimme_da_first_StrictArray gimme_a_strict_array
+
+In this example we can see very clear that through fusion (be it 
+overloading specialisation, deforestation or fold optimalisation) the type
+of the newly created function has become specialised, too. In this case
+for (a e) the a has been substituted with {!} yielding {!e}
+
+This happens by unifying the producer's result type with the type of the
+argument of the producer. In this case we unify
+
+       (ArrayDictionary {!} e) // producer { select = strictArraySelect, ...
+  with (ArrayDictionary a   e) // consumer gimme_da_first  
+  
+yielding the substitution a->{!}
+
+
+The transformation phase - Generating a new function
+************************^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The function "generateFunction" is called when a function is specialised.
+
+As the important parameters it gets the consumers function definition, the
+classification of it's arguments and an array "prods" that contains a producer
+description for each consumer argument. So generateFunction indeed
+specialises one consumer with possibly multiple producers. A producer
+description of type "Producer" indicates, whether the producer is a
+dictionary, a curried application or a function.
+
+First some trouble with types happens. As stated before we have to apply
+type unification. Therefore we reuse the infrastructure from the typing
+phase. Unfortunately this infrastructure makes certain assumptions about
+the given types for which some work has to be done to fulfill them.
+
+The unification algorithm assumes type variables in "integer form" (TempV) 
+instead of "pointer form" (TV). Furtheron we have to take care about
+uniqueness information. We have to add propagation information (see function
+"add_propagation_attributes). This will transform a type
+
+  consumer :: [*{Int}] -> Int
+
+into
+
+  consumer :: *[*{Int}] -> Int
+
+After giving every involved type variable a number we create an initial
+type variable substitution "subst". This array assigns every type variable
+(number) a type. With every producer this substitution can be further specialised.
+
+The "determine_args" function is the called. To explain what this function
+does here is an example (I write (List a) instead of [a]):
+
+  consumer :: (List a) (List b) (a, b) -> (Int, Int, Int)
+  consumer x y z = (case x of ..., case y of ..., fun0 z z z)
+
+  producer :: (List c) T2 -> (List Int)
+  producer1 a b = fun1 a b
+
+  producer2 :: T3 -> (List Real)
+  producer2 c = fun2 c
+  
+Our application is:
+
+  Start = consumer (producer1 E1 E2) (producer E3) E4
+
+Because we are generating a new functions all variables in that function
+should be fresh. The generated function should look like this:
+ 
+  consumer` a1 b1 c1 z1 = (case fun1 a1 b1 of ..., case fun2 c1 of ..., fun0 z1 z1 z1)
+
+The RHS expression is later created by calling the "unfold" function. "unfold" substitutes
+all local variables of a expression with fresh ones, but the free variables (here x, y and z)
+are replaced with an expression on which the free variable has been bound before.
+So determine_args establishes the following bindings (stored in the var_heap):
+
+  x -> fun1 a1 b1
+  y -> fun2 c1
+  z -> z1
+
+As it's first result value determine_args delivers the new function args: [a1, b1, c1, z1]
+
+What about the types of these arguments? The second result value of determine args
+is an array whose elements either contain the producers argument types or (in case there
+was no producer for that argument) the consumers original type. This would be in the
+upper example:
+
+  {[List c, T2],[T3],[(a, b)]}
+
+But the following type would be wrong:
+
+  consumer` :: (List c) T2 T3 (a, b) -> (Int, Int, Int)
+
+The right type is
+
+  consumer` :: (List c1) T2 T3 (Int, Real) -> (Int, Int, Int) // c1 "fresh"
+
+Fortunately we can infer this type correctly because "determine_args" corrects
+our type variable substitution "subst" with every producer. So the 
+substitution would look like this
+
+  a -> Int      // from producer1
+  b -> Real     // from producer2
+  c -> c1       // fresh
+
+Furtheron determine_args also accumulates uniqueness requirements: inequalities between
+attributed types. Later the inference of the attributes of the type of the generated
+function happens like in the typing phase: the (attribute less) substitution is lifted
+and attribute coercions are derived from the expanded uniqueness requirements,
+the coercion graph is partitionated, unused attribute variables are removed.
+
+Finally, the "integer" type representation is again copied into a "pointer" type
+representation.
+
+After we have let "unfold" create the RHS of our new function, we of course transform
+it.
+
+There is a lot of type information stored in the ExpressionHeap. This information has
+to be kept up to date when creating the RHS expression. Especially the type
+information for case and let expressions has to be refreshened and specialized.
+E.g. when in the upper original consumer for some local variable the polymorph
+type "a" was stored then this should be "Int" in the specialized consumer`.
+
+Status  (June 2001)
+******
+
+The following issues are suboptimal:
+
+- The algorithm does not terminate, e.g. when transforming:
+
+    zipWith f l1 l2 :== [f e1 e2 \\ e1<-l1 & e2<-l2]
+
+    l = [1 : zipWith (+) [1] l]
+
+    t [] = []
+    t [e:l] = [e: t l]
+
+  Start = t l
+
+  One could try to find criteria that ensure that fusion will terminate. Alternatively
+  one could also decide while fusing, whether this fusion should be aborted using
+  ad hoc methods (e.g. a counter that counts keeps track of the recursion depth, or
+  whatever). It can be quite nice if a compiler is known to terminate always.
+
+- It is possible that through fusion some functions can get unused. Imagine a function
+  that is called only once. When this call is specialised the original won't be
+  called anymore. But this superflous function still remains in the functions array,
+  doesn't get garbage collected and is further processed (e.g. in the convertcases
+  phase). One could infer the components once more after the transformation phase
+  to deal with that issue.
+
+- Local definitions don't get inlined, e.g. let "c" be a consumer and "p" a producer.
+  The follwing would be fused:
+
+    c p
+
+  while the following would not:
+
+    let pp = p in c pp
+
+  This could be improved.
+
+- Only two functions are fused together, but a third one is not getting involved.
+  E.g. when fusing the follwing:
+
+    i x = x
+    producer x = i [x:producer x]
+    consumer l = case l of ...
+
+  the algorithm stops at
+
+    consumer_producer x = case (i [x:producer x]) of
+
+- Multimatch cases are ignored (see above). The compiler generates lots of
+  multimatch cases (I remember the zip2 function).
+
+- Only function arguments get deforested, but not e.g. the following
+
+    case [] of [] -> 1
+
+The following are real bugs:
+
+- Strict expressions get optimised away although they shouldn't
+    
+    :: T = C !Int
+    producer = C (abort "ABORT")
+    consumer x = case x of C _ -> 1
+    Start = consumer producer
+
+  fusion result:
+
+    Start = consumer_producer
+    consumer_producer = 1
+  
+  (no abort)
+
+- It can happen that while fusing one sees that a case will never match:
+
+    producer = []
+    consumer [_:_] = 1
+    Start = consumer producer
+
+  fusion result:
+
+    Start = consumer_producer
+    consumer_producer = <never_matching_case>
+
+- Deforestation should better be done _after_ strictness analysys. Look at:
+
+    consumer :: ![a] -> [Int]
+    consumer l
+      = [case l of
+          [] -> 0
+          [_:t] -> 1
+        ]
+
+    producer = undef
+
+    Start = consumer producer
+
+  In this case "consumer" and "producer" may not be fused here although this
+  happens with the current implementation. The problem here is that the
+  argument of "consumer" has been made artificially strict. Possible solutions:
+  - VERBIEDEN!!!!
+    Never fuse consumer arguments that have been marked strict.
+  - If a consumer argument has been marked strict, ensure that it is indeed
+    strict. Therefore one has to reason about strictness of expressions.
+    A primitive strictness analysys could be part of the transformation
+    phase (prima!).
+
+- The algorithm makes no distinction between cases that come from patterns/guards
+  and cases that were specified by the programmer. The algorithm assumes that no
+  case was specified by the programmer. I think two places where this leads into
+  wrong results are:
+  - the second return value of the consumerRequirements class
+  - the removeNeverMatchingSubcases function