documentation

git-svn-id: https://svn.cs.ru.nl/repos/clean-compiler/trunk@509 1f8540f1-abd5-4d5b-9d24-4c5ce8603e2d
author: martinw 2001-06-29 11:53:15 +0000
committer: martinw 2001-06-29 11:53:15 +0000
commit: 181644b09de46cd200d83ec69b32b3ca740e59d0 (patch)
tree: f7c6d9dedc28a51f03f39d9e15b5bbe653099657 /coclmaindll
parent: disable PA_bug (diff)
2 files changed, 1046 insertions, 0 deletions
diff --git a/coclmaindll/doc/explicitImports.doc.txt b/coclmaindll/doc/explicitImports.doc.txt
new file mode 100644
index 0000000..b542f58
--- /dev/null
+++ b/coclmaindll/doc/explicitImports.doc.txt
@@ -0,0 +1,335 @@
+This file documents the mechanism used for explicit imports
+
+Basic problem: The modules are checked one after the other.
+When there are cycles between dcl modules, the algorithm has
+to solve imports from a module that is unchecked yet. The exported
+declarations of such an unchecked module are not known. 
+
+A difficulty in designing this algorithm was inctroduced by the
+fact that there can only be one symbol table at a time (because
+of the use of pointers). But for cyclic module dependencies
+one wants to incrementally build up symbol table information for
+all modules on the cycle at the same time.
+
+To solve this difficulty we introduced a new datastructure "ExplImpInfos", that can
+contain symbol table information for all modules at once:
+
+  :: *ExplImpInfos :== *{!*{!*ExplImpInfo}}
+  :: ExplImpInfo = ExplImpInfo Ident !.DeclaringModulesSet
+  :: DeclaringModulesSet :== IntKeyHashtable DeclarationInfo
+
+  :: DeclarationInfo = { di_decl :: !Declaration, ... }
+
+The outer array is indexed with the "module component number"
+(short:component number, see below, this is _not_ the module number).
+The inner array
+is indexed with the module component's "symbol number". We do
+not use pointers to identify symbols (like id_info). Instead
+we identify each symbol with a number. Caution: For one symbol
+(one id_info pointer) this number can vary from module component
+to module component! Finally each array element contains the "Ident"
+representation of the symbol and a "DeclaringModulesSet". This set,
+implemented as a hash table, contains the module numbers (not component numbers)
+of all currently known modules that declare/define the symbol, together with
+the "Declaration" information, which allows to find the original definition
+of that symbol's instance (no, I don't mean "instance of a class" here).
+
+When the dcl modules have to be checked, at first the dependency graph
+of these modules is partitioned (in function checkDclModules). Then an
+initial ExplImpInfos array is created. We only take symbols into account
+that somewhere appear in an explicit import statement and assign numbers to these
+symbols. This is also the place were the component numbers are created. The parser
+delivers import statements as a "ParsedImport" in which symbols are identified
+by pointers. These pointers are translated into symbol
+numbers (type "ImportNrAndIdents"). Now we process all components
+beginning with the leafs.
+
+Processing a component:
+All explicit imports are solved before the symbol table for
+any module is actually built. Take the following modules:
+  _________________________
+  definition module t1
+  from t2 import ::T2, ::TDouble
+  :: T1
+  _________________________
+  definition module t2
+  from t1 import ::T1
+  from t4 import ::T4
+  import t3
+  :: T2
+  _________________________
+  definition module t3
+  :: TDouble
+  _________________________
+  definition module t4
+  :: TDouble
+  :: T4
+  
+We assume that t1 becomes checked before t2. The problem here
+is to find out that t1 imports ::TDouble from t3 and not from
+t4
+
+Let's assume the following:
+  module t1 has been assigned number 11
+  module t2 ------------"----------- 12
+  module t3 ------------"----------- 13
+  module t4 ------------"----------- 14
+  component {t1,t2} has been assigned number 0
+  component {t3}    ------------"----------- 1
+  component {t4}    ------------"----------- 2
+  ::TDouble has been assigned nr 0 in component nr 0
+  ::T1      ------"------------- 1 -------"---------
+  ::T2      ------"------------- 2 -------"---------
+  ::T4      ------"------------- 3 -------"---------
+
+So the inital "ExplImpInfos" structure looks like this
+(we use a list notation here to denote the hashtable contents)
+
+        { {(TDouble,[]), (T1,[]), (T2,[]), (T4,[])}
+        , {}
+        , {}
+        }
+
+It is always known which components import directly from the actual
+module (this info is stored in "super_components", which maps module indices to
+lists of component numbers)
+ 
+First lets say module t4 is checked. At the end of checking this
+module "ExplImpInfos" is updated for those components that directly
+import from t4: component nr 0 (={t1,t2}):
+
+        { {(TDouble,[14]), (T1,[]), (T2,[]), (T4,[14])}
+        , {}
+        , {}
+        }
+
+So now we got the information that within component 0 (={t1,t2}) the
+symbols TDouble and T4 could be imported from module nr. 14 (=t4).
+
+After checking module t3:
+
+        { {(TDouble,[14, 13]), (T1,[]), (T2,[]), (T4,[14])}
+        , {}
+        , {}
+        }
+
+Now we check component {t1,t2}. At the beginning of checking any module
+component we add all symbols that are defined (in contrast to declared)
+to the ExplImpInfos structure, here ::T1 and ::T2:
+
+        { {(TDouble,[14, 13]), (T1,[11]), (T2,[12]), (T4,[14])}
+        , {}
+        , {}
+        }
+
+Now we try to solve all explicit imports in module t1:
+
+  from t2 import ::T2, ::TDouble
+  
+This involves a depth first search algorithm (function 
+"depth_first_search" in module explicitimports). 
+
+At first we search a path from module t1 to the definition of ::T2.
+
+The search will begin in module t2. We can infer that ::T2 is 
+indeed defined in t2 from
+the ExplImpInfos structure: We simply search the module t2 (nr. 12) in the
+entry for that symbol in the current component:
+
+        { {(TDouble,[14, 13]), (T1,[11]), (T2,[12]), (T4,[14])}
+                                          ^^^^^^^^^
+
+So we found the definition of ::T2. 
+
+The statement in module t1 was:
+
+  from t2 import ::T2, ::TDouble
+
+So we have to search for ::TDouble now. Beginning in t2 we can skip
+the following two imports, because they can't import ::TDouble:
+
+  from t1 import ::T1
+  from t4 import ::T4
+
+The remaining import statement
+
+  import t3
+
+is followed by our depth search algorithm. Again we infer from
+our ExplImpInfos structure that ::TDouble is indeed declared in
+module t3 (nr 13):
+
+        { {(TDouble,[14, 13]), (T1,[11]), (T2,[12]), (T4,[14])}
+                        ^^^^
+
+So we solved that import, too. Always when we have found a path to
+a desired symbol we will store that new information in the ExplImpInfos
+structure for possible later use. In this case we only add the fact that
+::TDouble is delared in module t2:
+
+        { {(TDouble,[14, 13, 12]), (T1,[11]), (T2,[12,11]), (T4,[14])}
+
+If there would have been a third module in the {t1, t2} component that
+would have tried to import ::TDouble from t2, we wouldn't have had 
+to search the path to the definition again (kinda cache).
+
+(The presentation of this example ends here.)
+
+The algorithm distinguishes between two kinds of symbols: top
+level symbols and belonging symbols. Top level symbols are types,
+functions, classes and instances. Belonging symbols are
+those that can occur in brackets in an explicit import
+statement: constructors (belonging to a type), fields (dito)
+an members (belonging to a class). At first all imports of
+top level symbols are solved. Later the belonging symbols are
+searched. This happens in a different manner, because for these
+symbols we don't assign numbers in the beginning. We don't assign
+numbers because the belonging symbols don't have to be explicitly
+stated in an import statement. For instance in
+  
+  from m import ::T(..)
+
+we could assign a number to the top level symbol ::T but not to
+the belonging constructor symbols: just _after_ solving the type
+symbol we know the constructors. We still have to apply a depth
+first search for every constructor of ::T, because it could happen
+that some of them are _not_ exported by "m". We have to find
+out which of them.
+
+Belonging symbols are identified by two values: the number of the
+corresponding top level symbol and the "ds_index" of the belonging
+symbol. E.g. with
+
+:: T = C0 | C1 |  C2
+
+the "ds_index" for C0 would be 0, and the "ds_index" for C2 would be 2.
+
+To search for a belonging symbol one needs both numbers. The ExplImpInfos
+data structure stores for each symbol in which modules it is declared. But
+there are only entries for the top level symbols in each hash table. To
+infer from the ExplImpInfos data whether a given belonging symbol is
+declared in a given module we use a field of the "DeclarationInfo": The
+"di_belonging" field which is a kind of bit vector. E.g. to test whether
+the upper belonging symbol "C1" is declared in module nr i we first check
+whether the (already known) top level symbol ::T is declared in module i. If
+not, then "C1" cannot be declared there either. If yes, we consult the second
+bit in the "di_belonging" bit vector.
+
+Some additional info:
+
+The depth search algorithm only visits modules within the actual component
+and those that are directly imported from the actual component. E.g. with
+"m1" importing only from "m2" and "m2" importing only from "m3", while
+checking "m1" module "m3" would never be visited.
+
+
+Status
+******
+
+1)
+
+One thing that doesn't work: macro members.
+E.g. the Ord class in module StdClass defines a "member" <=
+which indeed is a macro:
+
+  class Ord a	| < a
+  where
+    (<=) infix 4 :: !a !a -> Bool | Ord a
+    (<=) x y :== not (y<x)
+
+The following will fail:
+
+  from StdClass import class Ord(<=)
+  "<= does not belong to Ord"
+
+Do the following instead:
+
+  from StdClass import class Ord, <=
+
+Also using ".." doesn't import <= :
+
+  from StdClass import class Ord(..)
+  Start = 1<=2
+  "<= undefined"
+
+BTW:
+
+  from StdClass import Ord
+  Start = 1<=2
+
+makes Clean 1.3.3 crash
+
+2)
+
+There is an optimisation that could still be applied.
+
+Imagine the following situation:
+  ______________________
+  definition module t1
+  import m1,m2, .. mn 
+  from t2 import ::T
+  ______________________
+  definition module mi (for all i)
+  from t2 import ::T
+  ______________________
+  definition module t2
+  ::T
+  import m1,m2, .. mn 
+  ______________________
+
+
+There are no cycles. First t2 would be checked, then the mi and then t1.
+The ExplImpInfos data structure would look like this just before checking t1:
+
+   { {(T,[t2, m0, m1, ..mn])} ..
+
+But because the explicit import of ::T in module t1 is _not_ via any module
+within the component (there is only one: t1 itself) it would be completely
+sufficient just to store 
+
+   { {(T,[t2])} ..
+
+When resolving the explicit import we would never search the mi!
+
+I don't know whether you could gain much with such an optimisation.
+
+
+A better optimisation:
+
+The ExplImpInfos data structure is massively unique. The elements of
+the two dimensional array are unique, too. So one has to select the
+array elements with the "replace" mechanism. Unfortunately there is no
+"replace" like primitive for two dimensional arrays. I implemented
+this one:
+
+  replaceTwoDimArrElt :: !Int !Int !.e !{!*{!.e}} -> (!.e, !{!.{!.e}})
+  
+But this one allocates a new array with every call. In a (quite constructed)
+test case I measured that 4% of allocation was done by allocating arrays.
+I guess this was the array in replaceTwoDimArrElt. One can solve this problem
+by faking around with casts or abc code.
+
+An even more better optimisation:
+
+This optimisation doesn't deal with explicit imports, but with implicit
+imports. Consider the following situation:
+
+  ______________________
+  definition module t1
+  import m1,m2
+  ______________________
+  definition module m1
+  import StdIO
+  ______________________
+  definition module m2
+  import StdIO
+  ______________________
+
+When building the t1's symbol table the current algorithm will try to add
+StdIO's symbols _twice_ to that symbol table. In the first turn these
+symbols will be added indeed, but in the second turn trying to add new
+symbols of StdIO will not change the symbol table (because the symbol
+table already contains all these symbols). One could invent a mechanism
+that keeps track which modules have been added to the symbol table
+_as a whole_. In this way superflous work could be saved.
+
diff --git a/coclmaindll/doc/trans.doc.txt b/coclmaindll/doc/trans.doc.txt
new file mode 100644
index 0000000..bd9a491
--- /dev/null
+++ b/coclmaindll/doc/trans.doc.txt
@@ -0,0 +1,711 @@
+This file contains documentation of the transformation phase.
+
+Contents
+  - Overview
+  - The analysation phase
+  - The transformation phase
+      - Overview
+      - Generating a new function
+  - Status
+
+Originally the transformation phase was designed to solve three tasks:
+ - specialisation of overloaded functions (in the following just
+   "overloading specialisation"):
+ - inlining of higher order arguments in functions that are
+   defined locally in a macro (Nijmegen slang: "fold optimalisation")
+ - deforestation
+
+All these optimisations have some things in common (more or less): they
+are all a kind of specialisation. Specialisation involves creating new
+versions of functions. That's why it was decided to implement one algorithm
+that can handle all three optimalisations at the same time.
+We call this algorithm the "fusion" algorithm.
+
+Overview
+********
+
+In fact there are two phases involved: 
+ - the analysation phase
+ - the transformation phase
+
+Goal of the analysys is to assign two properties to each icl function argument
+(or: function argument position): The consumer classification (::Int)
+and the linearity (::Bool). These values are returned by analyseGroups in the
+{! ConsClasses} value. Furtheron so called "active" cases are marked as such (these
+markings are stored in the ExpressionHeap). The consumer classification can be
+one of these four values:
+
+  cPassive: 
+    if none of the following
+  cActive: 
+    Only these function arguments can be specialized. The
+    variable ("x" below) appears at least in one of the following positions:
+      - x.member (record selection: could be overloading specialised)
+      - x @ ... (higher order app: fold optimalisation)
+      - case x of .. (could be deforested)
+      - in a call to a function that's also cActive in that argument
+  cAccumulating: 
+    function argument can't be specialized because there
+    is a recursive call where the argument is not a variable, which
+    could lead into non termination of that algorithm.
+  cVarOfMultimatchCase:
+    A multimatch case is a case where one constructor could match
+    twice, e.g.:
+
+      g x = case x of
+              Cons h t | p1 h -> t
+              Cons h t | p2 h -> t
+
+    Such cases were exluded from deforestation in a first
+    approximation because they were difficult to handle (e.g. linearity
+    analysys)
+
+The linearity classification indicates, well, whether a function argument
+is used linearily. Only linear variables can be substituted with expressions
+(otherwise work could be duplicated).
+
+The transformation phase uses this information to generate new functions.
+The transformation algorithm will run over all icl functions, beginning
+with the leafs of the component tree. For every icl function he will run
+through the body expression (postorder!) and when he finds a function
+appliation that can be specialised
+he will do so (if the function wasn't specialised in that way already).
+
+The criteria to specialize a function "f" in an argument "A" are the following:
+  - f must be cActive in that argument.
+  - A must be an application
+  - If A is an application of a dictionary constructor: YEAH, go for it!
+  - If A is a curried application and f is defined as a local macro function:
+    YEAH!!!
+  - If A is a function application (g E1 .. En) and f is linear in that
+    argument: YEAH! Deforest! Cut all trees.  
+Otherwise nothing happens.
+
+E.g. if (f E1 (g E2 E3) E4) meets the above condition then a new function
+f_g will be created and that function application will become
+
+  f_g E1 E2 E3 E4  (if g is a function)
+  f_g E1 E4        (if g is a dictionary constructor) (it's not always that simple)
+
+f is called the _consumer_ and g the _producer_.
+
+Of course a new type for f_g has to be derived.
+
+How is f_g created? In case g was
+  - a dictionary:the formal argument in f is substituted the dictionary application
+  - a curried function: the formal (higher order) argument in f is substituted
+    with "g"
+  - an uncurried function: the formal argument in f is substituted with the
+    _body_ of g.
+
+
+The analysation phase
+*********************
+
+The analysys happens component wise, beginning with the leafs.
+
+The fundamental function here is the consumerRequirements class whose
+instances run through an expression. It's main data structure it is working
+on is the *ConsClassSubst:=={#Int}. Each variable that is used in a function
+that is currently analysed is assigned a number. This number is an index into
+the ConsClassSubst array. So the ConsClassSubst assigns a consumer classification
+to every variable. The classification process reminds on typing. If an array
+element is negative then the corresponding variable currently has been assigned
+to one of the four classifications:
+
+  cPassive              :== -1
+  cActive               :== -2
+  cAccumulating         :== -3
+  cVarOfMultimatchCase  :== -4
+
+If it is non negative (say i) then this states that the classification is the same
+as the variable which corresponds to i.
+
+The instances of consumerRequirements return a boolean that indicates whether
+the expression is unsafe, e.g. in
+
+  case x of
+    Cons h t -> case p h of True -> t
+    ... 
+
+(assumed that these cases result from patterns and guards) the
+consumerRequirements instance would return True for the inner case,
+because this case could fail so that control would GOTO the outer case.
+In this case the boolean would be used to classify x as 
+cVarOfMultimatchCase.
+
+The ai_cur_ref_counts field of the AnalyseInfo record is used to
+determine the linearity of all variables, the arguments and the local
+variables.
+
+The ai_cases_of_vars_for_function field of the AnalyseInfo record 
+accumulates all cases of which the case_expr is a variable. At the end
+of analyzing a component the case_info_ptr is marked (in function
+set_case_expr_info) with an EEI_ActiveCase tag, if the corresponding
+variable is active. So after the analysation phase every case is
+either "active" or not.
+
+The information whether a case is active is stored with the
+case_info_ptr. But there is already type information stored in the
+ExpressionHeap. For every case the case_info_ptr points to an 
+EI_CaseType entry. For every "let" expression the let_info_ptr
+points to an EI_LetType entry. These entries assign the variables
+that are introduced by a pattern or let bind a type. This information
+is needed for lifting (also in the following phases). So I invented
+the EI_Extended constructor which allows to store several independent
+pieces of information for one pointer. At the end of the transformation
+phase the original heap contents has to be restored again
+(function "cleanup_attributes") (the advantages of functional
+programming shine through here very clearly).
+
+The transformation phase
+************************
+
+The transformation phase - Overview
+************************^^^^^^^^^^^
+
+There are two basic transformations that are applied for deforestation
+(overloading specialisation and the fold optimalisation are just simple
+inlining!):
+
+ - The "case in case" transformation:
+
+     case (case x of
+             P1 -> A1
+             P2 -> A2)
+       P3 -> A3
+       P4 -> A4
+   
+   transforms into
+
+     case x of
+       P1 -> case A1 of
+               P3 -> A3
+               P4 -> A4
+       P2 -> case A2 of
+               P3 -> A3
+               P4 -> A4
+       
+   This happens in function lift_case. Note that this transformation
+   duplicates code.
+
+ - The "match" transformation:
+
+     case (Cons A1 A2) of
+       Cons h t -> A3
+       ..
+
+   transforms into
+
+     let h=A1
+         t=A2
+     in A3
+
+   If "h" or "t" are used linearily in A3 the can be inlined, too.
+   The match transformation reduces the amount of code (all the other patterns).
+   Note that this transformation is only valid, if the case is not
+   a multimatch case 
+
+Let's consider some examples first:
+
+  sum l =
+    case l of
+      Nil -> 0
+      Cons h t -> +I h (sum t) // +I: plus on integers
+
+  every_second toggle l2 =
+    case l2 of
+      Cons h2 t2
+          -> case toggle of
+               True -> Cons h2 (every_second (not toggle) t2)
+               False -> (every_second (not toggle) t2)
+      Nil -> Nil
+
+The analysys phase will assign l (the argument of sum) "cActive" and "linear".
+"cActive" because 
+ - the recursive call has a variable (t) as argument (in case of a
+   non variable it would be cAccumulating)
+ - l is used as a case_expression (case l of) (otherwise it would be cPassive
+   or cVarOfMultiMatchCase
+ - the case is not a multimatch case
+"linear" because l appears only once
+
+The case in the sum function will be marked as "active case of 'sum' in 'l'":
+
+  sum l =
+    case<*sum*> l of ...
+
+Indeed, sum is a typical list consumer!
+
+The arguments of every_second are classified as:
+ - toggle:cAccumulating, because there is a recursive call (in fact two)
+     with a non variable argument (not toggle).
+ - l2: cActive (for the same reasons as "l" in function sum)
+
+Now if we transform the following function
+
+  Start = sum (every_second True [42,42,42])
+
+we see that we have an uncurried function application in the active
+consumer position of sum. We fuse these two functions by replacing
+"l" in sum with the body of every_second (happens in "generateFunction"):
+
+  Start = sum_every_second True [42,42,42]
+
+  sum_every_second toggle l2 =
+    case<*sum*> (case l2 of
+                   Cons h2 t2
+                    -> case toggle of
+                         True -> Cons h2 (every_second (not toggle) t2)
+                         False -> (every_second (not toggle) t2)
+                   Nil -> Nil) of
+      Nil -> 0
+      Cons h t -> +I h (sum t)
+
+This newly generated function will be stored in the FunctionHeap.
+
+Now we transform sum_every_second. First we apply the case in case
+transformation. 
+
+  sum_every_second toggle l2 =
+    case l2 of
+      Cons h2 t2
+          -> case<*sum*> (case toggle of
+                            True -> Cons h2 (every_second (not toggle) t2)
+                            False -> (every_second (not toggle) t2))
+               Nil -> 0
+               Cons h t -> +I h (sum t)
+      Nil -> case<*sum*> Nil of
+               Nil -> 0
+               Cons h t -> +I h (sum t)
+
+Now we apply the case in case transformation once more in the
+Cons case and in the Nil case we do the match:
+
+  sum_every_second toggle l2 =
+    case l2 of
+      Cons h2 t2
+        -> case toggle of
+             True  -> case<*sum*> Cons h2 (every_second (not toggle) t2) of
+                        Nil -> 0
+                        Cons h t -> +I h (sum t)
+             False -> case<*sum*> every_second (not toggle) t2 of
+                        Nil -> 0
+                        Cons h t -> +I h (sum t)
+      Nil -> 0
+
+Finally we apply the match in the True case and we do the FOLD step:
+
+  sum_every_second toggle l2 =
+    case l2 of
+      Cons h2 t2
+        -> case toggle of
+             True  -> +I h2 (sum (every_second (not toggle) t2))
+             False -> sum_every_second (not toggle) t2
+      Nil -> 0
+
+The fold step can be applied, because we know that this particular
+case is the root case of the sum function, hence 
+
+  case<*sum*> every_second (not toggle) t2 of
+    Nil -> 0
+    Cons h t -> +I h (sum t)
+
+is equivalent to 
+
+  sum (every_second (not toggle) t2)
+
+which in turn is known to be specialised already:
+
+  sum_every_second (not toggle) t2
+
+So we can apply the fold step, if the root case comes together with the
+producer again. What do we do when during transformation a non-root case
+comes together with the producer? Answer: We generate a new function.
+Comment: This is bullshit. It would be better to generate functions for
+_all_ non root cases in a seperate phase before,
+because currently after the transformation
+phase (while interfacing with the backend) this is done anyway! BTW, the 
+every_second function has no non root case.
+
+Generating a new function would be necessary in the following case:
+
+  consumer l = 
+    1 + (case l of
+           Cons h t -> h + consumer t
+           Nil -> 0)
+
+  filter p l2 =
+    case l2 of
+      Nil -> Nil 
+      Cons h2 t2 -> case p @ h2 of
+                      True -> Cons h2 (filter p t2)
+                      False -> filter p t2
+
+Fusing these two would proceed like follows:
+
+  consumer_filter p l2 =
+    1 + (case<*consumer*> (case l2 of
+                             Nil -> Nil 
+                             Cons h2 t2 -> case p h2 of
+                                             True -> Cons h2 (filter p t2)
+                                             False -> filter p t2) of
+           Cons h t -> h + consumer t
+           Nil      -> 0 )
+
+  consumer_filter p l2 =
+    1 + (case l2 of
+           Nil        -> case<*consumer*> Nil of
+                           Cons h t -> h + consumer t
+                           Nil      -> 0
+           Cons h2 t2 -> case<*consumer*> (case p h2 of
+                                             True -> Cons h2 (filter p t2)
+                                             False -> filter p t2) of
+                           Cons h t -> h + consumer t
+                           Nil      -> 0) 
+
+  consumer_filter p l2 =
+    1 + (case l2 of
+           Nil        -> 0
+           Cons h2 t2 -> case p h2 of
+                           True  -> case<*consumer*> Cons h2 (filter p t2)
+                                      Cons h t -> h + consumer t
+                                      Nil      -> 0
+                           False -> case<*consumer*> filter p t2 of
+                                      Cons h t -> h + consumer t
+                                      Nil      -> 0) 
+
+  consumer_filter p l2 =
+    1 + (case l2 of
+           Nil        -> 0
+           Cons h2 t2 -> case p h2 of
+                           True  -> h2 + (consumer_filter p t2)
+                           False -> case<*consumer*> filter p t2 of
+                                      Cons h t -> h + consumer t
+                                      Nil      -> 0) 
+
+Now we should _not_ apply the fold step like
+
+  consumer_filter p l2 =
+    1 + (case l2 of
+           Nil        -> 0
+           Cons h2 t2 -> case p h2 of
+                           True  -> h2 + (consumer_filter p t2)
+                           False -> consumer_filter p t2
+
+Then (consumer_filter isPositive [-1]) would evaluate to 2 while
+(consumer (filter isPositive [-1])) would evaluate to 1!
+
+Instead we do so as if consumer was defined like
+
+  consumer l = 1 + (consumer_case l)
+  consumer_case l = case l of
+                      Cons h t -> h + consumer t
+                      Nil -> 0
+
+Now we can do the fold step. So the result would be
+
+  consumer_filter p l2  = 1+(consumer_case_filter p l2)
+  consumer_case_filter p l2
+    = case l2 of
+           Nil        -> 0
+           Cons h2 t2 -> case p h2 of
+                           True  -> h2 + (consumer_filter p t2)
+                           False -> consumer_case_filter p t2
+
+The additional functions that are generated for non root cases
+are created in function "generate_case_function".
+
+To finish this overview I give an example for fold optimalisation:
+
+  foldl f l init =
+    case l of Nil -> init
+              Cons h t -> foldl f t (f @ h init)
+
+  Start = foldl (+) [] 0
+
+Classification:
+   foldl f:(cActive, not linear) l:(cActive, linear) init:(cAccumulating, linear)
+
+The first arg is active because it appears on the RHS of @.
+
+I don't fuse the constructor []! (Sjaak does -> code explosion)
+
+But I specialize in the first arg. Being not linear doesn't matter
+for higher order arguments, because there is no calculation 
+duplicated (same with dictionaries)
+
+  foldl_plus l init =
+    case l of Nil -> init
+              Cons h t -> foldl_plus t (h + init)
+
+  Start = foldl_plus [] 0
+
+And I give example for overloading specialisation
+
+The programmer writes:
+
+  gimme_da_first :: (a e) -> e | Array a e
+  gimme_da_first a = a.[0]
+
+  gimme_a_strict_array :: {!Int}
+  gimme_a_strict_array = {42}
+  
+  Start = gimme_da_first gimme_a_strict_array
+
+The compiler makes from this while typing:
+
+  :: ArrayDictionary a e =
+      { select :: (a e) Int -> e
+      , update :: (a e) Int e -> (a e)
+      , size   :: (a e) -> Int
+      ..  
+
+  gimme_da_first :: (ArrayDictionary a e) (a e) -> e
+  gimme_da_first dictionary a = dictionary.select @ a 0
+
+  Start = gimme_da_first { select = strictArraySelect, update=strictArrayUpdate,
+                           size = strictArraySize .. }
+                         gimme_a_strict_array
+
+We see that the first dictionary argument of "gimme_da_first" is classified as
+a consumer (cActive because of the selection and even linear). So we fuse
+"gimme_da_first" with the record constructor:
+
+  gimme_da_first_StrictArray :: {!e} -> e
+  gimme_da_first_StrictArray a = strictArraySelect a 0
+
+  Start = gimme_da_first_StrictArray gimme_a_strict_array
+
+In this example we can see very clear that through fusion (be it 
+overloading specialisation, deforestation or fold optimalisation) the type
+of the newly created function has become specialised, too. In this case
+for (a e) the a has been substituted with {!} yielding {!e}
+
+This happens by unifying the producer's result type with the type of the
+argument of the producer. In this case we unify
+
+       (ArrayDictionary {!} e) // producer { select = strictArraySelect, ...
+  with (ArrayDictionary a   e) // consumer gimme_da_first  
+  
+yielding the substitution a->{!}
+
+
+The transformation phase - Generating a new function
+************************^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The function "generateFunction" is called when a function is specialised.
+
+As the important parameters it gets the consumers function definition, the
+classification of it's arguments and an array "prods" that contains a producer
+description for each consumer argument. So generateFunction indeed
+specialises one consumer with possibly multiple producers. A producer
+description of type "Producer" indicates, whether the producer is a
+dictionary, a curried application or a function.
+
+First some trouble with types happens. As stated before we have to apply
+type unification. Therefore we reuse the infrastructure from the typing
+phase. Unfortunately this infrastructure makes certain assumptions about
+the given types for which some work has to be done to fulfill them.
+
+The unification algorithm assumes type variables in "integer form" (TempV) 
+instead of "pointer form" (TV). Furtheron we have to take care about
+uniqueness information. We have to add propagation information (see function
+"add_propagation_attributes). This will transform a type
+
+  consumer :: [*{Int}] -> Int
+
+into
+
+  consumer :: *[*{Int}] -> Int
+
+After giving every involved type variable a number we create an initial
+type variable substitution "subst". This array assigns every type variable
+(number) a type. With every producer this substitution can be further specialised.
+
+The "determine_args" function is the called. To explain what this function
+does here is an example (I write (List a) instead of [a]):
+
+  consumer :: (List a) (List b) (a, b) -> (Int, Int, Int)
+  consumer x y z = (case x of ..., case y of ..., fun0 z z z)
+
+  producer :: (List c) T2 -> (List Int)
+  producer1 a b = fun1 a b
+
+  producer2 :: T3 -> (List Real)
+  producer2 c = fun2 c
+  
+Our application is:
+
+  Start = consumer (producer1 E1 E2) (producer E3) E4
+
+Because we are generating a new functions all variables in that function
+should be fresh. The generated function should look like this:
+ 
+  consumer` a1 b1 c1 z1 = (case fun1 a1 b1 of ..., case fun2 c1 of ..., fun0 z1 z1 z1)
+
+The RHS expression is later created by calling the "unfold" function. "unfold" substitutes
+all local variables of a expression with fresh ones, but the free variables (here x, y and z)
+are replaced with an expression on which the free variable has been bound before.
+So determine_args establishes the following bindings (stored in the var_heap):
+
+  x -> fun1 a1 b1
+  y -> fun2 c1
+  z -> z1
+
+As it's first result value determine_args delivers the new function args: [a1, b1, c1, z1]
+
+What about the types of these arguments? The second result value of determine args
+is an array whose elements either contain the producers argument types or (in case there
+was no producer for that argument) the consumers original type. This would be in the
+upper example:
+
+  {[List c, T2],[T3],[(a, b)]}
+
+But the following type would be wrong:
+
+  consumer` :: (List c) T2 T3 (a, b) -> (Int, Int, Int)
+
+The right type is
+
+  consumer` :: (List c1) T2 T3 (Int, Real) -> (Int, Int, Int) // c1 "fresh"
+
+Fortunately we can infer this type correctly because "determine_args" corrects
+our type variable substitution "subst" with every producer. So the 
+substitution would look like this
+
+  a -> Int      // from producer1
+  b -> Real     // from producer2
+  c -> c1       // fresh
+
+Furtheron determine_args also accumulates uniqueness requirements: inequalities between
+attributed types. Later the inference of the attributes of the type of the generated
+function happens like in the typing phase: the (attribute less) substitution is lifted
+and attribute coercions are derived from the expanded uniqueness requirements,
+the coercion graph is partitionated, unused attribute variables are removed.
+
+Finally, the "integer" type representation is again copied into a "pointer" type
+representation.
+
+After we have let "unfold" create the RHS of our new function, we of course transform
+it.
+
+There is a lot of type information stored in the ExpressionHeap. This information has
+to be kept up to date when creating the RHS expression. Especially the type
+information for case and let expressions has to be refreshened and specialized.
+E.g. when in the upper original consumer for some local variable the polymorph
+type "a" was stored then this should be "Int" in the specialized consumer`.
+
+Status  (June 2001)
+******
+
+The following issues are suboptimal:
+
+- The algorithm does not terminate, e.g. when transforming:
+
+    zipWith f l1 l2 :== [f e1 e2 \\ e1<-l1 & e2<-l2]
+
+    l = [1 : zipWith (+) [1] l]
+
+    t [] = []
+    t [e:l] = [e: t l]
+
+  Start = t l
+
+  One could try to find criteria that ensure that fusion will terminate. Alternatively
+  one could also decide while fusing, whether this fusion should be aborted using
+  ad hoc methods (e.g. a counter that counts keeps track of the recursion depth, or
+  whatever). It can be quite nice if a compiler is known to terminate always.
+
+- It is possible that through fusion some functions can get unused. Imagine a function
+  that is called only once. When this call is specialised the original won't be
+  called anymore. But this superflous function still remains in the functions array,
+  doesn't get garbage collected and is further processed (e.g. in the convertcases
+  phase). One could infer the components once more after the transformation phase
+  to deal with that issue.
+
+- Local definitions don't get inlined, e.g. let "c" be a consumer and "p" a producer.
+  The follwing would be fused:
+
+    c p
+
+  while the following would not:
+
+    let pp = p in c pp
+
+  This could be improved.
+
+- Only two functions are fused together, but a third one is not getting involved.
+  E.g. when fusing the follwing:
+
+    i x = x
+    producer x = i [x:producer x]
+    consumer l = case l of ...
+
+  the algorithm stops at
+
+    consumer_producer x = case (i [x:producer x]) of
+
+- Multimatch cases are ignored (see above). The compiler generates lots of
+  multimatch cases (I remember the zip2 function).
+
+- Only function arguments get deforested, but not e.g. the following
+
+    case [] of [] -> 1
+
+The following are real bugs:
+
+- Strict expressions get optimised away although they shouldn't
+    
+    :: T = C !Int
+    producer = C (abort "ABORT")
+    consumer x = case x of C _ -> 1
+    Start = consumer producer
+
+  fusion result:
+
+    Start = consumer_producer
+    consumer_producer = 1
+  
+  (no abort)
+
+- It can happen that while fusing one sees that a case will never match:
+
+    producer = []
+    consumer [_:_] = 1
+    Start = consumer producer
+
+  fusion result:
+
+    Start = consumer_producer
+    consumer_producer = <never_matching_case>
+
+- Deforestation should better be done _after_ strictness analysys. Look at:
+
+    consumer :: ![a] -> [Int]
+    consumer l
+      = [case l of
+          [] -> 0
+          [_:t] -> 1
+        ]
+
+    producer = undef
+
+    Start = consumer producer
+
+  In this case "consumer" and "producer" may not be fused here although this
+  happens with the current implementation. The problem here is that the
+  argument of "consumer" has been made artificially strict. Possible solutions:
+  - VERBIEDEN!!!!
+    Never fuse consumer arguments that have been marked strict.
+  - If a consumer argument has been marked strict, ensure that it is indeed
+    strict. Therefore one has to reason about strictness of expressions.
+    A primitive strictness analysys could be part of the transformation
+    phase (prima!).
+
+- The algorithm makes no distinction between cases that come from patterns/guards
+  and cases that were specified by the programmer. The algorithm assumes that no
+  case was specified by the programmer. I think two places where this leads into
+  wrong results are:
+  - the second return value of the consumerRequirements class
+  - the removeNeverMatchingSubcases function
author	martinw	2001-06-29 11:53:15 +0000
committer	martinw	2001-06-29 11:53:15 +0000
commit	181644b09de46cd200d83ec69b32b3ca740e59d0 (patch)
tree	f7c6d9dedc28a51f03f39d9e15b5bbe653099657 /coclmaindll
parent	disable PA_bug (diff)