diff options
Diffstat (limited to 'coclmaindll/doc/explicitImports.doc.txt')
-rw-r--r-- | coclmaindll/doc/explicitImports.doc.txt | 335 |
1 files changed, 335 insertions, 0 deletions
diff --git a/coclmaindll/doc/explicitImports.doc.txt b/coclmaindll/doc/explicitImports.doc.txt new file mode 100644 index 0000000..b542f58 --- /dev/null +++ b/coclmaindll/doc/explicitImports.doc.txt @@ -0,0 +1,335 @@ +This file documents the mechanism used for explicit imports + +Basic problem: The modules are checked one after the other. +When there are cycles between dcl modules, the algorithm has +to solve imports from a module that is unchecked yet. The exported +declarations of such an unchecked module are not known. + +A difficulty in designing this algorithm was inctroduced by the +fact that there can only be one symbol table at a time (because +of the use of pointers). But for cyclic module dependencies +one wants to incrementally build up symbol table information for +all modules on the cycle at the same time. + +To solve this difficulty we introduced a new datastructure "ExplImpInfos", that can +contain symbol table information for all modules at once: + + :: *ExplImpInfos :== *{!*{!*ExplImpInfo}} + :: ExplImpInfo = ExplImpInfo Ident !.DeclaringModulesSet + :: DeclaringModulesSet :== IntKeyHashtable DeclarationInfo + + :: DeclarationInfo = { di_decl :: !Declaration, ... } + +The outer array is indexed with the "module component number" +(short:component number, see below, this is _not_ the module number). +The inner array +is indexed with the module component's "symbol number". We do +not use pointers to identify symbols (like id_info). Instead +we identify each symbol with a number. Caution: For one symbol +(one id_info pointer) this number can vary from module component +to module component! Finally each array element contains the "Ident" +representation of the symbol and a "DeclaringModulesSet". This set, +implemented as a hash table, contains the module numbers (not component numbers) +of all currently known modules that declare/define the symbol, together with +the "Declaration" information, which allows to find the original definition +of that symbol's instance (no, I don't mean "instance of a class" here). + +When the dcl modules have to be checked, at first the dependency graph +of these modules is partitioned (in function checkDclModules). Then an +initial ExplImpInfos array is created. We only take symbols into account +that somewhere appear in an explicit import statement and assign numbers to these +symbols. This is also the place were the component numbers are created. The parser +delivers import statements as a "ParsedImport" in which symbols are identified +by pointers. These pointers are translated into symbol +numbers (type "ImportNrAndIdents"). Now we process all components +beginning with the leafs. + +Processing a component: +All explicit imports are solved before the symbol table for +any module is actually built. Take the following modules: + _________________________ + definition module t1 + from t2 import ::T2, ::TDouble + :: T1 + _________________________ + definition module t2 + from t1 import ::T1 + from t4 import ::T4 + import t3 + :: T2 + _________________________ + definition module t3 + :: TDouble + _________________________ + definition module t4 + :: TDouble + :: T4 + +We assume that t1 becomes checked before t2. The problem here +is to find out that t1 imports ::TDouble from t3 and not from +t4 + +Let's assume the following: + module t1 has been assigned number 11 + module t2 ------------"----------- 12 + module t3 ------------"----------- 13 + module t4 ------------"----------- 14 + component {t1,t2} has been assigned number 0 + component {t3} ------------"----------- 1 + component {t4} ------------"----------- 2 + ::TDouble has been assigned nr 0 in component nr 0 + ::T1 ------"------------- 1 -------"--------- + ::T2 ------"------------- 2 -------"--------- + ::T4 ------"------------- 3 -------"--------- + +So the inital "ExplImpInfos" structure looks like this +(we use a list notation here to denote the hashtable contents) + + { {(TDouble,[]), (T1,[]), (T2,[]), (T4,[])} + , {} + , {} + } + +It is always known which components import directly from the actual +module (this info is stored in "super_components", which maps module indices to +lists of component numbers) + +First lets say module t4 is checked. At the end of checking this +module "ExplImpInfos" is updated for those components that directly +import from t4: component nr 0 (={t1,t2}): + + { {(TDouble,[14]), (T1,[]), (T2,[]), (T4,[14])} + , {} + , {} + } + +So now we got the information that within component 0 (={t1,t2}) the +symbols TDouble and T4 could be imported from module nr. 14 (=t4). + +After checking module t3: + + { {(TDouble,[14, 13]), (T1,[]), (T2,[]), (T4,[14])} + , {} + , {} + } + +Now we check component {t1,t2}. At the beginning of checking any module +component we add all symbols that are defined (in contrast to declared) +to the ExplImpInfos structure, here ::T1 and ::T2: + + { {(TDouble,[14, 13]), (T1,[11]), (T2,[12]), (T4,[14])} + , {} + , {} + } + +Now we try to solve all explicit imports in module t1: + + from t2 import ::T2, ::TDouble + +This involves a depth first search algorithm (function +"depth_first_search" in module explicitimports). + +At first we search a path from module t1 to the definition of ::T2. + +The search will begin in module t2. We can infer that ::T2 is +indeed defined in t2 from +the ExplImpInfos structure: We simply search the module t2 (nr. 12) in the +entry for that symbol in the current component: + + { {(TDouble,[14, 13]), (T1,[11]), (T2,[12]), (T4,[14])} + ^^^^^^^^^ + +So we found the definition of ::T2. + +The statement in module t1 was: + + from t2 import ::T2, ::TDouble + +So we have to search for ::TDouble now. Beginning in t2 we can skip +the following two imports, because they can't import ::TDouble: + + from t1 import ::T1 + from t4 import ::T4 + +The remaining import statement + + import t3 + +is followed by our depth search algorithm. Again we infer from +our ExplImpInfos structure that ::TDouble is indeed declared in +module t3 (nr 13): + + { {(TDouble,[14, 13]), (T1,[11]), (T2,[12]), (T4,[14])} + ^^^^ + +So we solved that import, too. Always when we have found a path to +a desired symbol we will store that new information in the ExplImpInfos +structure for possible later use. In this case we only add the fact that +::TDouble is delared in module t2: + + { {(TDouble,[14, 13, 12]), (T1,[11]), (T2,[12,11]), (T4,[14])} + +If there would have been a third module in the {t1, t2} component that +would have tried to import ::TDouble from t2, we wouldn't have had +to search the path to the definition again (kinda cache). + +(The presentation of this example ends here.) + +The algorithm distinguishes between two kinds of symbols: top +level symbols and belonging symbols. Top level symbols are types, +functions, classes and instances. Belonging symbols are +those that can occur in brackets in an explicit import +statement: constructors (belonging to a type), fields (dito) +an members (belonging to a class). At first all imports of +top level symbols are solved. Later the belonging symbols are +searched. This happens in a different manner, because for these +symbols we don't assign numbers in the beginning. We don't assign +numbers because the belonging symbols don't have to be explicitly +stated in an import statement. For instance in + + from m import ::T(..) + +we could assign a number to the top level symbol ::T but not to +the belonging constructor symbols: just _after_ solving the type +symbol we know the constructors. We still have to apply a depth +first search for every constructor of ::T, because it could happen +that some of them are _not_ exported by "m". We have to find +out which of them. + +Belonging symbols are identified by two values: the number of the +corresponding top level symbol and the "ds_index" of the belonging +symbol. E.g. with + +:: T = C0 | C1 | C2 + +the "ds_index" for C0 would be 0, and the "ds_index" for C2 would be 2. + +To search for a belonging symbol one needs both numbers. The ExplImpInfos +data structure stores for each symbol in which modules it is declared. But +there are only entries for the top level symbols in each hash table. To +infer from the ExplImpInfos data whether a given belonging symbol is +declared in a given module we use a field of the "DeclarationInfo": The +"di_belonging" field which is a kind of bit vector. E.g. to test whether +the upper belonging symbol "C1" is declared in module nr i we first check +whether the (already known) top level symbol ::T is declared in module i. If +not, then "C1" cannot be declared there either. If yes, we consult the second +bit in the "di_belonging" bit vector. + +Some additional info: + +The depth search algorithm only visits modules within the actual component +and those that are directly imported from the actual component. E.g. with +"m1" importing only from "m2" and "m2" importing only from "m3", while +checking "m1" module "m3" would never be visited. + + +Status +****** + +1) + +One thing that doesn't work: macro members. +E.g. the Ord class in module StdClass defines a "member" <= +which indeed is a macro: + + class Ord a | < a + where + (<=) infix 4 :: !a !a -> Bool | Ord a + (<=) x y :== not (y<x) + +The following will fail: + + from StdClass import class Ord(<=) + "<= does not belong to Ord" + +Do the following instead: + + from StdClass import class Ord, <= + +Also using ".." doesn't import <= : + + from StdClass import class Ord(..) + Start = 1<=2 + "<= undefined" + +BTW: + + from StdClass import Ord + Start = 1<=2 + +makes Clean 1.3.3 crash + +2) + +There is an optimisation that could still be applied. + +Imagine the following situation: + ______________________ + definition module t1 + import m1,m2, .. mn + from t2 import ::T + ______________________ + definition module mi (for all i) + from t2 import ::T + ______________________ + definition module t2 + ::T + import m1,m2, .. mn + ______________________ + + +There are no cycles. First t2 would be checked, then the mi and then t1. +The ExplImpInfos data structure would look like this just before checking t1: + + { {(T,[t2, m0, m1, ..mn])} .. + +But because the explicit import of ::T in module t1 is _not_ via any module +within the component (there is only one: t1 itself) it would be completely +sufficient just to store + + { {(T,[t2])} .. + +When resolving the explicit import we would never search the mi! + +I don't know whether you could gain much with such an optimisation. + + +A better optimisation: + +The ExplImpInfos data structure is massively unique. The elements of +the two dimensional array are unique, too. So one has to select the +array elements with the "replace" mechanism. Unfortunately there is no +"replace" like primitive for two dimensional arrays. I implemented +this one: + + replaceTwoDimArrElt :: !Int !Int !.e !{!*{!.e}} -> (!.e, !{!.{!.e}}) + +But this one allocates a new array with every call. In a (quite constructed) +test case I measured that 4% of allocation was done by allocating arrays. +I guess this was the array in replaceTwoDimArrElt. One can solve this problem +by faking around with casts or abc code. + +An even more better optimisation: + +This optimisation doesn't deal with explicit imports, but with implicit +imports. Consider the following situation: + + ______________________ + definition module t1 + import m1,m2 + ______________________ + definition module m1 + import StdIO + ______________________ + definition module m2 + import StdIO + ______________________ + +When building the t1's symbol table the current algorithm will try to add +StdIO's symbols _twice_ to that symbol table. In the first turn these +symbols will be added indeed, but in the second turn trying to add new +symbols of StdIO will not change the symbol table (because the symbol +table already contains all these symbols). One could invent a mechanism +that keeps track which modules have been added to the symbol table +_as a whole_. In this way superflous work could be saved. + |