Now that I'm back to working on the DL2->C transpiler, I've had to think more about A-Normal Form (ANF). ANF is a program form in which expressions are divided into statements and atomic expressions. The primary result of ANF conversion is to separate and order statements, yielding sequential "blocks" of statements bound to variables with a terminating statement or atomic expression.
Some discussion and history of the name ANF:
https://web.archive.org/web/20250415115410/https://www.williamjbowman.com/blog/2022/06/30/the-a-means-a/
>Parametric ANF
For DL2 there are functions which should be considered "atomic", and these vary by compilation context. Arithmetic, for example, has specific optimizations that are best done on whole expressions. For this reason, it's best for arithmetic expressions to be collected together as completely as possible for optimization. Similarly, there are functions which should never be duplicated. Inherently expensive functions like make-vector and repeat can be expected to fall in this set. Some are more ambiguous: length and index vector operations are cheap but require access to the relevant vector, and can therefore be considered important for resource semantics, so that re-ordering such expressions is questionable. However, these vectors also have their own specific optimizations.
To deal with this, I suggest that DL2-ANF should be parametrically defined over DL2: the compiler's user should be able to supply a predicate to examine whether a specific DL2 expression is an atomic expression or not. If a DL2 expression is in fact atomic, then ANF will embed it freely in other expressions.
>DL2-ANF as Restricted DL2
Originally, I thought that DL2-ANF should stand separately from DL2, but during implementation I've come to think that DL2-ANF should actually be a form of DL2. Specifically, DL2-ANF should come from allowing "block-form" bodies in lambda expressions.
- Block-form bodies consist of nested "lambda-apply" constructs, where each application's arguments consist of statements, and terminating in either a statement or atomic expression.
- Statements are either a case of some atomic expression to some blocks or a non-atomic application whose arguments consist only of atomic expressions.
- Atomic expressions are variables, constants, kinds, types, choose constructions and a subset of additional expressions specified by the user.
>ANF Types Pain Point
One of the more painful issues in ANF is the practical necessity to provide types for statements. This means that ANF needs to carry and return type information as it operates. Although it is technically possible to avoid this, doing so carries its own troubles.
This pain point of deriving types for intermediate terms has to occur somewhere in many compilation pipelines. For some specific pipelines, such as a C back-end, this type derivation will have to occur at some point in the pipeline. In the particular case of C, there's an argument to be made for pushing this near the end of the compilation pipeline, simply to avoid type substitution and normalization.
In the particular case of ANF, it isn't sufficient to use the regular procedures. The problem is that the easiest way to implement ANF involves temporarily using a variant of DL2 with unique IDs for variables which I call DL2-ID. It is also preferred that type normalization is as fast as possible. My current plan here is to stick with doing everything inside DL2-ANF as described above, take on the pain of deriving the necessary types and implement a "DL2-ID normalizing substitution" procedure. This is also necessary to implement another technique which I will discuss later.
A pleasant side effect of all of this is that the ANF conversion implemented this way also acts as an up-front type in-lining and normalization procedure. This allows accepting a larger subset of DL2 based on type in-lining without a separate pass. My current HM front-end produces code based on naively in-lining prefix-polymorphic functions, so this allows me to conveniently support a "DL2-inline-boxed" variant of the language.
>Closure Conversion as ANF
In DL2, there are 9 term forms:
>var, constant
>constructors lambda, rec-lambda and destructor apply
>constructor choose and destructor case
>constructor iso-wrap and destructor iso-unwrap
Variables are optimized by in-lining, constants can be considered "optimal" in themselves. It's clear how to optimize choose and case in an ANF context.
>Treat choose as an expression so it can get in-lined.
>Case terms can check if they resolve to a known choose expression. If so, then eliminate the case statement by evaluating the resulting branch instead.
The iso-wrap and iso-unwrap forms are the same: iso-wrap is an expression and iso-unwrap can be eliminated if its argument is a direct iso-wrap form.
That leaves lambda and rec-lambda themselves. The problem is that lambda and rec-lambda can't be so comfortably in-lined, since doing so naively duplicates compilation. However, lambda and rec-lambda are core components of DL2 programs. With their prominence in the core language, there is a strong argument that lambda and rec-lambda should be treated as expressions too.
Users will strongly appreciate being able to freely use lambda and rec-lambda for their own higher-order constructs, so it makes sense to provide "guaranteed optimizations" for lambda and rec-lambda so that users can treat them as ordinary expressions. If users are unnecessarily worried about the costs of closure construction and closure calls, they may resort to excessive techniques to avoid the use of lambda, limiting the practiced usefulness of the language. In particular, in-lined control flow for loops, monadic and effect mechanisms would take a major performance hit, whether those mechanisms are user-defined or provided by the runtime. Users especially shouldn't be afraid to write repeat loops just because they use lambda internally, since doing so would take out an important branch of DL2 and render the whole project pointless. There's a strong argument in favor of not just in optimizing the use of lambda but in favor of aggressively in-lining lambda and rec-lambda forms as ordinary expressions, if the code duplication can be avoided. A principled approach could also significantly reduce pressure on heuristic-based function call in-lining, allowing for more consistent performance.
There are two "obvious strategies":
>closure conversion then ANF treating closures as expressions
>ANF, closure conversion, then an additional closure optimization pass
Neither strategy is completely satisfying. Ideally, one would want to in-line all instances of closures, but both strategies fail to in-line known closures which are used by other closures. It would be better if ANF's own in-lining were used for closures. Since ANF is "optimal" in terms of in-lining, inherently in-lining every atomic expression, doing so would yield similarly "optimal" closure use, avoiding closure construction wherever possible.
For these reasons, I suggest that rather than doing closure conversion as a separate pass, closure conversion is instead fused with ANF into a single pass. This will allow exactly the sort of aggressive closure in-lining that I want. In fact, I believe this style of in-lining should be considered the standard for DL2, which is another reason to implement ANF this way. This fused ANF and closure conversion could act as a "standard transformation" which provides an intuitive resource semantics. Closures only need to be actually constructed when their call sites are unknown, alongside any other key forms of in-lining.
With call sites resolved as completely as possible, the fused ANF with closure conversion can be a standard transformation. It acts as a principled standard optimization pass and produces a form which is convenient for back-ends to produce high-quality code resembling the appropriate program in their target language.