This is a paper with some similarities to
>>45 .
This paper details Morphic's approach to eliminating higher-order functions from a program by determining a set of closures that might be passed to each higher-order function, re-constructing a type for each set representing the particular selection, then using it to branch to one of several specialized functions.
>Restricted versions of specializing defunctionalization, such as C++ and Rust’s aggressive monomorphizing strategy and Julia JIT-enhanced approach, are already in widesrpead use [Bezanson et al. 2017; Hovgaard et al. 2018; Klabnik and Nichols 2019, Ch. 10; Paszke et al. 2021; Stroustrup 2013, Ch. 26]. However, all existing specializing techniques suffer from limitations of expressiveness—either not supporing truly first-class functions (as in [Hovgaard et al. 2018; Paszke et al. 2021]), or forcing the programmer to fall back to traditional, slow virtually-dispatched closures in the general case (as in C++ and Rust [Klabnik and Nichols 2019, Ch. 13; Stroustrup 2013, Ch. 11]).
Monomorphization can be seen as a form of specialization-by-substitution. The type arguments are substituted and the function called with a matching ABI. Defunctionalization in general and this paper in particular argues that this isn't enough: it would be better to inline both type and function arguments.
>We designed our benchmarks to make significant use of combinator-based library abstractions, as we believe such abstractions represent a common and important source of higher-order control flow in many functional programs. To this end, we wrote Morphic code to define a lazy iterator library, a state monad library, and a parser combinator library , all making heavy use of higher-order functions in both their implementations and their public APIs. The iterator library is used in the benchmarked sections of PrimesIter, PrimesSieve, Quicksort and Unify, the state monad library is used in Unify, and the parser combinator library is used in Calc and ParseJSON.
The benchmarks seem somewhat artificial. They designed their benchmarks to use higher-order functions wherever possible. The worst is that they were quick to use monadic programs in their benchmarks. Monadic programs strongly favor inlining and defunctionalization as they involve "continuation" functions that are used by small, simple "bind" wrappers which have simple enough behavior to justify inlining. Further compilation is then likely to be extremely favorable to this style of defunctionalization.
In this context, the results don't appear to reflect an overwhelming success. It's expected that MLton's monovariant defunctionalization will perform poorly on such benchmarks. OCaml does very well on the iterator-based programs, indicating that its ordinary compilation procedures work well for this style of program. The paper is probably picking its fights. If Haskell's ordinary compilation is sufficient to get the same benefits, then the paper might have to argue as if Haskell is somehow "cheating" its results. Admittedly, it's not a bad argument, Haskell's compiler is very aggressive and allows for such oddities as the MTL. The idiomatic Haskell code would use the list monad instead of iterators as well as the state monad on one of its benchmarks, and Haskell is all but guaranteed to do very well on those four of the six benchmarks. At that point, it would be a question of how well they handle parser combinators. Haskell would likely be competitive in that context. The fact that OCaml only chokes on the parser combinator programs.
>Across all compilers, we additionally remark that our largest and most realistic benchmark program, ParseJSON, is also the program in which we see the largest performance gains due to LSS. We take this as evidence in support of the hypothesis that the benefits of specialization become more significant with increasing scale.
I would disagree strongly here. I think that what's proven with ParseJSON is a matter of the insufficiency of contemporary compilers in the face of higher-order constructs. I think this paper needs strong follow-ups to evaluate its effectiveness alongside inlining for contrast. The paper also needs follow-ups in establishing worst-case behavior or the absence of worst-case behavior.
While I'm skeptical of the results, I think the results are encouraging either way for DL2. Currently, the DL2 documentation argues in favor of a systematic and synergistic approach to compilation. If Morphic's approach is appropriate for general use, then working well with even the simplest DL2 compilers would be strong evidence towards the development of such systematic approaches. If this is a fluke of a result, then DL2's argument can similarly work the other way in favor of the development of more sophisticated systematic approaches to inlining.