/bsp/ - bsp

Someone's Office

Index Catalog Archive Bottom Refresh
+
-
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0 (Temporarily Dead).



8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

Programming Languages Bibliography bsp 10/24/2025 (Fri) 20:03:47 Id: 7fd227 No. 40
This thread is for collecting material on programming languages. Language design, compiler and interpreter design and implementation, runtime design and implementation and type theory are all appropriate for this thread.
Two papers and a presentation on the two major styles of intermediate representation used in compilers: continuation-passing style (CPS) and A-normal form (ANF). For both DL1 and DL2, ANF is my recommendation for implementation. ANF can be implemented in a similar style to an ordinary interpreter, can help explicate resource semantics, tends to only differ from the source program by introducing intermediate variables for intermediate computations and is otherwise intuitive and pleasant. For these reasons, my informal recommendation for both DL1 and DL2 compilers is the use of ANF. The papers and presentation go into the debate between CPS and ANF in more detail. Note that despite the ubiquity of SSA, it rarely enters the picture. The reason for this is simply that SSA is equivalent to ANF via a "block argument transformation" replacing phi functions, but that ANF is simply the more appropriate formalization for the comparison. CPS and ANF are both based directly on the lambda calculus while SSA is the work of work on languages for imperative compilers.
A paper on a loop optimization and SIMD vectorization system for Common Lisp. While working on DL2, I found myself looking into a looping primitive function "run-repeat" to match the resource semantics of imperative compilation targets. As I continue to work on the compiler, I've been thinking about the optimizability of such looping primitives. One of the advantages of "run-repeat" is that the loop state typically inferred in imperative programming languages is explicated by the user, reducing liveness analysis for a loop to free-variables analysis. It made me wonder if the addition of more looping primitives might encourage loop optimizations through syntactic means rather than through data-flow analyses. I was considering writing a user guide and implementation guide section in DL2 about such looping primitives. It was during this time that I came across this paper on the other forum. This is a strong positive result in this direction and on an initial skim the methodology used strongly corresponds to what I had in mind. I might write more about the methodology later.
>>42 The paper describes a function-based SIMD procedure library and a macro-based loop optimization library. First, the SIMD library sb-simd: >The biggest challenge in writing this library was the sheer number of SIMD instructions available on a modern machine. The AVX instruction set alone provides almost 1000 instructions, and more than the number of functions in the CL package! >The solution we came up with is, perhaps unsurprisingly, Lisp macros. We first create a table of metadata for each instruction set, which contains information about all data types, functions, corresponding mnemonics, mathematical properties, and so forth. Then we use the data in these tables to generate all the types, declarations, compiler transformations, machine code emitters, and functions, that are specified in each table. >All of the tables that are used to generate the functions and data structures in sb-simd can also be queried at run time. This makes it possible to look up the supported instruction sets, the functions exported by each instruction set, the arguments and return types of each function and much more. Our loop optimization library Loopus uses this metadata to perform automatic vectorization. >The central data type in sb-simd is the SIMD pack. A SIMD pack is very similar to a specialized vector, except that its length must be a particular power of two that depends on its element type and the underlying hardware. Next, the loop optimization library loopus: >Even though the interface provided by sb-simd is relatively convenient — at least when comparing it to similar libraries in other programming languages — there are certain repetitive patterns when writing vectorized code that almost beg for another layer of abstraction via macros. The most frequent repetitive pattern is that of using two loops to process a range of data: One with a step size that is the vectorization width, and one with a step size of one for handling the remainder. >After writing a variety of prototypes, we decided to create a portable loop optimization library for Common Lisp that can be used via macros. The library is invoked by using the loopus:for macro for looping over a range of integers. >Only a small subset of Common Lisp is allowed in the body of a loopus:for macro: functions, macros, and the special operators let, let*, locally, and progn. For now, this subset strikes the right balance between expressiveness and ease of optimization, but we may add support for further special operators in the future. The good news is that once a programmer obeys these restrictions, the entire loop nest and all expressions therein are subject to the following optimizations: >>Rewriting of multi-dimensional array references to references using only a single row-major index. >>Symbolic optimization of polynomials, and especially of the expressions for calculating array indices. >>Hoisting of loop invariant code. >>Automatic SIMD vectorization. >One may wonder why we use macros and didn’t just add our loop optimizations to SBCL directly. The reason is that implementing loop optimizations for the entire Common Lisp language is a daunting task. The many possible interactions of language features would force us to be conservative in terms of optimization, or spend much more time on the development that we can currently spare. >We have narrowed the performance gap between Common Lisp and C for number crunching to something between 6% and 33%, depending on the target hardware. >The loop optimization library Loopus is still in its infancy, but already powerful enough to vectorize inner loops written in a certain subset of Common Lisp. This makes it possible to harness SIMD instructions for a wide variety of loops using minimal effort. We hope that this paper will attract further contributors, and that Loopus will one day reach feature parity with the loop optimization machinery in GCC and Clang. >We are confident that in many cases, such domain-specific optimizations can outperform the general-purpose work that is normally done by a compiler. We are looking forward to working on such optimizing Lisp macros in the future. Consider some loops: [defopaque-dl2 run-repeat [-> [[:: st type] [:: r type] [-> [st] [repeat-state st r]] [repeat-state st r]] r]] [defopaque-dl2 range-accum [-> [[:: a type] [-> [word a] a] a word] a]] [defopaque-dl2 make-vector [-> [[:: a type] [-> [word] a] word] [vector a]]] The run-repeat primitive is the generic looping primitive. The range-accum primitive is the functional equivalent of Loopus's for loop. The make-vector primitive constructs a vector by calling a generation function for each index. Suppose that range-accum and make-vector tended to be provided as primitives by DL2 backends. Based on the paper, I imagine that wherever the backend supports such optimizations simply choosing reasonable types when using range-accum would yield at least equivalent results without effort and possibly without awareness from the user. I would also expect that a more aggressive analysis of the function argument to run-repeat could determine relationships between iterations of a loop and expose "hidden" opportunities for vectorization that don't fit within the range-accum paradigm. Going further, Loopus is actually at a severe disadvantage compared to DL2. DL2 is explicitly runtime-agnostic, meaning there's no concern about having to run such looping primitives with incompatible features. DL2 does not require mutation, and CoW vectors are likely to become a "runtime standard" for practiced DL2. In a DL2 context, the problem can be approached from a "whole-program" perspective instead of Lisp hacks. Each implementation can use such loops as hints to guide backend-specific optimizations while being responsible for interactions with runtime features. Alternatively, a pre-packaged "standard" transformation could potentially do such analyses in a "standard" backend-agnostic way, which different backends may or may not include for their own compilers. As a last resort, a DL2 front-end might do the vectorization itself and only pass pre-vectorized code to its backend, allowing users to vectorize code without direct tool-chain support. Based on this, I think I should add the sections to the user guide and the implementation guide.
An old paper detailing various compiler optimizations. For the most part the optimizations it describes are the standard imperative compiler optimizations still in use today.
A paper on the theory of monomorphization. I'd been thinking of trying to figure out a "destructor monomorphization" for a while, but it seems the authors of this paper have already done something similar to what I was thinking. Notable is that their method does not support polymorphic recursion. This isn't surprising: practical polymorphic recursion can yield unbounded encodings. This is actually a very encouraging result for DL2. Ideally, DL2 can take this result, add box polymorphism and get a very rich form of mono-box polymorphism. My biggest problem is that I'm not sure this really is the best approach. The paper focuses on a constraint system. I have to wonder if it would be better to use a more type-directed approach, where each polymorphic variable is directly annotated based on its uses. The idea I have in mind is to traverse the AST with an environment, returning an annotated AST and environment. It's hard to say this strategy will work without implementing my idea in full, and it might turn out to be equivalent in any case. I am very pleased to hear that it is in fact possible. One of my primary worries about DL2 was that type-classes and other similar features would be complicated by the absence of an appropriate monomorphization mechanism. This paper shows that it's possible to use the natural encoding. Having a standard monomorphization transformation would make it easy to write code without worrying about these issues. Hacks to escape the issues of type-class instantiation would be unnecessary, and with the exception of unbounded polymorphic recursion it would be possible to use full DL2 without worry.
(1.30 MB SML-history.pdf)

(171.21 KB 1200x643 haskell-dead-end.jpeg)

https://smlfamily.github.io/history/SML-history.pdf >This paper focuses on the history of Standard ML, which plays a central role in this family of languages, as it was the first to include the complete set of features that we now associate with the name “ML” (i.e., polymorphic type inference, datatypes with pattern matching, modules, exceptions, and mutable state). >Standard ML, and the ML family of languages, have had enormous influence on the world of programming language design and theory. ML is the foremost exemplar of a functional programming language with strict evaluation (call-by-value) and static typing. The use of parametric polymorphism in its type system, together with the automatic inference of such types, has influenced a wide variety of modern languages (where polymorphism is often referred to as generics). It has popularized the idea of datatypes with associated case analysis by pattern matching. >Standard ML also set a precedent by being a language whose design included a formal definition with an associated metatheory of mathematical proofs (such as soundness of the type system). A formal definition was one of the explicit goals from the beginning of the project. While some previous languages had rigorous definitions, these definitions were not integral to the design process, and the formal part was limited to the language syntax and possibly dynamic semantics or static semantics, but not both. >More particularly, it is worth understanding how Standard ML pioneered and integrated its most characteristic features. >Static typing with type inference and polymorphic types, now commonly known as the Hindley-Milner type system. >Datatypes with the corresponding use of pattern matching for case analysis and destructuring over values of those types. >A module sub-language that itself is a functional language with a substantial extension of the basic type system of the core language. >This original ML was an embedded language within the interactive theorem proving system LCF, serving as a logically-secure scripting language. >Standard ML popularized a number of features that functional-language programmers take for granted today. The most important of these being datatypes with pattern matching, Hindley-Milner type inference, and parametric polymorphic (or “generic”) types. >What makes Standard ML (SML) an interesting and important programming language? For one thing, it is an exemplar of a strict, statically typed functional language. In this role, it has had a substantial influence on the design of many modern programming languages, including other statically-typed functional languages (e.g., OCaml, F#, Haskell, and Scala). >One of the most significant impacts of SML has been the exploration of a wide and diverse range of implementation techniques in SML compilers. >While the Standard ML language has not changed over the past 20+ years, there continue to be at least five actively supported implementations with an active user community. >Standard ML occupies a “sweet spot” in the design space; it is flexible and expressive while not being overly complicated — in contrast with many functional languages (e.g., Haskell, OCaml, and Scala) that have very rich, but complicated, type systems, and so many features that most programmers use only a subset of the language. >In both teaching and research, its stability allows its users to focus attention on their own work and avoid the distractions that arise when a language is defined by an ever-changing implementation. Standard ML is similar to DL2. Standard ML is a statically-typed impure functional language featuring Hindley-Milner type inference and let-polymorphism. What is often called the "Hindley-Milner type system" is roughly the type system of SML and Haskell. Arguably, it is a variant of the System F-Omega, just as DL2 is. However, where DL2 defines itself in terms of the full System F-Omega and describes restricted variants with different implementation properties, SML situates itself based on the limitations of the time. Monomorphization, let-polymorphism and HM type inference are most comfortably done with the restriction of "prenex polymorphism" that defines HM type systems. This practical straightforwardness in implementation comes from the fact that prenex polymorphism can be freely treated as STLC with type variables. Type inference is completely automatic as the remaining type variables after HM type inference simply become prenex variables. Monomorphization in this context is just the replication of functions by substituting all type variables. The transformation is intuitive to programmers if they even realize it's happening. In some languages monomorphization is given a unique name and syntax to emphasize the mechanical process of monomorphization. Let-polymorphism is a slight generalization of both that uses a temporary scope for the bound expression's internal type variables. This leads to a most general type for the expression, which can then be monomorphized for each appearance in the body of a let. Besides the existence and use of type operators, SML has higher-order features through its module and functor system, which might be seen as a restricted form of System F-Omega which enables a more explicit mechanism for code generation in a similar style to monomorphization. Functors correspond to functions from existential types to existential types. Another perspective on Standard ML is here: https://lawrencecpaulson.github.io/2022/10/05/Standard_ML.html >Dave MacQueen (with Robert Harper and John Reppy) has written a 100-page History of Standard ML. Here I offer my brief, personal and utterly subjective impressions of its early days, up to the deeply unfortunate schism that created what was then called Caml. Standard ML was a tragic missed opportunity. But even today there are still several viable implementations: Poly/ML, SML/NJ, Moscow ML and MLton. They are substantially compatible because the language was defined by an operational semantics. It’s sad that well into the 21st Century, Computer Science has so regressed that people no longer see the point of distinguishing between a programming language and its implementation. >The tragedy of the ML schism was that there were no disagreements about the abstract syntax, semantics, applications or general direction of ML. It was all about concrete syntax. Gérard preferred to stay close to the ISWIM syntax (“even though Landin was English!”), while Dave wanted a logical and robust syntax. Another smaller critique of SML, which also led me to finding a funny Haskell-related image: https://blog.plover.com/prog/haskell/sml-defects.html
This is a paper with some similarities to >>45. This paper details Morphic's approach to eliminating higher-order functions from a program by determining a set of closures that might be passed to each higher-order function, re-constructing a type for each set representing the particular selection, then using it to branch to one of several specialized functions. >Restricted versions of specializing defunctionalization, such as C++ and Rust’s aggressive monomorphizing strategy and Julia JIT-enhanced approach, are already in widesrpead use [Bezanson et al. 2017; Hovgaard et al. 2018; Klabnik and Nichols 2019, Ch. 10; Paszke et al. 2021; Stroustrup 2013, Ch. 26]. However, all existing specializing techniques suffer from limitations of expressiveness—either not supporing truly first-class functions (as in [Hovgaard et al. 2018; Paszke et al. 2021]), or forcing the programmer to fall back to traditional, slow virtually-dispatched closures in the general case (as in C++ and Rust [Klabnik and Nichols 2019, Ch. 13; Stroustrup 2013, Ch. 11]). Monomorphization can be seen as a form of specialization-by-substitution. The type arguments are substituted and the function called with a matching ABI. Defunctionalization in general and this paper in particular argues that this isn't enough: it would be better to inline both type and function arguments. >We designed our benchmarks to make significant use of combinator-based library abstractions, as we believe such abstractions represent a common and important source of higher-order control flow in many functional programs. To this end, we wrote Morphic code to define a lazy iterator library, a state monad library, and a parser combinator library, all making heavy use of higher-order functions in both their implementations and their public APIs. The iterator library is used in the benchmarked sections of PrimesIter, PrimesSieve, Quicksort and Unify, the state monad library is used in Unify, and the parser combinator library is used in Calc and ParseJSON. The benchmarks seem somewhat artificial. They designed their benchmarks to use higher-order functions wherever possible. The worst is that they were quick to use monadic programs in their benchmarks. Monadic programs strongly favor inlining and defunctionalization as they involve "continuation" functions that are used by small, simple "bind" wrappers which have simple enough behavior to justify inlining. Further compilation is then likely to be extremely favorable to this style of defunctionalization. In this context, the results don't appear to reflect an overwhelming success. It's expected that MLton's monovariant defunctionalization will perform poorly on such benchmarks. OCaml does very well on the iterator-based programs, indicating that its ordinary compilation procedures work well for this style of program. The paper is probably picking its fights. If Haskell's ordinary compilation is sufficient to get the same benefits, then the paper might have to argue as if Haskell is somehow "cheating" its results. Admittedly, it's not a bad argument, Haskell's compiler is very aggressive and allows for such oddities as the MTL. The idiomatic Haskell code would use the list monad instead of iterators as well as the state monad on one of its benchmarks, and Haskell is all but guaranteed to do very well on those four of the six benchmarks. At that point, it would be a question of how well they handle parser combinators. Haskell would likely be competitive in that context. The fact that OCaml only chokes on the parser combinator programs. >Across all compilers, we additionally remark that our largest and most realistic benchmark program, ParseJSON, is also the program in which we see the largest performance gains due to LSS. We take this as evidence in support of the hypothesis that the benefits of specialization become more significant with increasing scale. I would disagree strongly here. I think that what's proven with ParseJSON is a matter of the insufficiency of contemporary compilers in the face of higher-order constructs. I think this paper needs strong follow-ups to evaluate its effectiveness alongside inlining for contrast. The paper also needs follow-ups in establishing worst-case behavior or the absence of worst-case behavior. While I'm skeptical of the results, I think the results are encouraging either way for DL2. Currently, the DL2 documentation argues in favor of a systematic and synergistic approach to compilation. If Morphic's approach is appropriate for general use, then working well with even the simplest DL2 compilers would be strong evidence towards the development of such systematic approaches. If this is a fluke of a result, then DL2's argument can similarly work the other way in favor of the development of more sophisticated systematic approaches to inlining.


Forms
Delete
Report
Quick Reply