/bsp/ - STGL Thread

Name
Options
Subject
Message	Max message length: 12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 32.00 MB Total max file size: 50.00 MB Max files: 5 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

STGL Thread bsp Board owner 08/25/2025 (Mon) 22:18:52 Id: d0c07b No. 23

This is a thread about STGL and its currently-existing prototype, PGL. STGL is the simply-typed graphics language, and will be a rough equivalent to OpenGL 3.3/WebGL 2 based on the simply-typed lambda calculus. STGL will simplify graphics programming by providing a type system directly to the programmer to immediately detect and prevent type errors. STGL will also generalize immediate processing, batch processing, display lists and command buffers by allowing render programs to be written as functions acting at any granularity and executed with any arguments. These functionalities already exist in a weaker form in the prototype PGL. This thread will likely be updated eventually with more/better images, once they exist.

STGL Needs Comments bsp Board owner 08/25/2025 (Mon) 22:44:23 Id: d0c07b No. 24

I would appreciate comments on STGL. Which operations should it have? Which types should it have? Are there pitfalls I should be aware of? I already have a decent idea of what it should look like via my work on the prototype PGL. I'd like to make sure that STGL is a mostly-complete representation of OpenGL 3.3 Core/WebGL 2, maybe without some known-bad misfeatures. I'm only recently a graphics programmer, but I'd like it to be capable of decently advanced graphics. From a user perspective, it would also be nice to collect some techniques that can be distilled into code. In particular, I would really like to see advanced capabilities such as SDF-based alpha, impostors and weighted-blended order-independent transparency at some point. SDF-based alpha and impostors are likely possible in PGL already, but WB-OIT is almost certainly not. Additionally, are there any features that users want to have, either in the STGL core or as a "standard macro" provided by front-ends to STGL?

Early Direct3D and Execute Buffers bsp Board owner 12/04/2025 (Thu) 10:03:59 Id: d0c07b No. 48

While working on PGL documentation, I was looking into early Direct3D and found these: https://rmitz.org/carmack.on.opengl.html >The overriding reason why GL is so much better than D3D has to do with ease of use. GL is easy to use and fun to experiment with. D3D is not (ahem). You can make sample GL programs with a single page of code. I think D3D has managed to make the worst possible interface choice at every oportunity. COM. Expandable structs passed to functions. Execute buffers. Some of these choices were made so that the API would be able to gracefully expand in the future, but who cares about having an API that can grow if you have forced it to be painful to use now and forever after? Many things that are a single line of GL code require half a page of D3D code to allocate a structure, set a size, fill something in, call a COM routine, then extract the result. >GL's interface is procedural: You perform operations by calling gl functions to pass vertex data and specify primitives. >D3D's interface is by execute buffers: You build a structure containing vertex data and commands, and pass the entire thing with a single call. On the surface, this apears to be an efficiency improvement for D3D, because it gets rid of a lot of procedure call overhead. In reality, it is a gigantic pain-in-the-ass. >You wouldn't actually make an execute buffer with a single triangle in it, or your performance would be dreadfull. The idea is to build up a large batch of commands so that you pass lots of work to D3D with a single procedure call. >A problem with that is that the optimal definition of "large" and "lots" varies depending on what hardware you are using, but instead of leaving that up to the driver, the application programmer has to know what is best for every hardware situation. >You can cover some of the messy work with macros, but that brings its own set of problems. The only way I can see to make D3D generally usable is to create your own procedural interface that buffers commands up into one or more execute buffers and flushes when needed. But why bother, when there is this other nifty procedural API already there... >With OpenGL, you can get something working with simple, straightforward code, then if it is warranted, you can convert to display lists or vertex arrays for max performance (although the difference usually isn't that large). This is the right way of doing things -- like converting your crucial functions to assembly language after doing all your development in C. >With D3D, you have to do everything the painful way from the beginning. Like writing a complete program in assembly language, taking many times longer, missing chances for algorithmic improvements, etc. And then finding out it doesn't even go faster. https://narkive.com/rfCLBCDU.1 >[John Carmack himself cursed DX for this feature that OpenGL was easier to use.] >Now vertex buffers stored in video memory are transformed, lit, and rasterized by the GPU, using shader programs previously uploaded to the video card, and the old execute buffer architecture which combined transform/lighting/rendering instructions with vertex data just don't fit the bill. >The concept of batching commands to the driver is by no means gone, however. Behind the scenes, the calls that you make on the D3D device are cached to a "command buffer" prior to being sent to the driver. This helps to lower the number of switches between user and kernel mode to transfer instructions to the driver, which is the real cost behind rendering calls, far greater than the overhead of the API calls that issue these instructions. Also, you can use state blocks to encapsulate a set of state and shader constant changes, and re-use these state blocks. Execute buffers seem at first glance to foreshadow the later development of OpenGL 4 and Vulkan. The issue with execute buffers is the same as with display lists in that it accelerates the wrong calls and does so in the wrong way. The real advancement of that time was towards shaders for the beginning of programmable rendering. For OpenGL before OpenGL 4 this almost always meant a vertex shader to transform triangle vertices and a fragment shader to determine per-pixel colors. These were applied per-mesh rather than per-polygon. Display lists and execute buffers examples corresponded more closely to manual triangle rendering, taking advantage of the OpenGL matrix stack. Looking backwards, display lists seem to come from an earlier era with strange hardware. Many operations were implemented back then as physical co-processor programs which could have built-in logic so that a sequential program was appropriate. It's plausible that Microsoft considered this, and thought the thing to do was to optimize for a fixed sequential program, which wasn't unreasonable for GPUs of the time. By offering a stricter API than OpenGL, there was the possibility that they would one-up OpenGL on efficiency. Rather than foreshadowing the later development, I think display lists and execution buffers quickly became a vestigial element. I think the failure of execute buffers and the reduction of display lists to a vestigial element and their eventual elimination is what actually foreshadows later developments. When you CAN get to one mesh per call, vertex-based display lists are all but pointless. At that point, it's easy to saturate the GPU with work. When GPUs did advance so that you really could use more, then you could in principle make a display list and encode things there, but a better idea is to encode the new logic in already-programmable shaders. After all, if you write a "display list" in a texture array, all you need to do is increment the index into that array. At that point, you've effectively implemented instanced rendering, so why not just make that part of the graphics API and use that forever? Even though all I have now is the slow prototype C library PGL built on OpenGL 3, the primary argument that I hope STGL can make long-term is that instead of stopping at instanced rendering one should compile as much of the render program as possible. Instanced rendering is the acceleration of a single for loop, but STGL is the acceleration of any amount of rendering even up to the whole rendering pipeline. There is a more immediate benefit: the validation infrastructure of Vulkan is replaced with a simple type-check at the beginning and it should be possible in principle to accelerate with either OpenGL 3 or with any version of Vulkan. All that said, Vulkan, WGSL, SPIR-V, etc. have some warts that suggest to me that STGL is the natural development. There seems to be a decent amount of esoterica and the user is expected to litter the program with annotations. If compiling the entire pipeline, these suddenly seem much less relevant. I'm not a graphics programmer, but should I really need to provide workgroup sizes and location annotations to generate a texture? https://webgpufundamentals.org/webgpu/lessons/webgpu-compute-shaders.html >Unfortunately, the perfect size is GPU dependent and WebGPU can not provide that info. The general advice for WebGPU is to choose a workgroup size of 64 unless you have some specific reason to choose another size. Apparently most GPUs can efficiently run 64 things in lockstep. If you choose a higher number and the GPU can’t do it as a fast path it will chose a slower path. If on the other hand you chose a number below what the GPU can do then you may not get the maximum performance.

Index Catalog Archive Top Reply

Manage Board Moderate Board Moderate Thread

Forms

Delete

Password Unlink (Removes file reference from posts) Delete (Removes file from the server)

Report

Reason Category Global

No Cookies?

Quick Reply


Sage Bypass Check