Shame AI is a no go, because I would have recommended some classical GPU architecture research (wavefront scheduing and the hardware "algorithms" for this - like Tomasulo algorithm equivalent but much simpler), even better if you implement it in Verilog and optimize it enough. There is some work done on this and even some academic open source RTL and simulators, but it's kind of lackluster. They certainly need a lot of work on the compilers too, but that's adjacent.
Also on AI, Tenstorrent had some interesting grid architecture (tiled cores, having to pass data to adjacent nodes), is very scalable but coding it is challenging, I'd maybe recommend researching better ways to code and design such infinitely scalable GPGPU-like architectures?
Baring that, if you want some wonderful autism, one dude has been working on atomic scale mechanical computers designed using VLSI techniques and simulated at the atomic level (molecular dynamic, quasi-empirical, no QM), he's got his hands full with a lot of stuff so he only got the ALU done and some clocking, might be fun to see other research here - the context was Drexlerian nanotech (the hard stuff, that doesn't yet exist in reality, but a way of bootstrapping has been found in the last decade, the race is on!).
Nevermind, I'm a retard, I noticed OP is from 2022, but my reply was already written, now in 2025.