Here's my long ass "V4 *so far*" writeup that nobody asked for, I took my sweet time because I watched Dosh talk about trash for 50 mins first in order to better prepare for this and then I got distracted and watched Tenma get fucked by Sephiroth again for hours on end
If it's too long for you and/or you don't like my blogposts and/or tangents don't bitch and just hide the post
>The Good (surprisingly many but not enough to outweigh the negatives)
1) Styles are more accurate (and so are characters but and still need less images than V3 to work (translation: I only cared to test Shoebill EX (yes, there's a Shoebill alter that's pretty neat but is too different and nowhere near as good as the original and it also doesn't really make that much sense?) but she works well enough with just the character and copyright tag and only has around ~30 images, YMMV)
2) The inpainting model works well, they didn't fuck that up. It works REALLY well with the new inpaint focus area thing but that's also available on V3 apparently. Mixed feelings on the not-really-circular brush (also on V3)
3) NSFW concepts work way better than V3 on average
4) Single styles work really well, so far I've only found one that doesn't work anywhere near as well as I hoped but whatever. Your mileage WILL vary
5) 512 tokens are a godsend, it's nice to just export booru tags
6) The regional thing works well, it's annoying to use but at least with two characters (± 1boy pov) it just works™ and it's worth using just to specify the characters' positions, will test 3+ later. The source#target thing is annoying as shit though and I feel like this is VERY hit or miss for anything more complex than looking at/hugging
>The "who the fuck cares?"
1) NLP. Like I've said, it's nice when you need to fill in some gaps that tags can't fill but 90% of the time it's a case of "I wish shit was tagged better at a booru level" and not "I wish this hyper-specific tag existed at all"
2) Text. Again, faster to do in PS and unless I have a massive skill issue there's no way to actually have it blend in on surfaces instead of having it look like it was slapped on in MS Paint. At least it works better on Flux
3) The 16ch VAE is mostly a meme. Eyes and details ARE better on average compared to V3 but we're talking about 4070 SUPER vs 4070 Ti SUPER kind of gains in the best of cases. Very small details like the text on Shishiro Botan's HOLOLIVE strap are slightly less fucked but I expected the 16ch VAE to more or less fix stuff like that
>The Bad
1*) Emphasis does fucking nothing the vast majority of the time, I don't know if this is because DiT is inherently dogshit or if it's because they really did train with emphasized tags on somehow.
>The Ugly
1*) Anatomy (and backgrounds) seems to be COMPLETELY artist-dependent now (instead of mostly) and for some reason hands are just scuffed in general, you don't get extra fingers that often but the hands are just... "drawn" badly.
2) Related to the issues* above but emphasizing size tags (eg huge breasts) does ABSOLUTELY NOTHING IF YOU'RE USING AN ARTIST STYLE. You can test this by simply reusing the same seed. Without using artist tags it actually works to a degree but this brings us to the next issue
3) HOLY FUCKING MOTHER OF ABYSSORANGEMIX1, WHAT THE FUCK IS UP WITH THE DEFAULT STYLE? It's not ALWAYS AOM1 slop but it tends to gravitate towards that, even more so if you use one of the completely broken*
>The Dogshit
1)* medium tags. Traditional media, marker, millipen, etc etc. These are nearly completely broken. Some will actually change the style but not to what the tag says. Oil painting seems to work to a degree, woohoo
2) Mixing, obviously. Finicky, unstable and inconsistent IF you can even get shit to mix to begin with. DiT was a mistake
3) Weird jpeg-like-but-not-quite? artifacts everywhere, sometimes it almost looks like it's trying to imitate the grain effect used by some artists regardless of the artist tag. Some anons also reported incredibly bad frying on top of the artifacting but I haven't seen it yet
4) Shots wider than full body/tachi-e (sorry, "character image") images are an absolute crapshoot, maybe even worse than V3's but I'm not gonna bother checking
5) Some combinations of character tags and framing tags (and/or resolution too, especially square) will occasionally/consistently (it really depends) output REALLY blurry images. This was worse on the preview but it's still frequent enough on Full to genuinely be an issue. It's not even an unprompted depth of field effect (which happens more often than you'd think) either. All framing tags seem to do this to a degree but it's extremely common with "close-up, portrait," even with artist tags
6) The fucking white gens. According to finetune removing the "blank page" images from the dataset wouldn't solve the issue because "it's probably not the culprit and removing them would remove this (blank page in negs) workaround" even though it seems like that's exactly what it is. Either way this is completely fucking unacceptable, imagine if V1 (on their site) actually NaN'd out like the shitmixes did on local. The fact that they went from telling people how to fix it to actually having to intervene makes me think that they don't believe their own "oh it just happens sometimes, not a big deal" bullshit
>The Lovebite
Dancin' in the shadow with a dark smile
I probably forgot about some stuff but overall: see the attached video
Birdschizo out, I'm going back to playing TXR 2025.