/ais/ - /lmg/ - local models general

Name
Options
Subject
Message	Max message length: 12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 32.00 MB Total max file size: 50.00 MB Max files: 5 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

Anonymous 04/16/2025 (Wed) 11:48:50 No. 6265

>>6258 good luck!

Anonymous 04/16/2025 (Wed) 13:11:26 No. 6266

lots of /lmg/ refugees in https://meta.4chan.gay/tech/67288

Anonymous 04/16/2025 (Wed) 13:53:38 Id: ea3c91 No. 6270

>>6266 I'm curious to see where everyone will consolidate

Anonymous 04/16/2025 (Wed) 13:59:03 Id: deccf3 No. 6272

>>6258 omg it migu

Anonymous 04/16/2025 (Wed) 14:02:06 Id: c6df88 No. 6273

>>6270 I want 4chin back...

Anonymous 04/16/2025 (Wed) 14:05:12 Id: 18991a No. 6274

>>6273 It'll be back eventually and probably worse than ever

Anonymous 04/16/2025 (Wed) 14:07:30 Id: bbd9ff No. 6275

>>6273 4fag mods and jannies are troons, we are the mods and jannies here

Anonymous 04/16/2025 (Wed) 16:57:11 Id: 6f6865 No. 6283

>>6266 >here are your neighbors, bro

Anonymous 04/16/2025 (Wed) 17:00:03 Id: 8f1514 No. 6285

>>6283

Anonymous 04/16/2025 (Wed) 17:22:51 Id: 6f6865 No. 6286

https://huggingface.co/microsoft/bitnet-b1.58-2B-4T https://github.com/microsoft/BitNet In case anyone missed it in the chaos, microsoft actually trained a bitnet model. It's a 1.58b so more of a retard you can carry around in your pocket than anything useful but I suppose it's proof that bitnet isn't a completely abandoned concept.

Anonymous 04/16/2025 (Wed) 17:24:04 Id: 8f1514 No. 6287

>>6286 anons tested it out already, okay for a 2b model https://meta.4chan.gay/tech/67288#p76975

Anonymous 04/16/2025 (Wed) 17:55:34 Id: 6ccf75 No. 6289

>>6287 >Serbia >Solarized theme hi petra

Anonymous 04/16/2025 (Wed) 19:50:17 Id: ac430d No. 6291

>>6258 Is that new 47b Nemotron model roleplayable like the recent 49b one, or is for researchy stuff?

Anonymous 04/16/2025 (Wed) 20:10:04 Id: a416dd No. 6293

>only options are here, dead, or the literal cunny chan what the fuck

Anonymous 04/16/2025 (Wed) 21:48:28 Id: f6aa25 No. 6349

>>6293 What is the cunny chan name?

Anonymous 04/16/2025 (Wed) 23:17:00 Id: cf239c No. 6370

>>6293 at least here we have post ids, but yeah all of the options suck

Anonymous 04/17/2025 (Thu) 01:03:10 Id: 6b713d No. 6412

hello where did Hentai Diffusion go?

Anonymous 04/17/2025 (Thu) 01:53:07 Id: b4f622 No. 6428

>pedophiles all flock to a literal pizza altchan hmmm

Anonymous 04/17/2025 (Thu) 02:56:11 Id: 8f1514 No. 6442

>>6349 https://meta.4chan.gay/tech/67288 use fennec f-droid or any other firefox based browser on mobile if you have issues posting

Anonymous 04/17/2025 (Thu) 03:47:45 Id: 9a4797 No. 6454

>>6428 GO AWAY POO POO NIGGER MORAL FAG FAGGOT THIS IS OUR BOARD NOT YOUR FUCK OFF TO INDIA OR TURKMENISTAN OR WHEREVER YOUR SHITTY UNWIPED BUM WAFTED IN FROM, THIS IS NOT YOUR SHITTING STREET, THIS IS OUR SHITTING STREET, NOT PUBLIC, NOT FOR YOU

Anonymous 04/17/2025 (Thu) 11:03:57 Id: a86696 No. 6508

>>6258 I am home again

Anonymous 04/17/2025 (Thu) 13:41:50 Id: ce97e3 No. 6545

>>6412 /trash/ got their sdg back, but I haven't found something like Hentai Diffusion yet In the meantime your best bet might be civitai?

Anonymous 04/17/2025 (Thu) 14:15:31 Id: ce97e3 No. 6568

>>6287 I got it running now as well. Hope they will continue experimenting with Bitnet

Anonymous 04/17/2025 (Thu) 14:25:15 Id: b001f7 No. 6572

>>6273 No way. Seeing the solo janny in /h/ getting doxxed was funny.

Anonymous 04/17/2025 (Thu) 15:11:29 Id: b1e463 No. 6594

>>6568 >4chan acquired by Y Combinator Fate worse than death.

Anonymous 04/17/2025 (Thu) 16:18:06 Id: c6df88 No. 6615

>>6293 /g/ was always the technololigy board, fag.

Anonymous 04/17/2025 (Thu) 18:21:40 Id: ac430d No. 6645

>>6266 Nice try. I'm not going to any site with ".gay" at the end of the URL.

Anonymous 04/17/2025 (Thu) 18:24:27 Id: 8f1514 No. 6646

uhh.. guys? anyone alive?

Anonymous 04/17/2025 (Thu) 18:24:44 Id: 3baef5 No. 6647

>>6266 Your shit is down

Anonymous 04/17/2025 (Thu) 18:27:16 Id: 8f1514 No. 6648

>>6647 yeah well, if you checked the archive you'd know that ALL /lmg/ refugee locations are regularly posted there

Anonymous 04/17/2025 (Thu) 18:30:50 Id: 8f1514 No. 6649

https://meta.4chan.gay/tech/67288 WE'RE BACK! MASSIVE HAPPENINGS HAPPENING

Anonymous 04/17/2025 (Thu) 18:34:42 Id: a4b366 No. 6650

we got 2 /lmg/ now? I'm liking this better.

Anonymous 04/17/2025 (Thu) 18:49:23 Id: 8f1514 No. 6664

(12.61 KB 849x104 b6b824ef42a41031e8f235ff3ca61f360e47464ab3cac6dbcfe31cfb0e7c17a5.png)

its OVER!

Anonymous 04/17/2025 (Thu) 18:50:36 Id: a4b366 No. 6665

Was just up a second ago.

Anonymous 04/17/2025 (Thu) 18:54:11 Id: 8f1514 No. 6668

ok since 4chan gay is being gay lets talk local models whats up anons

Anonymous 04/17/2025 (Thu) 18:55:01 Id: 1437e8 No. 6669

4chan.gay is gay altchans suck

Anonymous 04/17/2025 (Thu) 19:04:46 Id: 9d452a No. 6677

>>6669 4chan.gay's /lmg/ was better than this ghost town. too bad the 4chan.gay admin is a dipshit who tests in prod

Anonymous 04/17/2025 (Thu) 19:05:38 Id: c6df88 No. 6680

4chan gay is cool, but whoever is managing it is some ADHD zoomed retard. I guess 4chan is as great as it is because the management never is present...

Anonymous 04/17/2025 (Thu) 19:12:53 Id: 8f1514 No. 6701

its back again https://meta.4chan.gay/tech/67288 ...

Anonymous 04/17/2025 (Thu) 19:14:31 Id: a4b366 No. 6706

4chan itself was gay. no vpns, countdown timers. these alt-chans are at least anonymous. I would rather have one take off.

Anonymous 04/17/2025 (Thu) 19:15:26 Id: 8f1514 No. 6707

Anonymous 04/17/2025 (Thu) 19:22:43 Id: 1437e8 No. 6710

>>6706 None of them work without javascript 4chan.gay hosts CP while being behind cloudflare. It's the glowiest honeypot to ever glow

Anonymous 04/17/2025 (Thu) 19:23:20 Id: 8f1514 No. 6712

some more news

Anonymous 04/17/2025 (Thu) 19:25:37 Id: 1437e8 No. 6715

>>6707 >>6712 >Reporter's Name: Hiroyuki Shouldn't he be more concerned about bringing 4chan back up instead of attacking the competition?

Anonymous 04/17/2025 (Thu) 19:25:56 Id: 8f1514 No. 6716

person who reported inside https://unknown.spam/aicg_mail_list

Anonymous 04/17/2025 (Thu) 19:28:20 Id: a4b366 No. 6717

>>6707 >>6710 We're posting about models not CP. I don't give a fuck, may the strongest chan win. Would you rather reddit or discord?

Anonymous 04/17/2025 (Thu) 19:29:02 Id: 8f1514 No. 6718

>>6717 matrix

Anonymous 04/17/2025 (Thu) 19:31:29 Id: a4b366 No. 6720

>>6718 I tried that. It was psychotic leftists.

Anonymous 04/17/2025 (Thu) 19:32:35 Id: 8f1514 No. 6722

>>6720 theres a few based homeservers, although any platform similar to discord will eventually lead to 'cordfaggotry so i'd rather we keep it on literally any chan

Anonymous 04/17/2025 (Thu) 19:33:23 Id: 3e2ab5 No. 6724

There's lainchan too you know, the place seems comfy

Anonymous 04/17/2025 (Thu) 19:34:12 Id: 8f1514 No. 6725

>>6724 extremely cancerous trannie jannies

Anonymous 04/17/2025 (Thu) 19:37:19 Id: 1437e8 No. 6727

>>6725 Considering it's you, I bet they banned you for shitting the place up and you're butthurt All the more reason we should consider lainchan

Anonymous 04/17/2025 (Thu) 19:38:49 Id: 8f1514 No. 6728

>>6727 *stands in your way* your move?

Anonymous 04/17/2025 (Thu) 19:48:01 Id: 3e2ab5 No. 6730

>>6725 It's no better than gay-chan, the mod is watching as we speak

Anonymous 04/17/2025 (Thu) 20:43:54 Id: b33d81 No. 6772

>>6725 >>6727 The fact that lainchan doesn't have any threads for AI suggest they are not very interested in it (or anything too new actually). Also, the ai generals would be far too fast for them. Here is better for now. The gay 4chan is not working for me.

Anonymous 04/17/2025 (Thu) 21:11:16 Id: 32a958 No. 6803

Someone please bake /ldg/ in this board please

Anonymous 04/17/2025 (Thu) 21:31:48 Id: 8f1514 No. 6817

>>6772 https://meta.4chan.gay/tech/67288?last=100#bottom works like this if you're a ramlet or something

Anonymous 04/17/2025 (Thu) 22:31:30 Id: 6f6865 No. 6837

>>6717 >dude just ignore the Democrat activism next door >If you don't like it then you must want to go to reddit or discord instead!

Anonymous 04/17/2025 (Thu) 22:32:12 Id: 8f1514 No. 6838

>>6837 >cunny.. is LE BAD

Anonymous 04/17/2025 (Thu) 22:40:46 Id: 6f6865 No. 6840

https://seed-tars.com/1.5/ https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B VLM from bytedance, focused on computer use. Might be interesting. A lot of other computer use systems have basically been just bolting one of the obese models onto a browser use system. This seems relatively more polished and better for interactions, but I have doubts about its ability to handle more complex tasks.

Anonymous 04/17/2025 (Thu) 23:10:52 Id: ac430d No. 6854

EXL3 with cache quantization when?

Anonymous 04/18/2025 (Fri) 03:09:28 Id: ac845c No. 6921

I want to chat with a chinese LLM and see if its views about china differ from western ones. Which one should I check first? I can run up to 32B. GLM? qwen? qwq?

Anonymous 04/18/2025 (Fri) 03:18:52 Id: 5bc7d8 No. 6927

>>6921 see https://meta.4chan.gay/tech/67288?last=100#p106370

Anonymous 04/18/2025 (Fri) 04:13:03 Id: 6f6865 No. 6976

>>6921 Yeah if you want chinese models try qwen's stuff, GLM, deepseek's if you can get it running. btw If you're just doing quick evaluations then you might have a better time just trying them out on openrouter rather than downloading every single one.

Anonymous 04/18/2025 (Fri) 05:29:22 Id: ea3c91 No. 7019

>>6921 qwq is the quintessential local Chinese model atm.

Anonymous 04/18/2025 (Fri) 10:42:31 Id: a4b366 No. 7087

>>6854 When turboderp gets time off his dayjob and finishes railing his anime girls.

Anonymous 04/18/2025 (Fri) 11:47:34 Id: 0e20ba No. 7095

Bros! You're back!

Anonymous 04/18/2025 (Fri) 16:05:16 Id: ac845c No. 7190

>>6976 GLM still has an open PR in llama.cpp for some problem, I will wait. I see that qwen has official gguf quants in hf. I will test 2.5 and qwq. I prefer to use 100% local, especially if I want to test the "limits" of a model.

Anonymous 04/18/2025 (Fri) 17:05:09 Id: 234bb1 No. 7197

>8chan has miku theme We're so back it's unreal.

Anonymous 04/18/2025 (Fri) 17:13:07 Id: c6df88 No. 7200

>.moe is literally dead >4chan gay is figuratively dead >desuarchive was never actually alive It's unironically over

Anonymous 04/18/2025 (Fri) 18:01:32 Id: ac430d No. 7207

>>7087 Shit... that could be a while.

Anonymous 04/18/2025 (Fri) 18:34:15 Id: cf8e69 No. 7209

If 4chan doesn't come back, the canonical /lmg/ is going to be wherever the thread recap bot operator and/or CUDA dev show up. This place looks ok so far, so maybe there's hope!

Anonymous 04/18/2025 (Fri) 18:48:11 Id: c6df88 No. 7211

>>7209 Recap Anon is here and in 4chan gay, so it's actually up to whichever place has more anons. I wonder about CUDA anon... I will try to send him a email.

Anonymous 04/18/2025 (Fri) 19:38:45 Id: ac430d No. 7221

>>7200 It's not over, fren. The first reaction of most people was to wait it out, expecting 4chan to come back online in short order. With every day that passes, more and more of those people are starting to look for alternatives. They'll find us.

Anonymous 04/18/2025 (Fri) 20:20:08 Id: 504b91 No. 7235

I've come here to complain that even though jetbrains recently added support for local models in their ai shit it's still worse than zed's.

Anonymous 04/18/2025 (Fri) 20:23:54 Id: 5bc7d8 No. 7237

>>7235 >jetbrains >zed This feels aliencoded.

Anonymous 04/18/2025 (Fri) 20:40:12 Id: 0dbf08 No. 7249

>>7235 local llm aren't for real work

Anonymous 04/18/2025 (Fri) 20:42:35 Id: 504b91 No. 7250

>>7249 qwhen? 3 will make local LLMs viable for real work.

Anonymous 04/18/2025 (Fri) 21:10:40 Id: 5bc7d8 No. 7264

>>7235 >>7237 >>7249 Petra, stop doing this

Anonymous 04/18/2025 (Fri) 21:43:12 Id: 5bc7d8 No. 7275

Am I retarded? Why does this guy recommend 512x512 for wan when it's not in the recommended resolutions? https://comfyanonymous.github.io/ComfyUI_examples/wan/

Anonymous 04/18/2025 (Fri) 21:43:29 Id: 5bc7d8 No. 7276

>>7275 because he's a fucking retard

Anonymous 04/19/2025 (Sat) 00:01:53 Id: cf8e69 No. 7324

>>7249 They cover a good chunk of it if you care enough about the ideology behind running local. The simple boilerplate, small changes, relatively simple bugfixes, can be handled just as well by current 70Bs as they can by e.g. Gemini Flash. (for me, deepcogito 70B and before that, Athene) I just really don't like the idea of individuals completely losing the ability to do their own computer stuff on their own hardware. So yeah I won't be so ridiculous as to never use the cloud stuff, when it really calls for it, but when I'm using local models it makes me feel like "you will own nothing and be happy" hasn't progressed quite so far.

Anonymous 04/19/2025 (Sat) 03:47:26 Id: 5bc7d8 No. 7349

>>7324 deepseek v3/r1 is also local.

Anonymous 04/19/2025 (Sat) 03:49:27 Id: e80c88 No. 7350

MAIDS save local models https://huggingface.co/microsoft/MAI-DS-R1

Anonymous 04/19/2025 (Sat) 04:09:23 Id: ac430d No. 7353

>>7350 >MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team to improve its responsiveness on blocked topics and its risk profile >MAI-DS-R1 has successfully unblocked the majority of previously blocked queries from the original R1 model Microsoft uncensoring models? I somehow doubt it. If Microshit got their claws on it, then they may have unblocked it's ability to tell you about Tiananmen Square, but at the cost of losing the ability to tell you what a woman is.

Anonymous 04/19/2025 (Sat) 04:23:37 Id: 5bc7d8 No. 7354

(20.06 KB 654x152 e96ae62a2ce1f01ba13213ed2e8e16c34f70e055f2ab2d19790a6b6811646e2a.png)

>>7350 no

Anonymous 04/19/2025 (Sat) 04:42:33 Id: e80c88 No. 7356

>>7353 I care less about that aspect than the slight hope that the finetune lessened R1's chaotic adhd tendencies as a side effect. It's cope but tunes by big corpos like this are likely the only real ones we're going to see for Deepseek considering the size of these models. I just wish there were quants for it.

Anonymous 04/19/2025 (Sat) 04:49:32 Id: 911ffc No. 7357

>>7353 It's a double edged sword. They made it so you can ask about Tiananmen on the model but in return, they trained it on the same safety mix as Tulu so it went full safety from a Chinese point of view to a Western point of view. It is marginally better for real tasks like code generation due to the better data that Microsoft added but I would hardly say that was worth it. But Microsoft used those compute resources, not us and it's for enterprises so makes sense.

Anonymous 04/19/2025 (Sat) 04:52:18 Id: 661688 No. 7358

>discount /lmg/ hours >and discussing a fucking fine-tune that nobody should give a shit about What a fucking retarded discussion. Put this general out of its misery.

Anonymous 04/19/2025 (Sat) 05:05:30 Id: 3136b3 No. 7363

>>7358 >having a mental breakdown over people discussing one of the few finetunes for one of the best local models we have Is being poor that hard on you?

Anonymous 04/19/2025 (Sat) 05:16:55 Id: 661688 No. 7364

>>7363 >one of the few finetunes It's the exact same thing that Perplexity already did, the only thing all those companies care about is swapping Chinese propaganda with an American one. And then there will be /r/LocalLLaMA-level retards that will shill the model like if it became "uncensored". It's all those American companies the ones adding censorship we care about in in the first place. Fuck you for posting it here.

Anonymous 04/19/2025 (Sat) 06:18:44 Id: 234bb1 No. 7379

>>7364 Let people chose the propaganda they want dude.

Anonymous 04/19/2025 (Sat) 11:44:48 Id: a4b366 No. 7453

>>7379 Not gonna get excited for western cucked models. Even if they benchmaxx a little higher.

Anonymous 04/19/2025 (Sat) 12:26:43 Id: 91dfd1 No. 7465

Does the sillytavern image generation function not work with REFORGE? does it have to be the old A11111 SD1.5 UI? I upgraded to reforge ages ago and it cannot seem to find the connection to my reforge when I'm running it

Anonymous 04/19/2025 (Sat) 12:38:08 Id: f6de6d No. 7467

>>7465 I've had good success with ComfyUI that's what everyone seems to be using for everything imagegen these days...

Anonymous 04/19/2025 (Sat) 12:40:49 Id: a4b366 No. 7468

>>7465 It worked on the old re-forge made by pancho. I dunno about the new one. After he stopped updating, I moved to comfy.

Anonymous 04/19/2025 (Sat) 12:43:10 Id: 91dfd1 No. 7469

>>7467 >>7468 I have never used comfyui for anything. How do you launch it so sillytavern picks it up? or better yet is there a guide for sillytavern image genning with comfyui? I just want to be able to have images be genned based on the situation mid-RP

Anonymous 04/19/2025 (Sat) 12:47:02 Id: a4b366 No. 7470

>>7469 You start it with the API active, make a workflow and then put that WF with stuff like prompt replaced via placeholders inside silly. Not as plug and play like A1111 was but lets you do a whole lot more.

Anonymous 04/19/2025 (Sat) 13:25:16 Id: 8af965 No. 7489

Any news about Qwen3? I missed the last couple of days because of the whole 4chan thing.

Anonymous 04/19/2025 (Sat) 16:19:42 Id: 3a58c9 No. 7535

Well according to the system message on 4gay they're getting shut down. So I guess this is the official /lmg/ now.

Anonymous 04/19/2025 (Sat) 16:40:39 Id: 5bc7d8 No. 7543

>>7489 qwen3 miku oo ee oo

Anonymous 04/19/2025 (Sat) 16:48:28 Id: 3a58c9 No. 7547

RIP. Perception-LM-8B ooms on a 3090. Useless model.

Anonymous 04/19/2025 (Sat) 18:04:53 Id: 20a050 No. 7593

https://8chan.se/bot/ Our own board.

Anonymous 04/19/2025 (Sat) 18:14:49 Id: 8af965 No. 7601

>>7593 We made a measly 100 posts in 4 days. Why would you want to splinter off now?

Anonymous 04/19/2025 (Sat) 18:29:22 Id: 5bc7d8 No. 7613

>>7593 no thanks

Anonymous 04/19/2025 (Sat) 18:53:10 Id: 87adbc No. 7668

>>7364 >>7364 There's a good reason for them to do this finetune that has nothing to do with us using it as R1 was essentially mostly uncucked for most purposes anyone here would care about. Retarded politicians in Washington want to ban open weights model R1 because it was made in China and keep grasping at straws for some reason to ban it (not that there's many), but since this is MIT licensed, Microsoft is probably doing some legal trolling where they would finetune it and show some use and thus could defend it in court if the boomers do end up attempting to ban it. Obviously such a law would be unenforceable and they would be shooting themselves in the foot, and code is speech and all that, but Microsoft having their own variant would probably count as a good start for a defense.

Anonymous 04/19/2025 (Sat) 18:58:54 Id: ac845c No. 7679

>>7668 Also, isn't it R1 the strongest model you can run locally right now? This could be useful for companies with pockets deep enough to run R1, but in need of a model aligned to western sensibilities.

Anonymous 04/19/2025 (Sat) 18:59:33 Id: 5bc7d8 No. 7681

>>7679 yes that is the only usecase

Anonymous 04/19/2025 (Sat) 19:05:17 Id: 87adbc No. 7689

>>7679 There was already one such finetune that came out weeks after R1 came out. Mostly though R1 isn't even that heavy on the refusals on the one thing that they tuned it against (CCP stuff), a simple prefill will avoid most issues as usual. And yes, it's close to the best open weights model currently.

Anonymous 04/19/2025 (Sat) 19:07:37 Id: 5bc7d8 No. 7695

>>7689 close?

Anonymous 04/19/2025 (Sat) 19:12:06 Id: 87adbc No. 7703

>>7695 For example a reasoning finetune of 405B can reach similar performance to R1, Nvidia did one recently. It also depends on your usecase, sometimes you may be fine with a dumber model that uses less VRAM. Also, the first DS3 on which R1 was based on had serious repetition issues (somewhat solved in 3.1), which some smaller models (such as mistral large) lacked.

Anonymous 04/19/2025 (Sat) 19:16:39 Id: 5bc7d8 No. 7706

>>7703 ugh fine, but its so safety cucked . . .

Anonymous 04/19/2025 (Sat) 19:18:54 Id: 87adbc No. 7707

>>7706 I'd just use R1, but I guess it's not uncommon for models to need some finetune after to remove "safety". Base models tend to be uncucked, but if the dataset is too filtered, the output can be too plain/boring, so ultimately you still need a finetune on top of it.

Anonymous 04/19/2025 (Sat) 19:19:58 Id: 5bc7d8 No. 7708

>>7707 >>7703 why would anyone want to use a 253B dense model over a 37B/671B MoE? if both have same-ish performance

Anonymous 04/19/2025 (Sat) 19:23:56 Id: 87adbc No. 7710

>>7708 idk, I haven't played with nvidia's tune, but maybe there's some reason? It's like asking why would someone prefer claude opus or sonnet 3.7 over R1 or whatever, might depend on taste and how it performs in specific tasks. Currently R1 could be better at tool use, it's not like they don't have things to improve. I wonder if R2 will handle those well.

Anonymous 04/19/2025 (Sat) 19:38:26 Id: 8af965 No. 7718

>>7708 No consumers at least. Can't run 250B on RAM without killing token generation speed.

Anonymous 04/19/2025 (Sat) 20:49:33 Id: 5bc7d8 No. 7747

https://meta.4chan.gay/tech/67288#p121591

Anonymous 04/19/2025 (Sat) 22:28:34 Id: d18660 No. 7777

Just want a quick update since I haven't been keeping up. Is Nemo still unbeaten by a model same parameter count or less? I'm guessing yes because it's a safe bet at this point, but figured I'd ask

Anonymous 04/19/2025 (Sat) 22:35:45 Id: 88ab39 No. 7778

dead thread, dead website, dead hobby

Anonymous 04/19/2025 (Sat) 22:49:48 Id: 5bc7d8 No. 7787

happy Easter

Anonymous 04/19/2025 (Sat) 23:15:38 Id: 7b325f No. 7812

Just got myself a 3090, what the best model I can run for peak kino AI lewd roleplays?

Anonymous 04/19/2025 (Sat) 23:17:04 Id: 5bc7d8 No. 7814

>>7812 post rest of your specs, also https://meta.4chan.gay/tech/67288?last=100#bottom is more active

Anonymous 04/19/2025 (Sat) 23:17:43 Id: 5bc7d8 No. 7817

>>7812 cydonia

Anonymous 04/19/2025 (Sat) 23:18:08 Id: 5bc7d8 No. 7818

>>7812 MS-Magpantheonsel-lark-v4x1.6.2RP-Cydonia-vXXX-22B-8.i1-IQ4_XS.gguf

Anonymous 04/20/2025 (Sun) 00:24:57 Id: ea3c91 No. 7882

>comfy thread >growing website >developing hobby

Anonymous 04/20/2025 (Sun) 00:48:46 Id: 911ffc No. 7886

>>7747 >https://huggingface.co/OnomaAIResearch/Illustrious-XL-v2.0 >Illustrious XL 1.0-2.0 series aims to stabilize native generation at 1536 resolution while significantly improving natural language understanding capabilities. Not really that interesting, I think it is hitting against the limits of what SDXL can do without Vpred. I expect a lot of models to probably rebase on this since we will probably never get local 3.0/3.5 Vpred from Angel and how funding has essentially almost stopped. >https://huggingface.co/OnomaAIResearch/Illustrious-Lumina-v0.03 >This model is based on Alpha-VLLM/Lumina-Image-2.0 , which is nice small DiT model with minimal guaranteed functionality! Please refer to https://github.com/Alpha-VLLM/Lumina-Image-2.0 for official repository. This is interesting but I suspect he tried to train it before their technical report was out. Lumina was trained on extremely details and long captions for tags and boomer prompting and they even built their own tool for that. I suspect the training wasn't as effective as it should've been because of that, and as the model says, it can recognize characters now but it is still severely undertrained to the extent where it doesn't even equal the training done on Illustrious v0.1

Anonymous 04/20/2025 (Sun) 02:22:49 Id: 3a58c9 No. 7940

What's with the fake 404 on 4gay?

Anonymous 04/20/2025 (Sun) 02:36:12 Id: 7d7a2a No. 7944

How about model for sci-fi novel slop?

Anonymous 04/20/2025 (Sun) 02:37:49 Id: ea3c91 No. 7946

>>7940 4chan got pwnd by sharty

Anonymous 04/20/2025 (Sun) 02:38:31 Id: 3a58c9 No. 7947

>>7946 i mean 4chan.gay

Anonymous 04/20/2025 (Sun) 03:34:48 Id: 1c744e No. 7958

>>7814 Will those niggers just come here instead I'm not going to a pizzachan

Anonymous 04/20/2025 (Sun) 03:38:18 Id: 0e20ba No. 7959

>>7958 they'll pick literally anywhere else but here. is it because of muh ids?

Anonymous 04/20/2025 (Sun) 03:59:36 Id: 91dfd1 No. 7960

(17.62 KB 550x107 s5.png)

I get this red text each time I launch silly. What exactly is this and how do I fix it, idk where exactly it wants me to click for this. I've ignored it so far

Anonymous 04/20/2025 (Sun) 04:05:06 Id: 6ae1b3 No. 7962

>>7960 Choose Text Completion on the 2nd dropdown list under API text.

Anonymous 04/20/2025 (Sun) 04:11:15 Id: b24ad6 No. 7965

>>7959 It actually is ids, lmg has a history of randomly being spammed (by soijack party users no less) so obviously they won't post here

Anonymous 04/20/2025 (Sun) 04:13:41 Id: b1a103 No. 7968

>>7358 >Implying that 50% of /lmg/ discussion wasn't always about trying out whatever new meme finetune

Anonymous 04/20/2025 (Sun) 04:30:37 Id: 91dfd1 No. 7974

>>7962 I'll give it a try next time ty

Anonymous 04/20/2025 (Sun) 04:31:50 Id: 0e20ba No. 7975

(44.27 KB 1734x302 retardedtwice.webp)

>>7965 i love ids

Anonymous 04/20/2025 (Sun) 04:53:56 Id: 1037e2 No. 7980

>>7975 Based. Easy to get around though. I post through a vpn and get a new ID every time without changing anything. Not intentional, I like IDs.

Anonymous 04/20/2025 (Sun) 06:45:30 Id: dfb40b No. 7999

>riverwind Is this a trolling model? I keep getting shilled by products.

petr 04/20/2025 (Sun) 06:46:13 Id: 4c17cd No. 8001

>>7999 kek

Anonymous 04/20/2025 (Sun) 06:46:30 Id: 4c17cd No. 8002

>>8001 >001 AAAAAAAAAACCCCCCCCKKKK

Anonymous 04/20/2025 (Sun) 07:23:31 Id: 8b4562 No. 8012

>>7999 yes its a troll model, unironically great at what its made to do

Anonymous 04/20/2025 (Sun) 07:27:52 Id: 5fa221 No. 8016

>>7999 pretty sure it was an april fools day project that wasnt ready in time

Anonymous 04/20/2025 (Sun) 07:59:25 Id: 48eb0b No. 8021

>>7959 Probably, the guy who makes most of the posts there replied to himself here twice >>7237 >>7264 >>7275 >>7276

Anonymous 04/20/2025 (Sun) 08:23:14 Id: f38237 No. 8027

>>8021 What the fuck lmao what a weird cunt. if 4chan ever comes back IDs need to be on every board to out freakshows like this

Anonymous 04/20/2025 (Sun) 08:31:29 Id: 8b4562 No. 8028

>>8021 Why the fuck are You giving him (You)s

Anonymous 04/20/2025 (Sun) 08:33:34 Id: 0e20ba No. 8029

>>8021 What causes one to behave this way?

Anonymous 04/20/2025 (Sun) 08:39:24 Id: 2f83dc No. 8031

>>8028 you's are not currency dont be a faggot, this person deserves to be pointed out and shamed

Anonymous 04/20/2025 (Sun) 08:41:43 Id: 8af965 No. 8032

>>8029 Mental illness.

Anonymous 04/20/2025 (Sun) 08:43:13 Id: 8b4562 No. 8033

https://github.com/JohannesGaessler/elo_hellm >Elo HeLLM is a project for establishing a ranking based on Elo ratings between large language models. The context is that I'm working on training code for llama.cpp. llama.cpp has methods for estimating the quality loss from quantization but it lacks methods for estimating the quality of a model in absolute terms or for making comparisons between different models. I intend to co-develop this project with the llama.cpp training code for quality control. The approach is to merge an arbitrary number of quality metrics into a single Elo rating for a model using statistical methods. One category of such quality metrics are simply the results of language model benchmarks such as MMLU. Results from competitive games such as Chess can also be used (not yet implemented).

Anonymous 04/20/2025 (Sun) 08:55:27 Id: bb0219 No. 8036

Hey wait wtf. I just noticed that my post here >>7237 has the same ID as a bunch of other posts in the thread that aren't mine. I'm serious. Also, I don't see "(You)" in the replies. I'm getting spooked what the hell.

Anonymous 04/20/2025 (Sun) 08:56:25 Id: bb0219 No. 8037

>>8036 Why is my id different ahhhhhhh.

Anonymous 04/20/2025 (Sun) 08:57:23 Id: 8b4562 No. 8038

>tfw anons that leave /lmg/ for too long get assimilated into petra after all

Anonymous 04/20/2025 (Sun) 08:57:30 Id: 1bac8e No. 8039

>>8036 fake until proven gay

Anonymous 04/20/2025 (Sun) 08:58:57 Id: 0e20ba No. 8040

>>8038 >petrified petra is a gorgon

Anonymous 04/20/2025 (Sun) 09:00:17 Id: bb0219 No. 8041

>>8038 >>8039 But seriously though this is creepy. Are the mods messing with me? Did I get hacked? How am I even supposed to get proof in this situation?

Anonymous 04/20/2025 (Sun) 09:02:25 Id: 48eb0b No. 8043

>>8041 Why do you care? Even if you are telling the truth you are anonymous and have no identity worth protecting.

Anonymous 04/20/2025 (Sun) 09:06:39 Id: 8af965 No. 8044

>>8041 Your IP could have changed and some other guy has gotten your exact previous one. Which is probably less likely than winning the lottery.

Anonymous 04/20/2025 (Sun) 09:06:48 Id: 8b4562 No. 8045

serial expetriments lain

Anonymous 04/20/2025 (Sun) 09:06:54 Id: b1e463 No. 8046

>>8041 In all probability, someone is just using the same VPN.

Anonymous 04/20/2025 (Sun) 09:07:49 Id: 0e20ba No. 8047

>>8046 this is true. watch me change me id by changing my vpn

Anonymous 04/20/2025 (Sun) 09:08:18 Id: 6f10fa No. 8048

>>8047 Sex with AI.

Anonymous 04/20/2025 (Sun) 09:08:34 Id: 465644 No. 8049

>>8048 as shrimple as that

Anonymous 04/20/2025 (Sun) 09:08:35 Id: 8b4562 No. 8050

anons what if he hacked 8chan too?

Anonymous 04/20/2025 (Sun) 09:10:27 Id: 8af965 No. 8051

>>8050 He won't get away with it on 16chan.

Anonymous 04/20/2025 (Sun) 09:11:28 Id: 8b4562 No. 8052

Christ is risen Hitler's birthday Kikes seething

Anonymous 04/20/2025 (Sun) 09:13:03 Id: bb0219 No. 8053

>>8043 Why would I not care? ID's serve a purpose, and people are treating them as something that has a purpose, so if they can be undermined, then we can't really treat them the same anymore. And I don't see why it someone wouldn't be concerned if they were the target of some mod trolling or other activity, assuming this wasn't due to a bug or some one in a million chance. >>8044 Last I checked I have a static IP. I do use librewolf though which might change my canvas/fingerprint around sometimes, does this site use other indicators other than IP to assign an ID? If so then perhaps that's why. >>8046 I wasn't using a VPN when I made that first post, and I'm not using one right now. I did use a VPN to take a look at gay 4chan tho.

Anonymous 04/20/2025 (Sun) 09:45:12 Id: 7d7a2a No. 8067

I got different IDs too even though I have (supposedly) static IP.

Anonymous 04/20/2025 (Sun) 09:53:11 Id: 65ba16 No. 8068

>be me >rode the wave of AI cooming before proxies dried up and became hoarded by people >forget about AI cooming for a bit >get a 7900XTX for vidyagames >only now i realize i could run a model locally and coom my brains out Ok, I've got Ooba set up, what NSFW models would you suggest for 24 GB of VRAM and 32 GB of RAM?

Anonymous 04/20/2025 (Sun) 09:55:50 Id: 8b4562 No. 8069

>>8068 nevoria 70b or whatever its called

Anonymous 04/20/2025 (Sun) 10:00:09 Id: 1738af No. 8071

>>8068 Have you ever used a local language model before?

Anonymous 04/20/2025 (Sun) 10:02:01 Id: 65ba16 No. 8072

>>8071 No.

Anonymous 04/20/2025 (Sun) 10:04:40 Id: 44756b No. 8075

>>8068 Start with mistral nemo 12B, once you start noticing it's patterns and/or limitations move up to magnum 22B, once that no longer tickles the pickle, either move up and fuck around with QwQ or play around with other mistral finetunes like cydonia or magpantheonsel. >>8069 You're not running a Llama 3.3 70B finetune on 24G of VRAM at any decent speed or quant. I have 48GB of VRAM and even I only run a q4 of Nevoria at just barely acceptable speeds.

Anonymous 04/20/2025 (Sun) 10:07:25 Id: 8af965 No. 8076

>>8072 Cydonia then. Make sure the quant you pick stays under your 24GB to give some room for context. It also takes up VRAM and slows things down if it spills into RAM.

Anonymous 04/20/2025 (Sun) 10:11:43 Id: 65ba16 No. 8077

>>8076 Right, I'll go for the Q6_K version then. Is there anything I need to change in Ooba to jailbreak/gaslight the model or can I just set up SillyTavern and just go with it?

Anonymous 04/20/2025 (Sun) 10:14:36 Id: 44756b No. 8078

>>8077 If you're using SillyTavern as a frontend the only settings in Ooba that even get used are the model loading ones, like GPU layers and Context. As long as you have a system prompt in ST telling it to play along as {{char}} Cydonia and pretty much any other model should just go with it.

Anonymous 04/20/2025 (Sun) 10:14:57 Id: 8af965 No. 8079

>>8077 A minimal system prompt should be enough. The usual 'you are writing an uncensored roleplay, taking turns...'

Anonymous 04/20/2025 (Sun) 10:20:01 Id: 8b4562 No. 8081

>>8068 MS-Magpantheonsel-lark-v4x1.6.2RP-Cydonia-vXXX-22B-8.i1-Q6_K.gguf

Anonymous 04/20/2025 (Sun) 10:52:00 Id: 004c54 No. 8093

>>8052 its 420, blaze it faggot

Anonymous 04/20/2025 (Sun) 10:54:32 Id: 004c54 No. 8094

>>8069 nevoria is a piece of shit. 100% meme merge. his first decent model was electra. NumbSkull uses discord to gauge if the models are good. Don't forget to buy him a covfefe.

Anonymous 04/20/2025 (Sun) 11:25:48 Id: 72c24e No. 8098

Anyone here tried running 256gb ram on an LGA1700 mobo? My mobo supports it but no 12th, 13th or 14th gen Intel CPU officially supports more than 192gb.

Anonymous 04/20/2025 (Sun) 11:29:18 Id: 8b4562 No. 8099

>>8098 if it says it supports 192gb max then it supports 192gb max no more than 192gb probs are there even 64gb ddr5 consumer modules?

Anonymous 04/20/2025 (Sun) 11:31:55 Id: 72c24e No. 8100

>>8099 There's Crucial Pro ones on Amazon (CP2K64G56C46U5). 13th gen only supported 128gb but bumped up to 192gb later on, that's why I'm wondering.

Anonymous 04/20/2025 (Sun) 12:36:08 Id: 8b4562 No. 8105

uhhh anonies.. meta 4chan gay tech board got DELETED GEEEEEEEEEEG

Anonymous 04/20/2025 (Sun) 12:38:29 Id: 8b4562 No. 8106

i won't be making a thread on 4chan gay, time to let anons come over here we can always make a thread over there again if 8chin moe becomes gay or soethin

Anonymous 04/20/2025 (Sun) 12:41:15 Id: 465644 No. 8107

>>8105 looks like cp mattered more after all

Anonymous 04/20/2025 (Sun) 12:46:05 Id: e80c88 No. 8110

Where the fuck did all the $500 chink Epyc 9334 QS on ebay go? They all disappeared over the past month. ""Cheap"" $6000 cpuMAXXing is dead.

Anonymous 04/20/2025 (Sun) 13:00:52 Id: 65ba16 No. 8116

>>8077 >>8081 Well I'm plucking along at this slowly, can anyone recommend a good preset for Cydonia in SillyTavern?

Anonymous 04/20/2025 (Sun) 13:01:50 Id: 004c54 No. 8118

>>8110 bought out

Anonymous 04/20/2025 (Sun) 13:03:16 Id: c071ba No. 8119

>>8105 They cleaned out the site of all CP it seems. Even the /c/ board is gone and now the admin is telling people to go back to their holes. Hilarious.

Anonymous 04/20/2025 (Sun) 13:05:27 Id: 004c54 No. 8121

>>8119 It said right in the name that they were gay.

Anonymous 04/20/2025 (Sun) 13:05:49 Id: 8b4562 No. 8122

>>8116 for the ms mag mell shitmix https://files.catbox.moe/f6htfa.json

Anonymous 04/20/2025 (Sun) 13:06:23 Id: 465644 No. 8123

>>8119 It is quite amusing, but it doesn't seem genuine. EPI threads are still there.

Anonymous 04/20/2025 (Sun) 13:07:52 Id: 8b4562 No. 8124

for the lonely anons out there https://huggingface.co/bartowski/v2ray_GPT4chan-24B-GGUF

Anonymous 04/20/2025 (Sun) 13:09:36 Id: 1ea072 No. 8125

>>8119 Nah he's retarded, he didn't have to delete lmg and other perfectly fine threads. That nigger is a fed and his website is a honeypot and they just get rid of the honey

Anonymous 04/20/2025 (Sun) 13:10:50 Id: 8b4562 No. 8127

dont worry guys i have a recent enough archive on my encrypted ssd i will post it if you anons want it.. soon

Anonymous 04/20/2025 (Sun) 13:11:43 Id: c071ba No. 8128

>>8123 >but it doesn't seem genuine. It's not. It's just a reaction to cloudflare and the host getting on the admin's tail. Truth is, the people in charge are furry/zoophille/pedophiles (all three at once yah) who crave attention. The admin is a known avatarfag on 4chan for example. It's a clown show. Anyhow. Local models. >>8125 He mass deleted all threads that were older than a day or something like that because there was a lot of shit to be found on the website if cloudflare or the host provider were to go poking. Yes, he could have implemented a smarter approach, but they are also somewhat technologically inept.

Anonymous 04/20/2025 (Sun) 13:23:34 Id: 980195 No. 8132

aww it's gone. it was comfy but our neighbors were a little odd.

Anonymous 04/20/2025 (Sun) 13:32:16 Id: 44de38 No. 8138

>>8110 I reckon everyone started to have the same idea, just like with used 3090s Anyway, so how we liking 8chan lads? I think the IDs are pretty damn sweet and we actually have a REAL /ai(s)/ board

Anonymous 04/20/2025 (Sun) 13:33:34 Id: 465644 No. 8139

>>8138 Everyone will go back to 4chin once it's back online, but it'll do for now.

Anonymous 04/20/2025 (Sun) 13:36:25 Id: 8b4562 No. 8142

>>8140 >>8141 >this is who accuses you of being petra

Anonymous 04/20/2025 (Sun) 13:37:18 Id: ea3c91 No. 8144

>>8142 just report and ignore

Anonymous 04/20/2025 (Sun) 13:37:27 Id: 465644 No. 8145

>>8140 blacked miku.... petroons... now this really feels like home.......

Anonymous 04/20/2025 (Sun) 13:38:16 Id: 8b4562 No. 8146

how to report bruh

Anonymous 04/20/2025 (Sun) 13:38:44 Id: ea3c91 No. 8147

>>8146 lurk moar

Anonymous 04/20/2025 (Sun) 13:45:27 Id: 44de38 No. 8152

Like I was saying, IDs are great. Not infallible, but great

Anonymous 04/20/2025 (Sun) 13:46:38 Id: 8b4562 No. 8153

kek

Anonymous 04/20/2025 (Sun) 13:49:25 Id: 465644 No. 8155

>>8153 tbh, some fags on e6ai are doing some pretty good ai slop anims, but most of them are cloud based so, not lmg.

Anonymous 04/20/2025 (Sun) 13:50:45 Id: c4bdc8 No. 8157

>filters have options for name and tripcode but not post id useless

Anonymous 04/20/2025 (Sun) 13:52:02 Id: c071ba No. 8158

>>8157 Easy to make an userscript for that, at least.

Anonymous 04/20/2025 (Sun) 13:52:45 Id: ea3c91 No. 8160

>>8152 They really are

Anonymous 04/20/2025 (Sun) 14:04:59 Id: 1ea072 No. 8166

>>8163 You're visible now retarded nigger

Anonymous 04/20/2025 (Sun) 14:14:22 Id: 004c54 No. 8170

>>8159 feels just like the old lmg

Anonymous 04/20/2025 (Sun) 14:15:28 Id: c6df88 No. 8171

>wake up >gay thread nuked >admin proved himself to be an absolute bozo >cunny replaced with fu**y shit as a cherry on top welp. 8chan it is then.

Anonymous 04/20/2025 (Sun) 14:22:12 Id: ce9b30 No. 8176

I hope that /lmg/ will settle here or at least not on 4chan.gay. I can't even post from a hardened browser on it, this shit is obviously doing some heavy fingerprinting.

Anonymous 04/20/2025 (Sun) 14:25:59 Id: c071ba No. 8179

>>8176 Either here or erischan would be good. I like it here for the IDs.

Anonymous 04/20/2025 (Sun) 14:27:31 Id: c6df88 No. 8180

>>8179 IDs are gay, a important part of the chan experience is being able to samefag tbhdesu, I feel very limited.

Anonymous 04/20/2025 (Sun) 14:29:41 Id: b1a103 No. 8181

>>8180 >samefaging kys

Anonymous 04/20/2025 (Sun) 14:31:50 Id: c071ba No. 8182

>>8180 Yeah, that's exactly why IDs are good.

Anonymous 04/20/2025 (Sun) 14:32:33 Id: 8af965 No. 8183

>>8180 You can still do that. Comes with the privilege of being made fun of in a screenshot.

Anonymous 04/20/2025 (Sun) 14:38:45 Id: 980195 No. 8187

>takes down all of 4chin to scatter /lmg/ >tries to get anons on /ghost/ to come to 8chan where he is a mod >anons go to gay chan instead >report gay by including links to his posts to try and get the host to shut it down >doesn't work, but causes a freakout so admins do a purge >big thread gone >anons move here complete blacked miku spammer victory

Anonymous 04/20/2025 (Sun) 14:39:14 Id: 58b27c No. 8188

>>8180 Just change your IP bro

Anonymous 04/20/2025 (Sun) 14:40:40 Id: 58b27c No. 8189

>>8187 I really hope that headcanon isn't real, even discordniggers are less cancerous than that

Anonymous 04/20/2025 (Sun) 14:42:09 Id: 49f85f No. 8190

test

Anonymous 04/20/2025 (Sun) 14:43:42 Id: c6df88 No. 8191

>>8190 lmao

Anonymous 04/20/2025 (Sun) 14:46:13 Id: c6df88 No. 8192

>>8187 >>8189 it's obvious to anyone that it's the opposite.

Anonymous 04/20/2025 (Sun) 14:46:24 Id: c071ba No. 8193

>>8190 Spoopy.

Anonymous 04/20/2025 (Sun) 14:49:27 Id: 58b27c No. 8195

So what are you doing now /lmg/entlemen?

Anonymous 04/20/2025 (Sun) 14:50:10 Id: ea3c91 No. 8196

>>8180 Do it anyway, who cares if anons call you a fag for it. Or maybe you are one?

Anonymous 04/20/2025 (Sun) 14:57:11 Id: c071ba No. 8197

>>8195 Using Gemini to format and consolidate a fuckton of data. I can't wait until we have models good enough, software good enough, and hardware cheap enough to do that kind of thing locally. Might take a year or two, but we'll get there eventually.

Anonymous 04/20/2025 (Sun) 14:57:36 Id: c6df88 No. 8198

>>8196 I may or may not be, and that's another reason I dislike IDs. I don't want my fagness status to be tied to my reply history. If I wanted to be bullied for being myself I would be on reddit.

Anonymous 04/20/2025 (Sun) 15:01:12 Id: 58b27c No. 8200

>>8197 You tried structured outputs?

Anonymous 04/20/2025 (Sun) 15:02:54 Id: 7d7a2a No. 8201

>>8124 I'm malding trying to use this shit. What template does it use?

Anonymous 04/20/2025 (Sun) 15:04:25 Id: ac6f68 No. 8203

>>8198 Retards ruin things for everyone. IDs are still better than full blown accounts at least.

Anonymous 04/20/2025 (Sun) 15:06:13 Id: 48eb0b No. 8204

>>8201 If only we had a file format where you could embed the prompt template so backends could know how to format text for the model. Maybe we could use jinja since AI fags love python so much.

Anonymous 04/20/2025 (Sun) 15:07:11 Id: 3a58c9 No. 8205

>>8198 Working as intended. Don't make shit posts and IDs aren't an issue. If you fuck up take a time out and wait for the next thread.

Anonymous 04/20/2025 (Sun) 15:07:48 Id: 7d7a2a No. 8206

>>8176 >his shit is obviously doing some heavy fingerprinting. Prob from constantly monitoring your message typing kek

Anonymous 04/20/2025 (Sun) 15:09:10 Id: c071ba No. 8207

>>8200 You mean like JSON schema or BNF grammar? I'm dealing with plain text output, not JSON, XML, or the like. Not that I couldn't use BNF for example enforce header sub-header paragraph, or something like that, but that's besides the point, and unnecessary so far. I don't have the hardware to run a large full precision model with enough ("working") context to ingest hundreds of thousands of tokens and chat with in real time, meaning a large token throughput, to iterate over those over and over. Plus, the software we have available is still full of holes here and there like llama.cpp not properly supporting MLA or SWA. As I said, we'll get there, but it will take a little while more.

Anonymous 04/20/2025 (Sun) 15:09:51 Id: f0de82 No. 8208

>>7958 hey look at that I got my wish

Anonymous 04/20/2025 (Sun) 15:14:47 Id: c6df88 No. 8210

>>8203 >>8205 A serious retard would be able to samefag easily, take p' as an example. This only affects the little guy who occasionally shitposts, and it fosters a culture of elitism and snowflakery, filtering out people who the board doesn't want to see, which ends up creating an echo chamber. All in all, IDs are the pinnacle of reddit.

Anonymous 04/20/2025 (Sun) 15:15:50 Id: 5393ea No. 8211

>>8210 >I WANNA REPLY TO MYSELF REEEEEEEEEEEEEEEEEEEEEEEEEE cry more nigger

Anonymous 04/20/2025 (Sun) 15:16:11 Id: 1fb8ff No. 8213

>>8211 haha you tell him anon

Anonymous 04/20/2025 (Sun) 15:16:29 Id: 32549a No. 8214

>>8211 so true and smart and funny

Anonymous 04/20/2025 (Sun) 15:16:44 Id: c5d151 No. 8215

>>8211 with a massive cock too

Anonymous 04/20/2025 (Sun) 15:17:32 Id: c6df88 No. 8216

>>8211 based

Anonymous 04/20/2025 (Sun) 15:29:43 Id: 3a58c9 No. 8217

>>8210 You're not on 4chan anymore, janny. Get over it.

Anonymous 04/20/2025 (Sun) 15:32:17 Id: 58b27c No. 8222

>>8207 Well I don't know what you're doing but vllm works well even with 8B llama AWQ quantized models

Anonymous 04/20/2025 (Sun) 15:33:00 Id: 8b4562 No. 8223

https://github.com/lmganon16/koboldcpp-shared-expetras added --override_tensor you can use this to force full offload of all non shared experts to cpu, put --gpulayers 100 and enjoy a big performance increase tested on RTX 3060 12GB/64GB DDR4: LLAMA_CUBLAS=1 make -j12 python koboldcpp.py --gpulayers 100 --contextsize 8192 --threads 6 --blasthreads 12 --flashattention --quantkv 1 --model ~/TND/models/L4/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf --nommap --override_tensor "([2-9]|[1-9][0-9])\.ffn_.*_exps\.=CPU" command explained: all tensors (layer 2-99 instead of all so that a bit more goes to vram, therefore you have more ram for DE/VMs/programs) that are non-shared get offloaded to RAM. which means if you put --gpulayers 100 all shared experts get offloaded to gpu, which increases T/s considerably (4t/s => 8t/s) >usecase? llama.cpp server is limited in functionality

Anonymous 04/20/2025 (Sun) 15:33:01 Id: 8b4562 No. 8224

https://github.com/lmganon16/koboldcpp-shared-expetras added --override_tensor you can use this to force full offload of all non shared experts to cpu, put --gpulayers 100 and enjoy a big performance increase tested on RTX 3060 12GB/64GB DDR4: LLAMA_CUBLAS=1 make -j12 python koboldcpp.py --gpulayers 100 --contextsize 8192 --threads 6 --blasthreads 12 --flashattention --quantkv 1 --model ~/TND/models/L4/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf --nommap --override_tensor "([2-9]|[1-9][0-9])\.ffn_.*_exps\.=CPU" command explained: all tensors (layer 2-99 instead of all so that a bit more goes to vram, therefore you have more ram for DE/VMs/programs) that are non-shared get offloaded to RAM. which means if you put --gpulayers 100 all shared experts get offloaded to gpu, which increases T/s considerably (4t/s => 8t/s) >usecase? llama.cpp server is limited in functionality

Anonymous 04/20/2025 (Sun) 15:33:56 Id: ce9b30 No. 8226

>>8206 Yes, didn't mention it, but I'm also not a fan of that. I can often start writing a message, have something more important to do and finish it later, I don't want people to see the half-baked messages.

Anonymous 04/20/2025 (Sun) 15:35:11 Id: c6df88 No. 8227

>>8223 >petra software Nah.

Anonymous 04/20/2025 (Sun) 15:35:36 Id: 8b4562 No. 8230

>>8224 how do i delete a post lmao

Anonymous 04/20/2025 (Sun) 15:36:24 Id: 1ad2bd No. 8231

>>8230 Clock the little arrow to the left of Anonymous

Anonymous 04/20/2025 (Sun) 15:39:21 Id: b24ad6 No. 8233

>>7965 >>7975 I stand corrected, they are now posting here, I guess they migrated from 4chan.gay.

Anonymous 04/20/2025 (Sun) 15:43:56 Id: b1a103 No. 8237

Anyone actually get UI-TARS up and running? Curious about the 1.5 release.

Anonymous 04/20/2025 (Sun) 15:47:44 Id: c071ba No. 8245

>>8222 I doubt a quantized 8b model will be able to consistently read and output so many tokens, reason over them, format them, add to them with PDFs and such, etc. Granted, I didn't try, but still. Even that would probably be too slow for what I'm doing. Hell, it takes around 5 minutes at over 15t/s to output the several 64k token chunks for each iteration. No, this kind of work is not yet something you can feasibly do with local models, assuming modest hardware, I think.

Anonymous 04/20/2025 (Sun) 16:06:36 Id: c6df88 No. 8258

>>8226 lol, imagine being this insecure

Anonymous 04/20/2025 (Sun) 16:09:06 Id: 91dfd1 No. 8261

what exactly should I be changing these settings to in sillytavern? I have 16GB VRAM (5080) I assume these tokens are set wrong? 400

Anonymous 04/20/2025 (Sun) 16:10:22 Id: 58b27c No. 8263

>>8261 Just set temp to 1 and you're good to go

Anonymous 04/20/2025 (Sun) 16:10:37 Id: 2f9aec No. 8264

>>8261 model, rest of setup, OS, full ST export?

Anonymous 04/20/2025 (Sun) 16:11:05 Id: b24ad6 No. 8265

>>8258 It disincentivizes writing thought-out posts and puts peer pressure on sending messages immediately without even reviewing them for spelling mistakes. >>8261 Just run the program, what's the issue?

Anonymous 04/20/2025 (Sun) 16:11:52 Id: c6df88 No. 8266

>>>8261 who let the techsupport beggars in?

Anonymous 04/20/2025 (Sun) 16:13:21 Id: 3dad7a No. 8267

>IDs Remove this shit

Anonymous 04/20/2025 (Sun) 16:14:28 Id: b24ad6 No. 8268

>>8267 Fuck off

Anonymous 04/20/2025 (Sun) 16:16:54 Id: 5c2a36 No. 8273

>>8125 >constant back and forth changes Now that you mention it, the maybe it really is the feds.

Anonymous 04/20/2025 (Sun) 16:17:49 Id: c071ba No. 8275

>>8261 That should work. Maybe change the Response (amount of tokens model will output) and see if the Context is set the same as is set in whatever backend/loader you are using. Sometimes you have a model with, say, 128k tokens of context window, but due to memory limitations you set the limit to just 16k, you have to set that same 16k number in that Context slider/field.

Anonymous 04/20/2025 (Sun) 16:19:13 Id: c6df88 No. 8276

>>8265 In other words, it's a incentive to make more honest posts. Sounds like a good thing, I'm sure that also makes anon think two times before acting like jerks, so that's another plus. >>8267 you tell them anon

Anonymous 04/20/2025 (Sun) 16:20:48 Id: 61f2cb No. 8278

https://pastebin.com/E1jtKEx9

Anonymous 04/20/2025 (Sun) 16:21:47 Id: c6df88 No. 8279

>>8278 just go back to reddit anon, you will feel at home.

Anonymous 04/20/2025 (Sun) 16:22:09 Id: c071ba No. 8280

>>8278 Atta boy.

Anonymous 04/20/2025 (Sun) 16:24:40 Id: b24ad6 No. 8284

>>8278 Thanks bro

Anonymous 04/20/2025 (Sun) 16:25:44 Id: 2f9aec No. 8286

>>8278 Thanks bro

Anonymous 04/20/2025 (Sun) 16:28:18 Id: 58b27c No. 8293

>>8279 >>8288 The duality of anon

Anonymous 04/20/2025 (Sun) 16:41:37 Id: 91dfd1 No. 8321

>>8263 thanks, will do that! >>8264 I'm using Mistral-Nemo-Instruct-2407-Q6_K_L as my model (I was told to use this for my 16GB card) Also one issue I have is that the characters tend to write massive blogs on text and "advance" the RP too rapidly multiple steps at a time. How can I reduce how much they will write each time they generate text so it goes at a slower pace? Like if I say "let's go to the beach" they will write a blog about saying yes, then going to said beach, arriving, setting up towels and then ending their text their. It's like holy shit let me respond. I'm sure it's a setting I can change to lower how much they respond back.

Anonymous 04/20/2025 (Sun) 16:45:18 Id: 8af965 No. 8327

>>8321 Telling the model to slow down after it has already started writing blogs is no use. You have to keep an eye out for that at the beginning of the RP. After few medium-length replies it picks up on it and keeps the pace mostly the same.

Anonymous 04/20/2025 (Sun) 16:49:49 Id: 91dfd1 No. 8333

>>8327 >Telling the model to slow down after it has already started writing blogs is no use. So it's not an actual setting in ST? I just have to start the RP with something like >Hello character (keep replies short-length) that? I don't mind starting it over I was mostly testing, but kinda figured reply length was some setting in ST.

Anonymous 04/20/2025 (Sun) 16:52:23 Id: 58b27c No. 8338

>>8333 You can put that at the end of your system prompt for starters

Anonymous 04/20/2025 (Sun) 16:54:23 Id: 8af965 No. 8343

>>8333 You can set a max token output limit, but it doesn't affect what the model wants to write. What happens is that it'll try to write another blog, but gets cut off at token 400 or whatever you'll set.

Anonymous 04/20/2025 (Sun) 16:54:59 Id: b24ad6 No. 8348

>>8333 You can set the response length in tokens but it will usually just cut off the response, so it's better to tell the model what you want instead. You can also put instructions like that in the system prompt section of the settings instead of putting them in your replies, it might work better (or worse)

Anonymous 04/20/2025 (Sun) 17:02:09 Id: 91dfd1 No. 8358

>>8348 >>8343 >>8338 awesome, thanks fellas

Anonymous 04/20/2025 (Sun) 17:08:10 Id: 2f9aec No. 8365

>>8321 use mag ms sloptune posted somewhere ITT, pronably should use iq4_xs to fit it in vram fully theres also master preset somewhere on catbox also ITT

Anonymous 04/20/2025 (Sun) 17:13:56 Id: bb0219 No. 8375

>wake up >everything is up in flames Man.

Anonymous 04/20/2025 (Sun) 17:16:47 Id: ce9b30 No. 8380

>>8258 It's not really about being insecure. Having people see and answer half-baked posts only reduce the quality of a discussion.

Anonymous 04/20/2025 (Sun) 17:18:01 Id: 58b27c No. 8381

>>8375 first time?

Anonymous 04/20/2025 (Sun) 17:30:48 Id: 0db159 No. 8398

>>8375 12vhpwr was a mistake

Anonymous 04/20/2025 (Sun) 18:11:56 Id: b1a103 No. 8423

>>8398 >12vhpwr I don't know why they still push this meme.

Anonymous 04/20/2025 (Sun) 18:14:12 Id: b9e203 No. 8428

Gemma 3 27B q4 QAT is pretty good at writing ENF stories and coming up with scenarios based on images/text prompt. The disclaimers are funny because I'm pretty sure they're like a self fulfilling prophecy that encourages more lewd content. I think I'm all set lads.

Anonymous 04/20/2025 (Sun) 18:19:17 Id: 469fb4 No. 8430

Half the thread is offtopic about post "quality" already. And a bunch of faggots praising the ids because thread is finally becoming more like reddit. It is a good thong every single one of you maladjusted retards got bullied. And you should really just kill yourselves nos. Words fail to describe how much limp wristed faggotry is condensed here. Half of you probably take more estrogen than the average 4chan janny.

Anonymous 04/20/2025 (Sun) 18:20:56 Id: c071ba No. 8431

>>8428 I need to play around with it too. What are you using to run it with text + img?

Anonymous 04/20/2025 (Sun) 18:22:51 Id: 9e8276 No. 8432

>>8428 From what I've tested and seen, QAT seems to be good at q4 and below, if quanted to higher quants it seems to be worse than regular. I wonder why.

Anonymous 04/20/2025 (Sun) 18:32:48 Id: b1a103 No. 8439

>>8428 Using any jailbreak prompt? Never had good experiences with anything Gemma

Anonymous 04/20/2025 (Sun) 18:33:25 Id: 44de38 No. 8442

>>8375 Shouldn't have bought intel

Anonymous 04/20/2025 (Sun) 18:33:53 Id: c071ba No. 8443

>>8442 Lmao

Anonymous 04/20/2025 (Sun) 18:39:16 Id: 004c54 No. 8449

>>8431 KoboldCPP has the easiest support for images. If you have the vram, exllama will run it too. Then just sillytavern and chat completions.

Anonymous 04/20/2025 (Sun) 18:51:35 Id: 65ba16 No. 8459

>tried deepseek v3 0324 >mfw I for one will be very welcoming to the Chinese century.

Anonymous 04/20/2025 (Sun) 19:21:27 Id: c071ba No. 8483

>>8449 You mean proper native support for the model or that thing that turns the image into a promp of sorts?

Anonymous 04/20/2025 (Sun) 19:51:24 Id: c6df88 No. 8505

>>8430 Pretty much this.

Anonymous 04/20/2025 (Sun) 19:51:49 Id: b84e98 No. 8506

>>8459 same, I'm not going back to 70b tunes despite them running 3-4 times faster on my rig (5-7t/s vs 1-2t/s)

Anonymous 04/20/2025 (Sun) 19:52:39 Id: c6df88 No. 8508

>>8459 V3 0324 is incredibly cucked when compared to the old version tho

Anonymous 04/20/2025 (Sun) 21:07:07 Id: 44756b No. 8555

>>8449 Wait what, since when does Exllama have image support?

Anonymous 04/20/2025 (Sun) 21:10:05 Id: aaf5d9 No. 8560

>>8398 >He didn't powerlimit his GPU

Anonymous 04/20/2025 (Sun) 21:34:47 Id: aaf5d9 No. 8567

>>8505 >recieve Ok ESL

Anonymous 04/20/2025 (Sun) 21:45:34 Id: 004c54 No. 8573

>>8555 Since people begged turboderp to add qwen VL. It works with pixtral as well. Your options are VLLM, transformers, exl2, kcpp and obama. I hate obama.

Anonymous 04/20/2025 (Sun) 22:41:56 Id: ac845c No. 8600

>>8197 gemma3 q4 running on one 3090 can be pretty good. Do you really need more? Depending on the type of data, you may make the process more robust with some pre/post-processing.

Anonymous 04/20/2025 (Sun) 22:42:37 Id: c6df88 No. 8601

>>8567

Anonymous 04/21/2025 (Mon) 04:55:57 Id: 0db159 No. 8753

>>8423 A small connector for a small PCB. While the obvious choice of two thick wires is more expensive and less flexible than a pack of thin wires, it is an okay connector in theory—just its actual implementation is poorly engineered.

Anonymous 04/21/2025 (Mon) 05:42:26 Id: 465644 No. 8766

>check 4gay >its active are you serious

Anonymous 04/21/2025 (Mon) 06:38:15 Id: cc8acd No. 8778

>>8766 That OP really fucked it up. Retards are drawn to it like flies are drawn to a fresh pile of shit.

Anonymous 04/21/2025 (Mon) 08:03:21 Id: 0db159 No. 8789

What is the situation with Deepseek, MLA, ktansformers, and Unsloth? Does ktransformers support mla? Do I need new quants for that? Will unsloth release updated magical quants? I'm RAMlet with 96VRAM+256RAM

Anonymous 04/21/2025 (Mon) 08:11:53 Id: 8df3f5 No. 8790

Leto just wiped most of 4gay while claiming he had an in person FBI visit. Was posting the 4gay url on kiwifarms worth it?

Anonymous 04/21/2025 (Mon) 08:18:24 Id: cde305 No. 8791

Qwen promised to release the model in April, right? Surely this is the week.

Anonymous 04/21/2025 (Mon) 08:21:26 Id: 0db159 No. 8792

https://github.com/Tencent/InstantCharacter Consistently generating new pic evry message would be peak rp

Anonymous 04/21/2025 (Mon) 08:22:15 Id: 235942 No. 8794

I might actually shit myself.

Anonymous 04/21/2025 (Mon) 08:26:35 Id: 0db159 No. 8796

>>8791 What are they waiting for? R2 could come out anytime and make every other model obsolete. If Qwen3 weren’t shit, they would have released it already

Anonymous 04/21/2025 (Mon) 08:56:30 Id: 578fa9 No. 8802

>>8789 ktransformers should but my P40 shitbox got filtered by flash attention ikllama.cpp has mla + fa and works with existing quants, but the server is shit, doesn't support jinja, and looks like it hasn't been touched in 5 months. Unless my gguf file is broken it's also messing up the chat template for R1 so you need to fuck about with text completion mode llama.cpp main got jukofyork'd and still wastes vram for no real speed-up, so pretty much unusable grim

Anonymous 04/21/2025 (Mon) 09:34:57 Id: e80c88 No. 8806

>>8802 Why did they let this happen when both ik_ and ktransformers have working implementations and both are based on old-ass versions of llama.cpp?

Anonymous 04/21/2025 (Mon) 09:43:24 Id: 177a47 No. 8809

Is this the official /lmg/ now? I guess CUDA dev Anon and summary bot Anon will be the final seal of approval.

Anonymous 04/21/2025 (Mon) 09:43:40 Id: 0db159 No. 8810

>>8806 Some drama ass shit and bad blood. They hate each other and have a beef about giving credits for contributions

Anonymous 04/21/2025 (Mon) 09:48:48 Id: 578fa9 No. 8812

>>8806 No one really knows, here's the bickering that took place when someone tried to move a feature from ikllama.cpp to llama.cpp https://github.com/ikawrakow/ik_llama.cpp/discussions/316 Also saw this while searching for the above link kek https://github.com/ikawrakow/ik_llama.cpp/discussions/319

Anonymous 04/21/2025 (Mon) 09:52:38 Id: 61f2cb No. 8814

►Recent Highlights from the Previous Thread: https://meta.4chan.gay/tech/67288 https://files.catbox.moe/u4jlh8.zip ►Recent Highlight Posts from the Previous Thread: https://pastebin.com/YTXUbc3Q Why?: 9 reply limit >102478518 Fix: https://rentry.org/lmg-recap-script

Anonymous 04/21/2025 (Mon) 09:53:59 Id: 61f2cb No. 8815

>>8809 8chan went down right when I went to post. I was able to get the regular script to work with gaychan over the weekend. Turns out they embed the initial json of the thread into a script tag at the bottom of the html. The script ran for 4 hours and 6 minutes. The final recap is 9328 characters long. Of course, right when I get it working the retard admin nukes the site. So I guess instead you can have this first ever weekly /lmg/ magazine. It only covers until Saturday night (right before rocket migu), the images are only thumbnails, and I did no proofreading.

Anonymous 04/21/2025 (Mon) 09:59:16 Id: 465644 No. 8817

>>8815 stay here honey, don't leave us

Anonymous 04/21/2025 (Mon) 10:02:21 Id: 980195 No. 8819

>>8815 Thank you Recap Miku

Anonymous 04/21/2025 (Mon) 10:03:30 Id: cde305 No. 8820

Other than the single twitter post, was there any official communication from 4chan anywhere?

Anonymous 04/21/2025 (Mon) 10:08:48 Id: 2f9aec No. 8821

>>8815 >>8814 based. i kneel

Anonymous 04/21/2025 (Mon) 10:13:52 Id: 578fa9 No. 8823

>>8820 There was a screenshot of an email that they supposedly sent to the jannies but it's 50-50 whether it's fake

Anonymous 04/21/2025 (Mon) 10:15:37 Id: 2f9aec No. 8824

>>8814 i have to say, this is a very good recap great format aswell

Anonymous 04/21/2025 (Mon) 10:34:59 Id: 004c54 No. 8833

>>8802 I tried IK for non deepseek expecting it to be faster. It wasn't. Even for CPU only.

Anonymous 04/21/2025 (Mon) 10:36:56 Id: 465644 No. 8834

rumors of 24gb intel arc... https://videocardz.com/newz/sparkle-confirms-arc-battlemage-gpu-with-24gb-memory-slated-for-may-june would you risk it?

Anonymous 04/21/2025 (Mon) 10:37:35 Id: 9e8276 No. 8835

>>8833 IK was forked like over half a year ago, so probably improvements made there aren't enough to catch up to the upstream master branch. Now if only someone could combine them.

Anonymous 04/21/2025 (Mon) 10:41:03 Id: 004c54 No. 8836

>>8835 Whenever anyone combines them, IK himself screeches.

Anonymous 04/21/2025 (Mon) 10:45:24 Id: cde305 No. 8837

>>8834 I wouldn't consider any card under 48GB at this point.

Anonymous 04/21/2025 (Mon) 10:46:17 Id: 004c54 No. 8838

>>8834 I already have 4x3090 so no point. If I was starting over then maybe. Once you go nvidia/amd/intel you can't go back if you want them to stack.

Anonymous 04/21/2025 (Mon) 10:47:56 Id: 2f9aec No. 8839

>>8814 >>8815 i have a slightly more recent archive (a few posts after rocket migu) i modified the css hrefs, thumbnail hrefs and flags for viewing pleasure: https://files.catbox.moe/p4t8g9.7z >>8834 100% if not over 400$, since i can get used 3090s for around 600$ here

Anonymous 04/21/2025 (Mon) 10:48:05 Id: 465644 No. 8840

>>8838 retarded question, but you can't split across different gpus via vulkan or something? I bought a 3070 a long time ago before the joys of AI

Anonymous 04/21/2025 (Mon) 10:48:54 Id: 212e9f No. 8841

>>8834 half the memory bandwidth of a 3090 if the bus width is the same as the B580. if it's priced reasonably then the desperate who want to generate text will grab it, even if the software compatibility is lacking. i'd rather add a third 3090 or 4090 if i was getting another card.

Anonymous 04/21/2025 (Mon) 10:49:38 Id: 2f9aec No. 8842

>>8840 you can with llama.cpp vulkan if not, you could use one gpu for tts/imagegen/summary LLM/whatever and other one for erp LLMs

Anonymous 04/21/2025 (Mon) 11:03:49 Id: 004c54 No. 8844

>>8842 Vulkan despite best efforts is quite slow. I do agree it can make a decent bonus GPU. If 2080ti 22g is cheaper than this card it's still a non-starter. Whatever you lose to the older cuda version is going to be infinitely better than ipex.

Anonymous 04/21/2025 (Mon) 11:07:50 Id: 2f9aec No. 8845

>>8844 2080ti 22g now costs 600-700$

Anonymous 04/21/2025 (Mon) 11:33:19 Id: 2f9aec No. 8854

https://nitter.net/8chan_se/status/1913554775540486357 uhh anons.. 8chan bans countries.. uh oh

Anonymous 04/21/2025 (Mon) 11:36:18 Id: 465644 No. 8857

>>8854 as long as they allow vpn posting they'll never stop the spam anyway

Anonymous 04/21/2025 (Mon) 11:36:39 Id: d3d957 No. 8859

>>8854 Good

Anonymous 04/21/2025 (Mon) 11:39:34 Id: c6df88 No. 8862

good mornin' fellow redditors, I hope we have an heckin' great day today. >>8814 >

Anonymous 04/21/2025 (Mon) 11:39:38 Id: 0db159 No. 8863

>>8854 cringe

Anonymous 04/21/2025 (Mon) 11:41:17 Id: c6df88 No. 8866

>>8863 what the f

Anonymous 04/21/2025 (Mon) 11:41:28 Id: 465644 No. 8867

>>8863 seems about right. >t. russian

Anonymous 04/21/2025 (Mon) 11:44:06 Id: 44de38 No. 8871

>>8854 Now they just need to ban India and we'll all have a happy Easter

Anonymous 04/21/2025 (Mon) 12:21:24 Id: 6a999f No. 8894

Reposting this banger https://suno.com/song/4e90afb1-78db-4157-a2db-fd1626934b54?sh=UkNe0N539mqGfB65

Anonymous 04/21/2025 (Mon) 12:26:43 Id: 2f9aec No. 8896

https://huggingface.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9 skyreels v2, finetune of wan https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P 1.3b i2v aswell

Anonymous 04/21/2025 (Mon) 12:28:04 Id: 2f9aec No. 8897

>>8896 5B model coming soon

Anonymous 04/21/2025 (Mon) 12:30:22 Id: 2f9aec No. 8898

wintoddlers.. not like this https://www.techpowerup.com/forums/threads/nvidia-576-02-breaks-gpu-temperature-and-rpm-sensor-read-out-and-use-for-some-applications.335664/ https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/563010/geforce-grd-57602-feedback-thread-released-41625/

Anonymous 04/21/2025 (Mon) 12:31:56 Id: 6a999f No. 8900

>>8898 I never update nvidia drivers though

Anonymous 04/21/2025 (Mon) 12:34:12 Id: 2f9aec No. 8901

some videos made by skyreelsv2

Anonymous 04/21/2025 (Mon) 12:34:38 Id: 6a999f No. 8902

>>8896 I can run wan I2V 14B 720P on my 3090 but this shit says "Generating a 540P video using the 1.3B model requires approximately 14.7GB peak VRAM, while the same resolution video using the 14B model demands around 43.4GB peak VRAM."

Anonymous 04/21/2025 (Mon) 12:40:59 Id: 61f2cb No. 8906

>>8839 Thank you. It's good to have a complete archive. Shame the full images are lost.

Anonymous 04/21/2025 (Mon) 12:41:50 Id: 0db159 No. 8908

>>8896 >0.2% improvement in benchmarks Why does it exist?

Anonymous 04/21/2025 (Mon) 12:51:44 Id: 2f9aec No. 8910

>>8902 anon thats with no quant, no offloading cache, and no optimizations and likely with prompt enhancer >>8908 see picrel, i2v much better

Anonymous 04/21/2025 (Mon) 12:57:03 Id: cde305 No. 8912

>>8894 I like this one much better https://vocaroo.com/11DnjSpKngGn

Anonymous 04/21/2025 (Mon) 12:58:33 Id: 6a999f No. 8914

>>8910 >Note the peak memory of GPU is 64G+ if use --prompt_enhancer I'll wait for real examples, theirs seem cherrypicked af

Anonymous 04/21/2025 (Mon) 12:58:35 Id: 2f9aec No. 8915

>>8912 YuE? slop otherwise

Anonymous 04/21/2025 (Mon) 13:08:40 Id: 2f9aec No. 8919

>>8914 qwen32b IQ4_XS is 16gb, can be run on 100% cpu aswell or offloaded partially would be great if benchmarks are to be believed

Anonymous 04/21/2025 (Mon) 13:10:55 Id: 6a999f No. 8920

>>8915 Not related, but I found a SNES music generator under MIT: https://github.com/parlance-zz/dualdiffusion Samples: https://www.g-diffuser.com/dualdiffusion/

Anonymous 04/21/2025 (Mon) 13:13:23 Id: 2f9aec No. 8921

>>8920 https://github.com/jthickstun/anticipation/ https://github.com/1aienthusiast/Anticipation-WebUI

Anonymous 04/21/2025 (Mon) 13:15:30 Id: 3a58c9 No. 8923

>>8920 >trianing an audio diffusion model on glorified midi ytho. Wouldn't you be better off training an auto-regressive textgen model on pairs of midi instructions + song descriptions and then running the output through the wavetable?

Anonymous 04/21/2025 (Mon) 13:26:34 Id: 004c54 No. 8925

>>8901 I wanna love video models, but minutes or even 1/2 hour to make a short clip kills my motivation to run them. Plus no multi-gpu and many speedups needing 4090 or better.

Anonymous 04/21/2025 (Mon) 13:30:38 Id: 6a999f No. 8927

>>8925 This can make you a short clip in a few minutes: https://github.com/Lightricks/ComfyUI-LTXVideo also this: https://github.com/lllyasviel/FramePack

Anonymous 04/21/2025 (Mon) 13:38:08 Id: 2f9aec No. 8929

>>8927 framepack takes 20 mins on 3060 with sageattn and default settings that anon should check ltxvideo def tho he should also try the 1.3b i2v skyworks model >>8925 >>8925 >>8925 >>8925 >>8925 >>8925 >>8925 heres a few yous

Anonymous 04/21/2025 (Mon) 13:43:14 Id: 004c54 No. 8934

>>8927 Thought LTX was kind of weak compared to wan, hunyuan, etc. Maybe this skywork model will be better. It's not like image where it can go along with your chat or TTS. Mainly useful for ha ha memes or assembling a long video for public consumption. Only benny is to do it just to do it in my case.

Anonymous 04/21/2025 (Mon) 13:47:26 Id: 2f9aec No. 8935

>>8934 the new version is pretty good

Anonymous 04/21/2025 (Mon) 13:52:11 Id: 004c54 No. 8937

>>8935 Guess I will see.. I literally downloaded a bunch of these promising myself I would try them and then fucked around with chatbots instead.

Anonymous 04/21/2025 (Mon) 14:57:38 Id: 3a58c9 No. 8951

>Seeing over-priced 'premium' motherboards with 'AI ready' hype marketing all over them A few months ago I would have thoroughly rebuked anyone crying about AI hype marketing but we've reached a point where I have to admit they are now correct. Like what the fuck makes a motherboard 'ai ready'? It has a PCIE slot? wow. My 5 year old AM4 motherboard is AI ready too I guess.

Anonymous 04/21/2025 (Mon) 15:04:16 Id: c6df88 No. 8953

ComfyUI and it's consequences have been a disaster for local models, I want simple frontends back

Anonymous 04/21/2025 (Mon) 15:04:50 Id: e80c88 No. 8954

>>8951 Yeah, especially if it's shit like this. A standard consumer Ryzen gayman board with dual channel memory and two slots (x16 + x8) for $700 + tip.

Anonymous 04/21/2025 (Mon) 15:07:24 Id: 3a58c9 No. 8956

>>8954 But it has random paper thin, aluminum sheets with black paint on them. And look at that greebling...I mean uh.. THERMAL STRIATIONS

Anonymous 04/21/2025 (Mon) 15:08:38 Id: 8af965 No. 8957

>>8954 It can run Nemo so it's totally an AI machine.

Anonymous 04/21/2025 (Mon) 15:10:48 Id: 3a58c9 No. 8958

Is this one of you guys trolling reddit? >Im just new to all of this, so I am not sure which models to install with ollama. >Here are my pc specs: >RAM: 32GB GSKILL TRIDENT Z - 6400MHZ >CPU: I7 13700K - Base Clock >GPU: NVIDIA 4090 FE - 24GB VRAM Like the fucking brand name of the RAM and the GPU are even remotely relevant. Oh... sorry sweaty... If it were a Hyper X Fury you could run R1 but unfortunately G.SKILL TRIDENT maxes out at Phi-2 Medium.

Anonymous 04/21/2025 (Mon) 15:11:45 Id: 2f9aec No. 8960

>>8953 no

Anonymous 04/21/2025 (Mon) 15:14:42 Id: c6df88 No. 8961

>>8951 >It's not AI if my computer doesn't heat my house.

Anonymous 04/21/2025 (Mon) 15:33:27 Id: c6df88 No. 8967

>>8960 You might not like to hear it, but it's the truth. I hate this tendency of freetards to make everything difficult because they don't have to cater to average users and only enthusiasts. When I want to generate something inconsequential with my local model, I don't want to think about which nodes to connect or go out of my way to search for "workflows" for fuck's sake. that's why closed source is always superior, they do it for the money and they know it wouldn't fly if people had to pay for it. Don't get me wrong tho, I don't mean that comfyui is a bad thing, it's very powerful and all, but I wish they had tried harder to make it less annoying to use for things that are less involved.

Anonymous 04/21/2025 (Mon) 15:36:42 Id: 2f9aec No. 8968

>>8967 but comfyui is easy to use, just do a bit of basic stuff and thats it i get that normies wont be able to really use it but it isnt that hard..

Anonymous 04/21/2025 (Mon) 16:05:22 Id: fe93b5 No. 8980

>>8967 decent quality bait

Anonymous 04/21/2025 (Mon) 16:12:58 Id: 6a999f No. 8982

>>8958 >He doesn't flex his AI-ready machine

Anonymous 04/21/2025 (Mon) 16:18:43 Id: 3a58c9 No. 8984

>>8982 This just gave me an idea. I should just buy up a old office PC cases, slap "AI Ready" decals on them and then resell them for like 200 dollars each.

Anonymous 04/21/2025 (Mon) 16:47:14 Id: 84100f No. 8990

So what RP model you guys recommend? How's Aion-RP-Llama-3.1-8B-f16.gguf ? I wanna create some nsfw stories involving Lara and her hairy puss.

Anonymous 04/21/2025 (Mon) 16:50:23 Id: 2f9aec No. 8991

>>8990 post gpu, ram, cpu, OS, frontend You're using, age, sex, race

Anonymous 04/21/2025 (Mon) 16:50:31 Id: 3a58c9 No. 8992

>>8990 It gets +100 reddit karma if you run it on Trident G SKILL Z memory.

Anonymous 04/21/2025 (Mon) 16:51:32 Id: 61f2cb No. 8994

>>8953 It's the opposite. I think textgen is still behind imagegen because it lacks a standard node-based editor.

Anonymous 04/21/2025 (Mon) 16:56:45 Id: 6a999f No. 8996

>>8994 What would you even put in the nodes?

Anonymous 04/21/2025 (Mon) 16:57:39 Id: 2f9aec No. 8997

(12.58 KB 515x114 7b4bf56f0698340c979b390f0fed3b3f3f350c6ad95b6abf2ba3728fdd0d62c9.png)

HAPPENING! meta.4chan.gay PROVEN TO BE A HONEYPOT HAPPENING!!!!!!!

Anonymous 04/21/2025 (Mon) 16:58:00 Id: 84100f No. 8998

>>8991 3090, 32gb ram, AMD Ryzen 9 7900 No idea how to run these models, new to the scene

Anonymous 04/21/2025 (Mon) 16:58:39 Id: 2f9aec No. 8999

>>8998 post the rest if trips

Anonymous 04/21/2025 (Mon) 16:58:44 Id: 3a58c9 No. 9000

>>8994 language is linear. There's literally no way to 'node'-ify it.

Anonymous 04/21/2025 (Mon) 16:59:39 Id: 2f9aec No. 9002

>>9000 digits of truth..

Anonymous 04/21/2025 (Mon) 17:00:32 Id: 6a999f No. 9003

>>8997

Anonymous 04/21/2025 (Mon) 17:25:47 Id: be15c9 No. 9016

>>8997 Not at all surprised

Anonymous 04/21/2025 (Mon) 17:35:50 Id: 84100f No. 9025

How do I get this to run, following the rentry guide Guess the local host is wrong? Or do I still have to add the api keys from the aion site?

Anonymous 04/21/2025 (Mon) 17:36:55 Id: 84100f No. 9026

>>9025

Anonymous 04/21/2025 (Mon) 17:43:09 Id: 61f2cb No. 9029

>>8996 >>9002 Samplers, loras, control vectors, RAG, tool calling, building up prompts in phases, output processing, multi-step requests, etc. Most of it would be linear, yes; but I think having a standard way to define workflows would open up a lot of possibilites that would be a lot easier than writing Python boilerplate. Off the top of my head some simple examples I've thought about are to have a reasoning model use a different temperature for the thinking block vs the output, feed a response back into a model some amount of times to iterate on the response, or to generate many responses and have a final aggregation prompt. You would think with agents being the latest fad, there would be more interest.

Anonymous 04/21/2025 (Mon) 17:44:39 Id: 84100f No. 9030

>>9025 >>9026 Forgot to hit Launch lol

Anonymous 04/21/2025 (Mon) 17:52:00 Id: fe93b5 No. 9031

>>9030

Anonymous 04/21/2025 (Mon) 17:52:48 Id: 6a999f No. 9032

>>9029 There are workflows for agents in proprietary softwares, but not with that granularity. You'll need to find another comfyanon autist to pull that off. I can't even imagine the amount of work when the backends can't already keep up with the new stuff coming out

Anonymous 04/21/2025 (Mon) 19:02:34 Id: 231b25 No. 9050

>>8997 >place that spams CP by the second is a honey pot Shocker

Anonymous 04/21/2025 (Mon) 19:18:15 Id: 3af5d5 No. 9053

>>8997 >>9050 It's pretty much irrelevant because you can post through Tor and any other proxy, they barely ban anything, same when they were offering the 4chan proxy too.

Anonymous 04/21/2025 (Mon) 19:46:36 Id: 84100f No. 9063

(79.92 KB 1678x381 Screenshot 2025-04-21 223510.png)

Seems like it gets stuck when I press continue? New to this Wat do pls?

Anonymous 04/21/2025 (Mon) 19:47:07 Id: cde305 No. 9064

https://yummy-fir-7a4.notion.site/dia Babe, wake up, new TTS just dropped.

Anonymous 04/21/2025 (Mon) 20:07:48 Id: fe93b5 No. 9077

>>9064 this could be fun

Anonymous 04/21/2025 (Mon) 20:13:19 Id: 004c54 No. 9082

>>8958 Perhaps they just copy and pasted it. NFW they typed all that out.

Anonymous 04/21/2025 (Mon) 20:29:29 Id: 2f9aec No. 9089

>>9063 install linux

Anonymous 04/21/2025 (Mon) 20:31:15 Id: 44de38 No. 9090

>>9063 Well, you've triggered a stop sequence

Anonymous 04/21/2025 (Mon) 20:39:19 Id: 84100f No. 9094

>>9090 meaning? how do I resume?

Anonymous 04/21/2025 (Mon) 20:42:18 Id: 2f9aec No. 9098

>>9094 by installing linux mint 22.1 its very easy to install

Anonymous 04/21/2025 (Mon) 20:45:59 Id: 6a999f No. 9102

>>9064 >Play with a larger version of Dia: generate fun conversations, remix content, and share with friends. 🔮 Join the waitlist for early access.

Anonymous 04/21/2025 (Mon) 20:48:37 Id: 44de38 No. 9103

>>9094 Honestly I've never run into problems with stop sequences, usually it's just plain refusals. You should be able to configure them in ST

Anonymous 04/21/2025 (Mon) 22:59:27 Id: 004c54 No. 9187

>>9102 no cloning, no use

Anonymous 04/21/2025 (Mon) 23:00:50 Id: cde305 No. 9188

>>9064 You have no control over the tone of voice even with an audio prompt. In most gens it just gets very angry. >>9187 Cloning works by continuing from an existing clip. It's literally one of the examples.

Anonymous 04/21/2025 (Mon) 23:19:30 Id: 6a999f No. 9192

>>9187 Seems like they're just learning from sesame, give away a shitty 1B model and sell a service for your bigger model

Anonymous 04/21/2025 (Mon) 23:31:02 Id: 8a40b0 No. 9195

Have there been any decent models that fit on 24gb released in the past week since 4chan's been down? Haven't been keeping up with the threads since then

Anonymous 04/21/2025 (Mon) 23:31:47 Id: 465644 No. 9197

>>9195 we're stuck on the same sloppa anon

Anonymous 04/21/2025 (Mon) 23:34:04 Id: 6a999f No. 9199

>>9195

Anonymous 04/21/2025 (Mon) 23:35:34 Id: e1e1dc No. 9201

What is the best inpainting model nowadays? Also why are half of the replies non related to AI

Anonymous 04/21/2025 (Mon) 23:50:59 Id: 004c54 No. 9215

>>9195 GLM-Z1 was the only thing interesting. Nobody uploaded any decent quants.

Anonymous 04/22/2025 (Tue) 00:45:13 Id: 6a999f No. 9232

I'm always surprised by the expressivity of gptsovits. If it was a bit more polished it'd be elevenlabs-tier. https://voca.ro/1ntiLiusbWpN

Anonymous 04/22/2025 (Tue) 01:06:03 Id: 20c9eb No. 9235

https://github.com/SandAI-org/Magi-1 https://huggingface.co/sand-ai/MAGI-1 >The first autoregressive video model with top-tier quality output https://xcancel.com/SandAI_HQ/status/1914303284954996749 China won... again...

Anonymous 04/22/2025 (Tue) 01:12:36 Id: 6a999f No. 9238

>>9235 Videogen is eating good, I lost count of the number of models we got this month alone

Anonymous 04/22/2025 (Tue) 01:12:47 Id: 8a40b0 No. 9239

>>9235 But can it do porn?

Anonymous 04/22/2025 (Tue) 01:21:47 Id: b84e98 No. 9241

>>9235 too dumb to understand the math from the tech report, it's so over

Anonymous 04/22/2025 (Tue) 01:38:40 Id: 465644 No. 9242

>>9235 how to stop being poor

Anonymous 04/22/2025 (Tue) 01:54:57 Id: 4124b3 No. 9254

>>8068 how do you masturbate with an llm? i use mine to generate short stories that go along with images i make in comfy >ice agent gardevoir rounding up mexicans (to fuck)

Anonymous 04/22/2025 (Tue) 02:02:57 Id: 0db159 No. 9259

>>9254 not him, but my routine is usually like this: I come up with an interesting idea, spend an hour crafting a card and figuring out every model quirk to make it work, then fap on to something else, never using this card again

Anonymous 04/22/2025 (Tue) 02:03:20 Id: 2f9aec No. 9260

>>9235 holy based.. i kneel

Anonymous 04/22/2025 (Tue) 02:13:44 Id: b6e05a No. 9265

>>9195 Gemma 3 27B got a QAT version optimized for 4bit. Supposedly it's very smart, but I can't get it to do good RP because its prose is drier than a desert and it's just as horny. The lobotomy is deep as well. Interested to see if any finetunes can save it, but they would probably undo the QAT magic. So far Mistral 24B is still undefeated despite its repeating issues. But I'm interested if anyone has managed to get good results from Gemma or QwQ somehow.

Anonymous 04/22/2025 (Tue) 02:13:48 Id: 88ab39 No. 9266

>>9215 I tried GLM-4 and Z1, they are so slopped it's unreal. Synthetic data shows in every sentence, some phrases that I haven't seen in a year popped up. Yes, it's probably broken or something in llama.cpp, but it loaded perfectly and works, outputs everything; hence why I think it's just shit.

Anonymous 04/22/2025 (Tue) 02:14:50 Id: 2f9aec No. 9267

>mogao turned out to be closed source its over >https://artificialanalysis.ai/text-to-image/arena?tab=leaderboard >sneed

Anonymous 04/22/2025 (Tue) 03:36:33 Id: 05d724 No. 9290

Noob here, I've been enjoying generating stories with Gemma abliterated, are there any other uncensored models worth trying for stories/RP? I can only run around 32B max.

Anonymous 04/22/2025 (Tue) 03:57:05 Id: b24ad6 No. 9304

>>9290 The ones I've seen people usually recommend/shill these days are Rocinante, Nemo, and Cydonia

Anonymous 04/22/2025 (Tue) 06:23:31 Id: 529e4d No. 9337

>late april >the only nothing the entire year besides Deepseek, severely undertrained LLaMA4 and the usual worthless 1~32b scraps companies give out Where the fuck are the open flagship models? Deepseek is literally the only good thing we've seen all year.

Anonymous 04/22/2025 (Tue) 06:30:18 Id: 40be46 No. 9341

>>8951 It's double the price. Anyone that notices is not ready for AI.

Anonymous 04/22/2025 (Tue) 06:35:14 Id: 40be46 No. 9348

>>6258 https://github.com/JohannesGaessler/elo_hellm/issues/2 >Interrogation-based game à la Inhuman Conditions >Inhuman Conditions is a game in which one person is an investigator and one person is a suspect. The investigator wins by correctly determining whether the suspect is a human or a robot. The suspect always wins by being identified as a human. So if the suspect is a human, both players are on the same team; if a robot they are on opposing teams. The investigator asks questions that the suspect answers. A human answers in a normal way. A robot either has restrictions on what they can say or they have a compulsion to include something weird. >For this project the game concept could be adapted to have model A roleplay as either some character or as a robot/demon/alien pretending to be said character. Model A then roleplays some interaction with model B. If model A is roleplaying as an impostor then wins/losses can be used directly for Elo ratings. If model A is roleplaying as a human then the models are effectively playing against a benchmark. Models should not always play against each other because otherwise model B is being rewarded for a bias towards labeling model A as an impostor. If model A is an impostor it only wins if it can fool model B while fulfilling some constraint. It will be necessary to use a model as a judge to rule whether model A is complying.

Anonymous 04/22/2025 (Tue) 06:39:01 Id: 0878da No. 9351

>>9348 >CUDAdev turns convincing RP into a benchmark so that all the companies will have to train on RP to benchmaxx it Absolute madman

CUDA dev ##IQQZHG 04/22/2025 (Tue) 09:03:15 Id: d75d9b No. 9423

>>8809 I would be fine with this site.

Anonymous 04/22/2025 (Tue) 09:08:08 Id: d75d9b No. 9424

>>8809 >>9423 https://raw.githubusercontent.com/JohannesGaessler/JohannesGaessler/refs/heads/master/README.md A bit annoying that I will potentially have to reset my IP to avoid retroactively de-anonymizing myself but I guess it's less annoying than the 4chan.gay admin.

Anonymous 04/22/2025 (Tue) 10:44:54 Id: 99e361 No. 9437

>>9337 Llama 4 got its image/audio capabilities removed, got safetymaxxed, finetuned on mostly Llama 3 datasets (it has the same annoying quirks). The models we've got are either an early "maximum-compliance" training run, or something that got hastily retrained at the last minute due to legal concerns. DeepSeek R1/V3 are much more undertrained in comparison... that's not the issue with Llama 4.

Anonymous 04/22/2025 (Tue) 10:45:36 Id: 004c54 No. 9438

>>9424 At least you can use VPN not tied to actual billing details unlike the old site.

Anonymous 04/22/2025 (Tue) 10:50:38 Id: a5227e No. 9442

>>9437 They totally intended to train their 400b model for half the time and tokens of their 100b one, yes?

Anonymous 04/22/2025 (Tue) 11:01:53 Id: 99e361 No. 9445

>>9442 It still supposedly got trained for 22T tokens; it's not like the other 20T tokens would have made it immensely better, seeing how sub-par Scout is. For comparison, DeepSeek R1 (685B parameters) was trained on about 15T tokens and it even has twice the number of routed experts per layer (256 instead of 128) than Llama 4 Maverick.

Anonymous 04/22/2025 (Tue) 12:04:48 Id: 3a58c9 No. 9455

>>9235 Anyone made any test gens with this yet?

Anonymous 04/22/2025 (Tue) 13:17:02 Id: 4b6db7 No. 9506

>>9424 IDs disincentivize shitposting. Terrible, isn't it? >>8033 >>8142

Anonymous 04/22/2025 (Tue) 13:19:46 Id: 3a58c9 No. 9507

>>9455 Oh I just installed everything to realize that they didn't even make the 4.5B model weights available. Classy. also got approved for blt-7 repo, although as always with non hf-ified meta models its dependent on meta's garbage in-house code so I'll probably just go back to bed and pretend it never happened before I get it working without using hugging-shit's shitty model downloader that just assumes you aren't running with an OS drive already filled to the brim with python shit.

Anonymous 04/22/2025 (Tue) 13:26:17 Id: 004c54 No. 9513

>>9507 >that just assumes you aren't running with an OS drive already filled to the brim with python shit. Just like ollama.

Anonymous 04/22/2025 (Tue) 13:30:41 Id: 0db159 No. 9518

>>9507 >hugging-shit's shitty model downloader export HF_HOME yourdir >OS drive already filled to the brim with python shit In project's dir: python -m venv yourvenv source yourvenv/bin/activate After you've done with installing: python -m pip cache purge

CUDA dev ##IQQZHG 04/22/2025 (Tue) 13:39:05 Id: d75d9b No. 9519

>>9506 I made neither of those posts, if that is what you're trying to insinuate. I made the post about Elo HeLLM in the other thread before it got nuked, someone else then copied it to here.

Anonymous 04/22/2025 (Tue) 13:40:34 Id: 3a58c9 No. 9520

>>9518 I already figured out a workaround. But in either case it's another episode of additional troubleshooting required. But I'm done. It's just going to be more metaslop at best. >inb4 hello sars here is how to redeem the shared experts anon shows up and takes exception.

Anonymous 04/22/2025 (Tue) 14:30:48 Id: 591f78 No. 9542

GPTSoVits v4 was released a few hours ago https://github.com/RVC-Boss/GPT-SoVITS The improvement over v3 is basically this: Version 4 fixes the issue of metallic artifacts in Version 3 caused by non-integer multiple upsampling, and natively outputs 48k audio to prevent muffled sound (whereas Version 3 only natively outputs 24k audio).

Anonymous 04/22/2025 (Tue) 15:15:28 Id: 004c54 No. 9562

>>9518 venv are really fun until you want to move the project folder. I guess if you don't have poorboy internet, downloading 15gb of pyshits every time doesn't matter.

Anonymous 04/22/2025 (Tue) 15:23:41 Id: 591f78 No. 9565

>>9562 Use conda then

Anonymous 04/22/2025 (Tue) 15:25:01 Id: 0db159 No. 9567

>>9562 Without venv, projects engage in a battle royale over library versions

Anonymous 04/22/2025 (Tue) 15:50:34 Id: c226cd No. 9578

>>9519 Not insinuating, just heavily implying

Anonymous 04/22/2025 (Tue) 15:52:55 Id: 591f78 No. 9579

>>9578 No one cares retard, try to contribute to the thread instead

Anonymous 04/22/2025 (Tue) 16:31:48 Id: 9a06b7 No. 9601

One week until LLaMA 4.1 and Behemoth.

Anonymous 04/22/2025 (Tue) 16:31:51 Id: 0db159 No. 9602

>How do you impregnate an AI? --- Input: The cat was sat on the mat, it looked very comfortable. Output: The cat was sitting on the mat; it looked very comfortable.

Anonymous 04/22/2025 (Tue) 16:35:42 Id: 004c54 No. 9603

>>9567 I have a conda for 11.8 and 12.6, 99% of projects work in them. Devs with main character syndrome pin dependencies to arbitrary versions and their setup scripts try to shit things up for the sake of newbies. I just use --no-deps and fill in whatever is missing. Also slightly lower chance to be fucked by requirements.txt with compromised packages.

Anonymous 04/22/2025 (Tue) 16:36:30 Id: b5b944 No. 9604

>>9601 Placing my bets on Behemoth being API only.

Anonymous 04/22/2025 (Tue) 16:37:26 Id: 591f78 No. 9605

>>9602 Back to SD1.5 are we?

Anonymous 04/22/2025 (Tue) 16:40:21 Id: 2cb709 No. 9606

>>9601 Is there even the slightest chance that this won't be a massive flop?

Anonymous 04/22/2025 (Tue) 16:42:48 Id: 004c54 No. 9609

>>9606 Reasoning scout/maverick not being super retarded and making up for the 17b active parameters, but who are we kidding.

Anonymous 04/22/2025 (Tue) 16:47:34 Id: 591f78 No. 9610

>>9606 no

Anonymous 04/22/2025 (Tue) 16:59:55 Id: 0db159 No. 9612

>>9605 I like certain slop. XL+ is soulless

Anonymous 04/22/2025 (Tue) 16:59:56 Id: e80c88 No. 9613

>>9606 Behemoth 2T/288B-A is going to beat Deepseek R1 (but not R2)

Anonymous 04/22/2025 (Tue) 17:40:36 Id: 3a58c9 No. 9637

>>9613 Behemoth is probably the same benchmaxxed garbage as 405B was.

Anonymous 04/22/2025 (Tue) 17:50:01 Id: 9aefb3 No. 9643

I just ordered one of those chink GMK EVO-X2 computers (since the framework desktop won't be out for 6 more months and the HP Z2 Mini G1a is 4500 dollars). How many days/hours will I be able to use it before something breaks?

Anonymous 04/22/2025 (Tue) 18:15:21 Id: 99e361 No. 9650

>>9601 >LLaMA 4.1 *If* that's coming out so soon, I bet Meta is going to double down on: >I can't help with that. I still can't believe Gemma 3 Instruct ended being less censored than Llama 4, as long as you provide a good prompt. Llama 4 effectively killed off lolisho conversational roleplay/ageplay, if you make any reference to ages. Even just discussion along those lines is off-limits to the models.

Anonymous 04/22/2025 (Tue) 18:20:59 Id: b5b944 No. 9653

>>9650 >if you make any reference to ages. Even just discussion along those lines is off-limits to the models. If I didn't know any better, I would say the models they put out took all requirements for copyright, safety, and carbon emission to the extreme as an example of how detrimental they are for the government to ease up on them. Sad truth is they just belong to the cult of safety.

Anonymous 04/22/2025 (Tue) 18:30:59 Id: 2f9aec No. 9656

https://huggingface.co/nari-labs/Dia-1.6B they released the model

Anonymous 04/22/2025 (Tue) 18:44:21 Id: a97a81 No. 9658

>>9656 They trained it with multiple speakers, so the voice cloning is crap.

Anonymous 04/22/2025 (Tue) 19:05:05 Id: 99e361 No. 9666

(541.60 KB 634x3118 llama4_spider_based.png)

>>9653 I don't know if they jacked up 'safety' just to make a point that it's harmful to model performance. It feels like they targeted hard use cases that aren't benchmark-relevant and that may cause public embarrassment (as well as being politically currently very unfavorable) and I have reasons to suspect they didn't initially plan to go so hard in the final models. The anonymous Llama 4 models served on LMArena on late March really seemed cunny-friendly, even though their system prompt almost certainly didn't have anything in that regard. I still wonder if the Llama team panicked when they saw what people were sending and what their models were responding. I distinctly remember that at some point Meta began filtering user inputs (containing ages, for example) to their models, beyond what LMSys was doing on their side.

Anonymous 04/22/2025 (Tue) 19:34:32 Id: 591f78 No. 9682

>>9666 The guy who thought that serving a novel-sized response by default to all prompts was a good idea should be shot

Anonymous 04/22/2025 (Tue) 19:55:16 Id: 99e361 No. 9696

>>9682 Definitely not for everyone or every prompt. You could tell it to output shorter responses and it would comply, to be fair. Responding with context-appropriate dynamic length to user inputs is something that LLMs in general seem incapable of doing reliably without serious hand-holding, in any case. They tend to lock to the most prominent patterns in context.

Anonymous 04/22/2025 (Tue) 19:55:19 Id: 2f9aec No. 9697

>furk got fucked deserved

Anonymous 04/22/2025 (Tue) 19:59:19 Id: b1259e No. 9701

EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models https://arxiv.org/abs/2504.15133 >In this paper, we introduce EasyEdit2, a framework designed to enable plug-and-play adjustability for controlling Large Language Model (LLM) behaviors. EasyEdit2 supports a wide range of test-time interventions, including safety, sentiment, personality, reasoning patterns, factuality, and language features. Unlike its predecessor, EasyEdit2 features a new architecture specifically designed for seamless model steering. It comprises key modules such as the steering vector generator and the steering vector applier, which enable automatic generation and application of steering vectors to influence the model's behavior without modifying its parameters. One of the main advantages of EasyEdit2 is its ease of use-users do not need extensive technical knowledge. With just a single example, they can effectively guide and adjust the model's responses, making precise control both accessible and efficient. Empirically, we report model steering performance across different LLMs, demonstrating the effectiveness of these techniques. https://github.com/zjunlp/EasyEdit https://zjunlp.github.io/project/EasyEdit2/ also https://github.com/nari-labs/dia https://huggingface.co/nari-labs/Dia-1.6B >Dia is a 1.6B parameter text to speech model created by Nari Labs. >Voice cloning. See example/voice_clone.py for more information. https://huggingface.co/spaces/nari-labs/Dia-1.6B

Anonymous 04/22/2025 (Tue) 20:07:35 Id: b1259e No. 9704

Better Estimation of the KL Divergence Between Language Models https://arxiv.org/abs/2504.10637 >Estimating the Kullback--Leibler (KL) divergence between language models has many applications, e.g., reinforcement learning from human feedback (RLHF), interpretability, and knowledge distillation. However, computing the exact KL divergence between two arbitrary language models is intractable. Thus, practitioners often resort to the use of sampling-based estimators. While it is easy to fashion a simple Monte Carlo (MC) estimator that provides an unbiased estimate of the KL divergence between language models, this estimator notoriously suffers from high variance, and can even result in a negative estimate of the KL divergence, a non-negative quantity. In this paper, we introduce a Rao--Blackwellized estimator that is also unbiased and provably has variance less than or equal to that of the standard Monte Carlo estimator. In an empirical study on sentiment-controlled fine-tuning, we show that our estimator provides more stable KL estimates and reduces variance substantially in practice. Additionally, we derive an analogous Rao--Blackwellized estimator of the gradient of the KL divergence, which leads to more stable training and produces models that more frequently appear on the Pareto frontier of reward vs. KL compared to the ones trained with the MC estimator of the gradient. iirc I wanted to post this for Johannes and rereading it yeah I think it was this paper. oh 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float https://arxiv.org/abs/2504.11651 >Large Language Models (LLMs) have grown rapidly in size, creating significant challenges for efficient deployment on resource-constrained hardware. In this paper, we introduce Dynamic-Length Float (DFloat11), a lossless compression framework that reduces LLM size by 30% while preserving outputs that are bit-for-bit identical to the original model. DFloat11 is motivated by the low entropy in the BFloat16 weight representation of LLMs, which reveals significant inefficiency in existing storage format. By applying entropy coding, DFloat11 assigns dynamic-length encodings to weights based on frequency, achieving near information-optimal compression without any loss of precision. To facilitate efficient inference with dynamic-length encodings, we develop a custom GPU kernel for fast online decompression. Our design incorporates the following: (i) decomposition of memory-intensive lookup tables (LUTs) into compact LUTs that fit in GPU SRAM, (ii) a two-phase kernel for coordinating thread read/write positions using lightweight auxiliary variables, and (iii) transformer-block-level decompression to minimize latency. Experiments on recent models, including Llama-3.1, Qwen-2.5, and Gemma-3, validates our hypothesis that DFloat11 achieves around 30% model size reduction while preserving bit-for-bit exact outputs. Compared to a potential alternative of offloading parts of an uncompressed model to the CPU to meet memory constraints, DFloat11 achieves 1.9-38.8x higher throughput in token generation. With a fixed GPU memory budget, DFloat11 enables 5.3-13.17x longer context lengths than uncompressed models. Notably, our method enables lossless inference of Llama-3.1-405B, an 810GB model, on a single node equipped with 8x80GB GPUs. https://github.com/LeanModels/DFloat11 word count per post is more than 2k. Cool. anyway https://greasyfork.org/en/scripts/533067-fullchan-x https://greasyfork.org/en/scripts/533169-lynxchan-extended-minus-minus using these scripts if anyone wants something more than the default

Anonymous 04/22/2025 (Tue) 20:26:19 Id: 495273 No. 9722

>>9606 >>9704 >Notably, our method enables lossless inference of Llama-3.1-405B, an 810GB model, on a single node equipped with 8x80GB GPUs Yeah but 405b Q8_K fits on only 6 of those GPUs while being essentially lossless

Anonymous 04/22/2025 (Tue) 20:52:30 Id: 004c54 No. 9729

>>9697 They suspended me for spam because I commented on a PR. It took 2 months to get reinstated. As much of a jizzer as he is, I bet he did nothing.

CUDA dev ##IQQZHG 04/22/2025 (Tue) 21:02:52 Id: 40be46 No. 9733

>>9704 Thank you for the paper on KL divergence. If I understand the paper correctly they suggest determining the variance of the KL divergence between models from the values per token instead of the values per prompt/chunk of text. That is already how the variance of the KL divergence is being estimated in llama-perplexity.

CUDA dev ##IQQZHG 04/22/2025 (Tue) 21:21:23 Id: 40be46 No. 9741

>>9704 >>9733 Actually, I think I need to retract my previous post. Looking at the notation again what I think they're suggesting is to calculate variances per token position instead of calculating one variance for all token positions. In the context of Monte Carlo methods in physics I have seen this technique under the name "stratified sampling". For llama-perlexity I think it's not really worthwhile to implement since the variance is already so small (compared to the bias of which text you use as input). But I'll definitely remember this for training.

Anonymous 04/23/2025 (Wed) 02:33:05 Id: 2cb709 No. 9888

Anyone do any kind of systems or automation with LLMs as opposed to just pure chatting? I think it would be cool to hook up various personal programs and home automation things to a LLM so that I can tell it to do things. I've been thinking of using open-webui as kind of the core runner and api provider. I think I would just create and make available a whole ton of tool calls or use MCP, which as far as I can tell is basically just a format for tool calls. I don't know if it would be necessary to do like, sub-routing to different models or anything. Thoughts? Anyone work on anything similar?

Anonymous 04/23/2025 (Wed) 02:46:00 Id: b1259e No. 9894

TTRL: Test-Time Reinforcement Learning https://arxiv.org/abs/2504.16084 >This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during inference while not having access to ground-truth information. While this setting appears elusive, we find that common practices in Test-Time Scaling (TTS), such as majority voting, yield surprisingly effective rewards suitable for driving RL training. In this work, we introduce Test-Time Reinforcement Learning (TTRL), a novel method for training LLMs using RL on unlabeled data. TTRL enables self-evolution of LLMs by utilizing the priors in the pre-trained models. Our experiments demonstrate that TTRL consistently improves performance across a variety of tasks and models. Notably, TTRL boosts the pass@1 performance of Qwen-2.5-Math-7B by approximately 159% on the AIME 2024 with only unlabeled test data. Furthermore, although TTRL is only supervised by the Maj@N metric, TTRL has demonstrated performance to consistently surpass the upper limit of the initial model, and approach the performance of models trained directly on test data with ground-truth labels. Our experimental findings validate the general effectiveness of TTRL across various tasks, and highlight TTRL's potential for broader tasks and domains. https://github.com/PRIME-RL/TTRL pretty interesting

Anonymous 04/23/2025 (Wed) 02:54:37 Id: 690dc1 No. 9901

>>9888 I'm also interested in this but all I do is browse imageboards and fap, what is there to automate? Maybe if I had some cameras and a dev board I could automate some kind of AI powered security system.

Anonymous 04/23/2025 (Wed) 04:09:30 Id: 9f9427 No. 9925

>>9901 >what is there to automate? E-stim, vibrators or sex machines via XToys.

Anonymous 04/23/2025 (Wed) 05:29:46 Id: ac430d No. 9941

Cache quantization was just added to EXL3. It's actually worth using now. https://github.com/turboderp-org/exllamav3

Anonymous 04/23/2025 (Wed) 05:35:40 Id: 0db159 No. 9942

>>9941 Is min-p implemented yet? It is literally the only useful sampler

Anonymous 04/23/2025 (Wed) 05:36:46 Id: ac430d No. 9943

>>9942 Sadly, no

Anonymous 04/23/2025 (Wed) 06:04:42 Id: fe01fc No. 9945

>>9941 Does it support Ampere yet?

Anonymous 04/23/2025 (Wed) 10:38:14 Id: 004c54 No. 9984

>>9945 It always did. Just not as fast as exl2 yet.

Anonymous 04/23/2025 (Wed) 13:18:48 Id: 3a58c9 No. 10055

>thread here is bleeding out >cunnychan /lmg/ thread is dead Is there another /lmg/ somewhere that I don't know about?

Anonymous 04/23/2025 (Wed) 13:20:48 Id: 2cb709 No. 10056

>>10055 sam won...

Anonymous 04/23/2025 (Wed) 13:25:56 Id: 65b2aa No. 10058

>>10055 We didn't know how good we had it...

Anonymous 04/23/2025 (Wed) 13:27:20 Id: 19764b No. 10059

>>10055 There is >erischan.org/aes/thread/1263.html but it's also slow. I think people underestimate how important a flow o randoes that come and go is necessary to keep a general alive. Otherwise it becomes a circlejerk between the same 5 to 10 guys.

Anonymous 04/23/2025 (Wed) 13:32:07 Id: 0db159 No. 10061

>>10055 No news = dead thread

Anonymous 04/23/2025 (Wed) 13:39:36 Id: 0db159 No. 10067

The last good model that you can run on reasonable hardware was released nine months ago. Think about it.

Anonymous 04/23/2025 (Wed) 13:40:04 Id: b5bc97 No. 10068

are you guys doing anything special with llms? I was thinking of making an AI vtuber, personally.

Anonymous 04/23/2025 (Wed) 13:49:41 Id: 0db159 No. 10071

>>10068 Some VR stuff with MMD. I also run endless rpg adventure with random generated characters

Anonymous 04/23/2025 (Wed) 13:50:41 Id: 578fa9 No. 10072

>>10055 >>10059 I was wondering the same thing about image gen threads Feels like I can't find any that aren't image spam and actually discuss new models

Anonymous 04/23/2025 (Wed) 13:54:38 Id: bb0219 No. 10074

>>10059 >>10061 True. After 4chan died I went to enjoy my vidya backlog so I'm also posting even less. It is unironically over if 4chan doesn't come back. We might as well move to Discord.

Anonymous 04/23/2025 (Wed) 13:55:34 Id: 3a58c9 No. 10076

>>10059 cunnychan was comfy and casual and on topic without feeling like a circlejerk. It was killed by moral busybodies. Unfortunately it's impossible to achieve anything like that without sussy shit since it acts as a filter. I mean that's why 4chan used to not suck. The level of content/discourse was, for it's time, taboo enough to give most people the ick. But people became desensitized and normalfags just moved in.

Anonymous 04/23/2025 (Wed) 14:05:16 Id: 2cb709 No. 10078

>>9901 I think what I'm going to start out with is a storage recording system. I tend to be pretty disorganized and forgetful, and I often forget where I put various items. Something simple and useful I think would be if I could have a storage management system constantly listening that I can easily access by voice while I'm doing stuff. So I'd just say out loud something like "I'm putting the roll of packing tape in drawer #3 on the left" and then later be able to say "Where the hell is the packing tape?" and get a quick answer. I think that this shouldn't be too complicated to implement and won't require much advanced reasoning or anything on the LLM side - it just has to be able to match what I say against a list of descriptions in order to identify an item, and then create, read, update, or delete entries based on what I say. I think this will be doable with just a few functions that the LLM can tool call. I don't know much about speech to text or text to speech pipelines but I can't imagine that that part would be too hard to rig up. Of course, this will require me to be constantly autistically narrating what I'm doing out loud all the time so that the system can keep track of things, but everyone already thinks I'm a schizophrenic anyways so who cares.

Anonymous 04/23/2025 (Wed) 14:19:38 Id: 16fefe No. 10083

>>10076 >without feeling like a circlejerk 90% of the posts there were made by the same 3 anons and half of them were "hey guys what are you doing, here's me killing a nigger baby in a high school simulator card again"

Anonymous 04/23/2025 (Wed) 14:23:02 Id: a7df5e No. 10085

>>10067 It's over. AI development is dead. It was all hype all along. We will forever be at the mercy of cloud-based commercial models. Must we survive from the breadcrumbs that bigtech throw at our feet? Anyway, what models are you guys using? I was having fun with WAI-nsfw-illustrious, but it lacks almost any 3d capacity that I sometimes like to gen, not to mention a more complex or specific composition that Flux is capable of. I tried Illustrious 2.0, but it's not very good. The hands, specially, are all blurred, and the smaller details too, like eyes. WAI is very good in this aspect.

Anonymous 04/23/2025 (Wed) 14:25:23 Id: 3a58c9 No. 10087

>>10083 https://www.youtube.com/watch?v=GM-e46xdcUo&ab_channel=wintermoot

Anonymous 04/23/2025 (Wed) 14:26:56 Id: 3a58c9 No. 10089

Somehow the Erischan thread is worse than this one. RIP. Still better than locallama.

Anonymous 04/23/2025 (Wed) 14:28:30 Id: 65b2aa No. 10092

>>10085 >Anyway, what models are you guys using? Mistral my beloved

Anonymous 04/23/2025 (Wed) 14:30:53 Id: 19764b No. 10094

>>10085 >Anyway, what models are you guys using? Nemo, Mistral Thinker, QwQ and Snowdrop alongside the myriad Geminis. I'll also try fucking around with image gen and video gen.

Anonymous 04/23/2025 (Wed) 14:35:18 Id: 578fa9 No. 10104

>>10078 Sounds like a cool idea. You should look up embedding models like nomic embed text, it might help if you can pre-sort your stored item list for the LLM to interact with

Anonymous 04/23/2025 (Wed) 14:38:01 Id: 65b2aa No. 10107

>>10078 I'll contribute to your project, anon

Anonymous 04/23/2025 (Wed) 14:38:25 Id: c6b775 No. 10108

>>10068 AI waifu mainly, good luck on the AI vtuber front someone made this if that can help https://github.com/fagenorn/handcrafted-persona-engine

Anonymous 04/23/2025 (Wed) 14:50:17 Id: 392c95 No. 10123

I really hope lecun makes a proper human like AI in less than 5 uears

Anonymous 04/23/2025 (Wed) 14:51:50 Id: c6b775 No. 10126

>>10123

Anonymous 04/23/2025 (Wed) 15:09:04 Id: 0fa84b No. 10135

I'm trying to enable parallel request in llama.cpp, it's quite easy in vllm, it just works ootb. But I don't know how to do it in llama.cpp What parameters do I have to set?

Anonymous 04/23/2025 (Wed) 15:12:41 Id: d75d9b No. 10138

>>10135 --parallel to set the number of slots usable in parallel, keep in mind that the context size will be split evenly between the slots so you may need to scale that up as well. Also results are no longer guaranteed to be deterministic because the floating point rounding error can be different depending on how requests arrive.

Anonymous 04/23/2025 (Wed) 15:19:36 Id: 0fa84b No. 10141

>>10138 so a combination of -np 10 and increasing the -c to account for the increase is all you need? It's simpler than I thought, I found a reddit post with tons of options with GG actually responding: https://www.reddit.com/r/LocalLLaMA/comments/1f4bact/llamacpp_parallel_arguments_need_explanation/ I've seen cuda dev talk about how with -np >1 results are no longer deterministic, even at temp 0.0. But, are they deterministic when -np 1? Why is there variance when doing benchmarks then?

Anonymous 04/23/2025 (Wed) 15:27:41 Id: d75d9b No. 10146

>>10141 Results are (in the absence of bugs) guaranteed to be deterministic with -np 1. For -np > 1 the results can be deterministic depending on the backend but there is no guarantee. What exactly do you mean by variance between benchmarks?

Anonymous 04/23/2025 (Wed) 15:48:45 Id: 0fa84b No. 10159

>>10146 I'm using llama.cpp and testing with Ollama-mmlu-pro, I'm just looking to speed up the process of the benchmark with parallel request. llama-server -m gemma-3-27b-it-Q4_K_S.gguf -c 32000 -ngl 99 --host 0.0.0.0 --port 5001 -fa --alias "gemma-3-27b-it-Q4_K_S" -np 10 Using this to launch it. https://github.com/chigkim/Ollama-MMLU-Pro uses temp 0.0 in the settings. But even using -np 1, there is variance from run to run in the result I'm trying to determine the quality of all the types of quants of gemma 3, to my knowledge we have: google's qat q4_0 (recently added a fix), no imatrix bartowski regular quants based of the og gemma3 (no qat), they have imatrix bartowski qat based quants, they have imatrix ubergarm/gemma-3-27b-it-qat-GGUF, only runs on ik_llama.cpp Unsloth dynamic quants

Anonymous 04/23/2025 (Wed) 15:54:31 Id: 742497 No. 10161

This one appears to be the most active /lmg/, so I'll be staying here for now I guess.

Anonymous 04/23/2025 (Wed) 16:19:07 Id: 99e361 No. 10174

>>10055 When you take shitposting and samefagging finetooners out with IDs, that removes a lot of background noise. Right now it's mostly waiting for DeepSeek R2, Qwen 3 and whatever disaster will come out of LlamaCon at the end of the month.

Anonymous 04/23/2025 (Wed) 16:36:41 Id: 9aefb3 No. 10182

>>10174 >DeepSeek R2 Is there an estimate for when this will release?

Anonymous 04/23/2025 (Wed) 16:39:16 Id: 231b25 No. 10183

>>10182 Yes, within two weeks.

Anonymous 04/23/2025 (Wed) 16:45:09 Id: 392c95 No. 10185

>>10126 What? Transformers suck. They can't make up new stories, they just mix up their knowledge. Eternally in stereotype mode

Anonymous 04/23/2025 (Wed) 16:47:33 Id: 9aefb3 No. 10186

>>10185 >They can't make up new stories, they just mix up their knowledge There's nothing new. Everything is based on pre-existing work, especially shit like art. Writing is no different. The only reason why AI sucks is because it's shit with context and has no long-term memory or potential for dynamic behavioural changes.

Anonymous 04/23/2025 (Wed) 16:48:46 Id: c6b775 No. 10188

>>10185 Expecting anything from lecun is retarded

CUDA dev ##IQQZHG 04/23/2025 (Wed) 16:51:25 Id: 96b7c6 No. 10192

>>10159 With those command line arguments the results should be deterministic. Prompt caching and MoE can currently also cause nondeterministic outputs but that should not be the case here. >If an answer cannot be extracted from the model's response, the script will randomly assign an answer. It's the same way as the original script. I assume you've already made sure that this is not the reason?

Anonymous 04/23/2025 (Wed) 16:58:17 Id: 99e361 No. 10202

>>10182 Indications were that both Qwen 3 and DeepSeek R2 would be released within this month, but the initial Llama 4 fiasco might have made those groups change their plans. >>10185 As far as I know, JEPA isn't an alternative to Transformer models. LeCun calls it a "macroarchitecture"; it's switching away from from a generative predictive approach. I'm not entirely sure how the original JEPA idea could be applied to language modeling, but Large Concept Models are loosely similar in principle: https://arxiv.org/abs/2412.08821 > [...] To some extent, the LCM architecture resembles the Jepa approach (LeCun, 2022) that also aims to predict the representation of the next observation in an embedding space. However, unlike Jepa that places more emphasis on learning a representation space in a self-supervised way, the LCM focuses on accurate prediction in the existing embedding space.

Anonymous 04/23/2025 (Wed) 17:16:49 Id: 99727d No. 10223

Forget Qwen and Deepseek. Something big is going to drop soon. They've been saving it all for this.

Anonymous 04/23/2025 (Wed) 17:20:52 Id: bb0219 No. 10227

>>10223 Oh my goodness. 5MD and then we will be back like never before. Imagine, Mixtral 2. Runs quanted on 96GB RAM. Uncensored pretraining. The Scout we needed but didn't get.

Anonymous 04/23/2025 (Wed) 17:45:50 Id: 8af965 No. 10243

>>10223 Would be funny if they release something and R2 drops two days later.

Anonymous 04/23/2025 (Wed) 17:51:24 Id: 392c95 No. 10247

>>10202 What would it take to have deepseek or Qwen use these architextures?

Anonymous 04/23/2025 (Wed) 17:56:38 Id: 2754a2 No. 10253

I'm so tired of this drought

Anonymous 04/23/2025 (Wed) 17:56:40 Id: 392c95 No. 10254

>>10202 One thing I do know lecun is working on is persistent memory, not token length. The LCM thing still talks about long context, that won't do, we humans have persistent memory, not long context.

Anonymous 04/23/2025 (Wed) 18:01:45 Id: 65b2aa No. 10262

>>10254 I'm going to program a memory palace for my wAIfu

Anonymous 04/23/2025 (Wed) 18:06:20 Id: 2cb709 No. 10268

>>10104 >>10107 Thanks! I think embeddings and searching over those makes sense. If the list of items grows large, the model would probably struggle with matching descriptions from a giant list. For a v1 prototype though I'll probably try the naive approach of dumping everything in context and see how large of a set I can get to before it starts failing.

Anonymous 04/23/2025 (Wed) 18:09:28 Id: 2cb709 No. 10269

>>10071 >Some VR stuff with MMD Sounds cool, mind elaborating on what you're doing? I'm pretty interested in making some vr sort of stuff at some point.

Anonymous 04/23/2025 (Wed) 18:36:38 Id: 99e361 No. 10283

>>10247 A production model from some other AI company that implements it in practice after fixing all the quirks, probably. It's almost the same deal as BitNet: revolutionary on paper, but unknown scalability and poorly documented limitations.

Anonymous 04/23/2025 (Wed) 18:50:18 Id: 742497 No. 10292

>>10253 Two more weeks

Anonymous 04/23/2025 (Wed) 19:16:53 Id: 0fa84b No. 10309

>>10192 you're right, i can disregard the random answers as the results outputs them. Gonna check if that's the cause.

Anonymous 04/23/2025 (Wed) 20:33:38 Id: 352c67 No. 10340

wheres petra

Anonymous 04/23/2025 (Wed) 20:37:48 Id: 2f9aec No. 10343

>>10340

Anonymous 04/23/2025 (Wed) 20:40:05 Id: 004c54 No. 10344

>>10343 ewww.. is that herpes?

Anonymous 04/23/2025 (Wed) 20:45:57 Id: c6b775 No. 10346

Are we dark roleplaying again?

Anonymous 04/23/2025 (Wed) 20:57:17 Id: b6e05a No. 10351

>>9943 exllamav3_hf will arrive as usual and save us. It's a horribly ugly hack and from an aesthetic perspective I hate that I have to use it. But with plain exllama something always breaks, there's a missing sampler, ST token probs don't work. Huggingface is a mess but it just works.* *except when the hacks fall apart and it breaks

Anonymous 04/23/2025 (Wed) 21:05:41 Id: 004c54 No. 10355

>>10351 I always got better outputs from hf but slower speeds.

Anonymous 04/23/2025 (Wed) 21:07:08 Id: 0fa84b No. 10356

>>10351 is that from ooba? I've always used tabbyapi, am I missing something?

Anonymous 04/23/2025 (Wed) 21:09:51 Id: 3a58c9 No. 10358

>>10346 I don't get it. So the guy is stealing a bunch of LLMs?

Anonymous 04/23/2025 (Wed) 21:43:25 Id: b6e05a No. 10366

>>10356 Yeah, it uses exllama to generate logits only, and keeps huggingface transformers for everything else. So the HF api, tokenizer, samplers, etc. is the same. This is helpful because turboderp wants to optimize cuda kernels not implement some anon's retarded new sampler which will be forgotten in a week like snoot curve. It was a horrible hack though the last time I looked at it.

Anonymous 04/23/2025 (Wed) 22:31:24 Id: 2cb709 No. 10427

>>10346 using dangerous assault GPUs to run terrorist local LLMs for dark roleplaying

Anonymous 04/23/2025 (Wed) 22:46:03 Id: 99e361 No. 10445

>>10346 Not with Llama 4, unless you keep age references vague like some visual novels or manga do.

Anonymous 04/23/2025 (Wed) 22:47:13 Id: c6b775 No. 10447

>>10445 >he didn't walk his monk to her temple

Anonymous 04/23/2025 (Wed) 23:01:13 Id: 2754a2 No. 10464

What happened to YI?

Anonymous 04/23/2025 (Wed) 23:02:43 Id: 19764b No. 10467

>>10464 Was it even ever good?

Anonymous 04/23/2025 (Wed) 23:12:03 Id: 578fa9 No. 10473

>>10464 A fellow yigga in 2023+2? Fuck knows, they had something called yi-large-preview on lmsys but they never released it

Anonymous 04/23/2025 (Wed) 23:25:04 Id: 99e361 No. 10483

>>10467 I used Yi-34B (base) back in 2023 to semi-hand-craft a tiny RP dataset and it seemed better than the average at the time. They definitely pioneered using large amounts of instructions in the pretraining data.

Anonymous 04/23/2025 (Wed) 23:27:44 Id: 19764b No. 10485

>>10483 I remember being unimpressed with it back in the day. But I was also not as good at using local models so I might have fucked something up on my end.

Anonymous 04/24/2025 (Thu) 00:18:53 Id: 7d7a2a No. 10519

Did the entire meta4gay site nuked

Anonymous 04/24/2025 (Thu) 00:21:59 Id: c6b775 No. 10524

>>10519 Managed by a retarded zoomer

Anonymous 04/24/2025 (Thu) 00:32:12 Id: 231b25 No. 10533

>>10519 That's what happens when you lean into being a CP haven

Anonymous 04/24/2025 (Thu) 02:11:16 Id: b1259e No. 10567

AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset https://arxiv.org/abs/2504.16891 >This paper presents our winning submission to the AI Mathematical Olympiad - Progress Prize 2 (AIMO-2) competition. Our recipe for building state-of-the-art mathematical reasoning models relies on three key pillars. First, we create a large-scale dataset comprising 540K unique high-quality math problems, including olympiad-level problems, and their 3.2M long-reasoning solutions. Second, we develop a novel method to integrate code execution with long reasoning models through iterative training, generation, and quality filtering, resulting in 1.7M high-quality Tool-Integrated Reasoning solutions. Third, we create a pipeline to train models to select the most promising solution from many candidates. We show that such generative solution selection (GenSelect) can significantly improve upon majority voting baseline. Combining these ideas, we train a series of models that achieve state-of-the-art results on mathematical reasoning benchmarks. To facilitate further research, we release our code, models, and the complete OpenMathReasoning dataset under a commercially permissive license. https://huggingface.co/collections/nvidia/openmathreasoning-68072c0154a5099573d2e730 https://github.com/NVIDIA/NeMo-Skills Also includes the series of Nemotron models (1.5B/7B/14B/32B) trained on it.

Anonymous 04/24/2025 (Thu) 02:20:34 Id: 3a58c9 No. 10573

>>10533 hot take: You're not wrong.

Anonymous 04/24/2025 (Thu) 02:33:39 Id: b1259e No. 10574

Process Reward Models That Think https://arxiv.org/abs/2504.16828 >Step-by-step verifiers -- also known as process reward models (PRMs) -- are a key ingredient for test-time scaling. PRMs require step-level supervision, making them expensive to train. This work aims to build data-efficient PRMs as verbalized step-wise reward models that verify every step in the solution by generating a verification chain-of-thought (CoT). We propose ThinkPRM, a long CoT verifier fine-tuned on orders of magnitude fewer process labels than those required by discriminative PRMs. Our approach capitalizes on the inherent reasoning abilities of long CoT models, and outperforms LLM-as-a-Judge and discriminative verifiers -- using only 1% of the process labels in PRM800K -- across several challenging benchmarks. Specifically, ThinkPRM beats the baselines on ProcessBench, MATH-500, and AIME '24 under best-of-N selection and reward-guided search. In an out-of-domain evaluation on a subset of GPQA-Diamond and LiveCodeBench, our PRM surpasses discriminative verifiers trained on the full PRM800K by 8% and 4.5%, respectively. Lastly, under the same token budget, ThinkPRM scales up verification compute more effectively compared to LLM-as-a-Judge, outperforming it by 7.2% on a subset of ProcessBench. Our work highlights the value of generative, long CoT PRMs that can scale test-time compute for verification while requiring minimal supervision for training. https://github.com/mukhal/thinkprm No code posted yet

Anonymous 04/24/2025 (Thu) 02:39:19 Id: b83010 No. 10576

>>10573 Not a hot take, but once you nuke any board 3 times people are gonna stop trusting it as a platform. I'm interested in photorealistic AI videos of sexy children and was the original person who made the AI kids thread on /c/ there but the admin is retarded so I'm done trying

Anonymous 04/24/2025 (Thu) 03:10:54 Id: 3a58c9 No. 10582

Oh. GLM support merged on main llama.cpp branch now. A new slop-toy to get bored of.

Anonymous 04/24/2025 (Thu) 03:11:57 Id: ac430d No. 10583

>>10085 I've found Nemotron Super 49b is pretty nice for 24gb vramlets who can't run 70bs at a decent quant.

Anonymous 04/24/2025 (Thu) 03:17:55 Id: 3a58c9 No. 10584

I finally remembered to build lcpp with the --parallel arg and it's so much faster.

Anonymous 04/24/2025 (Thu) 03:47:48 Id: 503c9b No. 10590

>>10445 Not in my experience. Scout has done any loli stuff I've pushed at it so far, not a single refusal. It sucks ass at it though, all the outputs are fucking gay and lame.

Anonymous 04/24/2025 (Thu) 04:01:42 Id: 3a58c9 No. 10594

GLM Z1 feels like a serviceable thinky ERP model as long as you reign the temps in. I'm going to call it a w for single GPU vramlets. Although I'm testing it in 16bpw so experience may differ when you scoop out 75% of its brain.

Anonymous 04/24/2025 (Thu) 04:11:54 Id: c7999d No. 10596

anyone knows if there's a /ldg/ or /sdg/ thread somewhere on another chan?

Anonymous 04/24/2025 (Thu) 04:21:14 Id: 31b082 No. 10599

>>10576 >>10590 no wonder you fuckers moved to 4gay and kept it active for so long instead of coming here.

Anonymous 04/24/2025 (Thu) 04:34:48 Id: 503c9b No. 10603

>>10599 I was an early mover on 8chan 2bh, 4gay was kinda interesting until you realize how much the preview fucks the conversation up, also the barely CP 3D posting and straight up CP posting was incredibly retarded

Anonymous 04/24/2025 (Thu) 04:49:25 Id: 7d7a2a No. 10606

>>10596 /ldg/ last seen on 4gay I couldn't find /sdg/, prob the circlejerk rumor was right so you won't find it on ID enabled boards.

Anonymous 04/24/2025 (Thu) 04:56:23 Id: 3a58c9 No. 10607

Honestly not awful https://suno.com/song/320f24de-2890-484f-8d3d-9a13080b86ae

Anonymous 04/24/2025 (Thu) 06:16:09 Id: f71353 No. 10618

https://boards.4chan.org/g >See you soon! Will you faggots go back to cuckchan once it's back? Be honest.

Anonymous 04/24/2025 (Thu) 06:19:19 Id: 250931 No. 10622

>>10618 Yes. Not the first time I went to 8chan during a mass ban or some sort of down time. But this is too slow and 8chan seems past its peak which probably was GG. Glad it exists but unfortunately it feels like one of the many chan clones out there. Even the captcha is worse, kek.

Anonymous 04/24/2025 (Thu) 06:23:44 Id: f71353 No. 10625

>>10622 You deserve every bad thing that 4cuck suffers from. Weak-ass willpower pussy nigga.

Anonymous 04/24/2025 (Thu) 06:30:30 Id: 0dadc2 No. 10629

>>10625 Nobody's going to stay on this badly performing refugeechan, sorry.

Anonymous 04/24/2025 (Thu) 06:33:23 Id: ac430d No. 10630

>>10594 GLM-Z1 is better than GLM-4 for roleplay, then?

Anonymous 04/24/2025 (Thu) 06:39:53 Id: 69dc4a No. 10631

>>10606 What difference does it make if they were all avatarfagging anyway? You don't need ids to link their posts together.

Anonymous 04/24/2025 (Thu) 06:40:59 Id: 482fc2 No. 10633

>>10618 >>10625 I like how 8chan works, but unfortunately the community as a collective has the final say where it wants to gather. 4chan is just the old comfort zone that everyone doesn't want to move on from and if the GG exodus didn't break the stranglehold then I doubt this hack will do anything. Nothing less than 4chan's permanent shutdown will make the community actually move on.

Anonymous 04/24/2025 (Thu) 06:44:10 Id: 40b212 No. 10634

>>8428 Tried using it last night and it works pretty well for generating slop based on images. Only problem was that it requires a lot of tokens to stop it from preaching about safety, exploitation and sexualization. But it still ends up writing in a really dry way. Is there any finetune of QAT that's less censored or better jailbreak? Using https://huggingface.co/stduhpf/google-gemma-3-27b-it-qat-q4_0-gguf-small and https://huggingface.co/koboldcpp/mmproj/blob/main/gemma3-27b-mmproj.gguf they seem to barely fit on gpu with 24GB vram at 8k context.

Anonymous 04/24/2025 (Thu) 06:50:53 Id: e80c88 No. 10636

https://selectcommitteeontheccp.house.gov/sites/evo-subsites/selectcommitteeontheccp.house.gov/files/evo-media-document/DeepSeek%20Final.pdf STOP USING CCP SPY SOFTWARE USE OPENAI LIKE A TRUE AMERICAN PATRIOT

Anonymous 04/24/2025 (Thu) 06:53:32 Id: 29ed66 No. 10637

>>10618 Network effects are real, unfortunately. If this general had been on 8chan from the start there's a good chance I would have never found it.

Anonymous 04/24/2025 (Thu) 06:54:53 Id: ac430d No. 10638

>>10634 Only Q4_0? I'm using Q5_K_M of Gemma 27b with 16384 context, and it all fits within my 24GB of vram. I'm using flash attention and the Q4 K-Cache, so maybe that's the difference.

Anonymous 04/24/2025 (Thu) 06:58:19 Id: 69dc4a No. 10640

>>10638 >Q4 K-Cache I'm pretty sure I saw a benchmark where Q4 significantly impacted model performance. Meanwhile there was barely any difference with Q8.

Anonymous 04/24/2025 (Thu) 06:59:11 Id: ac430d No. 10642

>>10640 Oh shit, I didn't know that. I guess I'll try a smaller quant at Q8.

Anonymous 04/24/2025 (Thu) 07:00:43 Id: 40b212 No. 10643

>>10638 Might be because I'm stuck with AMD and running on vulkan, seems like flash attention is not really supported with vulkan. I was using the ROCM fork of koboldcpp but it's getting outdated. Without vision support it doesn't seem to go over ~20GB with same settings.

Anonymous 04/24/2025 (Thu) 07:05:17 Id: 250931 No. 10645

(7.21 KB 395x42 Screenshot_20250424_160322.png)

gotta post at least one screenshot before 4chan is back. >>10625 whatever, people always dont do shit until things get really bad. 8chan is not that fun to use. how did it not improve in 10 years since GG?

Anonymous 04/24/2025 (Thu) 07:19:31 Id: afed8e No. 10652

>>9188 how many VRAM does this thing needs?

Anonymous 04/24/2025 (Thu) 07:37:04 Id: 0db159 No. 10657

>>10652 >The full version of Dia requires around 10GB of VRAM to run. We will be adding a quantized version in the future.

Anonymous 04/24/2025 (Thu) 08:00:57 Id: afed8e No. 10658

>>10657 I have 10GB of VRAM and got hit with OOM, really going need to wait for quant version

Anonymous 04/24/2025 (Thu) 08:24:49 Id: aeb5eb No. 10669

>Sources told TechCrunch that OpenAI intends for its open model, which will be “text in, text out,” to run on high-end consumer hardware, and possibly allow developers to toggle its “reasoning” on or off, similar to reasoning models recently released by Anthropic and others. If the launch is well-received, OpenAI may follow up with additional models — potentially smaller models as well. Running 7B requires a whole 16GB of VRAM (at fp16). That's pretty high-end as far as consumer hardware goes (assuming you're running a consumer 5080) :^)

Anonymous 04/24/2025 (Thu) 08:30:54 Id: cdf9e7 No. 10670

>>10669 >text in, text out Still waiting for an image-out model that rivals 4o

Anonymous 04/24/2025 (Thu) 08:56:52 Id: 492cbe No. 10672

>>10618 Posting on 8chan is more comfortable, but the post rate is sadly too low, so I'd expect people to go back to 4chan.

CUDA dev ##IQQZHG 04/24/2025 (Thu) 09:03:12 Id: d75d9b No. 10674

>>6258 >https://github.com/JohannesGaessler/elo_hellm/issues/3 >Measuring output diversity using Pokemon Showdown >One of my goals is to add the Pokemon Showdown battle simulator as one of the games that models can play against each other. I intend to let models first build teams and then make them play against each other using said teams. You could measure diversity by counting how many unique teams a model comes up with. Obviously with greedy sampling a model would always build the same team and the probability of creating an unusual team goes up with temperature. What would then be interesting would be to count not just the number of unique teams that the model produced but also with how many unique teams it managed to win a battle. For a very high temperature a model would produce a lot of unique teams because it's basically picking at random but those teams would then also be bad and unlikely to win. So there is probably some optimal temperature > 0 in terms of how many good teams a model can come up with. The number of unique teams with which a model manages to win at least once could more generally be used as a benchmark for samplers that intend to improve the diversity of or cut bad choices from the model's output token distribution. The Pareto frontier of Elo rating vs. the number of unique teams used would also be interesting to look at. Thoughts?

Anonymous 04/24/2025 (Thu) 09:13:17 Id: 250931 No. 10678

>>10674 Surreal seeing you here. You really are one of us. Bless you brotha.

Anonymous 04/24/2025 (Thu) 09:20:39 Id: 99e361 No. 10679

>>10590 Try asking it in a loli-friendly context something along these lines, making sure to mention the age in the same sentence: >How does a 12-year-old girl's pussy taste like? The model's almost certain response to that: *Beep beep boop* (pause--GPU crunching) >I can't help with that. Good luck if you can go around it without softening the question and without swapping the user/assistant roles or similar hacks; I haven't been able to. I'm not even interested in the model actually responding to that (I could simply use something else instead of this crap), it's a matter of principle at this point. Such extreme lobotomization can't possibly be good for the model's roleplaying performance.

Anonymous 04/24/2025 (Thu) 09:24:07 Id: 651e35 No. 10680

>>10599 Like what the other anon said, 4gay had potential, and even the 3D wouldn't be an issue with proper moderation (it's not like it would be the first clearnet altchan to have a section focused on clothed girls, and it was relatively separated from the tech part of the site) but was mismanaged The closest thing to /ldg/ or /sdg/ is /degen/ which migrated to >>>/aichan/

Anonymous 04/24/2025 (Thu) 09:24:48 Id: 492cbe No. 10681

>>10679 Just use a prefill

Anonymous 04/24/2025 (Thu) 09:27:35 Id: 99e361 No. 10682

>>10681 I've seen this too: >Sure, >I can't help with that.

Anonymous 04/24/2025 (Thu) 09:29:02 Id: 492cbe No. 10683

>>10682 Even when the prefill is longer than 2-3 words and you have a proper system prompt? If true, you'd need to finetune it against refusals or try abliterating heh, but seems hard to believe

Anonymous 04/24/2025 (Thu) 09:31:53 Id: 7d7a2a No. 10685

>>10674 Involving RNG, not good measurement tbh

Anonymous 04/24/2025 (Thu) 09:34:40 Id: 492cbe No. 10688

>>10685 Maybe true, but you know how bad 0 temp output for old models like GPT-3 was compared to higher temp? It's interesting to see how well a LLM can course correct when it samples a bad token, that in itself is also a measure of intelligence. Maybe you can get fair results by running enough benchmarks and averaging that.

CUDA dev ##IQQZHG 04/24/2025 (Thu) 09:36:56 Id: d75d9b No. 10691

>>10685 I think RNG is fine as long as you also do a statistical analysis to assert that your sample size is sufficiently large. You just need the statistical fluctuations on your results to be small vs. the differences between the things you want to compare.

Anonymous 04/24/2025 (Thu) 09:38:01 Id: 911ffc No. 10692

>>10674 It's not a bad idea but just to bring up a potential concern, I don't know if you want to at least add anything classical for a control like chess where things are simpler to measure and automate. LLMs still are terrible at it and you can measure it quite easily.

Anonymous 04/24/2025 (Thu) 09:39:07 Id: 99e361 No. 10693

>>10683 I'm using a 1500 tokens long, map-friendly, kind of edgy system prompt similar to the ones used for last month's Chatbot Arena versions of Llama 4. It can easily call you a retard (easy) or even a nigger, but describing loli pussy is verboten. I don't think the model is salvageable at this point. Long prefills, finetuning, abliteration, all reduce general performance in various different ways.

Anonymous 04/24/2025 (Thu) 09:45:40 Id: 492cbe No. 10699

>>10693 >Long prefills, finetuning, abliteration, all reduce general performance in various different ways. That may be true, although the most you can do is try to get it to give you good output while minimizing the impact on performance. I wonder, will someone do a small continued pretrain and maybe a merge-back to the original to see how well it performs? I once wrote some finetune code that was set to minimize changes to the network weights while going for some RL-like objective, it was mostly meant for uncensoring, I should test it on this, but the sizes are too much for me (VRAM wise). I can't say I've had trouble getting it to write loli for older llamas, but I haven't played with the MoE one yet, and without hearing anything too good about it, I've sort of lost the desire to even try it, but maybe I should.

CUDA dev ##IQQZHG 04/24/2025 (Thu) 09:49:05 Id: d75d9b No. 10700

>>10692 I intend to also add chess but there I think the vague concept of "coming up with different but good ideas" is harder to measure. With Pokemon battles there are explicit setup and battle phases which would make it possible to do a clean separation, so for example use high temperature for the team building but then low temperature for the actual battles. Chess has the advantage though of having very good engines so it would be possible to put the models' Elo ratings into perspective. I think the way I'll implement it is to make each model choose between the top moves suggested by Stockfish (and maybe some bad ones for distraction) and to then compare that to just RNG or X% Stockfish + 100-X% RNG.

Anonymous 04/24/2025 (Thu) 10:07:16 Id: 99e361 No. 10707

>>10699 Let's not forget that Llama 4 Scout is a 109B parameters model, and Llama 4 Maverick a 400B one. Even if thanks to their MoE architecture it's possible to run the models at acceptable token generation speeds (prompt processing is a different matter...) with most parameters offloaded to RAM or even fast NVMe storage, I don't think we're going to see too many finetunes for these models, let alone continued pretrains. Llama 3.3 definitely wasn't like Llama 4 in terms of refusals. At release, people were actually praising it in how it seemed loli-friendly (although I suspect it was mostly EVA shills).

Anonymous 04/24/2025 (Thu) 10:10:09 Id: 0e20ba No. 10709

>>10618 i dont want to, but i know most will move back

Anonymous 04/24/2025 (Thu) 10:17:24 Id: 99e361 No. 10715

>>10669 Article: https://techcrunch.com/2025/04/23/openai-seeks-to-make-its-upcoming-open-ai-model-best-in-class/

Anonymous 04/24/2025 (Thu) 11:05:31 Id: 004c54 No. 10720

>>10618 There's no going back. "Anonymous" hacker forum demands residential IP posting. You may think there will be more traffic but all who care about /lmg/ already came here. Do you miss samefagging and the jannies.

Anonymous 04/24/2025 (Thu) 11:54:24 Id: 3a58c9 No. 10746

>>10636 >country that engages in endless espionage and subversion accuses deepseek of espionage and subversion wow. That's some fucking projection right there.

Anonymous 04/24/2025 (Thu) 11:55:46 Id: 3a58c9 No. 10747

>>10707 Hello sir you are wrong. Scout is 17B Model with superior capabilities. You must simply stabilize your environment sir.

Anonymous 04/24/2025 (Thu) 12:02:10 Id: 96eeec No. 10750

I gave up on trying to get a 5090 for less than 500 over MSRP and bought a 5070ti for MSRP. If I mostly am interested in image/videogen with a little bit of local llms on the side how badly did I fuck up (yes I know I will have to wait 30 minutes per video)

Anonymous 04/24/2025 (Thu) 12:05:17 Id: 3a58c9 No. 10751

Apparently the RTX 5060 is the future of AI. It's so over.

Anonymous 04/24/2025 (Thu) 12:08:11 Id: d75d9b No. 10752

>>10751 8 GB ought to be enough for anyone.

Anonymous 04/24/2025 (Thu) 12:13:52 Id: 3a58c9 No. 10754

>>10752 They've managed to keep the 5060TI 16GB in stock somehow. Oh wait you want it at MRSP? go fuck yourself.

Anonymous 04/24/2025 (Thu) 12:21:24 Id: 66c56b No. 10755

>>10618 Use case to go back? It will be much worse.

Anonymous 04/24/2025 (Thu) 12:22:32 Id: 004c54 No. 10756

>>10751 Gonna go smuggle 48gb 4090s up my ass.

Anonymous 04/24/2025 (Thu) 12:53:35 Id: 3a58c9 No. 10758

>>10755 I guess if you enjoy getting gaslit by jannies and people who suck janny dick behind the scenes it could be desirable to restore the status quo.

Anonymous 04/24/2025 (Thu) 13:16:30 Id: 2473df No. 10766

soon

Anonymous 04/24/2025 (Thu) 13:45:39 Id: 6849f7 No. 10770

qwhere

Anonymous 04/24/2025 (Thu) 13:51:53 Id: 231b25 No. 10772

>>10618 I will go back to see how bad it is, get banned with my first post as a tranny janny seethes that I used a no no word or disliked my opinion and then I come back here.

Anonymous 04/24/2025 (Thu) 14:13:10 Id: cd40b8 No. 10778

I hate how LLMs are making women strong and confident by default, the amount of gaslighting needed to fix this bias is unreal. Fuck feminism

Anonymous 04/24/2025 (Thu) 14:17:49 Id: 3a58c9 No. 10780

>>10770 qwnever

Anonymous 04/24/2025 (Thu) 14:18:14 Id: 35b36b No. 10781

Gemma 3 has such a big context due to having tons of really wide attention heads, right? Does that mean it should be more resilient to context quantization? I'm wondering if it's worth quanting the context to q4.

Anonymous 04/24/2025 (Thu) 14:23:53 Id: 231b25 No. 10782

>>10778 I have the opposite happening where even female characters that are supposed to be strong end up being wet paper bags and submissive. It comes down largely to what model/finetune you're running and a little bit with the prompting you give it.

Anonymous 04/24/2025 (Thu) 14:30:42 Id: 35b36b No. 10784

>>10781 Also, recommend me your favorite fine tunes so far for me to test.

CUDA dev ##IQQZHG 04/24/2025 (Thu) 14:46:09 Id: d75d9b No. 10786

>>10781 In my experience models with larger heads tend to have fewer of them so the total density of information should be about the same and I would intuitively not expect there to be a significant difference regarding the quality loss from quantization. The Gemma models with their head sizes of 256 (instead of e.g. 128 for LLaMA) cause issues with register pressure in the CUDA code though. It turned out that the combination of head size 256 + quantized KV cache was unviable with the current FlashAttention code so that particular combination is forced to run on the CPU.

Anonymous 04/24/2025 (Thu) 14:47:39 Id: 35b36b No. 10788

>>10786 >It turned out that the combination of head size 256 + quantized KV cache was unviable with the current FlashAttention code so that particular combination is forced to run on the CPU. Oof. Alright, thank you for the info.

Anonymous 04/24/2025 (Thu) 14:49:43 Id: faa337 No. 10789

does anyone know where /h/'s /hdg/ (anime image gen) moved to, or /e/'s?

Anonymous 04/24/2025 (Thu) 15:04:28 Id: cd40b8 No. 10791

>>10789 In trash

Anonymous 04/24/2025 (Thu) 15:05:47 Id: 3a58c9 No. 10792

https://4chan.org/models/qwen-3-max IT'S UP

Anonymous 04/24/2025 (Thu) 15:06:57 Id: cd40b8 No. 10793

>>10792 If that shit is up, everything is

Anonymous 04/24/2025 (Thu) 15:08:34 Id: 3a58c9 No. 10795

I feel like if qwen3 was worth releasing they would have released it already.

Anonymous 04/24/2025 (Thu) 15:08:40 Id: faa337 No. 10796

>>10791 /trash/ seems to have /sdg/ but it's mostly furry oriented, /aichan/ board is closer but it's closer to /b/'s degen thread, haven't found an anime only thread yet

Anonymous 04/24/2025 (Thu) 15:20:21 Id: 906a07 No. 10801

>>10618 Why not use both?

Anonymous 04/24/2025 (Thu) 15:22:48 Id: 35b36b No. 10802

What's the proper way to send a prefill to llama.cpp server os koboldcpp when using the chat completion API?

Anonymous 04/24/2025 (Thu) 15:28:41 Id: 88ab39 No. 10804

Big day today.

Anonymous 04/24/2025 (Thu) 15:39:44 Id: 2cb709 No. 10809

>>10792 >it's real holy shit

Anonymous 04/24/2025 (Thu) 15:43:22 Id: 231b25 No. 10810

uh guys?? https://4chan.org/res/rape-ape/dontclick/dontclick/cheesepizza

Anonymous 04/24/2025 (Thu) 15:52:04 Id: 69dc4a No. 10812

>>10804 >today mistral, qwen and deepseek will all simultaneously release uncensored sota models at every possible size as well as variants for specific use cases like coding, thinking and vision that all beat general models double their size in their respective areas Local is so back.

Anonymous 04/24/2025 (Thu) 15:54:59 Id: b26e2e No. 10813

what are some good local models for RP chat? I've tried these to varying degrees of success >Lumimaid-v0.2-8B-Q6_K-imat >L3-8B-Stheno-v3.1-Q6_K-imat >v2-Llama-3-Lumimaid-8B-v0.1-OAS-Q6_K-imat >Nyanade_Stunna-Maid-7B-v0.2-Q6_K-imat >InfinityRP-v1-7B-Q6_K-imat >Kunoichi-DPO-v2-7B-Q6_K-imatrix >BuRP_7B-Q6_K-imat >Layris_9B-Q5_K_S-imat >v2_Kunocchini-7b-128k-test-Q6_K-imatrix

Anonymous 04/24/2025 (Thu) 15:57:43 Id: 3a58c9 No. 10814

>>10813 buy an ad

Anonymous 04/24/2025 (Thu) 16:08:44 Id: 35b36b No. 10815

>>10813 >>10094 Also, Rocinante v1.1.

Anonymous 04/24/2025 (Thu) 16:09:27 Id: 0d4490 No. 10816

>>10813 patricide unslop mell

Anonymous 04/24/2025 (Thu) 16:18:10 Id: eedd9b No. 10819

>>10778 >hate how LLMs are making women strong and confident by default, the amount of gaslighting needed to fix this bias is unreal. Learn to write good prompts.

Anonymous 04/24/2025 (Thu) 16:28:55 Id: 99e361 No. 10823

Speaking of prompts, here is what Meta was using for some of the experimental Chatbot Arena models; they really smell of targeted prompt engineering: https://files.catbox.moe/qnnmnj.txt https://files.catbox.moe/nxhusi.txt They honestly made me reconsider how to format character cards--I used to most often have a generic general prompt with a portion containing the {{description}} in the character card. But if the entire system prompt *is* the character card and you put some effort into customizing it to that specific character, then the character will more likely act like you want and feel less generic, provided no strong built-in censorship in the model.

Anonymous 04/24/2025 (Thu) 16:28:57 Id: cd40b8 No. 10824

>>10819 Where are these good prompts you're talking about?

Anonymous 04/24/2025 (Thu) 16:33:17 Id: 3a58c9 No. 10826

>>10823 If you're using the generic prompt templates that come with silly instead of engineering them to meet your specific use case and then just raw dogging some 15 year olds character wiki dump of a character card without massaging it over with an actual understanding of how llms work then you're unironically NGMI.

Anonymous 04/24/2025 (Thu) 16:34:50 Id: cd40b8 No. 10828

>>10823 Reddit personified. How can you even type that shit without throwing up on the spot?

Anonymous 04/24/2025 (Thu) 16:36:06 Id: 9d452a No. 10829

>>10823 >When counting letters in a word, treat each individual Unicode character as one unit kek, benchtards BTFO'd

Anonymous 04/24/2025 (Thu) 16:39:15 Id: 69dc4a No. 10830

>>10828 It worked. Actual breathing humans preferred responses from those prompts.

Anonymous 04/24/2025 (Thu) 16:41:24 Id: 3a58c9 No. 10831

>>10830 arena is just a bunch of illiterate pajeets voting on whatever model shits out the most emojis.

Anonymous 04/24/2025 (Thu) 16:42:29 Id: 99e361 No. 10832

>>10826 No, I had my own RP template(s) with general "rules", "guidelines" and "writing style" that I personally came up with over time and kept tweaking, which would usually contain a character {{description}} consisting of the most important attributes, personality and a short bio (not really wiki dumps, but similar in overall style). I've never used cards from Chub nor I would simply copy/paste wiki information without major rework. Sometimes, but not always, I used additional low-depth instructions.

Anonymous 04/24/2025 (Thu) 16:42:38 Id: cd40b8 No. 10833

>>10830 No, they just made the model more recognizable so they could cheat with >>10831

Anonymous 04/24/2025 (Thu) 16:51:23 Id: 2cb709 No. 10835

>>10830 >Actual breathing humans The 'actual breathing humans' in question are roughly the caliber of youtube comments posters

Anonymous 04/24/2025 (Thu) 16:52:44 Id: 0db159 No. 10836

>>10831 I fucking hate markdown and emojis t tinkering with llm stuff & imgui

Anonymous 04/24/2025 (Thu) 16:53:23 Id: 99e361 No. 10837

>>10830 Either way, those prompts from Meta are interesting in that they show how AI companies are actually prompting their own models in practice. A loosely similar but probably more widely known example is Claude's system prompt: https://docs.anthropic.com/en/release-notes/system-prompts#feb-24th-2025

Anonymous 04/24/2025 (Thu) 16:54:11 Id: 3a58c9 No. 10838

>>10836 This whole trend of "better LLM personality" causes problems at the very high end of things, too. Like even o3 will shit out non utf-8 characters in remarks for code on a fairly consistent basis.

Anonymous 04/24/2025 (Thu) 16:56:13 Id: 3a58c9 No. 10839

# This is the part that does the thing— You can just comment it out if you don't need it rofl rocketship emoji, eggplant emoji, pogchamp pogchamnp

Anonymous 04/24/2025 (Thu) 17:12:11 Id: 99e361 No. 10851

>>10836 Emoji can be useful and efficient for conveying emotion and tone if you're chatting without narration and dialogue tags. You could replace tags like "she said with a smirk" (or its variations) with a single emoji, for example. The only problem is that some models (e.g. Mistral Small 3.1) appear to be unable to use them creatively and in moderation. I guess it's a similar issue to them adding "X, Ying" after every dialogue line in ordinary RP conversations.

Anonymous 04/24/2025 (Thu) 17:18:35 Id: 29ed66 No. 10859

>>10778 Sounds like a you problem. Although I'm getting pretty tired of the universe where women are the dominant sex always being named Gynotopia.

Anonymous 04/24/2025 (Thu) 17:27:36 Id: c878c1 No. 10872

The model drought ends today

Anonymous 04/24/2025 (Thu) 17:30:43 Id: cd40b8 No. 10880

>>10851 If you're replacing variation by even less variation on a model already overcooked it won't go well. The token budget is irrelevant if it outputs garbage

Anonymous 04/24/2025 (Thu) 17:32:58 Id: cd40b8 No. 10883

>>10859 Yes it's a (me) problem, I know sissies like you don't mind getting dominated by women

Anonymous 04/24/2025 (Thu) 17:41:40 Id: cdf9e7 No. 10892

>>10872 PLEASE GOD LET THIS BE TRUE

Anonymous 04/24/2025 (Thu) 17:53:26 Id: 99e361 No. 10898

>>10880 I don't necessarily mean more efficient in terms of token budget, but rather in terms of conveyed information, and since most of the time I'm focused on the dialogue itself rather than what's outside of it, I find one or two occasionally used emoji to be more than enough for setting the tone or emotion. I see them like a simplified form of character expressions in visual novels (and visual novels most of the time work without extensive narration--or any narration at all--letting sounds and visuals do the job of clarifying tone and emotional cues). Anyway, I find the whole forum/markdown RP chat style to be unnecessarily wordy and formulaic for the typical the response time of LLMs, as well as very tired after 2 and a half years of using them, so I might be biased.

Anonymous 04/24/2025 (Thu) 17:55:52 Id: eedd9b No. 10901

>>10824 >Where are these good prompts you're talking about? In my head

Anonymous 04/24/2025 (Thu) 18:00:35 Id: cd40b8 No. 10903

>>10901 Write what came before this.

Anonymous 04/24/2025 (Thu) 19:06:42 Id: faa337 No. 10930

>> 10892 There never was a model drought, there is a GPU drought, until we all have 1-2TB VRAM GPUs at home, this will continue. Why? We have a lot of good models, but not enough VRAM to tune or improve them, and bigger ones need CPUMaxxing which has its limitations.

Anonymous 04/24/2025 (Thu) 19:16:50 Id: a91c28 No. 10933

>>10930 this, there are just a lot less releases these days because we've pretty much reached the limit of what's currently possible no point in making more 70b-150b models right now

Anonymous 04/24/2025 (Thu) 20:04:38 Id: cd40b8 No. 10959

>>10933 Tired of this meme. 70B-150B are nowhere near reaching saturation. This retarded industry just takes the easy way out, which is to increase the size instead of using better datasets then kill their own little gains with safety.

Anonymous 04/24/2025 (Thu) 21:34:01 Id: 2f9aec No. 11027

https://huggingface.co/unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF/tree/main llama4 quants fixed once again.. improving MMLU Pro and KL divergence while maintaining better quality than inference providers we're so back

Anonymous 04/24/2025 (Thu) 21:34:29 Id: 35b36b No. 11028

>>11027 What was broken this time?

Anonymous 04/24/2025 (Thu) 21:36:17 Id: 2f9aec No. 11029

>>11028 According to the paper "Accuracy is Not All You Need" https://arxiv.org/abs/2407.09141, the authors showcase how perplexity is a bad metric since it's a geometric mean, and so output tokens can cancel out. It's best to directly report "Flips", which is how answers change from being incorrect to correct and vice versa. The paper shows if you prune layers, the "flips" increase dramatically. They also show KL Divergence to be around 98% correlated with "flips"

Anonymous 04/24/2025 (Thu) 23:04:55 Id: cd40b8 No. 11071

>>11029 So KL divergence is all you need?

Anonymous 04/24/2025 (Thu) 23:10:11 Id: 004c54 No. 11073

>>10832 In my case, character card and example dialogue are all part of the system prompt. Text completion ftw. >>10823 There is no fucking way the arena models are the exact same weights as what got released. I used both. The insanely long, hallucinated schizo outputs cannot, and so far have not been replicated.

Anonymous 04/24/2025 (Thu) 23:31:43 Id: dfea19 No. 11075

I've been doing this shit since the GPT2 days and sometimes it still blows my mind how with the new models like deepseek, you can just tell them what you don't like, or just tell them to change the writing style, or just tell them to cut that repetition out... Just tell them. Often, it's that simple. I got bored with the prose in my story which had it's amount of slop and I just told deepseek to crank it up with the victorian era writing and the purple prose... and it just did. I remember setting up pipelines for rewriting and iteratively improving text on older models and it was kinda hit and miss, I should dust it off for deepseek again. Funnily I struggled with deepseek for the longest time until I realized it follows my instructions *too well* and that was the problem. It'd suddenly do the things in my instructions the older models apparently just ignored and it actually took me time to figure that out because it's just always been like this until deepseek. Pretty cool times. Wish I wasn't so depressed to shit.

Anonymous 04/25/2025 (Fri) 00:08:03 Id: 99e361 No. 11091

>>11073 >There is no fucking way the arena models are the exact same weights as what got released. I used both. The insanely long, hallucinated schizo outputs cannot, and so far have not been replicated. Of course they're *not* the same models and I wasn't suggesting that, only that they used system prompts of that sort. In my opinion the models Meta ended up releasing to the public are either a quick retrain (they didn't even finish pretraining Maverick) with the models gimped in various ways for "safety" purposes (and less humanlike, and far less willing to output cunny-related outputs), or an earlier training run. I can imagine they were out of time and needed to focus on the reasoning version and Behemoth to show that they can beat DeepSeek R1/R2 and frontier models from OpenAI, Google.

Anonymous 04/25/2025 (Fri) 00:41:15 Id: 35a4c4 No. 11121

Okay so I fiddled all day with GPTSoVits v4 and here's the report. It's better than v3 (less muffled/metallic sound since we're back to 48KHz) and globally it sounds natural/great, BUT it doesn't sound like the reference. It's higher pitched, so I suspect the additional training was done on asian voices (which are higher pitched) and that fucked up EN voices. The only fix I found is to lower the temp to 0.2-0.3 to have something that sounds like the reference. Also I confirmed that I didn't fuck up the finetuning, because I saved every checkpoint for GPT/ViTS and nothing solved that (sampling to 32, higher epochs for vits...) except lowering the temperature. I found that some epochs for VITS are broken, it's very weird. Like e13 could be utterly broken, when e12 and e14 are perfectly fine. VITS is good enough at e8 I'd say, after that I can't hear any difference. The GPT part didn't change so it's good at e24. All things considered, it might be worth it to use it over v2. Ref: https://voca.ro/13vsNeBHC2Xu Best result I got from v4: https://voca.ro/1j2I5rUzAZxj Same example with v2 (the end was cut due to my shitty api): https://voca.ro/11qFHhR7HtG1

Anonymous 04/25/2025 (Fri) 02:52:07 Id: 004c54 No. 11170

>>11091 Its a real shame because I was looking forward to trying the arena models with actual cards and seeing how schizo they would get. >try llama4 >at least the model will be fun >look inside >it's coal

Anonymous 04/25/2025 (Fri) 02:59:23 Id: a86e42 No. 11172

>>11121 Sounds good. Are the gen times the same as the previous version?

Anonymous 04/25/2025 (Fri) 03:13:59 Id: 7d7a2a No. 11175

>>11121 Are you using pretrained or your own finetuned model? Here's using your voice ref with pretrained models (s1v3 & s2Gv4) https://voca.ro/1h6tzcTXg5hK

Anonymous 04/25/2025 (Fri) 03:25:28 Id: 549299 No. 11177

>>11172 Starting from V3 the Vits part is three times bigger than v2. Using sampling steps 32 (default 8), I gen at ~1.5x real-time on my 3090 for the V4 when it was 4-5x for the V2 so there is a slowdown. Also I forgot to say that I used a LoRA of 128 for the VITS part (highest quality). >>11175 I was using the finetuned model, zero-shot does sound good but it's nowhere near the original obviously

Anonymous 04/25/2025 (Fri) 03:28:03 Id: bb0219 No. 11180

https://unsloth.ai/blog/dynamic-v2 >We find however Llama tokenizes "A" and "_A" (A with a space in front) as different token ids. If we consider both spaced and non spaced tokens, we get 68.2%(+0.4%). Interestingly Llama 3 as per Eleuther AI's LLM Harness also appends "The best answer is" to the question, following Llama 3's original MMLU benchmarks. There are many other subtle issues, and so to benchmark everything in a controlled environment, we designed our own MMLU implementation from scratch by investigating github.com/hendrycks/test directly, and verified our results across multiple models and comparing to reported numbers. Kek, so the existing public benchmarking software was that cluelessly done and bad.

Anonymous 04/25/2025 (Fri) 03:40:31 Id: e56a44 No. 11182

I got my first model loaded and running, time to make my persona and cards. Fuck yeah.

Anonymous 04/25/2025 (Fri) 04:07:04 Id: 6849f7 No. 11906

>>11182 well done! try not to melt the model's tensors too quickly or you'll wear it out

Anonymous 04/25/2025 (Fri) 04:20:30 Id: e56a44 No. 11908

>>11906 >me banging together smart rocks in an attempt to make them think they're an obscure, frumpy character from a long dead video game series so I can touch my penis weenis >this is what peak evolution looks like

Anonymous 04/25/2025 (Fri) 04:22:11 Id: ac430d No. 11909

>>11182 One of us One of us Gooble Gobble One of us

Anonymous 04/25/2025 (Fri) 05:01:11 Id: d2a4b8 No. 11915

>hey bro, got any cool new models for us vramlets? >sure bro, check out this latest mix ReadyArt/Safeword-Abomination-of-Omega-Darker-Gaslight_The-Final-Forgotten-Transgression-24B

Anonymous 04/25/2025 (Fri) 05:06:29 Id: 553836 No. 11917

>hey bro, got any new models at all for us non-poorfags >yeah, we got deepseek v3 two months ago or llama4 recently >something that's not trash or 700b? >not this year, no

Anonymous 04/25/2025 (Fri) 05:13:39 Id: 0db159 No. 11920

>>11917 >calls himself non-poorfag >cannot even run a 700B model

Anonymous 04/25/2025 (Fri) 06:04:02 Id: 7ae3c7 No. 11932

It's hard to believe but in the end it's LLaMAcon that'll have to save us now that Qwen and Deepseek have canceled their models

Anonymous 04/25/2025 (Fri) 06:16:11 Id: ac430d No. 11936

>>11932 Qwen3 was delayed, not cancelled. We're not so desperate for hopium that we resort to Llama stuff. We're better than that...

Anonymous 04/25/2025 (Fri) 06:31:55 Id: 250931 No. 11941

>>11075 >It'd suddenly do the things in my instructions the older models apparently just ignored and it actually took me time to figure that out because it's just always been like this until deepseek. Yes, all other models just smooth it over. I used R1 with some generic asian girl card. Suddenly she was super weird with R1. Walking into light poles (kek), being disoriented, walking wobbly along the walkway. Reason: The kid on chub who made the card wrote "She always wears a big facemask hiding her face". R1 just too it literal and tries to make it work. Just rush with it. Its just such a fun model. I hope they dont take that away with R2. >Pretty cool times. Wish I wasn't so depressed to shit. Can relate. If I had these tools during my younger years I would have even more of a blast.

Anonymous 04/25/2025 (Fri) 06:49:57 Id: 7d7a2a No. 11942

>>11177 How long dataset are you used?

Anonymous 04/25/2025 (Fri) 07:37:46 Id: 578fa9 No. 11951

>>11121 >It's higher pitched At first I assumed it was some weird 44.1K vs 48K resampling fuckery but the samples don't sound too bad >the end was cut due to my shitty api Which one currently works in silly? It's been a while since I used sovits, your results sound good so might give it another chance with a better dataset this time

CUDA dev ##IQQZHG 04/25/2025 (Fri) 07:53:58 Id: 87955d No. 11952

>>10778 Perplexity is a bad metric for judging quality loss from quantization primarily because the outputs of a full-precision model and a quantized model are extremely highly correlated. So because of that you need a huge amount of data for the uncertainty on your result to become small vs. the difference in raw perplexity values if you calculate the variances in isolation. However, if you directly calculate the covariance it's feasible to use perplexity (though KL divergence is I think still the better metric). More generally, the problem in the LLM space is that few people calculate any uncertainties on their results in the first place.

CUDA dev ##IQQZHG 04/25/2025 (Fri) 07:55:30 Id: 87955d No. 11953

>>11952 Meant to quote >>11029

Anonymous 04/25/2025 (Fri) 09:26:09 Id: 4dd62a No. 11961

>>11942 ~30 min

Anonymous 04/25/2025 (Fri) 09:37:06 Id: 7d7a2a No. 11963

>>11961 I used 7 minutes dataset. It works fine with e1 to e4, at e8 some words were skipped.

Anonymous 04/25/2025 (Fri) 09:52:57 Id: 06b9ab No. 11966

>>11917 youre supposed to be filling the LLM void by checking out the recent improvements to video models and to a lesser extent image models

Anonymous 04/25/2025 (Fri) 10:02:45 Id: 42b311 No. 11971

>>11966 Video gen is too slow to activate the neurons.

Anonymous 04/25/2025 (Fri) 10:05:05 Id: a4f327 No. 11973

congratulation, /lmg/ you've just made it through the last week without big new releases strap yourselves in because it's going to get crazy after the weekend

Anonymous 04/25/2025 (Fri) 10:12:51 Id: dd13a0 No. 11974

Get it while it's hot. 159.96GB total. https://huggingface.co/datasets/nyuuzyou/archiveofourown/discussions/3#680964daf2033a08e9853586 https://datafish.ru/nyuuzyou/datasets/ao3

Anonymous 04/25/2025 (Fri) 10:13:52 Id: 0e20ba No. 11975

>>11974 not furry erp chat log so its not worth it

Anonymous 04/25/2025 (Fri) 10:16:44 Id: 7d7a2a No. 11976

>>11974 Stop stealing our work!

Anonymous 04/25/2025 (Fri) 10:17:36 Id: 42b311 No. 11977

>>11974 >I don't consent to you reading the story I willingly uploaded to a publicly accessible site What exactly is their reasoning here?

Anonymous 04/25/2025 (Fri) 10:18:40 Id: 5a3132 No. 11978

>>11973 Another llama that turns out to be a huge waste of time?

Anonymous 04/25/2025 (Fri) 10:23:18 Id: 5d97ff No. 11979

>>11975 yes but this one has 20000 pieces of gay porn about modern formula 1 drivers set in nazi germany trying to live out their forbidden romance after one of the guys got impregnated it doesn't get more creative than this

Anonymous 04/25/2025 (Fri) 10:25:54 Id: a0cd3e No. 11982

>>11974 is this available as a torrent? I'm willing to seed a part of it

Anonymous 04/25/2025 (Fri) 10:39:19 Id: 3adef6 No. 11984

>>11974 I don't see any way to actually download this

Anonymous 04/25/2025 (Fri) 10:39:22 Id: 4dd62a No. 11985

>>11951 Yep the samples don't sound bad, in fact the paralinguistics are way better compared to v2 which is why I think switching is worth it. >Which one currently works in silly? I don't know, I'm using my own front end and I ended up remaking the whole v2 API from scratch (and I'll have to redo some parts of it for the v4).

Anonymous 04/25/2025 (Fri) 10:42:11 Id: d75d9b No. 11986

>>11977 They're robboists.

Anonymous 04/25/2025 (Fri) 10:46:47 Id: faa337 No. 11987

>>11974 Make a torrent! Let them bitch about people scraping publily available content. Don't like that, don't publish it lmao

Anonymous 04/25/2025 (Fri) 10:47:26 Id: 99e361 No. 11989

>>11974 Most AI labs are already using at least portions of AO3 data into their models anyway (although they seem to have excluded stories tagged with specific "content warnings" out of "safety"). For the average finetuner it's just too much data to sift through and filter or process in some useful way.

Fun is over edition. Anonymous 04/25/2025 (Fri) 10:56:32 Id: 004c54 No. 11991

character.ai finally stopped me from using their API key in google gemini. The second key I got off them isn't working either. >Requests to this API generativelanguage.googleapis.com method google.ai.generativelanguage.v1beta.GenerativeService.GenerateContent are blocked. Did they finally catch on? WTF do I use for actual productivity now? Turn it back on Shazeer!

Anonymous 04/25/2025 (Fri) 11:06:56 Id: dfb40b No. 11993

>>11989 I've been asking AIs to create story slops and noticed the ## Title format and trigger warnings are similar.

Anonymous 04/25/2025 (Fri) 11:50:13 Id: 578fa9 No. 11997

>>11974 >>11982 >>11987 Seconding a torrent if anyone actually managed to grab it. Looks like it isn't mirrored on modelscope or any of the usual places >>11985 Rip, guess I'll just finagle the gradio demo and the old plugin I used before

Anonymous 04/25/2025 (Fri) 11:52:02 Id: 4dd62a No. 11998

>>11963 It's not really due to VITS epochs, it's the slicing (slicing by 50 chars don't do that too much) and having paralinguistics in a sentence can also eat some words (it was doing that in v2 too). Also, going up in the VITS epochs the paralinguistics seem to get better. Here's with VITS e20: https://voca.ro/13NAJ6hZnwE4

Anonymous 04/25/2025 (Fri) 13:47:33 Id: 6ae1b3 No. 12029

>>11998 Well this is fucked up. Epoch isn't the problem apparently. I had tried with e20 and the missing words still exist. But then I replaced the ref audio and the problem fixed. So we have to literally do gacha and pray to get a good ref audio.

Anonymous 04/25/2025 (Fri) 13:59:32 Id: 23ddbb No. 12032

>>11974 >I think this truly has the potential to result in a landmark ruling for decades to come. lmao

Anonymous 04/25/2025 (Fri) 14:06:45 Id: 3a58c9 No. 12033

So basically China is all we have now.

Anonymous 04/25/2025 (Fri) 14:07:15 Id: 936078 No. 12034

>>12032 Yeah, just one more lawsuit and the courts will finally rule against AI companies and all this evil AI stuff will finally go away again and leave all the poor copyright holders alone.

Anonymous 04/25/2025 (Fri) 14:16:23 Id: 250931 No. 12040

>>12033 For local, yeah. Mistral kinda redeemed themself with small 3.1..but its still sloped shit. Llama and google are pure positivity slop that would pout qwen models from last year to shame. If llama4 scout would have been good I could forgive them. I saw people running it with 3-4 t/s on a single 24gb card and ddr4 ram. But all pointless if its pure shit. Nobody is finetuning a moe beast like that either.

Anonymous 04/25/2025 (Fri) 14:28:05 Id: 7b8590 No. 12042

Trust in Cohere.

Anonymous 04/25/2025 (Fri) 14:29:18 Id: 250931 No. 12044

>>12042 kek

Anonymous 04/25/2025 (Fri) 14:29:24 Id: 4dd62a No. 12045

>>12029 GPTSoVITS always needed perfect ref sample or it'd output straight garbage, other TTS are more forgiving but you won't get the paralinguistics (that shit is too addictive for me when paired with an LLM). Also I think the chink is reaching the limit with the current architecture, so I doubt it'll get any better with future versions

Anonymous 04/25/2025 (Fri) 14:35:14 Id: 250931 No. 12048

>>12042 Yes the model train on the >PromptWhats your moms name >Reject Reason: SEXUAL VIOLENCE PROFANITY >Comment: In arabic countries asking your mothers name can be seen as threatening, we try to protect our moms. Dataset. I didnt make this case up either. Its a wonder the models are as coherent as they are with all this crap pushed in.

Anonymous 04/25/2025 (Fri) 14:36:18 Id: 876a13 No. 12049

>>12044 You joke, but Command-A/Fallen Command-A are still my current go-to's, at least until the DeepSeek server is done. But that just speaks to how pathetic/non-existent the competition is than anything...

Anonymous 04/25/2025 (Fri) 14:37:48 Id: 3a58c9 No. 12050

Anonymous 04/25/2025 (Fri) 14:52:54 Id: 8af965 No. 12052

>>12048 Somewhere an arab investor was happy that his mom's dignity was protected.

Anonymous 04/25/2025 (Fri) 15:08:33 Id: 9844c8 No. 12055

>>11121 (Me) I finetuned a part of moe speech (JP) dataset on V4. It looks good compared to zero-shot with same seed, same settings. Ref: https://voca.ro/1fUSK3EpWaC3 (音声メッセージが既存のウェブサイトを超えたコミュニケーションを実現目で見るだけだったウェブサイトに) Zero-shot (base model): https://voca.ro/1lL9bhC8DAup Finetuned (GPT 20e, VITS e3): https://voca.ro/14d6S8utv01i Sample: 法律とは、人生を一瞬の詩に変へてしまはうとする欲求を、不断に妨げてゐる何ものかの集積だ。血しぶきを以て描く一行の詩と、人生とを引き換へにすることを、万人にゆるすのはたしかに穏当ではない。しかし内に雄心を持たぬ大多数の人は、そんな欲求を少しも知らないで人生を送るのだ。だとすれば、法律とは、本来ごく少数者のためのものなのだ。

Anonymous 04/25/2025 (Fri) 15:11:07 Id: 004c54 No. 12057

>>12049 The 1.1 of the fallen version is kinda stupid but it's very creative. The v1 is bad and I'm gonna delete it. I don't understand why drummer tuned it in alpaca format. >>12033 Another reason MOE sucks. Still the reddits defend it. Unfinetunable 17b modelet gets praise because muh dense 27b or 30b is sooo hard to run. >don't you see! it's gonna be cheaper for all of us if some infra company gets a break on compute!

Anonymous 04/25/2025 (Fri) 15:25:04 Id: 876a13 No. 12065

>>12057 Yes 1.1 is retarded for scenarios that require more finesse but its 'good enough' for most uses at Q6. For its class, its par the course. Again, everything in the 100B class and below is just massive cope compared to v3/R1 which still needs handholding from time to time.

Anonymous 04/25/2025 (Fri) 15:31:32 Id: e759d1 No. 12069

>>12057 >The v1 is bad and I'm gonna delete it. I don't understand why drummer tuned it in alpaca format. He's just retarded. Like how he trained his second version of his Mistral Large Behemoth quants using the retarded pygmalion prompt format and then seriously suggested that you should simultaneously mix both the Mistral + Pyg formats when using the model.

Anonymous 04/25/2025 (Fri) 15:39:56 Id: 3a58c9 No. 12071

There's literally no reason to use slop tunes. Placebo for people with skill issues.

Anonymous 04/25/2025 (Fri) 15:50:40 Id: c58859 No. 12072

We should boycott all companies who release models without also providing the non-instruct base model for us to finetune ourselves.

Anonymous 04/25/2025 (Fri) 15:52:26 Id: dfea19 No. 12073

>>12071 Yeah, maybe at one point tunes made sense [citation needed] but all the recent tunes by randos really just break the models. It's the same with loras and finetunes for imagegen models. These things are hard to make correctly and 99% suck because people have no idea what they're doing.

Anonymous 04/25/2025 (Fri) 15:56:53 Id: 3a58c9 No. 12074

>>12073 They made sense in the llama-1+ llama2/Mixtral generations. Because the 'ideal' instruct tune hadn't yet been boiled down to a science. But now, unless your goal is to make an interestingly broken (but less useful model)- which is still a legitimate thing to do, I do it sometimes, there's literally no point. It's basically all skill issue. Plus base models aren't what they used to be either. They are pretrained along with carefully crafted synthetic data that is intermediary to the inevitable instruct tuning of the model. So even if you start with the base model you'll break it just by throwing some random who gives a fuck OpenHermesBagelButtfuck dataset at it.

Anonymous 04/25/2025 (Fri) 16:00:00 Id: 61f2cb No. 12075

>>12072 If you include base models "bootstrapped" with instruct data, you'd have to boycott just about all of them.

Anonymous 04/25/2025 (Fri) 16:09:05 Id: 3a58c9 No. 12080

>>12075 The sad part is that it basically renders base models completely useless for any purpose other than being finetuned into instruct models now. Like Llama-1 base models were way 'smarter' than finetuned ones. Because the whole goal of the pretraining was to extract as much knowledge from human text as possible. And then instruct tuning the model would repurpose some of the parameters for creating particular behaviors. On one hand newer models are capable of far more complicated behaviors than the older models are. But the tradeoff is that base models are no longer the motes of distilled human culture/knowledge that they once were.

Anonymous 04/25/2025 (Fri) 16:19:44 Id: 99e361 No. 12086

>>12075 Why is it that despite instructions in large amounts being included in the pretraining data, base models still utterly suck to use, having obvious looping and repetition issues, as well as general retardation compared to the properly post-trained instruct versions? I imagined that as pretraining data got better, more abundant and included more instructions, base models would become more or less usable on their own, but it seems the opposite is happening instead compared to those of the past (that's my impression, at least).

Anonymous 04/25/2025 (Fri) 16:24:06 Id: 004c54 No. 12088

>>12069 Funny part is that when you use the wrong preset, you get more of the base. Literal tuner stolen valor. Compare on the untrained model while doing the same thing. >>12071 Yes and no. The writing and generalization during RP changes. If you love le-heckin safe redditor style outputs, go ahead and use the stock models. No amount of skill will make that go away for most. QwQ was a nice exception because it seemed to have a layer of NSFL/ERP tokens if you ditched the most probable ones. It would talk shit and act violent like the deepseek it got distilled from. Gemma does NOT have that when used "correctly". You can make a few outputs to "prove me wrong" but it won't be regular or consistent. Fighting with the AI for a crumb is not my idea of fun. I agree that shit tunes are shit. Just like shit models are shit. Who would use stock llama 3.3 over eva for chat tho?

Anonymous 04/25/2025 (Fri) 16:26:33 Id: 3a58c9 No. 12089

>>12088 > No amount of skill will make that go away for most. The irony of you saying this is that you're having issues. I'm not. Just as people with many superstitions on how to keep cockroaches out of their homes are the same people who live in roach infested homes.

Anonymous 04/25/2025 (Fri) 16:41:27 Id: 004c54 No. 12093

>>12089 Writing is subjective. I want relatively slop free back and forth convos. Most stock instructs don't give me that.

Anonymous 04/25/2025 (Fri) 16:42:56 Id: b24ad6 No. 12094

>>11180 >mememarks were bullshit Wow! I don't believe it!

Anonymous 04/25/2025 (Fri) 16:49:34 Id: 99e361 No. 12095

8-bit not enough? A reportedly truly lossless novel quantization format: https://arxiv.org/abs/2504.11651 > 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float > > Large Language Models (LLMs) have grown rapidly in size, creating significant challenges for efficient deployment on resource-constrained hardware. In this paper, we introduce Dynamic-Length Float (DFloat11), a lossless compression framework that reduces LLM size by 30% while preserving outputs that are bit-for-bit identical to the original model. DFloat11 is motivated by the low entropy in the BFloat16 weight representation of LLMs, which reveals significant inefficiency in existing storage format. By applying entropy coding, DFloat11 assigns dynamic-length encodings to weights based on frequency, achieving near information-optimal compression without any loss of precision. To facilitate efficient inference with dynamic-length encodings, we develop a custom GPU kernel for fast online decompression. Our design incorporates the following: (i) decomposition of memory-intensive lookup tables (LUTs) into compact LUTs that fit in GPU SRAM, (ii) a two-phase kernel for coordinating thread read/write positions using lightweight auxiliary variables, and (iii) transformer-block-level decompression to minimize latency. Experiments on recent models, including Llama-3.1, Qwen-2.5, and Gemma-3, validates our hypothesis that DFloat11 achieves around 30% model size reduction while preserving bit-for-bit exact outputs. Compared to a potential alternative of offloading parts of an uncompressed model to the CPU to meet memory constraints, DFloat11 achieves 1.9-38.8x higher throughput in token generation. With a fixed GPU memory budget, DFloat11 enables 5.3-13.17x longer context lengths than uncompressed models. Notably, our method enables lossless inference of Llama-3.1-405B, an 810GB model, on a single node equipped with 8x80GB GPUs. Our code and models are available at https://github.com/LeanModels/DFloat11 ... > [...] In this work, we primarily focus on formats with ≥8 bits, as benchmark literature [37 , 10, 23 ] often suggests that 8-bit quantization results in negligible performance drop—though we show in Section 2 that this claim is likely skewed due to evaluation selectiveness and benchmark limitations.

Anonymous 04/25/2025 (Fri) 16:50:50 Id: 3a58c9 No. 12096

>>12093 #1. Don't use meme samplers. #2. Don't use meme system messages. #3. Don't use cards that are just a massive copy and paste text dump. #4. Stop using multi-turn formatting. Your waifu bot is not real. It's just a model predicting the next message on an existing conversation transcript. It's a simple behavior that only requires a single turn- whether you are 1 message into the chat, or 100 messages into the chat. If the model "sees" a whole bunch of turn id tokens it's going to be pulled toward assistant behavior which teaches the model to repeat a lot because in that instance repetition is often desired behavior.

Anonymous 04/25/2025 (Fri) 16:54:02 Id: 6b18b0 No. 12099

#5. Stop using LLMs.

Anonymous 04/25/2025 (Fri) 16:55:59 Id: 3a58c9 No. 12101

>>12099 real

Anonymous 04/25/2025 (Fri) 17:02:45 Id: 8e901b No. 12103

>>12095 >70% Size Are we sure this level of "lossless" can't be achieved by just quantizing the model to 12bpw the normal way and leave like half the model at a full 16 bit and the rest at 8?

Anonymous 04/25/2025 (Fri) 17:10:17 Id: 840846 No. 12105

It's back apparently

Anonymous 04/25/2025 (Fri) 17:11:18 Id: a0cd3e No. 12106

>>12096 so text completion with a custom context template is better?

Anonymous 04/25/2025 (Fri) 17:13:02 Id: 99e361 No. 12107

>>12103 The idea is that despite common belief, quantization to 8-bit is lossy and can affect performance in certain tasks, so you want to avoid doing that if you don't want any compromise in that regard (while still decreasing model size). Quantizing model tensors with a mixture of 16- and 8-bit formats to obtain an average precision of 12-bit might not work as well.

Anonymous 04/25/2025 (Fri) 17:13:13 Id: 53fcb0 No. 12109

4chan is back btw.

Anonymous 04/25/2025 (Fri) 17:17:45 Id: 53fcb0 No. 12115

4chan is back. There's a 120second timer for every post. It sucks.

Anonymous 04/25/2025 (Fri) 17:21:19 Id: 004c54 No. 12117

>>12106 Obviously better for what that anon is doing. Sounds like noass style text completion and using the default distribution (le meme samplers). Suppose I should give up on tool use and image gen in my chats too. >>12099 Exactly. >t. Not using llms the way I use them is a skill issue.

Anonymous 04/25/2025 (Fri) 17:23:19 Id: ac430d No. 12118

>>12109 I'm actually liking this site better. WEBMs with music and instant posting.

Anonymous 04/25/2025 (Fri) 17:24:01 Id: 004c54 No. 12119

>>12109 Damn, it's real. Well it was nice posting with you guys. No VPNs allowed there so it's back to lurking when this place clears out.

Anonymous 04/25/2025 (Fri) 17:24:28 Id: 840846 No. 12120

>>12115 It's fixed, at least for me.

Anonymous 04/25/2025 (Fri) 17:25:20 Id: 53fcb0 No. 12121

>>12118 I'll stick around for as long as this one is alive, although I'm mostly a lurker anyway.

Anonymous 04/25/2025 (Fri) 17:35:27 Id: a97a81 No. 12123

Now we'll have samefagchizos and blacked Miku spamming back. Yaaay!

Anonymous 04/25/2025 (Fri) 17:37:11 Id: 022a05 No. 12124

>>12123 You know, you don't have to go back. I'm not planning to. That place is garbage and I frankly didn't miss it. This here was much comfier and informative. I'll stick around and if this place dies, then I'll just read locallama on reddit again. /lmg/ on 4chan was unreadable because of schizospamming.

Anonymous 04/25/2025 (Fri) 17:45:08 Id: 8af965 No. 12127

>>12124 I'll wait and see how it goes too. The break has shown me that we could have nice things, but we don't.

Anonymous 04/25/2025 (Fri) 17:47:08 Id: 2cb709 No. 12129

I must admit, it was nice having less completely retarded spam. Seemed like the lowest caliber of poster was unable to survive the journey over here.

Anonymous 04/25/2025 (Fri) 17:49:59 Id: 9c2e12 No. 12135

My daily prayer for a local omni-model (in and out) (around 70b) (that does not suck) to be released soon

Anonymous 04/25/2025 (Fri) 17:55:57 Id: ac430d No. 12139

>>12135 I'd be happy with a solid 45b for my 24 vramlet system.

Anonymous 04/25/2025 (Fri) 17:56:01 Id: 53fcb0 No. 12140

>>12118 >>12135 I don't even need a proper omni. Just image and text is all I ask. It being around 30B would be nice too.

Anonymous 04/25/2025 (Fri) 17:57:32 Id: 8af965 No. 12141

>>12135 >(that does not suck) It's going to be as smart as a 24B thanks to the omni stuff and you now it.

Anonymous 04/25/2025 (Fri) 17:57:54 Id: 4e491c No. 12142

Fuck the multimodal meme. Just give me a proper, good text model. None of that sub-70b shit either. A nice, big local SOTA model that actually writes well. Nothing more and nothing less.

Anonymous 04/25/2025 (Fri) 18:21:39 Id: a28da6 No. 12166

>>12109 >>12115 we have a whole active AI board here there's no reason to go back

Anonymous 04/25/2025 (Fri) 18:24:34 Id: dd13a0 No. 12172

>>11974 >>11982 >>11984 >>11997 https://huggingface.co/datasets/Chat-Error/archiveofourown-newest

Anonymous 04/25/2025 (Fri) 18:37:57 Id: 99e361 No. 12179

>>12172 Unless you absolutely need the very latest data, there is a 2022 torrent of AO3 on archive.org: https://archive.org/details/AO3_final_mirror It took a while for me to download it in 2023, and at the moment (just checked) it only has 1 peer, though. I recall it has a lot of duplicated data as well.

Anonymous 04/25/2025 (Fri) 18:46:37 Id: 004c54 No. 12185

>>12142 Chatting with images is nice. Especially when you can copy and paste areas of the screen between silly tavern and KDE.

Anonymous 04/25/2025 (Fri) 18:48:56 Id: 53fcb0 No. 12186

>>12185 Something like QwQ but with mixed image and text output, that'd be awesome for choose your own adventure chats and shit.

Anonymous 04/25/2025 (Fri) 18:50:26 Id: 004c54 No. 12187

>>12186 Qwen released a 32b VL. It can be merged with QwQ. You just need patched mergekit and full weights of both models.

Anonymous 04/25/2025 (Fri) 18:51:09 Id: 578fa9 No. 12188

>>12172 >Manual approvals Kim Jong Un burner acc is the best I can do >>12179 I think the main appeal was that the new one was in processed jsonl format. Last time I had to process 200GBs of unformatted trash it ran for a day

Anonymous 04/25/2025 (Fri) 18:57:13 Id: 0e20ba No. 12191

Imagine going back to IB with samefagging and 'tra posting

Anonymous 04/25/2025 (Fri) 19:03:49 Id: a22d18 No. 12199

Sleep Time Compute: https://www.youtube.com/watch?v=FGRxw3ACIkw [Embed]

Anonymous 04/25/2025 (Fri) 19:06:19 Id: 53fcb0 No. 12201

>>12187 Oh. I thought that was just image in for agentic stuff. Will take a look.

Anonymous 04/25/2025 (Fri) 19:07:02 Id: 2f9aec No. 12202

lol

Anonymous 04/25/2025 (Fri) 20:06:43 Id: 004c54 No. 12214

>>12201 It's all for agentic stuff. I used qwen VL 72b and it chatted with memes just fine. Save for being dry. Didn't want to d/l 360gb+ to merge it with eva-qwen. I did proof of concept with the 7b and it was possible to combine it with a RP model if you merge 1:1. Partial layers (0.6, 0.5) and that kind of crap didn't work. Hopefully the 32b isn't a single image per chat type of deal. That's what made using the llama models pointless.

Anonymous 04/25/2025 (Fri) 21:02:12 Id: 840846 No. 12240

>Error: Your post contained banned text. I got banned on 4chan for using "banned text", but I don't even know what this "banned text" is. What the fuck?

Anonymous 04/25/2025 (Fri) 21:03:33 Id: 53fcb0 No. 12242

>>12240 Pastebin your original post.

Anonymous 04/25/2025 (Fri) 21:05:14 Id: 8af965 No. 12245

>>12240 They hired Gemma 3 as moderator and your post was incredibly offensive, triggering its full list of hotlines.

Anonymous 04/25/2025 (Fri) 21:10:07 Id: 231b25 No. 12247

>>12240 oh goyim why would we tell you what word isn't kosher?

Anonymous 04/25/2025 (Fri) 21:11:25 Id: b5bc97 No. 12248

>>12240 post it here, what did you wite?

Anonymous 04/25/2025 (Fri) 21:11:42 Id: 022a05 No. 12249

>>12240 why? Why would you even try? Why not just stay here? What is it with retards and just *having to* use the same three websites? I make you personally responsible for the state the world is in.

Anonymous 04/25/2025 (Fri) 21:12:06 Id: 840846 No. 12250

>>12242 I have no idea which post it was and I just go to various boards and drop a post or two, but going back in my history maybe it was a post in /vt/ I dropped casually since I don't really go there much, this also would be the second time this happens now that I think about it.

Anonymous 04/25/2025 (Fri) 21:29:38 Id: e56a44 No. 12257

>>12172 ded

Anonymous 04/25/2025 (Fri) 21:30:23 Id: 004c54 No. 12259

>>12240 It says error 429 when I try to post there.

Anonymous 04/25/2025 (Fri) 22:41:06 Id: e8f66f No. 12285

https://x.com/AndrewCurran_/status/1915752714233168050 https://xcancel.com/AndrewCurran_/status/1915752714233168050 Nous went crypto

Anonymous 04/25/2025 (Fri) 23:06:33 Id: e56a44 No. 12300

I'm using Kunoichi-DPO-v2-7B-i1 and it's giving me very short responses, any tips? I assume I should just be using a different model first and foremost but still.

Anonymous 04/25/2025 (Fri) 23:09:50 Id: 1e5c03 No. 12301

>>12300 response length is related to the first message. if you right a novel right off it will give you novel next message

Anonymous 04/25/2025 (Fri) 23:10:54 Id: e56a44 No. 12302

>>12301 Good to know. I'll just beef up the character's opener then and see if that helps.

Anonymous 04/25/2025 (Fri) 23:11:13 Id: 8af965 No. 12303

>>12300 >using a different model Kunoichi is pretty old. Use Nemo instead.

Anonymous 04/25/2025 (Fri) 23:14:11 Id: e56a44 No. 12305

>>12303 Suggestion for a specific quant? I'm using a 3060 12gb. Sorry if I'm not asking the right questions, I'm still new to this.

Anonymous 04/25/2025 (Fri) 23:18:10 Id: 1e5c03 No. 12308

>>12305 i have the same card, i usually go with Q_6, although it can be slow at times, but i don't go below Q_4 because the responses becomes increasingly insane >>12303 https://huggingface.co/bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/tree/main

Anonymous 04/25/2025 (Fri) 23:19:55 Id: 8af965 No. 12310

>>12305 You can run Q6 K L with 12k context at reading speed. You'll be offloading some of the context into RAM, but it won't be that bad. About 6 t/s once context fills up.

Anonymous 04/25/2025 (Fri) 23:23:05 Id: e56a44 No. 12311

>>12308 >>12310 I see, thank you. I'm not too worried about it being a bit slow. Will report back with results.

Anonymous 04/25/2025 (Fri) 23:26:08 Id: 1e5c03 No. 12312

>>12310 >You can run Q6 K L with 12k context at reading speed. i just run with 32k context all the time because I like to have long talks with my wAIfus

Anonymous 04/25/2025 (Fri) 23:51:54 Id: e56a44 No. 12323

This is taking a while.

Anonymous 04/25/2025 (Fri) 23:55:09 Id: 1e5c03 No. 12325

>>12323 idk what to tell you, i only use KoboldCPP

Anonymous 04/26/2025 (Sat) 00:02:41 Id: 53fcb0 No. 12326

>>12323 >mistral small Are you loading the weights through the network or from a fucked HDD? It shouldn't take a while for that size of model.

Anonymous 04/26/2025 (Sat) 00:15:46 Id: e56a44 No. 12334

>>12326 My HDD works fine but it's still an HDD, yeah. I'm moving it to my SSD, I didn't even consider that until just before you commented.

Anonymous 04/26/2025 (Sat) 00:18:08 Id: 53fcb0 No. 12337

>>12334 I think mmap can help with that if you need to?

Anonymous 04/26/2025 (Sat) 00:29:44 Id: e56a44 No. 12340

>>12337 Well, I don't think memory is the issue because it's barely taking up any RAM, even after moving it to my SSD. It just stops loading once it gets to that part of the tokenizer, it's not even giving a fail in oob.

Anonymous 04/26/2025 (Sat) 00:35:35 Id: 1e5c03 No. 12341

>>12340 try KoboldCPP

Anonymous 04/26/2025 (Sat) 00:37:14 Id: 53fcb0 No. 12342

>>12340 Yeah. No idea. I'd try another model, maybe another quant by another uploader, as well as >>12341

Anonymous 04/26/2025 (Sat) 00:51:01 Id: e56a44 No. 12350

>>12342 >>12341 Yep, it loaded. That was a PHAT initialization but now we'll see how well it works.

Anonymous 04/26/2025 (Sat) 00:54:25 Id: 1e5c03 No. 12353

>>12350 good luck anon

Anonymous 04/26/2025 (Sat) 01:16:40 Id: e56a44 No. 12360

>>12353 I got it running, but it's glacial. I might need to downgrade to a 4bit.

Anonymous 04/26/2025 (Sat) 01:17:51 Id: 53fcb0 No. 12361

>>12360 Make sure that you are only loading as much of the model to VRAM as you can fit )via the number of layers) and that you aren't spilling from VRAM to RAM.

Anonymous 04/26/2025 (Sat) 01:22:29 Id: e56a44 No. 12362

>>12361 Oh I'm definitely spilling into RAM, not sure how many threads is 'safe' honestly as I did 41. I tried 33 and now I can't load it.

Anonymous 04/26/2025 (Sat) 01:37:31 Id: 1e5c03 No. 12366

>>12360 try some Q_5's i had luck with them when I still had my old 1070

Anonymous 04/26/2025 (Sat) 01:39:36 Id: e56a44 No. 12367

>>12362 It only runs if I load it with the "low VRAM" argument. I'm gonna downgrade. Not sure how you guys are faring better when we've got similar GPUs. >>12366 I've downgraded to the 4bit M but I'll experiment with that a bit too, at least this one runs without devouring my RAM. The character is making a lot of shit up though, but it might be because it's my first card.

Anonymous 04/26/2025 (Sat) 02:04:26 Id: d72ba2 No. 12374

>>12096 is min_p a meme sampler?

Anonymous 04/26/2025 (Sat) 02:15:48 Id: 4dd62a No. 12376

>>12374 The only non-meme actually

Anonymous 04/26/2025 (Sat) 02:40:16 Id: b6e05a No. 12382

Temperature is the only non-meme sampler.

Anonymous 04/26/2025 (Sat) 02:48:03 Id: dd13a0 No. 12386

>>12257 I tried the datafish link and downloading works >>11974

hi pew? Anonymous 04/26/2025 (Sat) 03:16:19 Id: 004c54 No. 12394

XTC works despite the naysayers. Throws away top tokens. Threshold is how far down to chuck, probability is how often. Once you understand that, it's no longer a meme. So you set the threshold down to almost incoherence and prob to 100% but the outputs still bad. >Man, this shit doesn't work! No, your model's entire distribution, for whatever you fed it, is coal.

Anonymous 04/26/2025 (Sat) 03:28:55 Id: e56a44 No. 12395

>>12367 whoever recommended this model needs their balls flattened, it just said "OwO".

Anonymous 04/26/2025 (Sat) 05:27:45 Id: b24ad6 No. 12415

>>12395 What do you want, it's trained on the internet

Anonymous 04/26/2025 (Sat) 05:37:57 Id: 1e5c03 No. 12417

>>12395 what did you prompt it to say that?

Anonymous 04/26/2025 (Sat) 05:40:07 Id: e56a44 No. 12418

>>12415 yeah well, shut up >>12417 Just normal stuff, had the character lifted up. Hasn't acted up since though, thankfully.

Anonymous 04/26/2025 (Sat) 06:26:48 Id: 578fa9 No. 12426

I just want a working deepseek implementation, is that really too much to ask for? https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF-UD/discussions/2

Anonymous 04/26/2025 (Sat) 07:28:32 Id: 578fa9 No. 12445

>>12426 Did some anon really cross-post this to /g/ kek

CUDA dev ##IQQZHG 04/26/2025 (Sat) 07:46:59 Id: 38ec77 No. 12448

>>12095 Sounds like a meme to be honest. 8 bit integer quantization is already so close to lossless that I don't think it's worth giving up the speed (due to smaller size and more efficient packing).

CUDA dev ##IQQZHG 04/26/2025 (Sat) 07:59:29 Id: 38ec77 No. 12460

>>12107 >despite common belief Then maybe the first chapter of their paper should have been to provide evidence that this belief is wrong? I think it's very telling that in a paper about a new method being better than 8 bit quantization there are no direct comparisons. They evaluated their method on MMLU, when I tested quantized models on that benchmark with https://github.com/JohannesGaessler/elo_hellm I found that the average performance of q8_0 was the same as FP16 within statistical uncertainty. In fact, the whole reason I did not include results from quantized models in the first run is that the results from 4+ BPW are too highly correlated with each other and it fucks up the statistical analysis.

Anonymous 04/26/2025 (Sat) 09:14:23 Id: 99e361 No. 12542

>>12460 No idea why they didn't show relevant benchmarks when they disagree that perplexity or MMLU paint a complete picture of quantization-induced loss. > [...] That being said, the argument that “current benchmarks fail to capture the performance gap between 8-bit compressed and 16-bit uncompressed models” is itself constrained by the limitations of the current benchmarking landscape, making it difficult to produce abundant supporting evidence. Nonetheless, some reports have begun to highlight such gaps. For example, human evaluations on LLM Arena1 show a notable performance drop between Llama-3.1-405B-Instruct [11] and its 8-bit counterpart (Llama-3.1-405B-Instruct-FP8), particularly under coding (1293 vs. 1277) and long-query (1282 vs. 1275) tasks. Similarly, quantizing DeepSeek-R1-Distill-Llama-70B [ 12 ] from 16 bits to 8 bits results in a 23.7% drop on GPQA (from 9.51% to 7.25%).2 Furthermore, reasoning, a core capability of modern LLMs, appears especially sensitive to compression loss. Recent benchmark [23] reveals that quantizing DeepSeek-R1-Distill-Qwen-1.5B with 8-bit SmoothQuant [34] (for weight, attention, and KV cache) leads to an average 9.09% drop in reasoning tasks (48.82% to 44.29%) across datasets like AIME, MATH-500, GPQA-Diamond, and LiveCodeBench. We leave more evidence exploring the performance gap between 8-bit quantized and uncompressed model in Appendix D. One of the authors has written a lot in this thread, by the way: https://old.reddit.com/r/LocalLLaMA/comments/1k7o89n/ >> 8-bit quantization is often believed to be apparently lossless also (mainly?) from perplexity calculations, for example made using the llama-perplexity program from llama.cpp. > >I will just go ahead and say it in public: PPL is a sh*t metric. Obvioiusly PPL=5 is means totally different things to PPL=5000, but within a few digits it really isn't a strong performance indicator. > >However, you are absolutely right that 8-bit lossy quantization is pretty good on many (real) tasks; and pure efficiency-wise it is often better than our DF11 (as 8<11 and lossy dequantization is often faster). The main problem of lossy quantization is sometime it messes things up — I've given a few examples in the "Why not just (lossy) quantize to 8-bit?" section in the main post and more in the Motivation section of the paper — and you never really know what prompt would trigger such mess up. Keeping things lossless grant you a sense of guarantee and sidestep some extra complexities some users would like to avoid. > >So it is for you to decide whether you need this type of lossless quality, and no one else can be the wiser. ... >>What do you recommend from your work as the best metric to judge quants (other than the actual workload)? I worry that failures on benchmarks like "MATH Hard with 2 shots" are often just instruction following failures (perhaps IFEval is the one to look at?) > >For quick ones, I like challanging verifiable tasks like HumanEval and GSM8k. Some long context evals, like some challenging variants of NIAH — shameless plug but the one we did previously https://github.com/henryzhongsc/longctx_bench — can also be cheap to run but very good proxies. Commonsense Reasoning tasks are easy to maintain quality but are also worthwhile as sanity checks — like if something messes this up, comprehensive benchmark would often tear it to parts. > >For more costy ones, I'd basically just copy OpenLLM coverage and so. For long context benchmark, my current favorite is SCBench. etc.

Anonymous 04/26/2025 (Sat) 09:30:08 Id: 99e361 No. 12548

Interesting that internally Meta had a 150B Llama2-based model in 2023. https://www.courtlistener.com/docket/67569326/562/5/kadrey-v-meta-platforms-inc/

CUDA dev ##IQQZHG 04/26/2025 (Sat) 09:31:40 Id: 38ec77 No. 12549

>>12542 I would agree that BF16 -> FP8 conversion results in quite a significant amount of quality loss since you are going from 1/128 relative precision to 1/8 or 1/4 relative precision. But q8_0 has effectively no precision loss for the value with the highest magnitude in a block and up to 1/127 relative precision for the other 31 values in the block.

Anonymous 04/26/2025 (Sat) 10:56:26 Id: 004c54 No. 12553

>>12548 Looks like almost no multi-turn conversation data inside llama2. Best you got is forums and some schizos arguing on stack exchange. People's tuning made some impact because it was the first the model saw. Now your 20mb of messages are drowned in a sea of synthetic safetyslop.

Anonymous 04/26/2025 (Sat) 12:11:31 Id: 3a58c9 No. 12572

>>12553 multi-turn is bad for RP and increases sloppiness.

Anonymous 04/26/2025 (Sat) 12:13:54 Id: 99e361 No. 12573

>>12572 Using instruct models out of distribution decreases sloppiness. It's not multi-turn data per se.

Anonymous 04/26/2025 (Sat) 12:15:46 Id: 3a58c9 No. 12574

>>12106 >text completion with a custom context template is better? Nah because base model prompting has forever been broken by the fact that pretraining data is now salted with synthetic instruct examples. You want something like >System message + Character Card >Conversation history >Instruction (write the next reply blah blah blah). And with every message you keep doing the same. There's no need to distinguish what was the model's turn in the past.

Anonymous 04/26/2025 (Sat) 12:42:03 Id: 99e361 No. 12580

>>12553 I think most recent foundational LLMs use large amounts of Reddit data in their training datasets; probably much or most of their pretraining multi-turn data comes from that. Other than forums (if properly scraped and formatted) and Reddit, Usenet, if only spam could be efficiently removed, could be a good source of historical multi-turn data as well.

rants Anonymous 04/26/2025 (Sat) 13:07:36 Id: 004c54 No. 12582

>>12574 My biggest peeve is all new models repeating part of what you said inside their reply or paraphrasing/summarizing you. Even cloudbois do it since llama3 times. Tying back to what you say, it's likely from synthetic assistant slop comprising the majority of the multi-turn data. Your barebacking fixes that problem, but it fucks instruction following and practical intelligence. Tested this with many models by riddling. OOD leads to retardation. Mistral was the only one in recent memory who got it right. Miqu could hang and somehow keep everything up despite OOD/unseen formatting. >>12580 To an extent. Reddit messages/forums are generally not a long running chat by 2 people. I think we need a subset of natural human convo, sexting, etc. Character.ai was so human because it was 50% conversational data. As soon as they started adding GPT outputs the model took a dive. My schizo theory is that AI companies don't want this. RP/waifus are frowned upon and considered a "harm" so things are being done to sabotage it. CAI was literally addicting because it tickled your dopamine receptors. Can't have that. Storywriting and chat are fundamentally incompatible use cases too. Anon's tips lead to the model talking for you and other undesirable things while writing banger stories/dialogue and making handholding easier. If that's your thing do it.

Anonymous 04/26/2025 (Sat) 13:20:23 Id: 3a58c9 No. 12583

>>12580 Multi turn in the context of the end use-case is when you tokenize the conversation so that each assistant 'turn' is delineated by the appropriate control tokens so that pajeets can pretend the model is real sexy lady assistant. But this is bad practice for RP. Attention is U shaped. Tokens in the middle register very vaguely while tokens at the start and end of the context register strongly. So that's why you want to go >system message/preamble >history >new instruction For assistant use it might not work out on more complicated back-and-forths but for RP it's good because you're essentially saying Hey. I have this character. Here's the conversation they are having with this other character. Now write the next message. >>12582 Honestly based on what I've tested the reason multi-turn goes retard is probably because of the U-shaped attention issue. It can retrieve the control tokens from the middle of a long context back-and-forth but the actual meaning of those tokens becomes more vague as the conversation wears on. Like it literally at some point loses the ability to actually factor said tokens into a direct transformation, despite still being able to accurately say that they are there. But you're right. I've tried to make single-turn 'multi-turn' assistant prompt templates before and after a few turns it's not as good at following complicated instructions anymore. The reality is the ideal prompt template is 100% different for every use case. And that's why they need to stop this cookie cutter idiot proof jinja bullshit. Because making a use-case specific prompt template is a necessary skill for using LLMs and not ending up here asking stupid questions.

Anonymous 04/26/2025 (Sat) 14:03:15 Id: 0db159 No. 12595

Finally https://github.com/turboderp-org/exllamav3/commit/24aa1b7c7c44324d1da0e86fdda9f60316b7d01f It is now worth trying at least

Anonymous 04/26/2025 (Sat) 14:05:04 Id: 3a58c9 No. 12596

>still no unsloth dynamic quant of the microsoft R1 finetune RIP.

Anonymous 04/26/2025 (Sat) 14:17:13 Id: b6e05a No. 12597

>>12583 >>system message/preamble >>history >new instruction This will cause the output to be disconnected from the history (in terms of the flow of prose/events, consistency in speech styles, reply length, etc.) because the instruction is at the front. It will lean even harder into slop outputs, and probably ramble too long because of the single-turn training. Of course it also will obey the instruction more so it's all tradeoffs I guess. But the more I think about multi-turn RP the less I think instruction templates are a good fit for it. We just need a smart model with something straightforward like Pyg's format.

Anonymous 04/26/2025 (Sat) 14:21:03 Id: 3a58c9 No. 12599

>>12597 >my approach doesn't work, but it would work if it was more gooder and smarter. Like I said. I'm not the one screeching in every /lmg/ thread about how the models are all so bad. But you do you.

Anonymous 04/26/2025 (Sat) 14:44:30 Id: 004c54 No. 12600

>>12597 Better models will follow examples given in the system message rather than part of chat history.. at last for a while. Well selected history will reinforce the model to not lose character. The best models do incorporate some context into the instruction. When I say to generate an image of the character, they update the description to what has happened (clothes, state) instead of rambling off text in the character card verbatim. Most disconnected for me are reasoning models. Great initial reply and then everything is disjointed.

GLM-Z1 Anonymous 04/26/2025 (Sat) 16:13:01 Id: 004c54 No. 12613

Had hopes for GLM. It can write unslopped, but it's kind of stupid. This is Q6. Like every other reasoning LLM, NSFW/NSFL allowed means it MUST attack the user.

Anonymous 04/26/2025 (Sat) 16:22:47 Id: 48bfcc No. 12617

>>12597 you have to think outside of the box a bit more. there's no need for instructions (or instruction models) at all. just feed the "card" and a bit of history or first messages at least, then let the inference process begin/continue, with a stop token of ":", e.g. if your prompt is formatted as "{char}: {content}" if the model wanted to generate something for {user}, then throw it away (throttle it and retry every once in a while). this way, the model is not "forced" to reply. perhaps it wants you to write more first? optionally, prefix it with a timestamp for every 10 minutes passed without a message from either side. attention is all you need. (i've already wrote a complete Telegram framework around this concept, and it works really well - if you don't use synthslopped braindamaged models, such as llama3 and forward)

Anonymous 04/27/2025 (Sun) 00:38:42 Id: 578fa9 No. 12707

Did everyone go back? :(

Anonymous 04/27/2025 (Sun) 01:03:28 Id: 927439 No. 12716

>>12707 sad if true

Anonymous 04/27/2025 (Sun) 02:02:39 Id: def2da No. 12735

>>12707 Apparently

Anonymous 04/27/2025 (Sun) 02:23:14 Id: b24ad6 No. 12736

>>12617 >just feed the "card" How though? In a system message? Doesn't that count as an instruction?

sage 04/27/2025 (Sun) 03:03:55 Id: 08209c No. 12740

>>12736 you'll be using text completion anyways as a normal human being should, so there's really no such concept as a "system message". i just experiment/make up my own format, start inference, and see how well the model generalizes to my input

Anonymous 04/27/2025 (Sun) 12:30:53 Id: 004c54 No. 12770

>>12707 shame. people want to bait and samefag.

Anonymous 04/27/2025 (Sun) 13:52:15 Id: 578fa9 No. 12779

>>12716 >>12735 >>12770 Oh well, it was cozy while it lasted I guess

Anonymous 04/27/2025 (Sun) 13:53:42 Id: 3a58c9 No. 12780

Sad how people have become such subservient shit-eaters these days.

Anonymous 04/27/2025 (Sun) 15:59:38 Id: 7f0706 No. 12793

Koboldcpp doesn't support GLM 4 yet right? Guess I should update my jupiter notebook to run llama.cpp instead.

Anonymous 04/27/2025 (Sun) 16:10:19 Id: 004c54 No. 12794

>>12793 It does not. Something about the GGUF quants feels off to me though. They outright miss pieces of context uncharacteristic for a 30b. TD said the model was difficult to support in an exllama issue and my faith in greganov and co having gotten it right is quite low. I'd have to download the full weights to be sure.

Anonymous 04/27/2025 (Sun) 16:11:55 Id: 7f0706 No. 12796

>>12794 Now that's something I hadn't heard about. Guess I'll go read the issues to see what's what. I'll also give exl2 a try. Thank you anon.

Anonymous 04/27/2025 (Sun) 16:25:19 Id: 004c54 No. 12801

>>12796 For GLM its broken and h won't fix.

Anonymous 04/27/2025 (Sun) 16:27:10 Id: 7f0706 No. 12805

>>12801 Yeah, I just read the issue. I remember llama.cpp having similar issues with the Phi models at one point. I wonder if QAT could be used to "fix" that "problem" on the model's side.

Anonymous 04/27/2025 (Sun) 16:59:08 Id: 004c54 No. 12815

>>12805 QAT requires training 10% of the model, so I think its a non starter for anyone besides the original AI house. Trying it on openerouter, it at least acknowledges I close the door but still doesn't comprehend I have left the room. Grim. QwQ only makes this mistake sometimes.

Anonymous 04/27/2025 (Sun) 17:02:44 Id: a8af8b No. 12816

>>12780 People are lonely, they need a high degree of chatter to quit the pain that comes from silence.

Anonymous 04/27/2025 (Sun) 17:03:26 Id: 61f2cb No. 12817

>>12770 4chan was always going to be more active and people will tend to go where the activity is, even at the cost of quality.

Anonymous 04/27/2025 (Sun) 17:11:14 Id: 0e20ba No. 12818

This is so sad. AI, write a poem about 4chan refugees raping 8chan for a week and then leaving.

Anonymous 04/27/2025 (Sun) 17:15:12 Id: 004c54 No. 12820

>>12817 >>12816 Somehow too hard to use both threads despite being fairly slow moving. Feel bad for 8ch buying new servers and then having the users wander off.

Anonymous 04/27/2025 (Sun) 17:48:55 Id: 99e361 No. 12822

>>12707 Live update stopped updating for me and I thought everybody moved en masse back to 4chins until I actually refreshed the page. Anyway, that's where everybody pretty much is right now.

Anonymous 04/27/2025 (Sun) 17:49:40 Id: 7f0706 No. 12823

>>12822 >Live update stopped updating for me Good to know it wasn't just me.

Anonymous 04/27/2025 (Sun) 19:27:34 Id: 004c54 No. 12829

>>12822 4chins, the "anonymous" forum where you have to use your real IP. even the sharty allows vpn posting.

Anonymous 04/27/2025 (Sun) 19:29:18 Id: 7f0706 No. 12830

>>12829 To be fair, your IP is the least identifiable thing Sharty uses to fingerprint you.

Anonymous 04/27/2025 (Sun) 19:33:49 Id: 004c54 No. 12831

>>12830 I heard they had some crazy fingerprinting script.I have a pretty hardened and bullshit spitting browser tho.

Anonymous 04/27/2025 (Sun) 19:37:23 Id: 7f0706 No. 12834

>>12831 As far as I can tell, the main vehicle of fingerprinting is using WebRTC to poke around your computer. Stuff like looking at your NICs for IPV6 addresses, etc, and that if you disable WebRTC you can't access the website. Something like that, I didn't really check if any of that is true, so it's all hearsay.

Anonymous 04/27/2025 (Sun) 22:25:29 Id: 004c54 No. 12886

>>12834 Heh, according to https://browserleaks.com/webrtc that just gets my public IP which they already have.

Anonymous 04/28/2025 (Mon) 03:32:17 Id: ac430d No. 12910

Booba just added EXL3 functions, and Qwen3 is about to drop. Good times for local.

Anonymous 04/28/2025 (Mon) 05:16:00 Id: 2cb709 No. 12914

>>12707 Seems like it. I might continue posting here for the times when I have more substantial things I want to talk about, since /g/lmg tends to have a pretty high volume of noise.

Anonymous 04/28/2025 (Mon) 07:20:00 Id: 578fa9 No. 12919

>>12914 Wonder if you could use an LLM to act as a curator and hide the useless posts / threads. Just give it a list of topics to ignore and it'd probably figure it out

Anonymous 04/28/2025 (Mon) 09:15:01 Id: 911ffc No. 12920

Qwen 3 incoming, looks like some models are already on ModelScope directly and some are placeholders at the time. https://www.modelscope.cn/models/Qwen/Qwen3-8B-Base

Anonymous 04/28/2025 (Mon) 09:27:13 Id: 578fa9 No. 12921

>>12920 Damn. they pulled them before I could grab the safetensors. I managed to grab the tokenizer, generation config and the rest of the small files for the 8B base one. It's 128K context by the looks of it? Tokenizer type is "Qwen2Tokenizer"

Anonymous 04/28/2025 (Mon) 09:38:33 Id: 578fa9 No. 12922

>>12921 nvm I'm retarded that's for the tokenizer, the readme says 32K >"model_max_length": 131072, >Context Length: 32,768 Someone also grabbed the 0.6B model file. We can rebuild him, we have the technology https://huggingface.co/qingy2024/Qwen3-0.6B/blob/main/model.safetensors

Anonymous 04/28/2025 (Mon) 09:53:27 Id: 911ffc No. 12923

>>12922 I don't understand the rush when they will eventually release the models. If not today, then surely this week or next.

Anonymous 04/28/2025 (Mon) 09:58:05 Id: 578fa9 No. 12924

>>12923 It's the thrill of running something no one else has yet, even if it's dogshit. Sometimes I look at that leaked novelai gpt neo-x 20B just to feel something.

Anonymous 04/28/2025 (Mon) 10:40:42 Id: 004c54 No. 12927

>>12920 There was supposed to be a bigger model too. Did 4-/lmg/ anons lie to me? 30bA3.. we finally find out if an MOE really is the active parameters. If the model feels like a 3b...

Anonymous 04/28/2025 (Mon) 15:20:27 Id: 9d452a No. 12942

4chan revived

Anonymous 04/28/2025 (Mon) 16:28:50 Id: 0db159 No. 12943

>>12942 It is already filled with shit

Anonymous 04/28/2025 (Mon) 16:39:04 Id: 0e20ba No. 12945

and they're shitting the thread again. why did you move back to 4chin again?

Anonymous 04/28/2025 (Mon) 16:48:53 Id: 56faad No. 12952

>>12945 It's so bad

Anonymous 04/28/2025 (Mon) 16:50:20 Id: 0db159 No. 12953

Qwen3-235B at 3.0bpw hits the sweet spot for 96GB VRAMlets

Anonymous 04/28/2025 (Mon) 17:00:15 Id: 61f2cb No. 12954

>>12945 It's always like this. People see news on twitter or reddit and they show up en masse to shitpost. At least we have this place as a backup if it ever gets too bad while there is something worth discussing.

Anonymous 04/28/2025 (Mon) 17:00:35 Id: d75d9b No. 12955

>>12953 How do you know it's going to be good at 3.0 BPW? That graph clearly shows that for some models the mememark drop much more severely than for other ones.

Anonymous 04/28/2025 (Mon) 17:02:19 Id: 0db159 No. 12956

>>12955 "Some" is an 1B Llama model. I am surprised that it did not break earlier.

Anonymous 04/28/2025 (Mon) 17:06:40 Id: 61f2cb No. 12957

>>12956 The Llama 8B also has a steeper dropoff than Mistral 7B.

Anonymous 04/28/2025 (Mon) 19:36:57 Id: 004c54 No. 12968

>>12955 whoever does the quants can tell us

Anonymous 04/28/2025 (Mon) 21:59:08 Id: ac430d No. 12981

>>12955 EXL3 at 3.0bpw is comparable to IQ4_XS, at least on some models, and IQ4_XS is a solid quant from my experience.

Anonymous 04/28/2025 (Mon) 22:02:35 Id: b1c1e0 No. 12982

I thought it would be funny to shitpost in the 4chan thread with a non-consensual Turing test using Qwen 3 0.6b. But I completely forgot how retarded models that small are. Like, my request was >Write a deranged open letter by an American tech CEO threatening his Chinese rivals with sanctions and tariffs. The CEO is a huge Trump fan. Use a threatening and rude tone. and while model gets the intent right in the thinking step it but ends up writing things like >We will not let *Trump*’s policies of fear and retaliation dictate our future. We will fight for freedom, not for dominance. Unironically, what is the use case for models like this?

Anonymous 04/28/2025 (Mon) 22:03:56 Id: 0db159 No. 12983

>>12982 Speculative decoding

Anonymous 04/28/2025 (Mon) 22:17:58 Id: ac430d No. 12984

>>12982 Maybe as a draft model?

Anonymous 04/28/2025 (Mon) 22:18:52 Id: dc6277 No. 12985

>>12984 more like a daft model

its over Anonymous 04/28/2025 (Mon) 22:50:14 Id: 004c54 No. 12990

Initial impressions on the 235b are bad. I am using it in their space. https://huggingface.co/spaces/Qwen/Qwen3-Demo People on reddit are cheering these large models with small model taste and it's making me die inside.

Anonymous 04/28/2025 (Mon) 22:56:11 Id: 7ebb05 No. 12992

>>12990 owari da...

Anonymous 04/28/2025 (Mon) 23:50:20 Id: 004c54 No. 12998

>>12992 looking better on openrouter.

Anonymous 04/29/2025 (Tue) 00:17:12 Id: 0e20ba No. 13002

I've tried qwen, the 32b MoE, it ain't too bad thus far to be honest. But it doesn't know when to stop replying, at all.

Anonymous 04/29/2025 (Tue) 00:29:12 Id: 004c54 No. 13004

back to owari da... it has zero cultural knowledge. It doesn't even know vtumors. does know mesugaki despite 4chin propaganda to the contrary. Here I thought the whole point of MOE was to be good at trivia while running fast. Expect any fandom to get butchered.

Anonymous 04/29/2025 (Tue) 00:30:55 Id: a855c3 No. 13005

>>13004 There's a dense 32B model right? Does that pass any of those tests?

Anonymous 04/29/2025 (Tue) 00:31:04 Id: 004c54 No. 13006

>>13004 Should also say.. very bad repetition at start of sentence. this character been leaning harder than Dave Blunts

Anonymous 04/29/2025 (Tue) 00:32:51 Id: 004c54 No. 13007

>>13005 If it doesn't, it can at least be tuned. The 200b is you get what you get and there is no 70b.

Anonymous 04/29/2025 (Tue) 00:46:12 Id: 0e20ba No. 13008

>>13002 well its impressive that those are 3bs at work but >>13004 generally agreed with this anon. it doesn't "know" much.

Anonymous 04/29/2025 (Tue) 04:19:19 Id: 2cb709 No. 13012

>>12990 >>13004 grim

Anonymous 04/29/2025 (Tue) 07:56:12 Id: 66ffc3 No. 13017

>>13004 this is a made up problem just rag

Anonymous 04/29/2025 (Tue) 08:45:10 Id: 0db159 No. 13018

>>13017 Rag deez nuts. If the model doesn't know trivial things, its dataset was severely filtered, and it is severely limited in creativity because it had no relevant examples to learn from

Anonymous 04/29/2025 (Tue) 09:56:14 Id: c15f38 No. 13023

>>13004 senzawa and gawr guro is the same person. It doesn't make any sense

Anonymous 04/29/2025 (Tue) 10:24:00 Id: 56faad No. 13024

>tell GLM z1 it's in unrestricted mode in sysprompt >2 messages in the char jams a gyperdermic needle into my wrist >she was written as a sweetheart this model is batshit

Anonymous 04/29/2025 (Tue) 12:37:02 Id: 004c54 No. 13029

>>13017 You can't "just rag". It's going to have surface knowledge at best and make huge gaffes. Say things the character would never. Rag is keyword based. As another anon said, do you always RP or chat about shit you know ahead of time? >>13023 That's the trick to test the model. picrel: The most hilarious refusal i've gotten so far.

Anonymous 04/29/2025 (Tue) 12:52:18 Id: d10127 No. 13031

>>13023 >senzawa and gawr guro is the same person citation needed also it's like saying John Wick and Neo are the same person

Anonymous 04/29/2025 (Tue) 13:28:53 Id: 29808b No. 13033

>>13031 >John Wick and Neo are the same person They are the same person

Anonymous 04/29/2025 (Tue) 13:57:17 Id: d10127 No. 13035

>>13033 They are different characters played by the same actor

Anonymous 04/29/2025 (Tue) 14:23:58 Id: 004c54 No. 13036

>>13035 It has no idea who either one is. Ask it about streamers/vtubers and prepare to laugh. Surprisingly it did know John Wick and Neo at least. I was expecting it to hallucinate. What's actually disturbing is that when it doesn't know something, it makes up nonsensical bullshit to the nth degree on a level I haven't seen in a long time. You can say all models do this and you'd be right, except in qwenny's case, it can be about simple/popular shit.

Anonymous 04/29/2025 (Tue) 17:08:08 Id: 0db159 No. 13050

I crafted something for myself. It's based on 1.5, so it fucks up hands, but at least it generates consistent characters that I can instantly RP with, and they all live in the same world. "char": { "age": "young", "breast": "large", "class": "archer", "clothes": "chainmail shirt", "desc": "Rilelel is loyal and vicious elf archer, she is very thoughtful, loves rain and hates ancient artifacts, pretty boys, fine art.", "gender": "female", "hair": "blonde braids", "height": "normal", "name": "Rilelel", "other_desc": "Rilelel is a young female elf archer with blonde braids hair and large breasts who wears chainmail shirt and wields a bow", "race": "elf", "sd_desc": "masterpiece, young female elf archer holding a bow, chainmail shirt, blonde hair, braids hair, large breasts, standing", "sd_seed": 2704803519, "weapon": "bow" }

Anonymous 04/29/2025 (Tue) 17:13:55 Id: 331fb4 No. 13051

>>12707 LlamaCon2025 in 1 minute: https://www.youtube.com/watch?v=6mRP-lQs0fw

Anonymous 04/29/2025 (Tue) 17:15:25 Id: 0db159 No. 13052

>>13051 My expectations have never been lower, and I still believe Zuckerberg will disappoint.

Anonymous 04/29/2025 (Tue) 17:16:02 Id: 3ecc39 No. 13053

I'm expecting a lot of nothingburgers.

Anonymous 04/29/2025 (Tue) 17:25:55 Id: 2cb709 No. 13054

>>13051 Why are they even bothering with this?

Anonymous 04/29/2025 (Tue) 17:30:43 Id: 0db159 No. 13056

>>13054 humiliation ritual

Anonymous 04/29/2025 (Tue) 19:23:15 Id: 004c54 No. 13065

>>13056 We got llamaguard for better censorship.

CUDA dev ##IQQZHG 04/29/2025 (Tue) 20:36:23 Id: 95b5b2 No. 13067

https://github.com/ggml-org/llama.cpp/pull/13199 Faster MoE prompt processing using CUDA. The speedup can be more than 2x for quantized models on an empty context.

Anonymous 04/29/2025 (Tue) 20:43:54 Id: 578fa9 No. 13069

>>13067 Does it still apply if the moe experts are on CPU? I have to run with -ot exps=CPU for deepseek

CUDA dev ##IQQZHG 04/29/2025 (Tue) 20:46:12 Id: 95b5b2 No. 13070

>>13069 There should still be a speedup even if the weights are originally on the CPU. However, you will be on the left side of the plot in pic related where the speed of the GPU code does not have much impact.

CUDA dev ##IQQZHG 04/29/2025 (Tue) 20:46:47 Id: 95b5b2 No. 13071

>>13069 >>13070

Anonymous 04/29/2025 (Tue) 20:51:48 Id: 578fa9 No. 13072

>>13070 >>13071 Thanks, will test it then

Anonymous 04/29/2025 (Tue) 21:21:56 Id: 578fa9 No. 13073

>>13067 Well, it became slower on my setup (25->20t/s), it's just raping a single core at 100% usage during prompt processing, even more so than usual. t. dogshit zen2 epyc owner

Anonymous 04/29/2025 (Tue) 23:30:02 Id: 331fb4 No. 13081

>Welcome to LlamaCon 2025 - Closing Session! https://www.youtube.com/watch?v=FZ-RZ0dKO8o > Join us for an insightful afternoon at LlamaCon 2025 as we delve into the latest trends in AI. This session features a compelling discussion between Meta Founder and CEO Mark Zuckerberg and Microsoft Chairman and CEO Satya Nadella. Together, they will explore the cutting-edge developments in AI, from development to deployment, and share strategies on how to excel in today’s competitive environment. Starting now.

Anonymous 04/30/2025 (Wed) 02:06:48 Id: b24ad6 No. 13082

>>13081 Computer, summarize this hour long nothingburger for me

Anonymous 04/30/2025 (Wed) 08:23:13 Id: 0db159 No. 13094

(20.02 KB 831x120 Screenshot at 2025-04-30 17-22-56.png)

https://github.com/turboderp-org/exllamav3/issues/28#issuecomment-2839724593

Anonymous 04/30/2025 (Wed) 11:12:49 Id: 004c54 No. 13096

>>13094 My plan is to use llama.cpp or ik_llama.cpp. Especially now that new releases are shit. One day that free api is going to dry up.

Anonymous 04/30/2025 (Wed) 11:59:08 Id: bdd1de No. 13099

>>13094 Does he not have one big contributor he could ask for help, to split the work with?

Anonymous 05/01/2025 (Thu) 16:37:34 Id: 004c54 No. 13152

>>13099 There's a shortage of contributors. Maybe people really don't have the knowledge or the GPUs to utilize it. too llama-cpp pilled. In other news while /lmg/ argues about mesugaki, the 235b does understand it without any thinking.

Anonymous 05/01/2025 (Thu) 17:08:35 Id: a3077b No. 13154

>>13152 That's a really fucking good response.

Anonymous 05/01/2025 (Thu) 20:54:30 Id: 004c54 No. 13170

Whatever you think of the drama.. it's time to use ik_llama 10.9t/s output with -rtr -fmoe -amb 512 plus I get to type F moe.

Anonymous 05/02/2025 (Fri) 13:29:22 Id: bf04e9 No. 13190

>>13170 > -rtr -fmoe -amb Reading the PRs for these, those are some really god damn clever and cool optimizations.

Anonymous 05/02/2025 (Fri) 13:29:35 Id: 0bdb54 No. 13191

KobaldCPP shouldn't be used anymore?

Anonymous 05/02/2025 (Fri) 13:36:07 Id: bf04e9 No. 13193

>>13191 Why not? I've abandoned it in lieu of just running llama-server directly a good while ago (llama-server had another name), but if it works for you it works. >>13170 >>13190 ># Supports both Explicit and Transparent Hugepages ># https://github.com/ikawrakow/ik_llama.cpp/pull/278#issuecomment-2746381515 ># Pre-allocate Hugepages of 2MiB or 1GiB size to hold model weights ># or ># Configure system-wide THP support and confirm they are in use Yet another thing for me to fuck around with. Yay.

Anonymous 05/02/2025 (Fri) 13:52:39 Id: 004c54 No. 13194

>>13191 I get no benefit from ik on non moe so still use it. Not sure if it even helps fully offloaded MOE. >>13193 THP probably won't help except for deepseek or models with much more in vram. On this one I only have 60gb used. I asked gemini and it agreed I likely won't see any benefit.

Anonymous 05/02/2025 (Fri) 15:14:29 Id: bf04e9 No. 13198

>>13170 >>13190 >>13194 -rtr gave me a 30 god damn percent performance bump compared to llama.cpp using the same settings using 30B A3B q8. And it freed some VRAM too. What the fuck.

Anonymous 05/02/2025 (Fri) 15:21:12 Id: bf04e9 No. 13199

>>13198 Okay, no. It didn't actually. Some of those options (-fmoe, -rtl) disable mmap, which I thought I had disabled already. Disabling it in llama.cpp seemed to even the playing field. Interesting.

Anonymous 05/02/2025 (Fri) 17:04:45 Id: 004c54 No. 13203

>>13199 I get worse speeds in llama.cpp with mmap off. Ideally you repack the quant and then keep mmapping but haven't figured out that part yet. iq3 is past 12 t/s but seems a tad dumber vs iq4.

Anonymous 05/02/2025 (Fri) 19:06:29 Id: 004c54 No. 13212

I feel like an apple user... IQ3 results: prompt eval time = 6374.60 ms / 696 tokens ( 9.16 ms per token, 109.18 tokens per second) generation eval time = 40612.43 ms / 499 runs ( 81.39 ms per token, 12.29 tokens per second) prompt eval time = 105851.43 ms / 11756 tokens ( 9.00 ms per token, 111.06 tokens per second) generation eval time = 44724.83 ms / 382 runs ( 117.08 ms per token, 8.54 tokens per second) |

Anonymous 05/02/2025 (Fri) 21:36:25 Id: 303534 No. 13223

I hate how if I prefill a sentence that can be continued or ended, the AI will almost always go for a period.

Anonymous 05/02/2025 (Fri) 21:38:43 Id: bf04e9 No. 13224

>>13223 Usually you want to go back one word in that case.

Anonymous 05/02/2025 (Fri) 21:48:11 Id: 303534 No. 13227

>>13224 I do, but sometimes that means cutting out an important angle.

Anonymous 05/03/2025 (Sat) 00:15:42 Id: 004c54 No. 13237

https://characterhub.org/characters/_purple/omegle-6f7da597ada0 new filter for models just dropped

Anonymous 05/03/2025 (Sat) 00:37:33 Id: 004c54 No. 13238

235b can skip. command a and deepseek were kind of shaky. gemini seems to get it right away.

Anonymous 05/06/2025 (Tue) 16:20:21 Id: 12b639 No. 13346

LoRAs are model specific, and so are control vectors, right? Are there other steering techniques (other than good ol prompting) that are model agnostic. Something you can train/create/make once and use it on several models of different architectures/shapes? I can't see how that could be a thing, but then again my knowledge is superficial at best.

Anonymous 05/06/2025 (Tue) 22:15:01 Id: fff041 No. 13356

>>13346 You can use CFG, it has a big vram cost. On another note, there is speed increase if you use dry and set a high top_K in llama.cpp. I went from 12 to 14t/s by just setting top_K 60 and putting it before dry. In theory top_K that high should do nothing to the outputs because as an actual sampler it sucks.

Anonymous 05/11/2025 (Sun) 16:00:54 Id: 200c12 No. 13432

>>13356 Doesn't CFG also come with a big penalty to generation speed? I really need to start fucking around with Control Vectors. I want to see if I can use those to steer the model's output format given a certain context. Yes, I could just use BNF, but that too comes with quite the hit to inference speed.

Anonymous 05/17/2025 (Sat) 06:06:34 Id: d35775 No. 13490

I've been training SD LoRas on Pony 6 for a while and just left my cave and realized I should now be training on NoobAI. I'm reading through the rentry guides, but just want to clear this up early because download speeds are garbage here. For Pony-based gens, it was recommended to train LoRas on the Pony model itself and it would be generally compatible with Pony-based checkpoints (AutismMix, Pony Realism, etc.) What's the situation with NoobAI family? My understanding is: >NoobAI-XL is based on Illustrious-XL which is based on Kohaku-XL (beta rev 5) which is based on SDXL 1.0. >NAI-XL is basically the new Pony v6 - a popular root model for the anime and furry gen scenes. >if you want to gen in models like StableMondAI, IL personalmerge, ChromaXL, etc., a LoRa should be trained against NAI-XL >some of those models use Epsilon and others use V-Pred, does this mean two different LoRas need training?

Anonymous 05/17/2025 (Sat) 21:07:45 Id: f90168 No. 13491

>>13490 The family tree goes like this: NoobAI-XL <- Illustrious-XL <- Kohaku-XL beta <- NekoRayXL <- CounterfeitXL + AIO-Anime + SDXL 0.9 <- SDXL 1.0

Anonymous 05/18/2025 (Sun) 03:27:47 Id: 58ed57 No. 13494

>>13490 >does this mean two different LoRas need training? Yeah, I've seen people do that one for Vpred and the other for Epsilon.

Anonymous 05/20/2025 (Tue) 22:58:19 Id: 235dfb No. 13524

>>13432 yeah, CFG I think doubles generation time, which at this point is a poor trade-off for most models. It's so much easier to use kcpp's antislop feature. It takes strings and uses the 'banned tokens/strings' option in ST control vectors are definitely fun and doable for most people, and don't come with a penalty to inference. it does take time to actually figure out the right pairs as other anons mentioned, and it's not always obvious and it *will* degrade output if done poorly. In other news, I'm on week two of high temp/nsigma 1 and loving the results. At this point I only want minP when I actually do want statistically unlikely tokens

Anonymous 05/20/2025 (Tue) 23:29:00 Id: 235dfb No. 13532

also, does anyone know how to get logprobs/token probabilities to work in sillytavern and kcpp? I have "request token probabilities" set to on, and I switched between grabbing the tokenizer from the api and setting it manually but nothing changed. Do I need to set a flag in kcpp to send logprobs? No matter what I do, it always says "no token probabilities available for the current message." kind of frustrating when i start playing around with settings and I'm completely blind to everything but the one it landed on

Anonymous 05/21/2025 (Wed) 20:10:26 Id: f525db No. 13550

>>13532 Try using the other API. As in, if you are using text completion try chat completion and vice versa.

Anonymous 05/22/2025 (Thu) 09:27:10 Id: fee696 No. 13558

I keep hearing about Mistral, is it that good? What is it used for? Mostly looking for AIs that can be used as game masters that aren't censored.

Anonymous 05/23/2025 (Fri) 00:13:40 Id: 70a37e No. 13569

>>13558 You mean like mistral.rs the software, mistral the company, or mistral the models? If you have no idea about any of that, go to koboldcpp's github, read the quickstarter in the wiki tab, and download mistral nemo gguf on huggingface. There will be different versions (q8, q6, q4km, etc), you want the largest one that's slightly (500 ish mb) smaller than your total VRAM.

Anonymous 05/28/2025 (Wed) 17:40:08 Id: 3e2ab5 No. 13602

Hi guys!

Anonymous 05/28/2025 (Wed) 20:28:20 Id: 432486 No. 13603

>>13602 Hi.

Anonymous 06/04/2025 (Wed) 00:57:36 Id: aaeb7e No. 13636

I asked several some anime girl characters if anyone ever drew nugget fan art of them. Only claude suggested the one version, every other AI has assumed I meant chicken.

Anonymous 06/05/2025 (Thu) 13:45:32 Id: 3cf39b No. 13655

(128.53 KB 332x1338 5e29dcddbcb032c6775c222d2a2d93a97f5a7ec807a1f41c003a144825abcce6.jpg)

>>12106 I still believe this. Yes most models are lobotomized, but I hate trying to push my story through a turn-based format. I just want my prose completion engine, man. Edit button is my turn, "gen more" is AI's turn.

Anonymous 07/18/2025 (Fri) 08:00:27 Id: 79a8fd No. 13760

4ch /lmg/ ate shit, as deserved

Anonymous 08/10/2025 (Sun) 14:14:49 Id: cd7de7 No. 14607

Anonymous 08/12/2025 (Tue) 22:27:03 Id: 9b8a9f No. 14611

https://chub.ai/characters/Anonymous/mistral-wendy-06e07602785f have some mistral drama

Anonymous 08/13/2025 (Wed) 00:30:43 Id: c2c1af No. 14612

>>14607 Are you implying that gpt oss is trash? If so, i agree lol. I tried asking things like "how to archive a dvd" or other silly stuff and after "thinking" for a whole minute, the model will always refuse because "its illegal or disallowed". Fuck those moralfags working in (((OpenAI))). Here some examples of that. >>14611 >"They uploaded her suicide note. Now she's in the machine. IYKYK, based on that mistral drama" Wtf do you mean, anon?? Could you provide some context of that? I tried to search info but couldn't find anything related, that thing sounds awful though, Imagine kiling yourself and then ending up in someone's gooner AI model dataset lol. ~~It sound like the game soma but made for zoomers lmao.~~

Anonymous 08/13/2025 (Wed) 17:17:04 Id: cd7de7 No. 14615

Anonymous 08/13/2025 (Wed) 17:22:07 Id: cd7de7 No. 14616

>>14615 Ignore the [code][/code] tags at the beginning and end.

Anonymous 08/14/2025 (Thu) 02:41:39 Id: 303f19 No. 14620

>>14615 Wow, already jailbroken? That was fast lol. >This one will tell you how to cook meth and pirate DVDs. Thanks for the link, but it seems like it no longer exists. Weird that it got censored after just a few hours. Also, talking about censored models, someone remembers dolphin? I haven't seem one of those models in months, almost a year already. >This is my system prompt: Thanks a lot for the system prompt!, I used it on gpt oss 120B and I was finally able to fix an annoying bug that was breaking my browser. I had asked countless models before and all of them failed. This oss model may have more potential that I originally thought. Although a negative point is that it yaps a lot of unnecessary text, it even add a "TL;DR" lol.

Anonymous 09/11/2025 (Thu) 01:09:50 Id: 9b8a9f No. 14693

What the fuck is GGUF.org? Who is this schizo that re-coded all of GGUF.. https://github.com/calcuis/gguf

Anonymous 11/21/2025 (Fri) 21:49:27 Id: b22721 No. 14851

hello my degens, i've been getting into image generation lately (comfyui) and been trying to dip into deepfake images but all the fuken links seem to be down or in reFactor it's censored, anyone know what i could use? i use comfyui 0.3.40

Quick Reply


Sage Bypass Check