/ais/ - /lmg/ - local models general

/ais/ - Artificial Intelligence Tools

"In the Future, Entertainment will be Randomly Generated" - some Christian Zucchini

Mode: Reply

Name
Options
Subject
Message	Max message length: 12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 32.00 MB Total max file size: 50.00 MB Max files: 5 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0 (Temporarily Dead).

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

Use this board to discuss anything about the current and future state of AI and Neural Network based tools, and to creatively express yourself with them. For more technical questions, also consider visiting our sister board about Technology

/lmg/ - local models general Anonymous 04/16/2025 (Wed) 06:15:26 No. 6258

/lmg/ - a general dedicated to the discussion and development of local language models. ►News >(04/14) GLM-4-0414 and GLM-Z1 released: https://hf.co/collections/THUDM/glm-4-0414-67f3cbcb34dd9d252707cb2e >(04/14) Nemotron-H hybrid models released: https://hf.co/collections/nvidia/nemotron-h-67fd3d7ca332cdf1eb5a24bb >(04/10) Ultra long context Llama-3.1-8B: https://hf.co/collections/nvidia/ultralong-67c773cfe53a9a518841fbbe >(04/10) HoloPart: Generative 3D Part Amodal Segmentation: https://vast-ai-research.github.io/HoloPart ►News Archive: https://rentry.org/lmg-news-archive ►Glossary: https://rentry.org/lmg-glossary ►Links: https://rentry.org/LocalModelsLinks ►Official /lmg/ card: https://files.catbox.moe/cbclyf.png ►Getting Started https://rentry.org/lmg-lazy-getting-started-guide https://rentry.org/lmg-build-guides https://rentry.org/IsolatedLinuxWebService https://rentry.org/tldrhowtoquant ►Further Learning https://rentry.org/machine-learning-roadmap https://rentry.org/llm-training https://rentry.org/LocalModelsPapers ►Benchmarks LiveBench: https://livebench.ai Programming: https://livecodebench.github.io/leaderboard.html Code Editing: https://aider.chat/docs/leaderboards Context Length: https://github.com/hsiehjackson/RULER Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard Censorbench: https://codeberg.org/jts2323/censorbench GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference ►Tools Alpha Calculator: https://desmos.com/calculator/ffngla98yc GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator Sampler Visualizer: https://artefact2.github.io/llm-sampling ►Text Gen. UI, Inference Engines https://github.com/lmg-anon/mikupad https://github.com/oobabooga/text-generation-webui https://github.com/LostRuins/koboldcpp https://github.com/ggerganov/llama.cpp https://github.com/theroyallab/tabbyAPI https://github.com/vllm-project/vllm

Anonymous 05/02/2025 (Fri) 13:29:35 Id: 0bdb54 No. 13191

KobaldCPP shouldn't be used anymore?

Anonymous 05/02/2025 (Fri) 13:36:07 Id: bf04e9 No. 13193

>>13191 Why not? I've abandoned it in lieu of just running llama-server directly a good while ago (llama-server had another name), but if it works for you it works. >>13170 >>13190 ># Supports both Explicit and Transparent Hugepages ># https://github.com/ikawrakow/ik_llama.cpp/pull/278#issuecomment-2746381515 ># Pre-allocate Hugepages of 2MiB or 1GiB size to hold model weights ># or ># Configure system-wide THP support and confirm they are in use Yet another thing for me to fuck around with. Yay.

Anonymous 05/02/2025 (Fri) 13:52:39 Id: 004c54 No. 13194

>>13191 I get no benefit from ik on non moe so still use it. Not sure if it even helps fully offloaded MOE. >>13193 THP probably won't help except for deepseek or models with much more in vram. On this one I only have 60gb used. I asked gemini and it agreed I likely won't see any benefit.

Anonymous 05/02/2025 (Fri) 15:14:29 Id: bf04e9 No. 13198

>>13170 >>13190 >>13194 -rtr gave me a 30 god damn percent performance bump compared to llama.cpp using the same settings using 30B A3B q8. And it freed some VRAM too. What the fuck.

Anonymous 05/02/2025 (Fri) 15:21:12 Id: bf04e9 No. 13199

>>13198 Okay, no. It didn't actually. Some of those options (-fmoe, -rtl) disable mmap, which I thought I had disabled already. Disabling it in llama.cpp seemed to even the playing field. Interesting.

Anonymous 05/02/2025 (Fri) 17:04:45 Id: 004c54 No. 13203

>>13199 I get worse speeds in llama.cpp with mmap off. Ideally you repack the quant and then keep mmapping but haven't figured out that part yet. iq3 is past 12 t/s but seems a tad dumber vs iq4.

Anonymous 05/02/2025 (Fri) 19:06:29 Id: 004c54 No. 13212

I feel like an apple user... IQ3 results: prompt eval time = 6374.60 ms / 696 tokens ( 9.16 ms per token, 109.18 tokens per second) generation eval time = 40612.43 ms / 499 runs ( 81.39 ms per token, 12.29 tokens per second) prompt eval time = 105851.43 ms / 11756 tokens ( 9.00 ms per token, 111.06 tokens per second) generation eval time = 44724.83 ms / 382 runs ( 117.08 ms per token, 8.54 tokens per second) |

Anonymous 05/02/2025 (Fri) 21:36:25 Id: 303534 No. 13223

I hate how if I prefill a sentence that can be continued or ended, the AI will almost always go for a period.

Anonymous 05/02/2025 (Fri) 21:38:43 Id: bf04e9 No. 13224

>>13223 Usually you want to go back one word in that case.

Anonymous 05/02/2025 (Fri) 21:48:11 Id: 303534 No. 13227

>>13224 I do, but sometimes that means cutting out an important angle.

Anonymous 05/03/2025 (Sat) 00:15:42 Id: 004c54 No. 13237

https://characterhub.org/characters/_purple/omegle-6f7da597ada0 new filter for models just dropped

Anonymous 05/03/2025 (Sat) 00:37:33 Id: 004c54 No. 13238

235b can skip. command a and deepseek were kind of shaky. gemini seems to get it right away.

Anonymous 05/06/2025 (Tue) 16:20:21 Id: 12b639 No. 13346

LoRAs are model specific, and so are control vectors, right? Are there other steering techniques (other than good ol prompting) that are model agnostic. Something you can train/create/make once and use it on several models of different architectures/shapes? I can't see how that could be a thing, but then again my knowledge is superficial at best.

Anonymous 05/06/2025 (Tue) 22:15:01 Id: fff041 No. 13356

>>13346 You can use CFG, it has a big vram cost. On another note, there is speed increase if you use dry and set a high top_K in llama.cpp. I went from 12 to 14t/s by just setting top_K 60 and putting it before dry. In theory top_K that high should do nothing to the outputs because as an actual sampler it sucks.

Anonymous 05/11/2025 (Sun) 16:00:54 Id: 200c12 No. 13432

>>13356 Doesn't CFG also come with a big penalty to generation speed? I really need to start fucking around with Control Vectors. I want to see if I can use those to steer the model's output format given a certain context. Yes, I could just use BNF, but that too comes with quite the hit to inference speed.

Anonymous 05/17/2025 (Sat) 06:06:34 Id: d35775 No. 13490

I've been training SD LoRas on Pony 6 for a while and just left my cave and realized I should now be training on NoobAI. I'm reading through the rentry guides, but just want to clear this up early because download speeds are garbage here. For Pony-based gens, it was recommended to train LoRas on the Pony model itself and it would be generally compatible with Pony-based checkpoints (AutismMix, Pony Realism, etc.) What's the situation with NoobAI family? My understanding is: >NoobAI-XL is based on Illustrious-XL which is based on Kohaku-XL (beta rev 5) which is based on SDXL 1.0. >NAI-XL is basically the new Pony v6 - a popular root model for the anime and furry gen scenes. >if you want to gen in models like StableMondAI, IL personalmerge, ChromaXL, etc., a LoRa should be trained against NAI-XL >some of those models use Epsilon and others use V-Pred, does this mean two different LoRas need training?

Anonymous 05/17/2025 (Sat) 21:07:45 Id: f90168 No. 13491

>>13490 The family tree goes like this: NoobAI-XL <- Illustrious-XL <- Kohaku-XL beta <- NekoRayXL <- CounterfeitXL + AIO-Anime + SDXL 0.9 <- SDXL 1.0

Anonymous 05/18/2025 (Sun) 03:27:47 Id: 58ed57 No. 13494

>>13490 >does this mean two different LoRas need training? Yeah, I've seen people do that one for Vpred and the other for Epsilon.

Anonymous 05/20/2025 (Tue) 22:58:19 Id: 235dfb No. 13524

>>13432 yeah, CFG I think doubles generation time, which at this point is a poor trade-off for most models. It's so much easier to use kcpp's antislop feature. It takes strings and uses the 'banned tokens/strings' option in ST control vectors are definitely fun and doable for most people, and don't come with a penalty to inference. it does take time to actually figure out the right pairs as other anons mentioned, and it's not always obvious and it *will* degrade output if done poorly. In other news, I'm on week two of high temp/nsigma 1 and loving the results. At this point I only want minP when I actually do want statistically unlikely tokens

Anonymous 05/20/2025 (Tue) 23:29:00 Id: 235dfb No. 13532

also, does anyone know how to get logprobs/token probabilities to work in sillytavern and kcpp? I have "request token probabilities" set to on, and I switched between grabbing the tokenizer from the api and setting it manually but nothing changed. Do I need to set a flag in kcpp to send logprobs? No matter what I do, it always says "no token probabilities available for the current message." kind of frustrating when i start playing around with settings and I'm completely blind to everything but the one it landed on

Anonymous 05/21/2025 (Wed) 20:10:26 Id: f525db No. 13550

>>13532 Try using the other API. As in, if you are using text completion try chat completion and vice versa.

Anonymous 05/22/2025 (Thu) 09:27:10 Id: fee696 No. 13558

I keep hearing about Mistral, is it that good? What is it used for? Mostly looking for AIs that can be used as game masters that aren't censored.

Anonymous 05/23/2025 (Fri) 00:13:40 Id: 70a37e No. 13569

>>13558 You mean like mistral.rs the software, mistral the company, or mistral the models? If you have no idea about any of that, go to koboldcpp's github, read the quickstarter in the wiki tab, and download mistral nemo gguf on huggingface. There will be different versions (q8, q6, q4km, etc), you want the largest one that's slightly (500 ish mb) smaller than your total VRAM.

Anonymous 05/28/2025 (Wed) 17:40:08 Id: 3e2ab5 No. 13602

Hi guys!

Anonymous 05/28/2025 (Wed) 20:28:20 Id: 432486 No. 13603

>>13602 Hi.

Anonymous 06/04/2025 (Wed) 00:57:36 Id: aaeb7e No. 13636

I asked several some anime girl characters if anyone ever drew nugget fan art of them. Only claude suggested the one version, every other AI has assumed I meant chicken.

Anonymous 06/05/2025 (Thu) 13:45:32 Id: 3cf39b No. 13655

(128.53 KB 332x1338 5e29dcddbcb032c6775c222d2a2d93a97f5a7ec807a1f41c003a144825abcce6.jpg)

>>12106 I still believe this. Yes most models are lobotomized, but I hate trying to push my story through a turn-based format. I just want my prose completion engine, man. Edit button is my turn, "gen more" is AI's turn.

Anonymous 07/18/2025 (Fri) 08:00:27 Id: 79a8fd No. 13760

4ch /lmg/ ate shit, as deserved

Anonymous 08/10/2025 (Sun) 14:14:49 Id: cd7de7 No. 14607

Anonymous 08/12/2025 (Tue) 22:27:03 Id: 9b8a9f No. 14611

https://chub.ai/characters/Anonymous/mistral-wendy-06e07602785f have some mistral drama

Anonymous 08/13/2025 (Wed) 00:30:43 Id: c2c1af No. 14612

>>14607 Are you implying that gpt oss is trash? If so, i agree lol. I tried asking things like "how to archive a dvd" or other silly stuff and after "thinking" for a whole minute, the model will always refuse because "its illegal or disallowed". Fuck those moralfags working in (((OpenAI))). Here some examples of that. >>14611 >"They uploaded her suicide note. Now she's in the machine. IYKYK, based on that mistral drama" Wtf do you mean, anon?? Could you provide some context of that? I tried to search info but couldn't find anything related, that thing sounds awful though, Imagine kiling yourself and then ending up in someone's gooner AI model dataset lol. ~~It sound like the game soma but made for zoomers lmao.~~

Anonymous 08/13/2025 (Wed) 17:17:04 Id: cd7de7 No. 14615

Anonymous 08/13/2025 (Wed) 17:22:07 Id: cd7de7 No. 14616

>>14615 Ignore the [code][/code] tags at the beginning and end.

Anonymous 08/14/2025 (Thu) 02:41:39 Id: 303f19 No. 14620

>>14615 Wow, already jailbroken? That was fast lol. >This one will tell you how to cook meth and pirate DVDs. Thanks for the link, but it seems like it no longer exists. Weird that it got censored after just a few hours. Also, talking about censored models, someone remembers dolphin? I haven't seem one of those models in months, almost a year already. >This is my system prompt: Thanks a lot for the system prompt!, I used it on gpt oss 120B and I was finally able to fix an annoying bug that was breaking my browser. I had asked countless models before and all of them failed. This oss model may have more potential that I originally thought. Although a negative point is that it yaps a lot of unnecessary text, it even add a "TL;DR" lol.

Anonymous 09/11/2025 (Thu) 01:09:50 Id: 9b8a9f No. 14693

What the fuck is GGUF.org? Who is this schizo that re-coded all of GGUF.. https://github.com/calcuis/gguf

Anonymous 11/21/2025 (Fri) 21:49:27 Id: b22721 No. 14851

hello my degens, i've been getting into image generation lately (comfyui) and been trying to dip into deepfake images but all the fuken links seem to be down or in reFactor it's censored, anyone know what i could use? i use comfyui 0.3.40

Index Catalog Archive Top Reply

231

Manage Board Moderate Board Moderate Thread

Forms

Delete

Password Unlink (Removes file reference from posts) Delete (Removes file from the server)

Report

Reason Category Global

No Cookies?

Quick Reply


Sage Bypass Check