Do you host your own AI?

SuspiciousCarrot78@aussie.zone · 17 hours ago

Do you host your own AI?

jaschen306@sh.itjust.works · 7 hours ago

I’m running dwarfstar which is a 2 bit deepseek v4 flash. It’s quite capable even at 2 bit.

zutto@lemmy.fedi.zutto.fi · 7 hours ago

This dwarfstar looks interesting, can you elaborate on your setup and what kind of inference speeds you are getting?

jaschen306@sh.itjust.works · 2 hours ago

I have a 5080 and 128gb of ram running on a AMD 9950X.

Depending on the task I can get over 170-200t/s when the MOE only calls a few agents and can fit inside the VRAM or as low as 5-10ts when it calls more agents and has to hit the system memory. But for grunt work that doesn’t need professor level tasks, it’s more than capable and if you have the time, it’s super worth it because it’s basically free tokens.

I only use this for overnight work to save on tokens during the day. When I’m pulling analytics for my work and it just needs basic analysis that doesn’t touch multiple tooks.

During work hours I’m using GLM5.2 for web development, Kimi k2.7 for complicated data analysis and Minimax m3 if I need the context window to be bigger than what kimik2.7 can give me.