Do you host your own AI?

SuspiciousCarrot78@aussie.zone · 17 hours ago

Domi@lemmy.secnd.me · 4 hours ago

Yes, I got a Strix Halo machine before the RAM price hike and use it to run all my ML stuff on it.

Currently using llama-swap with llama.cpp/ComfyUI and opencode/Open WebUI as frontend.

I’m running Qwen3.6-27b, Voxtral Mini 4b, Piper and Qwen Image. Also, some embedding and reranking models.

I use them for:

SuspiciousCarrot78@aussie.zone · 3 hours ago

What sort of tok/s are you getting on the strix?

Domi@lemmy.secnd.me · 1 hour ago

About 200 t/s prompt processing and 10-20 t/s with MTP.

Greatly depends on the task, predictable things like code generates at 18-20 t/s. Creative writing more like 10-17 t/s.

SuspiciousCarrot78@aussie.zone · 32 minutes ago

Damn - I thought strix would do a bit better than that, for how much it costs.