Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · 2 months ago

It’s not! Use SonoBus; it’s dead simple, and superior to Discord. It’s far lower latency, with customizable filters, peer-to-peer; and totally free.

Now if you want emojis and video and rambling channels and stuff, you will have to go elsewhere.

brucethemoose@lemmy.world · 2 months ago

Yeah, I’m not against the idea philosophically. Especially for security. I love the idea of containerized isolation.

But in reality, I can see exactly how much disk space and RAM and CPU and bandwidth they take, heh. Maintainers just can’t help themselves.

brucethemoose@lemmy.world · 2 months ago

I find the overhead of docker crazy, especially for simpler apps. Like, do I really need 150GB of hard drive space, an extensive poorly documented config, and a whole nested computer running just because some project refuses to fix their dependency hell?

Yet it’s so common. It does feel like usability has gone on the back burner, at least in some sectors of software. And it’s such a relief when I read that some project consolidated dependencies down to C++ or Rust, and it will just run and give me feedback without shipping a whole subcomputer.

brucethemoose@lemmy.world · edit-2 1 year ago

It’s less optimal.

On a 3090, I simply can’t run Command-R or Qwen 2.5 34B well at 64K-80K context with ollama. Its slow even at lower context, the lack of DRY sampling and some other things majorly hit quality.

Ollama is meant to be turnkey, and thats fine, but LLMs are extremely resource intense. Sometimes the manual setup/configuration is worth it to squeeze out every ounce of extra performance and quantization quality.

Even on CPU-only setups, you are missing out on (for instance) the CPU-optimized quantizations llama.cpp offers now, or the more advanced sampling kobold.cpp offers, or more fine grained tuning of flash attention configs, or batched inference, just to start.

And as I hinted at, I don’t like some other aspects of ollama, like how they “leech” off llama.cpp and kinda hide the association without contributing upstream, some hype and controversies in the past, and hints that they may be cooking up something commercial.

brucethemoose@lemmy.world · edit-2 1 year ago

Guide to Self Hosting LLMs Faster/Better than Ollama