Partially. I started with hosting my own llama3.2 + granite4 models using Ollama for my Home Assistant smart home and for general chat with OpenWebUI. I also run whisper for speech-to-text locally on my 1080 Ti GPU. I like the privacy and ownership of my self-hosted models, but I started to run into limitations with the small weights. So I built some tools that allow me to selectively route traffic to larger models hosted on DeepInfra depending on my need. For example, to GLM/Kimi models for code reviews or for my custom harnesses or harder problems.
Partially. I started with hosting my own llama3.2 + granite4 models using Ollama for my Home Assistant smart home and for general chat with OpenWebUI. I also run whisper for speech-to-text locally on my 1080 Ti GPU. I like the privacy and ownership of my self-hosted models, but I started to run into limitations with the small weights. So I built some tools that allow me to selectively route traffic to larger models hosted on DeepInfra depending on my need. For example, to GLM/Kimi models for code reviews or for my custom harnesses or harder problems.