i don’t use it at all, i do want some selfhosted speech to text model (whisper?) but my computer is ancient so it would be awfully slow. i have some multi hour audio recordings from presentations, would be nice to have them in text and searchable…
How ancient is ancient? TTS and STT are much lighter than llm. (eg: Whisper, Piper, Kokoro, Coqui etc)…you might have more capability than you think, especially if you’re doing batch processing like that.
a haswell xeon e5-1650 machine, i remember running llama 7b in llama.cpp in like 2023 and it was quite sluggish. guess i should try whisper at some point…
i don’t use it at all, i do want some selfhosted speech to text model (whisper?) but my computer is ancient so it would be awfully slow. i have some multi hour audio recordings from presentations, would be nice to have them in text and searchable…
How ancient is ancient? TTS and STT are much lighter than llm. (eg: Whisper, Piper, Kokoro, Coqui etc)…you might have more capability than you think, especially if you’re doing batch processing like that.
a haswell xeon e5-1650 machine, i remember running llama 7b in llama.cpp in like 2023 and it was quite sluggish. guess i should try whisper at some point…
Ha. You were doing inference on CPU on a haswell era. Been there, done that.
OTOH…whisper.cpp is heavily optimised for it.
Plus, you’re doing batch transcription, not real-time, so slow doesn’t actually matter.
Fire Whisper small or medium overnight and wake up to searchable text.
PS: if you want a good fast little llm, something like Qwen 3.6 2B will work well on the Xeon.