Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

  • jaschen306@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 hours ago

    I have a 5080 and 128gb of ram running on a AMD 9950X.

    Depending on the task I can get over 170-200t/s when the MOE only calls a few agents and can fit inside the VRAM or as low as 5-10ts when it calls more agents and has to hit the system memory. But for grunt work that doesn’t need professor level tasks, it’s more than capable and if you have the time, it’s super worth it because it’s basically free tokens.

    I only use this for overnight work to save on tokens during the day. When I’m pulling analytics for my work and it just needs basic analysis that doesn’t touch multiple tooks.

    During work hours I’m using GLM5.2 for web development, Kimi k2.7 for complicated data analysis and Minimax m3 if I need the context window to be bigger than what kimik2.7 can give me.