Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

    • zutto@lemmy.fedi.zutto.fi
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 hours ago

      This dwarfstar looks interesting, can you elaborate on your setup and what kind of inference speeds you are getting?

      • jaschen306@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        I have a 5080 and 128gb of ram running on a AMD 9950X.

        Depending on the task I can get over 170-200t/s when the MOE only calls a few agents and can fit inside the VRAM or as low as 5-10ts when it calls more agents and has to hit the system memory. But for grunt work that doesn’t need professor level tasks, it’s more than capable and if you have the time, it’s super worth it because it’s basically free tokens.

        I only use this for overnight work to save on tokens during the day. When I’m pulling analytics for my work and it just needs basic analysis that doesn’t touch multiple tooks.

        During work hours I’m using GLM5.2 for web development, Kimi k2.7 for complicated data analysis and Minimax m3 if I need the context window to be bigger than what kimik2.7 can give me.