Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

  • Domi@lemmy.secnd.me
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 hour ago

    About 200 t/s prompt processing and 10-20 t/s with MTP.

    Greatly depends on the task, predictable things like code generates at 18-20 t/s. Creative writing more like 10-17 t/s.