For the longest time, I’ve been trying to figure out a way to “survive” in this new AI age without having to fork over a ton of money just to keep up. I’ve tried using local models via Ollama, and while they definitely work to a degree, they’re (unsurprisingly) not as good as the big model providers.

The local models tend to

  • Forget what they’re doing
  • Struggle to break larger tasks into smaller ones
  • Lose focus easily
  • Have weaker coding performance
  • Drift over longer sessions

So to improve the reliability of fully local, smaller models (and to keep all my data local and in my own network), I created Loki.

It’s a local-first, batteries-included command line tool and runtime for building and running LLM workflows locally. It’s model agnostic and supports things like

  • Agents and agent delegation
  • Roles/personas
  • MCP Servers
  • RAG
  • Custom tools
  • Macros
  • Workflow Scripting

A lot of the features it supports are specifically designed to compensate for weaknesses in smaller local models. For example:

  • Auto continuation to keep pushing models to completion instead of stopping halfway through problems
  • Parallel agent delegation so tasks can be split into smaller, focused scopes
  • Workflow-based execution (“If this, do that”) for building more reliable and repeatable automations

It also supports the major cloud providers if you want them (which definitely helped while testing 😄), but my long-term goal is simple:

Get as close as possible to Claude Code-style reliability using fully local models.

I’m always open to feedback, questions, or ideas.

Repo: https://github.com/Dark-Alex-17/loki

    • naught@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      15 minutes ago

      Opencode isn’t very fun to set up with local LLMs and I’ve had issues with tool calling, but it’s very doable! That said, OpenCode is my go-to, absolutely love it compared to all alternatives I’ve tried

    • Dark-Alex-17@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      20
      ·
      4 hours ago

      Yeah… 😅 I originally named it Loki because, well…if you leave LLMs unsupervised they just create mischief. Any ideas of a good rename? I’ve gotten this comment before and I just couldn’t think of anything good.

        • Dark-Alex-17@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          13
          ·
          4 hours ago

          Ooh I like Coyote! That’s definitely in the running now. Not to mention that’s really a really cool allusion to Native American mythology!

          • Helix 🧬@feddit.org
            link
            fedilink
            English
            arrow-up
            3
            ·
            44 minutes ago

            Or Coyode (mixture between code and coyote, could be written co[yo]de for extra yo). Only has 4 duckduckgo results so easily searchable and distinguishable.

            ChatGPT generated logo example

            co[yo]de logo: coyote with a hoodie and the aforementioned spelling

      • [object Object]@lemmy.ca
        link
        fedilink
        English
        arrow-up
        6
        ·
        4 hours ago

        I feel like that well describes a border collie.

        Wants to do stuff, but if you don’t attend they’ll find stuff to do.

    • Dark-Alex-17@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      9
      ·
      4 hours ago

      So actually, this was the original purpose of it. But all the help I tried to get on it didn’t really have much interest in doing anything outside of the usual big model providers, so I tried advertising a more general use case to attract more input. I can’t deny that agnostic support for even the big providers is helpful when you’re trying to stay current with the rapid advances in LLMs.

      After that, I kind of gave up on getting feedback on local-first models. So, instead, I just dove in head-first the way I wanted;Trying new things, building new agents to try and rival Claude Code, adding features as I found them useful and necessary to improve that reliability, etc., and iterating. Then, with the most recent release on Friday, I had done so many changes and improvements specifically for local models that I thought I finally had a strong enough tool to maybe pique enough people’s interest to get some feedback and input. 🙂

      Oh, and the config example shows how to add Ollama models here

  • CIA_chatbot@lemmy.world
    link
    fedilink
    English
    arrow-up
    13
    ·
    4 hours ago

    Just an fyi, Loki is also an extremely popular logging system by Grafana, might want a rename if you don’t want to deal with people not finding your project due to having a larger project named the same thing

  • nimble@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    6
    ·
    4 hours ago

    I’m confused. You say in post title you don’t want to send code to the cloud but the image you attached shows openai gpt4o. So what’s the deal?

    • Dark-Alex-17@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      4 hours ago

      It was just the one gif I had available and also the model that worked fast enough to fit into a gif without taking forever between prompts so I could demo Loki well. You make a good point though. It’s an old build and is slightly outdated. I’ll update that. Thanks for pointing that out.

  • Ricky Rigatoni@piefed.zip
    link
    fedilink
    English
    arrow-up
    6
    ·
    4 hours ago

    Does it have built-in protections so it doesn’t randomly decide to delete every file it has permissions to?

    • Dark-Alex-17@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      4 hours ago

      Yes it does. By default, any of the execute_command or fs_write/fs_patch/etc. tools all have guards around them that prompt for user confirmation before doing things. They can be disabled via the AUTO_APPROVE environment variable if necessary (like they are when using the sisyphus agent). For bash tools, I’ve included functions that can help do this when you write your own tools. For Python tools, you can use the usual input methods.

      • Ricky Rigatoni@piefed.zip
        link
        fedilink
        English
        arrow-up
        8
        ·
        4 hours ago

        As usual, leave it to the random developers on the internet to put more care and thought into something than the multibillion dollar companies.

    • Dark-Alex-17@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      11
      ·
      4 hours ago

      I’m using a ton of different ones but the main ones I use daily are

      • gemma4:26b
      • deepseek-coder
      • deepseek-r1:32b
      • devstral:24b
      • granite-code:34b
      • openthinker:latest
      • phi4:latest
      • qwen3:30b
      • mixtral:8x22b

      I’m also going to use this opportunity to plug an amazing project to help figure out which models will work well on my hardware: https://github.com/AlexsJones/llmfit Is amazing!

      • Blue_Morpho@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        4 hours ago

        Isn’t it a huge delay to swap out to a different ~30b model every few minutes depending on the use case?

        • Dark-Alex-17@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          4
          ·
          4 hours ago

          Unfortunately, yes. It’s one reason I’m trying to figure out a good mechanism to maybe do something like multiple ollama hosts. So like: you can specify what model to use specifically in an agent. But if an agent delegates to a sub-agent, it unloads that model and loads the new one. I’m trying to figure out if there’s a way to “alternate” between multiple hosts (say, ollama running locally and one running on your server), so that when a switch happens, it does it on the secondary host while also looking ahead to see what needs to be switched, if anything, on the primary host.

          It supports multiple Ollama hosts right now as-is so what I’ve honestly been doing for the time being is specify which model on which host each agent uses so there’s only loading of one model at the beginning of a session. Then there’s no unloading/loading/etc. The other thing I’ve been trying is to see how small I can get the models to be without losing performance. While the tricks implemented in Loki help dramatically, I know there’s still a lot more I can do to improve it further.

  • lime!@feddit.nu
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 hours ago

    any chance of an lsp server? i know the protocol is clunky as all hell, but local completions in any editor would be big.

    • Dark-Alex-17@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      4 hours ago

      I’ve been thinking about integrating LSP into it but I can’t think of a great way to do it. I’ve been meaning to look at OpenCode and see how they do it. Maybe I’ll work that into the next release!

  • DecronymB
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    11 minutes ago

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    DNS Domain Name Service/System
    IP Internet Protocol
    SSL Secure Sockets Layer, for transparent encryption
    TLS Transport Layer Security, supersedes SSL

    3 acronyms in this thread; the most compressed thread commented on today has 16 acronyms.

    [Thread #318 for this comm, first seen 26th May 2026, 19:30] [FAQ] [Full list] [Contact] [Source code]