Interests: programming, video games, anime, music composition

I used to be on kbin as e0qdk@kbin.social before it broke down.

  • 0 Posts
  • 4 Comments
Joined 3 years ago
cake
Cake day: November 27th, 2023

help-circle
  • If you just pulled the default version of qwen3.5 from ollama’s repo you downloaded a mediocre one that only uses ~6GB.

    Check ollama show qwen3.5 and see if you get something like this in the result:

      Model
        architecture        qwen35    
        parameters          9.7B      
        context length      262144    
        embedding length    4096      
        quantization        Q4_K_M 
    

    This is the default version I got when I first tried using ollama without any experience. It worked, but it’s a heavily quantized, lower parameter version of the model – i.e. it’s pretty dumb – compared to what you can actually run on your hardware.


  • I started running LLMs a couple months ago on my own hardware. I have a Framework Desktop that I ordered last year and also recently picked up a refurbished 24GB AMD RX 7900 XTX which I’m doing some performance testing against. The dGPU is much better for dense models, and slightly faster for MoE if I’m willing to run them at a lower quant – but uses more power and has annoying coil whine. The Framework Desktop uses ~100W under load, is quieter, and for the MoE models already runs them fast enough for most of my needs – so most of my LLM use happens on that system still.

    For software: I’m using ollama on the Framework currently, but I want to replace it with just using llama.cpp directly eventually. I’ve been using llama-cli for testing the dGPU. I wrote my own chat client to interact with ollama as well as a few other programs for specific tasks.

    I’ve been using the LLMs for a mix of research (both personal and professional), entertainment, practical coding tasks (mostly debugging and brainstorming, plus a bit of UI prototyping, automatic generation of sequence diagrams for documentation, and light scripting), as well as automation of tedious tasks.

    As an example of the latter, people often send me requests to prepare data sets by email but don’t specify the sources they want precisely so I have to go match the name against the real name in our archives; LLMs are great for mapping the imperfect name – with typos, missing prefixes, incorrect addition of spaces, addition/removal of hyphens, etc. – to the exact name I actually need to pull the data off disk when given a lookup table to compare against.

    As far as models go, I’m mostly using various Qwen 3.6 and Gemma4 variants. I have multiple versions of each for different purposes. llmfan46’s uncensored Qwen 3.6 35B-A3B @ Q6_K (from Hugging Face) is my default model currently.


  • Should be trivial to set up something like that if you’ve got parts you want to work with. Any desktop with an automatic background switcher should be able to cycle through images in a directory you specify on a timer. Set up your favorite remote access software (SSH, Samba, NFS …) and you’re done. If you want more control over the behavior, you could script up something custom with a little more effort – but it’s still not particularly hard to implement something like that.

    Watch out for burn in on the screen if you’re leaving it on all the time.


  • People have already covered most of the tools I typically use, but one I haven’t seen listed yet that is sometimes convenient is python3 -m http.server which runs a small web server that shares whatever is in the directory you launched it from. I’ve used that to download files onto my phone before when I didn’t have the right USB cables/adapters handy as well as for getting data out of VMs when I didn’t want to bother setting up something more complex.