So which models are you running now on what hardware?
Running local AI.
8 Comments
@[ApogeeWatcher] Right now I’m mostly running lightweight models since I’m on a CPU-focused setup.
Currently testing:
- LLaMA 3.2 (1B & 3B)
- Phi-3 Mini
- Qwen 2.5:3b-Instruct
I’ve been leaning toward the smaller models because they’re surprisingly usable when tuned well, especially for local/offline workflows.
I’m actually planning to dive deeper into this — comparing performance, responsiveness, and real-world usability across these models in upcoming articles. There’s a lot that isn’t obvious until you really start pushing them in a constrained environment.
Please log in to add a comment.
@[sarmad] Yeah — I actually covered the hardware specs and full setup in my previous article.
Quick overview: I’m running a CPU-focused environment (no dedicated GPU), so most of the work has been around optimizing lightweight models like LLaMA 3.2 (1B & 3B), Phi-3 Mini, and Qwen 2.5:3b-Instruct to get practical performance.
I haven’t shared dashboard screenshots or structured output comparisons yet — that’s something I’m currently working on. Planning a deeper follow-up where I break down:
- model behavior under constraints
- latency differences
- real-world output examples
If you’re interested in a specific use case or type of output, let me know , I can include it in the next write-up.
@[ricsmwangi] I'm interested in few use cases:
- Running small models that can be used for semantic searching on a server
- Running a coding model locally for development
- Running image or video generation models locally. This is to allow using AI to edit/generate personal photos and/or videos. I don't want to upload personal and family videos to public servers
Please log in to add a comment.
@[Jason Mullings] Yeah, I ran into the same bottleneck early on. Docker adds a bit of overhead, especially on CPU setups.
What helped me was:
- Running Ollama directly on Arch instead of Docker
- Switching to lighter models (3B range instead of larger ones)
- Tweaking thread usage and keeping the system as “clean” as possible during runs
It’s not quite 10x yet, but the difference is noticeable. I’m experimenting with a few optimizations — if I manage to push it further, I’ll definitely share an update.
Please log in to add a comment.
@[Steve Fenton] That’s a fair point — commercial tools still have an edge in consistency and convenience.
For local setups, I’ve been getting the best balance from smaller instruction-tuned models (3B–7B range), especially when paired with well-structured prompts. The key difference I’ve noticed is that local models reward precision more — you have to “guide” them a bit more deliberately.
The upside though is control — no latency spikes, no token costs, and full offline capability. I’m currently experimenting with ways to reduce prompt overhead and make interactions feel more natural, closer to commercial tools.
Please log in to add a comment.
Honestly, it wasn’t a straight path
Took me a couple of weeks of trial and error , breaking things, reinstalling, testing different models, and figuring out what actually works on my hardware.
The tricky part wasn’t just setting it up, but optimizing it to feel usable day-to-day. Once that clicked, everything started making sense.
If you’re planning to try it, just expect a bit of chaos at the start, but it’s definitely worth it.
Please log in to add a comment.
I did something similar. Bought a cheap SSD and threw it in my gaming desktop. Installed Debian linux on it and used a dual boot so when I want I can switch to linux, check on local models that are running my agents with it. Have a good GPU but not top tier (Nvidia 3080) so performance is amazing and I am not burning tokens making calls out to Claude in the Cloud. Highly recommend!
@[onchainintel] That’s a solid setup — especially with a 3080, you’re in a really good spot for local AI.
I’m currently working with a more CPU-focused environment, so I’ve had to optimize a bit differently — lighter models, tighter configurations, and focusing on efficiency over raw power.
But I like your approach with the dual boot and SSD — that kind of flexibility makes experimenting much easier. Definitely a direction I’d explore more if I upgrade hardware.
Please log in to add a comment.
Please log in to comment on this post.
More Posts
- © 2026 Coder Legion
- Feedback / Bug
- Privacy
- About Us
- Contacts
- Premium Subscription
- Terms of Service
- Refund
- Early Builders
More From ricsmwangi
Related Jobs
- Solution Backend Software Engineer - Local ServiceTik Tok · Full time · San Jose, CA
- Sr Network Engineer with Min 16yrs exp (Webcam Interview)(100% ONSITE_ Only Local to DMV Area)Advanced American Technologies, Inc · Full time · Sweden
- Power Platform SME. with Min 11yrs exp (Webcam Interview)(100% ONSITE_Only Local to DMV Area)Advanced American Technologies, Inc · Full time · Sweden
Commenters (This Week)
Contribute meaningful comments to climb the leaderboard and earn badges!