Running local AI.

Question

Running local AI.

ricsmwangi posted Apr 16 Originally published at dev.to 1 min read

I had this idea to run AI locally on my own laptop. Just to see if I could. Ended up going with Ollama.

At first it was brutal — all CPU, no GPU, super slow. But I messed around, tweaked some stuff, and finally got it to actually run okay. Not fast, but okay.

Then I went down a rabbit hole. I wanted to know what the models were doing. Like, how hot is my CPU getting? How fast is it spitting out tokens? So I started building my own little monitoring setup. Used C for some low-level stuff, Dash for a live dashboard, Python to glue it all together. Oh and lm-sensors to watch the temps because this thing makes my laptop sweat.

Now I can sit there and watch my models run in real time. Token rate, memory, core temps — all on a dashboard.

Feels good having AI running offline. No cloud, no weird latency, just my machine. And a bunch of scripts I broke and fixed along the way.

If you're thinking about trying local AI, just go for it. Just know you'll end up tinkering way more than you expect. Worth it though.

17 Comments

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

ApogeeWatcherverified · Answer 1 · 2026-04-16T17:16:17+0000

ApogeeWatcherverified • Apr 16

So which models are you running now on what hardware?

ricsmwangi • Apr 20

@[ApogeeWatcher] Right now I’m mostly running lightweight models since I’m on a CPU-focused setup.

Currently testing:

LLaMA 3.2 (1B & 3B)
Phi-3 Mini
Qwen 2.5:3b-Instruct

I’ve been leaning toward the smaller models because they’re surprisingly usable when tuned well, especially for local/offline workflows.

I’m actually planning to dive deeper into this — comparing performance, responsiveness, and real-world usability across these models in upcoming articles. There’s a lot that isn’t obvious until you really start pushing them in a constrained environment.

sarmad · Answer 2 · 2026-04-16T23:14:46+0000

sarmad • Apr 16

Can you share more? Hardware specs, dashboard screen shots, model, example output?

ricsmwangi • Apr 20

@[sarmad] Yeah — I actually covered the hardware specs and full setup in my previous article.

Quick overview: I’m running a CPU-focused environment (no dedicated GPU), so most of the work has been around optimizing lightweight models like LLaMA 3.2 (1B & 3B), Phi-3 Mini, and Qwen 2.5:3b-Instruct to get practical performance.

I haven’t shared dashboard screenshots or structured output comparisons yet — that’s something I’m currently working on. Planning a deeper follow-up where I break down:

model behavior under constraints
latency differences
real-world output examples

If you’re interested in a specific use case or type of output, let me know , I can include it in the next write-up.

sarmad • Apr 23

@[ricsmwangi] I'm interested in few use cases:

Running small models that can be used for semantic searching on a server
Running a coding model locally for development
Running image or video generation models locally. This is to allow using AI to edit/generate personal photos and/or videos. I don't want to upload personal and family videos to public servers

Spyros · Answer 3 · 2026-04-17T06:24:25+0000

Which parts did you build in C, and why did you choose C instead of doing everything in Python?

oluwatosinolamilekan · Answer 4 · 2026-04-17T11:05:42+0000

Super cool project — turning an old laptop into a local AI lab and building your own monitoring stack is next-level tinkering. Respect for sticking through the slow CPU-only phase and making it work. What model are you running on it now?

Jason Mullings · Answer 5 · 2026-04-17T14:30:05+0000

Jason Mullings • Apr 17

Great stuff! I've attempted the same on Docker; however, the token rate is so slow. I'll be keen to know if you find a way to 10X Ollama - please keep us posted.

ricsmwangi • Apr 20

@[Jason Mullings] Yeah, I ran into the same bottleneck early on. Docker adds a bit of overhead, especially on CPU setups.

What helped me was:

Running Ollama directly on Arch instead of Docker
Switching to lighter models (3B range instead of larger ones)
Tweaking thread usage and keeping the system as “clean” as possible during runs

It’s not quite 10x yet, but the difference is noticeable. I’m experimenting with a few optimizations — if I manage to push it further, I’ll definitely share an update.

Steve Fentonverified · Answer 6 · 2026-04-17T16:22:23+0000

Steve Fentonverified • Apr 17

I'm trying to do this more, but I have to admit the draw to the slightly better results of the commercial tools does keep drawing me back, because I don't have to spend so long on prompts (even though the end result is sometimes similar). What local models are getting you the best results?

ricsmwangi • Apr 20

@[Steve Fenton] That’s a fair point — commercial tools still have an edge in consistency and convenience.

For local setups, I’ve been getting the best balance from smaller instruction-tuned models (3B–7B range), especially when paired with well-structured prompts. The key difference I’ve noticed is that local models reward precision more — you have to “guide” them a bit more deliberately.

The upside though is control — no latency spikes, no token costs, and full offline capability. I’m currently experimenting with ways to reduce prompt overhead and make interactions feel more natural, closer to commercial tools.

yogirahul · Answer 7 · 2026-04-18T06:46:33+0000

yogirahul • Apr 18

I wanted to do this man...... how long it took you to make it happen ?

ricsmwangi • Apr 20

Honestly, it wasn’t a straight path

Took me a couple of weeks of trial and error , breaking things, reinstalling, testing different models, and figuring out what actually works on my hardware.

The tricky part wasn’t just setting it up, but optimizing it to feel usable day-to-day. Once that clicked, everything started making sense.

If you’re planning to try it, just expect a bit of chaos at the start, but it’s definitely worth it.

onchainintelverified · Answer 8 · 2026-04-18T19:09:49+0000

I did something similar. Bought a cheap SSD and threw it in my gaming desktop. Installed Debian linux on it and used a dual boot so when I want I can switch to linux, check on local models that are running my agents with it. Have a good GPU but not top tier (Nvidia 3080) so performance is amazing and I am not burning tokens making calls out to Claude in the Cloud. Highly recommend!

	Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download) Pocket Portfolio - Apr 1
	Architecting a Local-First Hybrid RAG for Finance Pocket Portfolio - Feb 25
	Local-First: The Browser as the Vault Pocket Portfolio - Apr 20
	AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems praneeth - Mar 31
	Your AI Doesn't Just Write Tests. It Runs Them Too. Kevin Martinez - May 12

Running local AI.

17 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

Architecting a Local-First Hybrid RAG for Finance

Local-First: The Browser as the Vault

AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems

Your AI Doesn't Just Write Tests. It Runs Them Too.

More From ricsmwangi

Your AI Underwriter Said No. Now Explain It. (Or Get Sued.)

What has been bugging my mind lately:

No GPU? No problem!, running local AI efficiently on my CPU.

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,338 amazing developers

Don't have an account? Sign up

OR

Running local AI.

17 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From ricsmwangi

Related Jobs

Commenters (This Week)