
Over the past few weeks, I’ve been experimenting with Ollama to run local models on my machine.
Here’s what I discovered:
⚡ Performance & Speed
- Lightweight models like Gemma 3B and Phi-3 Mini run surprisingly well even on constrained hardware.
- Caching and modular setup helped me keep inference times low.
️ Workflow & Setup
- Ollama’s model management makes it easy to swap and curate a lean library.
- I trimmed unused models and focused on a 3-model setup under 2GB each for rapid prototyping.
Use Cases
- Built a Streamlit chatbot powered by Gemma 3B:1 — smooth local inference with a polished UI.
- Compared models side by side for reasoning depth vs. speed, which gave me practical insights into trade-offs.
Takeaways
- Local deployment isn’t just about independence from cloud — it forces you to think strategically about efficiency.
- Branding, UI polish, and repo structure matter just as much as raw performance when sharing projects.
I’d love to hear how others are balancing model variety vs. hardware constraints.
What’s your favourite local llm model ?