Every time I opened a terminal, I waited. Not long — maybe a second and a half — but long enough to notice. Long enough to be annoying. I finally decided to profile my zsh startup, and what I found took it from 1.4 seconds down to 53 milliseconds.
Here's what I learned.
Profiling with zprof
Zsh has a built-in profiler. Add zmodload zsh/zprof at the top of your .zshrc and zprof at the bottom, then open a new shell:
# top of .zshrc
zmodload zsh/zprof
# ... your config ...
# bottom of .zshrc
zprof
My initial profile told a clear story:
| Culprit | Time | % of startup |
NVM (nvm.sh) | ~430ms | 31% |
| Completion subprocesses (kubectl, helm, gh, ...) | ~400ms | 29% |
compinit (full rebuild every time) | ~240ms | 17% |
brew shellenv | ~30ms | 2% |
go env GOPATH | ~20ms | 1% |
| Everything else | ~280ms | 20% |
Four of these five are subprocess calls — things like `eval "$(brew shellenv)"` or `source <(kubectl completion zsh)` that fork a process just to produce some static text. That's the low-hanging fruit.
## Optimization 1: Lazy-load NVM
NVM was the single biggest offender. Sourcing `nvm.sh` on every shell startup cost ~430ms, and I don't use `node` in every terminal session. The fix: wrapper functions that defer loading until you actually call `nvm`, `node`, `npm`, etc.
**Before:**
```zsh
export NVM_DIR="$HOME/.nvm"
[ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && . "/opt/homebrew/opt/nvm/nvm.sh"
[ -s "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm" ] && . "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm"
**After:**
```zsh
export NVM_DIR="$HOME/.nvm"
_nvm_lazy_load() {
unfunction nvm node npm npx corepack 2>/dev/null
[ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && \. "/opt/homebrew/opt/nvm/nvm.sh"
[ -s "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm" ] && \. "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm"
}
nvm() { _nvm_lazy_load; nvm "$@" }
node() { _nvm_lazy_load; node "$@" }
npm() { _nvm_lazy_load; npm "$@" }
npx() { _nvm_lazy_load; npx "$@" }
corepack() { _nvm_lazy_load; corepack "$@" }
The wrapper functions replace themselves on first call via unfunction, then delegate to the real command. Cost at startup: zero. Cost on first node invocation: ~430ms (once).
Optimization 2: Hardcode static values
Several lines in my config were spawning subprocesses to compute values that never change:
# Before — subprocess every startup
eval "$(/opt/homebrew/bin/brew shellenv)"
export PATH="$PATH:$(go env GOPATH)/bin"
. "$HOME/.cargo/env"
These produce the same output every time. Just paste the result directly:
# After — zero subprocesses
export HOMEBREW_PREFIX="/opt/homebrew"
export HOMEBREW_CELLAR="/opt/homebrew/Cellar"
export HOMEBREW_REPOSITORY="/opt/homebrew"
export PATH="/opt/homebrew/bin:/opt/homebrew/sbin:$PATH"
[ -z "${MANPATH-}" ] || export MANPATH=":${MANPATH#:}"
export INFOPATH="/opt/homebrew/share/info:${INFOPATH:-}"
export GOPATH="$HOME/go"
export PATH="$PATH:$GOPATH/bin"
export PATH="$HOME/.cargo/bin:$PATH"
```
Leave a comment like `# regenerate with: brew shellenv` so future-you knows where the values came from.
## Optimization 3: Cache completions into fpath
This was the big one. My original config eagerly sourced completions from 12 different tools on every shell startup:
```zsh
# Before — 12 subprocesses, every startup
command -v kubectl &>/dev/null && source <(kubectl completion zsh)
command -v helm &>/dev/null && source <(helm completion zsh)
command -v minikube &>/dev/null && source <(minikube completion zsh)
command -v gh &>/dev/null && source <(gh completion -s zsh)
# ... 8 more tools
```
Each `source <(tool completion zsh)` forks a subprocess AND evaluates thousands of lines of shell code. Minikube's completion alone is 5,000 lines.
The fix has two parts:
**For completions:** write them to files in an fpath directory. Compinit loads these lazily — only when you actually press TAB on that command:
```zsh
ZSH_COMP_CACHE="$HOME/.zsh-completion-cache"
[[ -d "$ZSH_COMP_CACHE" ]] || mkdir -p "$ZSH_COMP_CACHE"
_cache_fpath() {
local name="$1"; shift
local cache_file="$ZSH_COMP_CACHE/_$name"
local -a stale=($cache_file(N.mh+24))
if [[ ! -f "$cache_file" ]] || (( $#stale )); then
"$@" > "$cache_file" 2>/dev/null
fi
}
command -v kubectl &>/dev/null && _cache_fpath kubectl kubectl completion zsh
command -v helm &>/dev/null && _cache_fpath helm helm completion zsh
# ... etc
fpath=($ZSH_COMP_CACHE $fpath)
```
**For plugins that must run at startup** (fzf keybindings, direnv hook, oh-my-posh prompt), cache their init output and `zcompile` for faster sourcing:
```zsh
_cache_source() {
local name="$1"; shift
local cache_file="$ZSH_COMP_CACHE/$name.zsh"
local -a stale=($cache_file(N.mh+24))
if [[ ! -f "$cache_file" ]] || (( $#stale )); then
"$@" > "$cache_file" 2>/dev/null
zcompile "$cache_file" 2>/dev/null
fi
source "$cache_file"
}
_cache_source fzf fzf --zsh
_cache_source direnv direnv hook zsh
_cache_source oh-my-posh oh-my-posh init zsh --config ~/.poshthemes/theme.omp.json --print
Both functions use a 24-hour cache expiry via zsh glob qualifiers. Delete ~/.zsh-completion-cache to force a refresh.
I also cached compinit itself — a full rebuild only runs once per day, and otherwise compinit -C skips straight to the dump file:
autoload -Uz compinit
local -a zcompdump_stale=(~/.zcompdump(N.mh+24))
if (( $#zcompdump_stale )); then
compinit
else
compinit -C
fi
{ zcompile ~/.zcompdump } &!
```
## The bug that almost ruined everything
After implementing all of this, I ran `time zsh -i -c exit`. The result: **1.59 seconds**. *Slower* than before.
I profiled again and saw this:
```text
num calls time self name
-----------------------------------------------------------------
1) 15 1180.06 97.34% 1169.02 96.43% _cache_completion
2) 1 26.83 2.21% 7.49 0.62% compinit
```
The caching function was taking 97% of startup time across 15 calls. The caches existed on disk but were being **regenerated every single time**. The staleness check was broken.
I restructured the approach — separating completions (fpath-based, lazy) from plugins (source-based, eager) — and tried again. Same problem: `_cache_fpath` at 72%, `compinit` doing full rebuilds.
The bug was in this line:
```zsh
if [[ ! -f "$cache_file" || -n "$cache_file"(#qN.mh+24) ]]; then
```
This looks reasonable. The glob qualifier `(#qN.mh+24)` means "match if the file is older than 24 hours, with N (nullglob) to return empty string if no match." The `-n` test checks if the result is non-empty.
**The problem: glob qualifiers don't expand inside `[[ ]]`.**
Zsh's `[[ ]]` conditional construct does not perform filename generation (globbing). The string `"$cache_file"(#qN.mh+24)` is treated as the literal path with `(#qN.mh+24)` appended as text. Since that string is always non-empty, the condition is **always true**. Every cache was being regenerated on every startup. The caching was doing nothing.
The same bug affected the `compinit` staleness check:
```zsh
# Also broken — compinit was doing a full rebuild every time
if [[ -n ~/.zcompdump(#qN.mh+24) ]]; then
The fix: expand the glob into an array variable first, then check its length:
local -a stale=($cache_file(N.mh+24))
if [[ ! -f "$cache_file" ]] || (( $#stale )); then
```
Regular variable assignments DO perform globbing. The `(N.mh+24)` qualifier (no `#q` prefix needed outside `[[ ]]`) expands the glob, and `$#stale` gives us the match count. If the file is older than 24 hours, `stale` contains one element; otherwise it's empty.
This is a subtle footgun. The code looks correct, it doesn't produce errors, and the caches *are* created — they're just never *reused*. Without profiling, you'd never know.
## Result
```console
$ time zsh -i -c exit
zsh -i -c exit 0.03s user 0.02s system 93% cpu 0.053 total
53 milliseconds. A 96% reduction from 1.4 seconds.
Here's what each optimization contributed:
| Optimization | Savings |
| Lazy-load NVM | ~430ms |
| Cache completions into fpath (lazy compinit) | ~500ms |
| Cache plugin init scripts + zcompile | ~200ms |
| Hardcode brew/go/cargo | ~50ms |
| compinit -C (cached dump) | ~170ms |
| Total | ~1,350ms |
The first shell open after 24 hours takes a couple of seconds to regenerate caches, but every subsequent shell is instant. You can force a full refresh anytime:
rm -rf ~/.zsh-completion-cache ~/.zcompdump*
Takeaways
- Profile first.
zprof told me exactly where the time was going. Don't guess.
- Subprocess calls add up. Each
eval $(...) or source <(...) forks a process. Twelve of them cost almost a full second.
- fpath > source for completions. Compinit loads completion functions lazily from fpath. Don't eagerly source thousands of lines you might never use.
- Test your caching actually works. A cache that regenerates every time is worse than no cache — it has the overhead of both the generation AND the file I/O.
- Glob qualifiers don't work inside
[[ ]]. This is the kind of bug that looks correct, produces no errors, and silently destroys your performance. Expand globs into variables first.