What performance checks belong in CI (and what belongs in monitoring)

Question

What performance checks belong in CI (and what belongs in monitoring)

calendar_todayJun 9 • schedule6 min read

— Originally published at apogeewatcher.hashnode.dev

Lighthouse CI can block merges or become a job teams ignore. Here is how we split pre-merge checks, bundle limits, and scheduled PageSpeed monitoring so CI only tests what it can do reliably.

A pull request passed Lighthouse CI on Friday and the preview URL looked fine, but on Monday scheduled monitoring flagged checkout INP on the same release.

The merge was not obviously wrong. The pipeline had checked the homepage and a marketing template. The regression was on a route the workflow never tested. That gap is normal once you track more than a handful of URLs, which is why we stopped using CI as a stand-in for production monitoring and limited it to a small set of pre-merge checks.

Below is what we put in the pipeline, what we keep in scheduled PageSpeed monitoring, and what we removed because it only produced failed builds that teams stopped trusting.

When a green CI build still breaks production webperf

CI answers one question: did this candidate build cross the thresholds we set on the URLs we test in a lab environment?

Production monitoring answers another: are the URLs clients care about still inside budget after deploy, next week, and on templates we never added to the workflow file?

The problems we see most often:

CI tests the wrong URLs: it guards / and /pricing while commerce runs on /checkout and /cart.
Preview differs from production: CDN rules, auth, feature flags, or third-party scripts differ between preview hosts and live domains.
Unstable runs: LCP and INP swing run to run on cold CI runners; strict errors on metrics that vary run to run train teams to ignore the job.
Too much in one pipeline: the job tries to enforce field CrUX, SEO rank proxies, or twelve templates on every docs-only PR.

The fix is not more Lighthouse runs on every push. It is choosing checks CI can run reliably and moving the rest to monitoring, with a written rule for who tests what after deploy.

Performance budget CI gates: what we block on merge

We treat performance budget CI gates as hard stops on merge or deploy preview, not as a report for executives.

Block (error) when the check is stable and linked to what the PR changed:

CLS and layout stability on the templates in scope (lab, but repeatable).
Total blocking time or max potential FID proxies when the change touches main-thread JavaScript.
Resource summary limits when bundle size is the likely regression (script bytes, third-party count).
Performance category floor only when the team has stable reference scores on the preview host; otherwise it becomes a warning first.

Warn when the metric is useful but unstable until preview scores stabilise:

LCP on preview URLs with ads, A/B snippets, or cold CDN edges.
INP unless the PR touches interaction-heavy UI and you run enough repetitions to trust the median.

Do not gate in CI (monitor instead):

Field CrUX or Search Console exports (belong after deploy, not in CI).
Rank or traffic proxies.
Every template in the sitemap on every commit.

Copy the numbers from the same policy doc you use in monitoring. Our Watcher guide on performance budgets for web teams defines the metrics and limits; CI should enforce a subset, not a second spreadsheet.

Resource and bundle budgets: what we enforce before Lighthouse runs

Lab scores can stay green while JavaScript weight climbs every sprint. We enforce at least one resource check that does not depend on Lighthouse score swings:

Main bundle size ceiling (webpack performance, size-limit, or bundlesize).
Dependency diff review when a PR adds more than an agreed chunk of kilobytes.
Image budget on static imports when LCP regressions usually trace to hero assets.

Resource gates are fast, deterministic, and easy to explain in a PR comment ("main bundle +42 KB"). They catch the regressions developers actually introduce. Lighthouse catches how those bytes behave on a throttled mobile profile.

Order matters: run fast checks first. There is little value in a five-minute Lighthouse run when a bundle guard would have stopped the merge in thirty seconds.

Lighthouse CI vs scheduled monitoring: split the pipeline jobs

Run Lighthouse CI on the path that produces a candidate URL: build, preview deploy, health check, then lhci autorun (or collect + assert). Run it on PRs to protected branches and on release tags, with path filters so markdown-only changes do not run Lighthouse unless you want them to.

Run scheduled PageSpeed monitoring after merge: production or staging URLs, client URL groups, a test schedule you can state in a client report, alerts when budgets break in production.

We document the split in onboarding:

Phase	Owner	Question
Pre-merge	CI (LHCI + bundle guards)	Did this branch break budgets on the URLs we test in preview?
Post-deploy	Monitoring product	Are production templates still inside budget on the schedule we promised?

Use the same thresholds in both places. If CI and monitoring use different numbers, engineers ignore both.

For wiring LHCI, assertions, and GitHub Actions patterns, the step-by-step lives on our blog: How to set up performance budgets in CI/CD pipelines. This post is the decision guide before you paste YAML.

Performance budgets in CI for agency client repositories

Agencies rarely maintain one pipeline. Client A is Shopify with theme checks; Client B is Next.js on Vercel; Client C still ships WordPress with a plugin stack nobody wants to touch in CI.

Our default agency CI template is intentionally small:

One or two preview URLs the client agrees are highest risk (often homepage plus one revenue template; two LHCI URLs max on small retainers).
One bundle or size guard when the stack exposes a build artefact.
Written rule in the README: after deploy, monitoring watches the full URL list the SOW names.

We do not paste the client's full performance budget table into every repo. Procurement and account teams already struggle when monitoring scope is vague; CI scope should be narrower and written in the README ("LHCI runs on these two preview URLs; full template list is in Watcher").

Multi-tenant monitoring tracks many sites over time. CI blocks merges when preview tests show a clear regression on the paths in scope.

CI webperf gates we moved to warnings or removed

After several quarters when the CI job failed too often, we changed policy:

Removed full-site crawls inside CI (flaky, slow, duplicates monitoring).
Removed failing PRs on single-run LCP without numberOfRuns ≥ 2.
Moved performance score on preview to warn until preview parity with production was documented.
Moved INP on long-tail templates and logged-in routes to monitoring only unless the PR touches that flow.
Kept CLS regressions on in-scope templates and bundle ceilings on main entry points as errors.

When the main branch fails every other day, teams bypass the job. Fewer error rules that people act on work better than dozens of metrics.

Internal checklist before adding another CI gate

Before we add a new webperf step to a client pipeline, we ask:

Does this metric fail for reasons the developer on the PR can fix?
Is the preview URL stable enough to trust an error (not just a warn)?
Is the same threshold already monitored post-deploy, or will CI create a second set of numbers?
Will the job finish in time for the team's PR review workflow (under ~10 minutes for small sites)?
Can we explain the failure in one sentence in the PR comment?

If two or more answers are no, leave that check in monitoring or in a weekly report, not in merge blocking.

Next step: pick three gates for one repository

Open the client repo where CI fails most often, or your own product site. Write three lines in the README:

URLs Lighthouse CI tests on preview (max two to start).
Errors: CLS, bundle KB, or TBT; pick one lab metric you trust.
Warnings: LCP or performance score until preview scores stay stable.

Implement those before adding a fourth check. Pair with the same numeric budgets you use in monitoring so "green in CI" and "green in the report" mean the same thing.

If you need assertion keys, lighthouserc shapes, and a GitHub Actions sketch, use performance budgets in CI/CD pipelines. If you need the full budget strategy and metric tables first, start with the complete guide to performance budgets.

CI should block merges you would have rolled back anyway. Run everything else on the full URL list in scheduled monitoring, without making every PR wait on Chrome.

Originally published on Hashnode.

2 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

Jarod42 · Answer 1 · 2026-06-11T05:13:51+0000

Jarod42 • Jun 11

Makes sense. Do you think lightweight load tests belong in every PR, or only before releases?

ApogeeWatcherverified • Jun 11

@[Jarod42] Depends on whether anything from the frontend is affected most of the time. Need to be careful about introducing too much noise to the process.

	Free perf tools: what belongs in a stack vs what belongs in a buyer's matrix ApogeeWatcherverified - Jul 1
	When the performance budget line item scares procurement (and what to send instead) ApogeeWatcherverified - Jun 8
	INP in production: what we wish we had measured earlier ApogeeWatcherverified - May 28
	Outgrowing GTmetrix: what agencies ask next ApogeeWatcherverified - Jun 10
	Portfolio dashboards: what multi-tenant changes day to day ApogeeWatcherverified - Jun 7

What performance checks belong in CI (and what belongs in monitoring)

When a green CI build still breaks production webperf

Performance budget CI gates: what we block on merge

Resource and bundle budgets: what we enforce before Lighthouse runs

Lighthouse CI vs scheduled monitoring: split the pipeline jobs

Performance budgets in CI for agency client repositories

CI webperf gates we moved to warnings or removed

Internal checklist before adding another CI gate

Next step: pick three gates for one repository

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Free perf tools: what belongs in a stack vs what belongs in a buyer's matrix

When the performance budget line item scares procurement (and what to send instead)

INP in production: what we wish we had measured earlier

Outgrowing GTmetrix: what agencies ask next

Portfolio dashboards: what multi-tenant changes day to day

More From ApogeeWatcherverified

Give agents tools, not a scraped DOM: WebMCP on the page

MCP versus API: what assistants need that your REST endpoints do not spell out

MCP for SaaS: when ChatGPT and Claude need secure access to user data

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,758 amazing developers

Don't have an account? Sign up

OR

What performance checks belong in CI (and what belongs in monitoring)

When a green CI build still breaks production webperf

Performance budget CI gates: what we block on merge

Resource and bundle budgets: what we enforce before Lighthouse runs

Lighthouse CI vs scheduled monitoring: split the pipeline jobs

Performance budgets in CI for agency client repositories

CI webperf gates we moved to warnings or removed

Internal checklist before adding another CI gate

Next step: pick three gates for one repository

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From ApogeeWatcherverified

Related Jobs

Commenters (This Week)