Lighthouse CI can block merges or become a job teams ignore. Here is how we split pre-merge checks, bundle limits, and scheduled PageSpeed monitoring so CI only tests what it can do reliably.
A pull request passed Lighthouse CI on Friday and the preview URL looked fine, but on Monday scheduled monitoring flagged checkout INP on the same release.
The merge was not obviously wrong. The pipeline had checked the homepage and a marketing template. The regression was on a route the workflow never tested. That gap is normal once you track more than a handful of URLs, which is why we stopped using CI as a stand-in for production monitoring and limited it to a small set of pre-merge checks.
Below is what we put in the pipeline, what we keep in scheduled PageSpeed monitoring, and what we removed because it only produced failed builds that teams stopped trusting.
When a green CI build still breaks production webperf
CI answers one question: did this candidate build cross the thresholds we set on the URLs we test in a lab environment?
Production monitoring answers another: are the URLs clients care about still inside budget after deploy, next week, and on templates we never added to the workflow file?
The problems we see most often:
- CI tests the wrong URLs: it guards
/ and /pricing while commerce runs on /checkout and /cart.
- Preview differs from production: CDN rules, auth, feature flags, or third-party scripts differ between preview hosts and live domains.
- Unstable runs: LCP and INP swing run to run on cold CI runners; strict errors on metrics that vary run to run train teams to ignore the job.
- Too much in one pipeline: the job tries to enforce field CrUX, SEO rank proxies, or twelve templates on every docs-only PR.
The fix is not more Lighthouse runs on every push. It is choosing checks CI can run reliably and moving the rest to monitoring, with a written rule for who tests what after deploy.
We treat performance budget CI gates as hard stops on merge or deploy preview, not as a report for executives.
Block (error) when the check is stable and linked to what the PR changed:
- CLS and layout stability on the templates in scope (lab, but repeatable).
- Total blocking time or max potential FID proxies when the change touches main-thread JavaScript.
- Resource summary limits when bundle size is the likely regression (script bytes, third-party count).
- Performance category floor only when the team has stable reference scores on the preview host; otherwise it becomes a warning first.
Warn when the metric is useful but unstable until preview scores stabilise:
- LCP on preview URLs with ads, A/B snippets, or cold CDN edges.
- INP unless the PR touches interaction-heavy UI and you run enough repetitions to trust the median.
Do not gate in CI (monitor instead):
- Field CrUX or Search Console exports (belong after deploy, not in CI).
- Rank or traffic proxies.
- Every template in the sitemap on every commit.
Copy the numbers from the same policy doc you use in monitoring. Our Watcher guide on performance budgets for web teams defines the metrics and limits; CI should enforce a subset, not a second spreadsheet.
Resource and bundle budgets: what we enforce before Lighthouse runs
Lab scores can stay green while JavaScript weight climbs every sprint. We enforce at least one resource check that does not depend on Lighthouse score swings:
- Main bundle size ceiling (webpack
performance, size-limit, or bundlesize).
- Dependency diff review when a PR adds more than an agreed chunk of kilobytes.
- Image budget on static imports when LCP regressions usually trace to hero assets.
Resource gates are fast, deterministic, and easy to explain in a PR comment ("main bundle +42 KB"). They catch the regressions developers actually introduce. Lighthouse catches how those bytes behave on a throttled mobile profile.
Order matters: run fast checks first. There is little value in a five-minute Lighthouse run when a bundle guard would have stopped the merge in thirty seconds.
Lighthouse CI vs scheduled monitoring: split the pipeline jobs
Run Lighthouse CI on the path that produces a candidate URL: build, preview deploy, health check, then lhci autorun (or collect + assert). Run it on PRs to protected branches and on release tags, with path filters so markdown-only changes do not run Lighthouse unless you want them to.
Run scheduled PageSpeed monitoring after merge: production or staging URLs, client URL groups, a test schedule you can state in a client report, alerts when budgets break in production.
We document the split in onboarding:
| Phase | Owner | Question |
| Pre-merge | CI (LHCI + bundle guards) | Did this branch break budgets on the URLs we test in preview? |
| Post-deploy | Monitoring product | Are production templates still inside budget on the schedule we promised? |
Use the same thresholds in both places. If CI and monitoring use different numbers, engineers ignore both.
For wiring LHCI, assertions, and GitHub Actions patterns, the step-by-step lives on our blog: How to set up performance budgets in CI/CD pipelines. This post is the decision guide before you paste YAML.
Agencies rarely maintain one pipeline. Client A is Shopify with theme checks; Client B is Next.js on Vercel; Client C still ships WordPress with a plugin stack nobody wants to touch in CI.
Our default agency CI template is intentionally small:
- One or two preview URLs the client agrees are highest risk (often homepage plus one revenue template; two LHCI URLs max on small retainers).
- One bundle or size guard when the stack exposes a build artefact.
- Written rule in the README: after deploy, monitoring watches the full URL list the SOW names.
We do not paste the client's full performance budget table into every repo. Procurement and account teams already struggle when monitoring scope is vague; CI scope should be narrower and written in the README ("LHCI runs on these two preview URLs; full template list is in Watcher").
Multi-tenant monitoring tracks many sites over time. CI blocks merges when preview tests show a clear regression on the paths in scope.
CI webperf gates we moved to warnings or removed
After several quarters when the CI job failed too often, we changed policy:
- Removed full-site crawls inside CI (flaky, slow, duplicates monitoring).
- Removed failing PRs on single-run LCP without
numberOfRuns ≥ 2.
- Moved performance score on preview to warn until preview parity with production was documented.
- Moved INP on long-tail templates and logged-in routes to monitoring only unless the PR touches that flow.
- Kept CLS regressions on in-scope templates and bundle ceilings on main entry points as errors.
When the main branch fails every other day, teams bypass the job. Fewer error rules that people act on work better than dozens of metrics.
Internal checklist before adding another CI gate
Before we add a new webperf step to a client pipeline, we ask:
- Does this metric fail for reasons the developer on the PR can fix?
- Is the preview URL stable enough to trust an error (not just a warn)?
- Is the same threshold already monitored post-deploy, or will CI create a second set of numbers?
- Will the job finish in time for the team's PR review workflow (under ~10 minutes for small sites)?
- Can we explain the failure in one sentence in the PR comment?
If two or more answers are no, leave that check in monitoring or in a weekly report, not in merge blocking.
Next step: pick three gates for one repository
Open the client repo where CI fails most often, or your own product site. Write three lines in the README:
- URLs Lighthouse CI tests on preview (max two to start).
- Errors: CLS, bundle KB, or TBT; pick one lab metric you trust.
- Warnings: LCP or performance score until preview scores stay stable.
Implement those before adding a fourth check. Pair with the same numeric budgets you use in monitoring so "green in CI" and "green in the report" mean the same thing.
If you need assertion keys, lighthouserc shapes, and a GitHub Actions sketch, use performance budgets in CI/CD pipelines. If you need the full budget strategy and metric tables first, start with the complete guide to performance budgets.
CI should block merges you would have rolled back anyway. Run everything else on the full URL list in scheduled monitoring, without making every PR wait on Chrome.
Originally published on Hashnode.