The Struggle
I remember the payment bug that kept me up until 3 AM.
Stripe was sending a invoice.payment_failed webhook, but only in production.
I checked my logs: Truncated.
I checked my tunneling tool: Session expired.
I checked my SaaS bin: History limit reached.
I realized I didn't have a debugging tool; I had a toy.
The Solution: Webhook Debugger
I decided to build my own solution. But I didn't just want a "bucket" that catches requests. I wanted to build a Reference Implementation for how a modern, secure Node.js application should look in 2026.
Here are the 13 Engineering Patterns I used to build it:
1. Global SSE Heartbeat & Padding
Most SSE implementations leak memory by creating a timer per connection.
My Approach: A single setInterval iterates a Set of clients.
The Pro Tip: I added res.write(' '.repeat(2048)) (2KB whitespace) and X-Accel-Buffering: no headers. Why? Because corporate firewalls (and Nginx) love buffering streams. The padding forces them to flush the connection immediately.
2. SSRF Protection (DNS & IP Verification) ️
Allowing user-defined webhooks is dangerous (Server-Side Request Forgery).
The Fix: I wrote a custom validator in src/utils/ssrf.js that resolves the DNS before the request. It checks the IP against a blocklist of private ranges (RFC 1918) and cloud metadata services (169.254.169.254). It even handles IPv4-mapped IPv6 addresses (::ffff:127.0.0.1).
3. Deep Replay with Exponential Backoff
Retrying a failed webhook isn't just "try again".
The Logic: If the destination yields a transient error (ECONNABORTED, 503), the system waits 1s, then 2s, then 4s.
Header Stripping: The replay engine automatically strips sensitive headers (Authorization, Cookie) so you don't accidentally send production credentials to your local dev environment.
4. Timing-Safe Authentication ⏱️
Never compare API keys with ===.
The Attack: An attacker can measure how long your server takes to say "No" to guess the key character-by-character.
The Fix: I use crypto.timingSafeEqual in src/utils/auth.js to ensure the comparison takes the exact same time whether the key is 99% correct or 0% correct.
5. Memory-Safe Rate Limiting (LRU)
Standard rate limiters are often purely in-memory maps. If a botnet hits you with 1 million IPs, your server crashes (OOM).
The Pattern: My RateLimiter uses a Sliding Window with LRU Eviction. It hard-caps at 1,000 entries. If the map is full, the oldest IP is evicted to make room. It prioritizes stability over strictness.
6. Memory-Safe Dataset Filtering
Searching for a single timestamp in a 1GB JSON dataset will crash a standard Node.js process.
The Solution: Iterative Pagination. The /replay endpoint reads chunks of 1000 items (dataset.getData({ limit, offset })), searches for the event ID, and fetches the next chunk only if not found. This ensures we never load the entire dataset into memory.
Inputs from the wild are messy. Strings look like numbers; booleans look like strings.
The Pattern: A dedicated coerceRuntimeOptions utility in src/utils/config.js recursively walks the input object, coercing "true" -> true and "5" -> 5, ensuring the runtime configuration isn't crashed by type mismatches.
8. Index.html Caching
We serve a UI, but we aren't a CDN.
The Optimization: The index.html template is read from disk once at startup and cached in a string variable (indexTemplate). Placeholders like {{VERSION}} are replaced on-the-fly using escapeHtml(), but the disk I/O cost is paid only once.
9. Bootstrap Validation Logic
What happens if the user manually edits the INPUT.json and breaks the JSON syntax?
Self-Healing: The ensureLocalInputExists function in src/utils/bootstrap.js detects corrupt JSON on startup. Instead of crashing, it automatically renames the bad file to .tmp and writes a fresh default configuration, logging a warning. The app always starts.
Every millisecond counts.
Content-Encoding: identity: Disables gzip for the SSE stream (gzip buffers, which kills real-time).
Cache-Control: no-cache: Forces browsers to verify the stream status.
Connection: keep-alive: Critical for long-lived streams.
11. Testing Asynchronous Code
Testing streaming and retries is notoriously hard.
The Strategy: We use jest with custom helpers (waitForCondition in tests/helpers/test-utils.js) and mocked timers. In resilience.test.js, we mock axios to fail exactly twice with ECONNABORTED to verify the retry logic attempts exactly 3 times before giving up.
12. Hot-Reloading (Zero Downtime Config)
The Problem: Restarting the server just to change the API key or add a webhook URL loses all SSE connections.
The Solution: A background poller in src/main.js reads the INPUT.json from the Key-Value Store every 5 seconds. When a change is detected:
- It diffs the new config against the old one.
- It updates middleware (body parser limits, rate limiter), auth keys, and webhook counts dynamically.
- It reconciles the webhook IDs: if the user increased
urlCount, new IDs are generated; if decreased, no IDs are removed to prevent data loss.
This is all enabled by the loggerMiddleware.updateOptions() function, which allows runtime reconfiguration of the logger instance.
13. Escape the "SaaS Tax" (Self-Hosting)
If you are an agency handling 50 clients, paying $30/mo per seat for debugging tools adds up.
Since this is a standard Dockerized Node.js app, you can deploy it to your own generic VPS.
FROM apify/actor-node:20
COPY . .
RUN npm install
CMD npm start
GitHub Repo (v2.8.7 is out now!)