Building a Production-Ready NDVI Pipeline: Normalization Fixes, WAF Resilience, and System Stability

Question

Building a Production-Ready NDVI Pipeline: Normalization Fixes, WAF Resilience, and System Stability

Rahim8050 posted Apr 5 3 min read

From “It Works” to Production: Fixing NDVI Rendering, Surviving WAF Blocks, and Stabilizing a Failing Server

Most backend systems don’t fail because of obvious bugs.
They fail quietly—through misleading data, unstable infrastructure, and assumptions that don’t hold under load.

This week, I pushed an NDVI (Normalized Difference Vegetation Index) pipeline through a full transition:

from “it renders images” → to a production-grade, resilient system

This article breaks down what actually went wrong—and how each fix improved the system holistically.

1. The Illusion of Correctness: When “Green” Lies

The first issue looked harmless:
NDVI raster PNGs were rendering—but everything was green.

Root Cause

The normalization formula:

normalized = (ndvi + 1.0) / 2.0

This assumes NDVI values span [-1, 1].

But real-world farm data was tighter:

NDVI range: 0.42 → 0.63

After normalization:

0.71 → 0.81 → Entirely “green zone”

So the system was technically correct—but visually misleading.

2. The Fix: Dual Normalization Strategy

Instead of replacing one approach with another, I introduced two modes:

Histogram-Based Normalization (Default)

normalized = (ndvi - ndvi_min) / (ndvi_max - ndvi_min)

Expands contrast per image
Reveals actual variation
Best for analysis

Fixed Normalization (Legacy)

normalized = (ndvi + 1.0) / 2.0

Keeps consistency across images
Useful for comparisons

Why This Matters

This isn’t just a rendering fix—it’s a data integrity decision:

Never force a single interpretation when the data has multiple valid contexts.

3. External Systems Don’t Like You (STAC WAF Blocks)

The next issue wasn’t in my system—it was in how it behaved externally.

Symptom

Requests to Copernicus STAC API started failing
Response: empty HTML or “Request Rejected”

Classic WAF (Web Application Firewall) behavior.

Root Cause

The system behaved like a bot:

Rapid consecutive requests
Predictable timing patterns

The Fix: Controlled Request Behavior

sleep(base_interval + random.uniform(jitter_min, jitter_max))

Implemented:

Minimum delay between requests
Random jitter (1–5 seconds)
Graceful handling of invalid responses

Result

The system stopped being “blocked” and started being tolerated.

This is a key distributed systems lesson:

You don’t just integrate with APIs—you adapt to their defensive behavior.

4. Architecture Maturity: Preparing for Redis Streams

The NDVI pipeline originally used Celery for async jobs.

Instead of rewriting everything, I introduced a routing abstraction:

if settings.NDVI_QUEUE_BACKEND == "celery":
    dispatch_celery_job(...)
elif settings.NDVI_QUEUE_BACKEND == "stream":
    raise NotImplementedError

Why This Is Important

This small change enables:

Seamless migration to Redis Streams
Zero disruption to current workflows
Incremental architecture evolution

Principle

The best migrations don’t start with new systems—they start with abstractions.

5. The Real Problem: Server Crashes Every 10 Hours

This was the most critical issue.

Symptoms

System crashes after ~10 hours
No single obvious failure point

Root Cause

Unbounded resource usage across multiple services:

MySQL buffer pool growing unchecked
PHP-FPM workers accumulating memory
Redis using unlimited memory
Celery workers never restarting
Monitoring stack filling disk

6. The Fix: System-Wide Resource Governance

Instead of patching one component, I enforced limits everywhere:

Database

innodb_buffer_pool_size = 512M
max_connections = 50

PHP-FPM

pm.max_children = 15
pm.max_requests = 500

Redis

mem_limit: 512mb
maxmemory-policy: allkeys-lru

Celery

worker_max_tasks_per_child = 100
worker_max_memory_per_child = 512000  # KB

Observability Stack

Prometheus: 7-day retention
Loki: compaction + retention limits

Result

The system moved from eventual failure → controlled lifecycle stability

7. Observability and Documentation (The Often Ignored Layer)

Alongside code changes:

Added alerts for system health
Documented all NDVI configuration settings
Created deployment + rollback guides
Expanded test coverage

Why This Matters

Without this layer:

Fixes don’t scale
Bugs repeat
Systems become tribal knowledge

8. Security Mistake (Worth Mentioning)

During the process, a sudo password was exposed in plain text.

No system is “production-ready” if secrets are handled casually.

Immediate Takeaways

Never expose credentials in logs or chats
Rotate immediately if leaked
Move toward:
- environment-based configs
- secret management systems

Final Thoughts

This wasn’t about fixing a bug.

It was about addressing three deeper problems:

Data correctness (NDVI visualization)
System behavior under external constraints (WAF)
Infrastructure stability (resource limits)

The Real Transformation

Before:

“The system works.”

After:

“The system is predictable, resilient, and ready for scale.”

Key Lessons

Correct output is not always truthful output
External APIs require behavioral adaptation
Systems fail through unbounded growth, not sudden errors
Architecture evolves through abstraction, not rewrites
Stability is a system-wide property, not a single fix

If you’re building data pipelines or distributed systems, this pattern will repeat.

The earlier you design for it—the less painful it becomes.

1 Comment

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

Ionuț H. Stan · Answer 1 · 2026-04-06T05:17:28+0000

Honestly the resource limits section is the most valuable part. So many systems just grow until they crash.

	What Is an Availability Zone Explained Simply Ijay - Feb 12
	Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI Masbadar - Mar 12
	Why most people quit AWS Ijay - Feb 3
	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapseverified - Apr 20

Welcome to Coder Legion

Connect with 4,319 amazing developers

Don't have an account? Sign up

OR

Building a Production-Ready NDVI Pipeline: Normalization Fixes, WAF Resilience, and System Stability

From “It Works” to Production: Fixing NDVI Rendering, Surviving WAF Blocks, and Stabilizing a Failing Server

1. The Illusion of Correctness: When “Green” Lies

Root Cause

2. The Fix: Dual Normalization Strategy

Histogram-Based Normalization (Default)

Fixed Normalization (Legacy)

Why This Matters

3. External Systems Don’t Like You (STAC WAF Blocks)

Symptom

Root Cause

The Fix: Controlled Request Behavior

Result

4. Architecture Maturity: Preparing for Redis Streams

Why This Is Important

Principle

5. The Real Problem: Server Crashes Every 10 Hours

Symptoms

Root Cause

6. The Fix: System-Wide Resource Governance

Database

PHP-FPM

Redis

Celery

Observability Stack

Result

7. Observability and Documentation (The Often Ignored Layer)

Why This Matters

8. Security Mistake (Worth Mentioning)

Immediate Takeaways

Final Thoughts

The Real Transformation

Key Lessons

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Rahim8050

Related Jobs

Commenters (This Week)