Building a Production-Ready NDVI Pipeline: Normalization Fixes, WAF Resilience, and System Stability

Building a Production-Ready NDVI Pipeline: Normalization Fixes, WAF Resilience, and System Stability

Leader posted 3 min read

From “It Works” to Production: Fixing NDVI Rendering, Surviving WAF Blocks, and Stabilizing a Failing Server

Most backend systems don’t fail because of obvious bugs.
They fail quietly—through misleading data, unstable infrastructure, and assumptions that don’t hold under load.

This week, I pushed an NDVI (Normalized Difference Vegetation Index) pipeline through a full transition:

from “it renders images” → to a production-grade, resilient system

This article breaks down what actually went wrong—and how each fix improved the system holistically.


1. The Illusion of Correctness: When “Green” Lies

The first issue looked harmless:
NDVI raster PNGs were rendering—but everything was green.

Root Cause

The normalization formula:

normalized = (ndvi + 1.0) / 2.0

This assumes NDVI values span [-1, 1].

But real-world farm data was tighter:

NDVI range: 0.42 → 0.63

After normalization:

0.71 → 0.81 → Entirely “green zone”

So the system was technically correct—but visually misleading.


2. The Fix: Dual Normalization Strategy

Instead of replacing one approach with another, I introduced two modes:

Histogram-Based Normalization (Default)

normalized = (ndvi - ndvi_min) / (ndvi_max - ndvi_min)
  • Expands contrast per image
  • Reveals actual variation
  • Best for analysis

Fixed Normalization (Legacy)

normalized = (ndvi + 1.0) / 2.0
  • Keeps consistency across images
  • Useful for comparisons

Why This Matters

This isn’t just a rendering fix—it’s a data integrity decision:

Never force a single interpretation when the data has multiple valid contexts.


3. External Systems Don’t Like You (STAC WAF Blocks)

The next issue wasn’t in my system—it was in how it behaved externally.

Symptom

  • Requests to Copernicus STAC API started failing
  • Response: empty HTML or “Request Rejected”

Classic WAF (Web Application Firewall) behavior.


Root Cause

The system behaved like a bot:

  • Rapid consecutive requests
  • Predictable timing patterns

The Fix: Controlled Request Behavior

sleep(base_interval + random.uniform(jitter_min, jitter_max))

Implemented:

  • Minimum delay between requests
  • Random jitter (1–5 seconds)
  • Graceful handling of invalid responses

Result

The system stopped being “blocked” and started being tolerated.

This is a key distributed systems lesson:

You don’t just integrate with APIs—you adapt to their defensive behavior.


4. Architecture Maturity: Preparing for Redis Streams

The NDVI pipeline originally used Celery for async jobs.

Instead of rewriting everything, I introduced a routing abstraction:

if settings.NDVI_QUEUE_BACKEND == "celery":
    dispatch_celery_job(...)
elif settings.NDVI_QUEUE_BACKEND == "stream":
    raise NotImplementedError

Why This Is Important

This small change enables:

  • Seamless migration to Redis Streams
  • Zero disruption to current workflows
  • Incremental architecture evolution

Principle

The best migrations don’t start with new systems—they start with abstractions.


5. The Real Problem: Server Crashes Every 10 Hours

This was the most critical issue.

Symptoms

  • System crashes after ~10 hours
  • No single obvious failure point

Root Cause

Unbounded resource usage across multiple services:

  • MySQL buffer pool growing unchecked
  • PHP-FPM workers accumulating memory
  • Redis using unlimited memory
  • Celery workers never restarting
  • Monitoring stack filling disk

6. The Fix: System-Wide Resource Governance

Instead of patching one component, I enforced limits everywhere:

Database

innodb_buffer_pool_size = 512M
max_connections = 50

PHP-FPM

pm.max_children = 15
pm.max_requests = 500

Redis

mem_limit: 512mb
maxmemory-policy: allkeys-lru

Celery

worker_max_tasks_per_child = 100
worker_max_memory_per_child = 512000  # KB

Observability Stack

  • Prometheus: 7-day retention
  • Loki: compaction + retention limits

Result

The system moved from eventual failure → controlled lifecycle stability


7. Observability and Documentation (The Often Ignored Layer)

Alongside code changes:

  • Added alerts for system health
  • Documented all NDVI configuration settings
  • Created deployment + rollback guides
  • Expanded test coverage

Why This Matters

Without this layer:

  • Fixes don’t scale
  • Bugs repeat
  • Systems become tribal knowledge

8. Security Mistake (Worth Mentioning)

During the process, a sudo password was exposed in plain text.

No system is “production-ready” if secrets are handled casually.


Immediate Takeaways

  • Never expose credentials in logs or chats
  • Rotate immediately if leaked
  • Move toward:

    • environment-based configs
    • secret management systems

Final Thoughts

This wasn’t about fixing a bug.

It was about addressing three deeper problems:

  1. Data correctness (NDVI visualization)
  2. System behavior under external constraints (WAF)
  3. Infrastructure stability (resource limits)

The Real Transformation

Before:

“The system works.”

After:

“The system is predictable, resilient, and ready for scale.”


Key Lessons

  • Correct output is not always truthful output
  • External APIs require behavioral adaptation
  • Systems fail through unbounded growth, not sudden errors
  • Architecture evolves through abstraction, not rewrites
  • Stability is a system-wide property, not a single fix

If you’re building data pipelines or distributed systems, this pattern will repeat.

The earlier you design for it—the less painful it becomes.


1 Comment

0 votes

More Posts

What Is an Availability Zone Explained Simply

Ijay - Feb 12

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

Masbadar - Mar 12

Why most people quit AWS

Ijay - Feb 3

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

snapsynapseverified - Apr 20
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

2 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!