Docker vs Kubernetes for Kafka + ELK + MDM
Docker assumptions
[Docker Host: localhost]
|
|-- Kafka (ports: 9092:9092) -> localhost access OK
|-- Zookeeper (ports: 2181:2181)
|-- Connect (depends_on: Kafka, Zookeeper)
|-- Elasticsearch (ports: 9200:9200)
|-- Kibana (ports: 5601:5601)
Assumptions Docker makes here:
Single machine → IPs are predictable
localhost can reach all services via host ports
Startup order partially controlled by depends_on
No strict health checks → app assumes services are ready
Volumes persist across container restarts, host storage available
latest images work fine
Ports do not collide (developer responsibility)
CI/CD not enforced → dev environment is “safe”
What can break in CI/CD / Kubernetes:
localhost:9092 in Connect fails → Kafka unreachable
Startup race → pipelines try to read before Kafka is ready
Host port collisions if multiple builds run simultaneously
Volumes may not exist or behave differently → Bronze/Silver/Gold data lost
Kubernetes assumptions
[Cluster: multi-node]
|
|-- Pod: Kafka
| - ephemeral IP
| - service: kafka.svc.cluster.local:9092
| - StatefulSet for persistence
|
|-- Pod: Zookeeper
| - StatefulSet
|
|-- Pod: Connect
| - Deployment / replica
| - readinessProbe: wait for Kafka
|
|-- Pod: Elasticsearch
| - StatefulSet
| - PVC for data
|
|-- Pod: Kibana
| - Deployment
|
|-- MDM ETL Jobs
| - run as Jobs / CronJobs
| - read Kafka via service discovery
Kubernetes assumptions enforced here:
Pods ephemeral → localhost cannot reach other services
IPs change → must use service DNS
Startup order not guaranteed → readiness/liveness probes required
Pods can fail → apps must retry / be idempotent
Volumes are externalized via PVC → pods assume statelessness
No shared host ports → dynamic service ports or ClusterIP used
Scaling / replicas expected → apps must handle multiple consumers
Secrets/configs injected → apps cannot rely on local .env
Potential failures if Docker assumptions remain:
Connect tries localhost:9092 → fails
ETL Jobs start before Kafka is ready → missing data
MDM pipeline loses state if volume not properly configured → corrupted Bronze/Silver/Gold
Kibana cannot reach Elasticsearch → wrong host/IP
Scaling Kafka / Connect fails → race conditions
Summary Table: Where things break
╔═════════════════════════════╗
║ LAYER ║
╠═════════════════════════════╣
║ Networking ║
║ Docker: localhost works ║
║ Kubernetes: localhost isolated; DNS required ║
║ Break: Kafka Connect cannot reach Kafka ║
╚═════════════════════════════╝
╔═════════════════════════════╗
║ Startup order ║
╠═════════════════════════════╣
║ Docker: depends_on works ║
║ Kubernetes: startup unordered ║
║ Break: ETL jobs fail on unavailable source ║
╚═════════════════════════════╝
╔═════════════════════════════╗
║ State ║
╠═════════════════════════════╣
║ Docker: local volumes ║
║ Kubernetes: pods ephemeral; use PVs ║
║ Break: Bronze/Silver/Gold lost on restart ║
╚═════════════════════════════╝
╔═════════════════════════════╗
║ Port binding ║
╠═════════════════════════════╣
║ Docker: host ports available ║
║ Kubernetes: dynamic ports; cluster network ║
║ Break: port collision or misrouting ║
╚═════════════════════════════╝
╔═════════════════════════════╗
║ Scaling ║
╠═════════════════════════════╣
║ Docker: manual / optional ║
║ Kubernetes: replicas normal ║
║ Break: consumer duplication / missed msgs ║
╚═════════════════════════════╝
╔═════════════════════════════╗
║ Config ║
╠═════════════════════════════╣
║ Docker: .env files ║
║ Kubernetes: ConfigMaps / Secrets ║
║ Break: jobs fail; app misconfigured ║
╚═════════════════════════════╝
╔═════════════════════════════╗
║ Health ║
╠═════════════════════════════╣
║ Docker: optional ║
║ Kubernetes: readiness/liveness required ║
║ Break: app starts too early → errors ║
╚═════════════════════════════╝
Mental picture
Docker Compose (dev)
Single host
localhost:9092 → Kafka
depends_on → partial order
Volumes → local paths
Ports → host mapped
App assumes success
Key insight
Kubernetes (prod)
Multi-node cluster
kafka.svc.cluster.local:9092
readiness_Probe + retries
Persistent_Volumes / Stateful_Sets
Ports abstracted via Services
App assumes failure / retries
Docker Compose hides bad assumptions: “works on my laptop”
Kubernetes exposes them: “does your pipeline really work anywhere?”
Warning:
In an enterprise MDM pipeline, ignoring these differences = data loss, compliance violations, or audit failures.