If you've ever wondered how Kubernetes knows when to spin up more pods or give a pod more memory, that's autoscaling — and it's one of those concepts that sounds intimidating until you actually do it yourself. Day 17 of the #40DaysOfKubernetes challenge is where it clicked for me.
What is Autoscaling in Kubernetes?
At its core, autoscaling means Kubernetes adjusts resources automatically based on demand. You don't manually intervene every time traffic spikes. There are two main types:
- HPA (Horizontal Pod Autoscaler) — adds or removes pods based on CPU/memory usage
- VPA (Vertical Pod Autoscaler) — adjusts the resources (CPU/memory) of existing pods
Think of HPA as hiring more staff when the shop gets busy. VPA is more like giving one staff member more tools to handle the workload alone.
What I Did — Setting Up HPA
First I deployed the sample php-apache app with defined CPU requests and limits:
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache


The key part is setting resources.requests.cpu — HPA needs this to calculate utilization. Without it, the autoscaler has nothing to measure against.
Then I created the HPA object targeting 50% average CPU utilization, with a minimum of 1 pod and maximum of 10. This is the declarative method:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
While this is the imperative method:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10


Generating Load to Watch It Scale
This is the fun part. I ran a load generator in a separate pod — basically a loop hammering the apache service with requests:
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- \
/bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
Then watched the HPA respond in real time:
kubectl get hpa php-apache --watch

Watching the replica count climb from 1 to several pods as CPU utilization crossed 50% made the whole concept land in a way that reading documentation never does.
HPA vs VPA — When Do You Use Which?
| HPA | VPA |
| Scales | Number of pods | Pod resource limits |
| Best for | Stateless apps with variable traffic | Apps where sizing is hard to predict upfront |
| Works with | CPU, memory, custom metrics | CPU and memory |
In practice, most production workloads use HPA. VPA is useful during early deployment when you're still figuring out the right resource requests for an app.
Key Takeaway
Don't skip setting resources.requests in your deployment spec. HPA is blind without it. That one line is what connects your workload to the autoscaler.