You’ve probably heard that AWS Lambda provisioned concurrency eliminates cold starts. What they don’t tell you? It can turn your $50 Lambda bill into $500+ overnight if you’re not careful.
Here’s the thing: provisioned concurrency is incredibly powerful, but most developers either avoid it entirely (suffering cold starts) or implement it carelessly (suffering bill shock). There’s a sweet spot in between, and Terraform makes it possible to hit it consistently.
The Problem: Cold Starts vs. Cost Explosions
Lambda cold starts happen when AWS needs to initialize a new execution environment. For a typical Node.js function, this adds 200-500ms of latency. For Java or .NET? You’re looking at 2-5 seconds.
The traditional solution is provisioned concurrency - keeping execution environments warm and ready. But here’s the catch:
- On-demand pricing: ~$0.20 per 1M requests + compute time
- Provisioned concurrency: $0.0000041667 per GB-second (always running) + $0.20 per 1M requests
For a 512MB function with 10 provisioned instances running 24/7:
Monthly cost = 10 instances × 0.5 GB × 730 hours × 3600 seconds × $0.0000041667
= ~$546 per month
Just to keep functions warm. Before you’ve processed a single request.
The Strategy: Dynamic Provisioned Concurrency
The secret is treating provisioned concurrency like autoscaling for EC2 - scale it up when you need it, down when you don’t. Here’s how to do it right with Terraform.
Step 1: Calculate Your Break-Even Point
Not every function needs provisioned concurrency. Use this formula:
Break-even daily invocations = (Provisioned cost per day) / (Cold start cost savings per invocation)
Cold start cost = Cold start duration (s) × Memory (GB) × $0.0000166667
Example: 512MB function, 300ms cold start, 20% of requests are cold starts
Daily provisioned cost (1 instance) = 1 × 0.5 × 86400 × $0.0000041667 = $0.18
Cold start cost per invocation = 0.3 × 0.5 × $0.0000166667 = $0.0000025
Savings per invocation (20% cold) = $0.0000025 × 0.2 = $0.0000005
Break-even = $0.18 / $0.0000005 = 360,000 invocations/day
If you’re processing fewer than 360k requests per day, provisioned concurrency costs you money.
Here’s a production-ready Terraform configuration that scales provisioned concurrency based on actual demand:
Basic Lambda Function Setup
# main.tf
resource "aws_lambda_function" "api_handler" {
filename = "lambda_function.zip"
function_name = "api-handler"
role = aws_iam_role.lambda_role.arn
handler = "index.handler"
runtime = "nodejs20.x"
memory_size = 512
timeout = 30
environment {
variables = {
ENVIRONMENT = "production"
}
}
publish = true # Required for provisioned concurrency
}
# Create an alias for stable endpoint
resource "aws_lambda_alias" "live" {
name = "live"
description = "Live traffic alias"
function_name = aws_lambda_function.api_handler.function_name
function_version = aws_lambda_function.api_handler.version
}
Provisioned Concurrency with Auto Scaling
# provisioned_concurrency.tf
# Enable provisioned concurrency
resource "aws_lambda_provisioned_concurrency_config" "api_handler" {
function_name = aws_lambda_alias.live.function_name
qualifier = aws_lambda_alias.live.name
provisioned_concurrent_executions = 2 # Minimum instances
}
# Auto Scaling Target
resource "aws_appautoscaling_target" "lambda_target" {
max_capacity = 10 # Maximum instances
min_capacity = 2 # Minimum instances (must match provisioned concurrency)
resource_id = "function:${aws_lambda_function.api_handler.function_name}:${aws_lambda_alias.live.name}"
scalable_dimension = "lambda:function:ProvisionedConcurrencyUtilization"
service_namespace = "lambda"
depends_on = [aws_lambda_provisioned_concurrency_config.api_handler]
}
# Scale up when utilization > 70%
resource "aws_appautoscaling_policy" "lambda_scale_up" {
name = "lambda-scale-up"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.lambda_target.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
service_namespace = aws_appautoscaling_target.lambda_target.service_namespace
target_tracking_scaling_policy_configuration {
target_value = 70.0 # Target 70% utilization
predefined_metric_specification {
predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
}
scale_in_cooldown = 300 # Wait 5 min before scaling down
scale_out_cooldown = 60 # Wait 1 min before scaling up again
}
}
Scheduled Scaling for Predictable Workloads
If your traffic follows a pattern (business hours, weekend drops), use scheduled actions:
# scheduled_scaling.tf
# Scale up for business hours (Mon-Fri 8 AM - 6 PM EST)
resource "aws_appautoscaling_scheduled_action" "scale_up_business_hours" {
name = "scale-up-business-hours"
service_namespace = aws_appautoscaling_target.lambda_target.service_namespace
resource_id = aws_appautoscaling_target.lambda_target.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
schedule = "cron(0 8 ? * MON-FRI *)" # 8 AM EST weekdays
scalable_target_action {
min_capacity = 5
max_capacity = 15
}
}
# Scale down for nights and weekends
resource "aws_appautoscaling_scheduled_action" "scale_down_off_hours" {
name = "scale-down-off-hours"
service_namespace = aws_appautoscaling_target.lambda_target.service_namespace
resource_id = aws_appautoscaling_target.lambda_target.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
schedule = "cron(0 18 ? * MON-FRI *)" # 6 PM EST weekdays
scalable_target_action {
min_capacity = 1
max_capacity = 3
}
}
# Weekend minimal capacity
resource "aws_appautoscaling_scheduled_action" "scale_down_weekend" {
name = "scale-down-weekend"
service_namespace = aws_appautoscaling_target.lambda_target.service_namespace
resource_id = aws_appautoscaling_target.lambda_target.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
schedule = "cron(0 0 ? * SAT *)" # Midnight Saturday
scalable_target_action {
min_capacity = 1
max_capacity = 2
}
}
IAM Permissions
# iam.tf
resource "aws_iam_role" "lambda_role" {
name = "api-handler-lambda-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "lambda_basic" {
role = aws_iam_role.lambda_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}
# Required for Application Auto Scaling
resource "aws_iam_role" "autoscaling_role" {
name = "lambda-autoscaling-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "application-autoscaling.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "autoscaling_policy" {
role = aws_iam_role.autoscaling_role.name
policy_arn = "arn:aws:iam::aws:policy/aws-service-role/LambdaReplicatorPolicy"
}
Step 3: Monitor and Optimize
Create CloudWatch dashboards to track your investment:
# monitoring.tf
resource "aws_cloudwatch_metric_alarm" "high_cost_alert" {
alarm_name = "lambda-provisioned-concurrency-high-cost"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "ProvisionedConcurrencyUtilization"
namespace = "AWS/Lambda"
period = "300"
statistic = "Average"
threshold = "30" # Alert if avg utilization < 30%
alarm_description = "Provisioned concurrency may be over-provisioned"
treat_missing_data = "notBreaching"
dimensions = {
FunctionName = aws_lambda_function.api_handler.function_name
Resource = "${aws_lambda_function.api_handler.function_name}:${aws_lambda_alias.live.name}"
}
}
Real-World Cost Comparison
Let’s say you have an API with these characteristics:
- Traffic: 50k requests/day during business hours (10 hours), 5k requests overnight
- Function: 512MB, 200ms average execution, 400ms cold start
- Cold start rate: 20% without provisioned concurrency
Without Provisioned Concurrency
Compute cost: 55k × 0.2s × 0.5GB × $0.0000166667 = $0.92/day
Request cost: 55k × $0.0000002 = $0.01/day
Cold start penalty: 55k × 0.2 × 0.4s × 0.5GB × $0.0000166667 = $0.37/day
Total: ~$40/month
With Always-On Provisioned Concurrency (5 instances)
Provisioned cost: 5 × 0.5GB × 86400s × $0.0000041667 = $0.90/day
Compute cost: 55k × 0.2s × 0.5GB × $0.0000166667 = $0.92/day
Request cost: 55k × $0.0000002 = $0.01/day
Total: ~$56/month
With Scheduled Provisioned Concurrency (5 instances @ 10hrs, 1 instance @ 14hrs)
Provisioned cost: (5 × 0.5 × 36000) + (1 × 0.5 × 50400) × $0.0000041667 = $0.48/day
Compute cost: $0.92/day
Request cost: $0.01/day
Total: ~$43/month
Savings: $13/month vs always-on, while eliminating cold starts during peak hours. Scale this across 20 functions and you’re saving $3,120/year.
Best Practices Checklist
✅ Always use an alias - Provisioned concurrency requires versioning
✅ Start small - Begin with min_capacity = 1 or 2, monitor utilization
✅ Set cooldown periods - Prevent thrashing (scale_in_cooldown = 300s minimum)
✅ Match workload patterns - Use scheduled actions for predictable traffic
✅ Monitor utilization - Alert when average utilization < 40% (you’re wasting money)
✅ Test scaling behavior - Simulate load spikes in staging first
✅ Version carefully - Updating function code requires reprovisioning (brief downtime)
⚠️ Common Gotchas
1. The “Insufficient Data” Trap
When scaling down to zero, CloudWatch alarms can get stuck in “INSUFFICIENT_DATA” state. Solution: Keep min_capacity = 1 or use scheduled scaling to go to zero during known idle periods.
2. Deployment Delays
Updating a function with provisioned concurrency takes 2-3 minutes to reprovision. For CI/CD, either:
- Accept the delay
- Use blue/green deployments with two aliases
- Temporarily disable provisioned concurrency during deploys
3. Region Limits
Default provisioned concurrency limit is 100 instances per account per region. Request increases through AWS Support if needed.
Quick Start Commands
# Initialize Terraform
terraform init
# Preview changes
terraform plan
# Deploy
terraform apply
# Check provisioned concurrency status
aws lambda get-provisioned-concurrency-config \
--function-name api-handler \
--qualifier live
# Monitor scaling activity
aws application-autoscaling describe-scaling-activities \
--service-namespace lambda \
--resource-id "function:api-handler:live"
When to Use Provisioned Concurrency
Great for:
- Customer-facing APIs with strict latency SLAs (<100ms p99)
- Functions with >2s cold starts (Java, .NET, large dependencies)
- Predictable, high-volume traffic patterns
- Functions processing >100k requests/day
Skip it for:
- Internal tools and batch processing
- Low-traffic endpoints (<10k requests/day)
- Functions with fast cold starts (<200ms)
- Unpredictable, sporadic workloads
Complete Example Repository
Here’s a complete working example structure:
lambda-provisioned-concurrency/
├── main.tf
├── provisioned_concurrency.tf
├── scheduled_scaling.tf
├── iam.tf
├── monitoring.tf
├── variables.tf
├── outputs.tf
└── lambda_function/
├── index.js
└── package.json
variables.tf:
variable "function_name" {
description = "Lambda function name"
type = string
default = "api-handler"
}
variable "min_capacity" {
description = "Minimum provisioned concurrency instances"
type = number
default = 2
}
variable "max_capacity" {
description = "Maximum provisioned concurrency instances"
type = number
default = 10
}
variable "target_utilization" {
description = "Target utilization percentage for auto scaling"
type = number
default = 70
}
outputs.tf:
output "function_arn" {
value = aws_lambda_function.api_handler.arn
}
output "function_url" {
value = aws_lambda_alias.live.invoke_arn
}
output "estimated_monthly_cost" {
value = "Min: $${var.min_capacity * 0.5 * 730 * 3600 * 0.0000041667}, Max: $${var.max_capacity * 0.5 * 730 * 3600 * 0.0000041667}"
}
Final Thoughts
Provisioned concurrency isn’t an all-or-nothing decision. The key is using it strategically:
- Calculate your break-even point
- Implement dynamic scaling with Terraform
- Monitor utilization and costs
- Optimize based on real data
With this approach, you can eliminate cold starts during peak hours while avoiding the $500+ monthly surprise on functions that only need it 20% of the time.
Remember: The goal isn’t zero cold starts - it’s optimal cost for your latency requirements.
Have you implemented provisioned concurrency? What’s your biggest cost optimization win with Lambda? Share in the comments!
Found this helpful? Follow me for more AWS cost optimization tips with Terraform!