The Silent AWS Bill Killer: Stop CloudWatch Logs from Eating Your Budget

The Silent AWS Bill Killer: Stop CloudWatch Logs from Eating Your Budget

Leader posted Originally published at dev.to 6 min read

CloudWatch logs set to “Never Expire” can cost thousands. Here’s how to automate retention policies with Terraform and slash your logging costs by 90%.

Pop quiz: When was the last time you checked your CloudWatch Logs retention settings?

If you’re like 80% of AWS users, the answer is “never” — because the default retention is “Never Expire.”

Here’s what that means for your wallet:

Month 1:  $10 in logs
Month 6:  $60 in logs
Month 12: $120 in logs
Month 24: $240 in logs

Your logs are growing indefinitely. And you’re paying $0.03/GB per month for storage you probably never look at.

Let me show you how to fix this in 10 minutes with Terraform and save 80-90% on CloudWatch costs.

The Hidden Cost of “Never Expire”

CloudWatch Logs pricing is deceptively simple:

  • Ingestion: $0.50 per GB
  • Storage: $0.03 per GB per month
  • Analysis: $0.005 per GB scanned

A typical production app generates 10-50 GB of logs per month. Let’s say you’re at 20 GB/month:

Year 1 accumulation:

Month 1:  20 GB × $0.03 = $0.60
Month 2:  40 GB × $0.03 = $1.20
Month 3:  60 GB × $0.03 = $1.80
...
Month 12: 240 GB × $0.03 = $7.20
Total Year 1: $46.80 (storage alone)

Year 2:

Starting: 240 GB
Ending:   480 GB × $0.03 = $14.40/month
Total Year 2: $164.40

Year 3: $285.60
Year 4: $410.40

After 4 years, you’re paying $35/month just to store logs you’ll never read. Multiply this by 50 log groups and you’re at $1,750/month.

The Solution: Smart Retention Policies

The fix is ridiculously simple: Set retention policies based on log importance.

Here’s a sensible default strategy:

Log Type RetentionReasoning
Production errors 90 days Compliance & debugging
Application logs 30 days Recent troubleshooting
Access logs 14 days Security reviews
Debug/verbose logs7 days Active development only
Lambda logs 14 days Quick investigations

️ Terraform Implementation

Basic Retention Setup

# cloudwatch_logs.tf

# Production application logs
resource "aws_cloudwatch_log_group" "app_production" {
  name              = "/aws/application/production"
  retention_in_days = 30

  tags = {
    Environment = "production"
    Application = "web-app"
  }
}

# Lambda function logs
resource "aws_cloudwatch_log_group" "lambda_api" {
  name              = "/aws/lambda/api-handler"
  retention_in_days = 14

  tags = {
    Environment = "production"
    Function    = "api-handler"
  }
}

# Development logs (shorter retention)
resource "aws_cloudwatch_log_group" "app_dev" {
  name              = "/aws/application/dev"
  retention_in_days = 7

  tags = {
    Environment = "dev"
  }
}

Bulk Retention Manager Module

For existing log groups, here’s a module that sets retention across all groups:

# modules/cloudwatch-retention-manager/main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

variable "default_retention_days" {
  description = "Default retention in days for all log groups"
  type        = number
  default     = 30
}

variable "retention_rules" {
  description = "Map of log group patterns to retention days"
  type = map(object({
    pattern        = string
    retention_days = number
  }))
  default = {
    production = {
      pattern        = "/aws/*/production/*"
      retention_days = 90
    }
    lambda = {
      pattern        = "/aws/lambda/*"
      retention_days = 14
    }
    dev = {
      pattern        = "/aws/*/dev/*"
      retention_days = 7
    }
  }
}

variable "exclude_patterns" {
  description = "Log groups matching these patterns won't be modified"
  type        = list(string)
  default     = ["/aws/rds/*", "/aws/audit/*"]  # Keep RDS and audit logs longer
}

# Data source to get all log groups
data "aws_cloudwatch_log_groups" "all" {}

locals {
  # Filter log groups based on patterns and exclusions
  log_groups_to_manage = [
    for lg in data.aws_cloudwatch_log_groups.all.log_group_names :
    lg if !contains([for pattern in var.exclude_patterns : can(regex(pattern, lg))], true)
  ]

  # Map log groups to retention days based on rules
  retention_map = {
    for lg in local.log_groups_to_manage :
    lg => try(
      [for k, v in var.retention_rules : v.retention_days if can(regex(v.pattern, lg))][0],
      var.default_retention_days
    )
  }
}

# Apply retention policy to each log group
resource "aws_cloudwatch_log_group" "managed" {
  for_each = local.retention_map

  name              = each.key
  retention_in_days = each.value

  # Prevent recreation of existing log groups
  lifecycle {
    prevent_destroy = true
  }
}

# Output savings estimation
output "estimated_savings" {
  value = {
    log_groups_managed = length(local.retention_map)
    retention_policies = local.retention_map
    message = "Retention policies applied. Check AWS Cost Explorer in 30 days to see savings."
  }
}

Usage Example

# main.tf

module "cloudwatch_retention" {
  source = "./modules/cloudwatch-retention-manager"

  default_retention_days = 30

  retention_rules = {
    production_errors = {
      pattern        = "/aws/*/production/errors"
      retention_days = 90
    }
    production_app = {
      pattern        = "/aws/*/production"
      retention_days = 30
    }
    lambda = {
      pattern        = "/aws/lambda"
      retention_days = 14
    }
    dev = {
      pattern        = "/dev/"
      retention_days = 7
    }
    staging = {
      pattern        = "/staging/"
      retention_days = 14
    }
  }

  exclude_patterns = [
    "/aws/rds/instance/production-db/audit",  # Compliance requirement
    "/aws/cloudtrail"                          # Keep CloudTrail longer
  ]
}

output "retention_summary" {
  value = module.cloudwatch_retention.estimated_savings
}

Apply and Monitor

# Preview changes
terraform plan

# Apply retention policies
terraform apply

# Output example:
# log_groups_managed = 47
# Retention policies applied to 47 log groups

Find Your Biggest Offenders

Before applying retention policies, identify which log groups are costing you the most:

# List all log groups with their sizes
aws logs describe-log-groups \
  --query 'logGroups[?retentionInDays==`null`].[logGroupName,storedBytes]' \
  --output table

# Calculate monthly cost
aws logs describe-log-groups \
  --query 'logGroups[?retentionInDays==`null`].storedBytes' \
  --output json | jq '[.[] / 1073741824] | add * 0.03'

Add this as a Terraform data source:

# audit.tf

data "external" "cloudwatch_costs" {
  program = ["bash", "-c", <<-EOT
    aws logs describe-log-groups \
      --query 'logGroups[?retentionInDays==null]' \
      --output json | jq '{
        count: (. | length | tostring),
        total_gb: ([.[].storedBytes | select(. != null)] | add / 1073741824 | tostring),
        monthly_cost: ([.[].storedBytes | select(. != null)] | add / 1073741824 * 0.03 | tostring)
      }'
  EOT
  ]
}

output "current_cloudwatch_waste" {
  value = {
    log_groups_without_retention = data.external.cloudwatch_costs.result.count
    total_storage_gb            = data.external.cloudwatch_costs.result.total_gb
    estimated_monthly_cost      = "$${data.external.cloudwatch_costs.result.monthly_cost}"
  }
}

Advanced: Dynamic Retention Based on Environment

# dynamic_retention.tf

locals {
  environments = {
    production = 90
    staging    = 30
    dev        = 7
  }

  log_group_configs = {
    for env, retention in local.environments : env => {
      api_logs = {
        name      = "/aws/api/${env}"
        retention = retention
      }
      app_logs = {
        name      = "/aws/application/${env}"
        retention = retention
      }
      worker_logs = {
        name      = "/aws/worker/${env}"
        retention = retention
      }
    }
  }

  # Flatten into individual log groups
  all_log_groups = merge([
    for env, configs in local.log_group_configs : {
      for service, config in configs :
      "${env}-${service}" => config
    }
  ]...)
}

resource "aws_cloudwatch_log_group" "dynamic" {
  for_each = local.all_log_groups

  name              = each.value.name
  retention_in_days = each.value.retention

  tags = {
    ManagedBy   = "terraform"
    Environment = split("-", each.key)[0]
  }
}

Real Savings Example

Before retention policies:

  • 50 log groups
  • Average 5 GB per group after 1 year
  • Total: 250 GB × $0.03 = $7.50/month
  • After 3 years: 750 GB × $0.03 = $22.50/month

After implementing 30-day retention:

  • 50 log groups
  • Average 1.5 GB per group (30 days of data)
  • Total: 75 GB × $0.03 = $2.25/month
  • Savings: $5.25/month → $63/year
  • After 3 years: Still $2.25/month (savings of $242/year)

For a mid-size company with 200 log groups:

  • Savings: ~$1,000/year

⚠️ Important Considerations

1. Compliance Requirements

Some logs must be kept for regulatory reasons:

# compliance.tf

resource "aws_cloudwatch_log_group" "audit_logs" {
  name              = "/aws/audit/production"
  retention_in_days = 2555  # 7 years for SOX/HIPAA compliance

  tags = {
    Compliance = "required"
    Retention  = "7-years"
  }
}

2. Lambda Log Groups Auto-Creation

Lambda creates log groups automatically. Prevent this:

resource "aws_lambda_function" "api" {
  # ... other config ...
  
  # Create log group BEFORE the Lambda function
  depends_on = [aws_cloudwatch_log_group.lambda_api]
}

resource "aws_cloudwatch_log_group" "lambda_api" {
  name              = "/aws/lambda/${var.function_name}"
  retention_in_days = 14

  # Create this FIRST so Lambda doesn't create it without retention
}

3. Existing Data Isn’t Deleted Immediately

Setting retention doesn’t delete old data immediately. AWS cleans up expired logs eventually (within days).

Quick Implementation Checklist

Audit current log groups - Find groups without retention
Categorize by importance - Production vs dev vs debug
Set retention policies - 7/14/30/90 days based on category
Handle Lambda logs - Create log groups before functions
Document compliance needs - Don’t auto-expire audit logs
Monitor savings - Check Cost Explorer after 30 days

5-Minute Quick Start

# 1. Check your current waste
terraform init
terraform apply -target=data.external.cloudwatch_costs

# 2. Apply retention module
terraform apply

# 3. Verify in AWS Console
aws logs describe-log-groups \
  --query 'logGroups[*].[logGroupName,retentionInDays]' \
  --output table

# 4. Celebrate! 

Pro Tips

1. Start with dev/staging
Apply aggressive retention (7 days) to non-production first. Production can stay at 30-90 days.

2. Use log exports for long-term storage
If you need logs beyond retention period, export to S3 (much cheaper):

resource "aws_cloudwatch_log_subscription_filter" "export_to_s3" {
  name            = "export-old-logs"
  log_group_name  = aws_cloudwatch_log_group.app_production.name
  filter_pattern  = ""
  destination_arn = aws_kinesis_firehose_delivery_stream.logs_to_s3.arn
}

S3 storage: $0.023/GB vs CloudWatch: $0.03/GB (23% cheaper + Glacier options)

3. Set up alerts for high ingestion
Catch runaway logging before it costs you:

resource "aws_cloudwatch_metric_alarm" "high_log_ingestion" {
  alarm_name          = "high-cloudwatch-ingestion"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "IncomingBytes"
  namespace           = "AWS/Logs"
  period              = 3600
  statistic           = "Sum"
  threshold           = 10737418240  # 10 GB per hour
  alarm_description   = "Alert when log ingestion exceeds 10GB/hour"
}

When This Makes the Biggest Impact

This optimization shines when you have:

  • Many Lambda functions (each creates a log group)
  • Multiple environments (dev/staging/prod all logging)
  • Verbose application logging (debug logs in production )
  • Long-running workloads (logs accumulating for years)
  • Microservices architecture (100+ services = 100+ log groups)

Summary: Why This Matters

CloudWatch Logs retention is one of those “set it and forget it” optimizations:

One-time setup - 10 minutes with Terraform
Automatic savings - Every month, forever
Zero operational impact - Logs you need are kept, old ones purged
Scales with your infrastructure - More log groups = more savings
Compound benefits - Savings grow over time as log accumulation stops

The math is simple: Stop paying to store logs you’ll never read.

Set retention policies today, thank yourself every month.


How much are you spending on CloudWatch Logs? Run the audit script and share in the comments!

Follow for more AWS cost optimization tips with Terraform!

More Posts

Why most people quit AWS

Ijay - Feb 3

What Is an Availability Zone Explained Simply

Ijay - Feb 12

10 Proven Ways to Cut Your AWS Bill

rogo032 - Jan 16

How to Reduce Your AWS Bill by 50%

rogo032 - Jan 27

AWS Account Locked! How One IAM Mistake Cost Me

Ijay - Mar 18
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

4 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!