Stop Paying $150/Month for NAT Gateways: Save 90% with This Terraform Trick

Stop Paying $150/Month for NAT Gateways: Save 90% with This Terraform Trick

posted Originally published at dev.to 7 min read

AWS NAT Gateway costs $45/month each plus data fees. Here's how to slash costs by 90% using fck-nat with Terraform—full HA setup included.

Quick question: Do you know how much your NAT Gateways cost?

Most teams don't realize they're spending $45/month per NAT Gateway plus $0.045/GB in data processing fees. A typical multi-AZ setup with 3 NAT Gateways processing 1TB/month costs:

3 NAT Gateways × $32.40/month     = $97.20
Data processing: 1,000 GB × $0.045 = $45.00
Total monthly cost:                  $142.20
Annual cost:                         $1,706.40

For what? Giving your private subnet instances internet access.

There's a better way. Let me show you how to get the same functionality for $15/month using Terraform.

Why NAT Gateways Are So Expensive

NAT Gateway pricing has two components:

  1. Hourly charge: $0.045/hour per gateway = $32.40/month
  2. Data processing: $0.045/GB processed

For a production multi-AZ setup (3 availability zones):

  • 3 NAT Gateways running 24/7: $97.20/month
  • Data processing (1TB): $45/month
  • Total: $142.20/month minimum

And that's before you process any serious traffic. Handle 5TB/month? Add another $225 in data fees.

The Solution: fck-nat

fck-nat is an open-source NAT solution that runs on a tiny EC2 instance. It does the exact same thing as NAT Gateway but costs ~90% less.

Cost comparison:

Solution Monthly Cost Annual Cost
3 NAT Gateways + 1TB data $142 $1,706
3 fck-nat instances (t4g.nano) $15 $180
Savings $127 $1,526

And there's no data processing fee. Zero. Nada.

️ Terraform Implementation

Basic Single-AZ Setup (Simplest)

Start simple with one NAT instance:

# modules/fck-nat/main.tf

data "aws_ami" "fck_nat" {
  most_recent = true
  owners      = ["568608671756"]  # fck-nat AMI owner

  filter {
    name   = "name"
    values = ["fck-nat-al2023-*"]
  }

  filter {
    name   = "architecture"
    values = ["arm64"]  # ARM is cheaper
  }
}

resource "aws_instance" "fck_nat" {
  ami           = data.aws_ami.fck_nat.id
  instance_type = "t4g.nano"  # $3/month, plenty of power
  subnet_id     = var.public_subnet_id

  source_dest_check = false  # Critical for NAT to work!

  tags = {
    Name = "fck-nat-instance"
  }
}

resource "aws_eip" "fck_nat" {
  domain   = "vpc"
  instance = aws_instance.fck_nat.id

  tags = {
    Name = "fck-nat-eip"
  }
}

# Route table for private subnets
resource "aws_route_table" "private" {
  vpc_id = var.vpc_id

  route {
    cidr_block           = "0.0.0.0/0"
    network_interface_id = aws_instance.fck_nat.primary_network_interface_id
  }

  tags = {
    Name = "private-route-table"
  }
}

resource "aws_route_table_association" "private" {
  for_each       = toset(var.private_subnet_ids)
  subnet_id      = each.value
  route_table_id = aws_route_table.private.id
}

Deploy it:

terraform apply
# Total cost: ~$5/month (t4g.nano + EIP)

High-Availability Multi-AZ Setup (Production-Ready)

For production, you want HA across multiple AZs:

# modules/fck-nat-ha/main.tf

variable "availability_zones" {
  description = "AZs to deploy NAT instances"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

variable "public_subnet_ids" {
  description = "Map of AZ to public subnet ID"
  type        = map(string)
}

variable "private_subnet_ids" {
  description = "Map of AZ to list of private subnet IDs"
  type        = map(list(string))
}

data "aws_ami" "fck_nat" {
  most_recent = true
  owners      = ["568608671756"]

  filter {
    name   = "name"
    values = ["fck-nat-al2023-*"]
  }

  filter {
    name   = "architecture"
    values = ["arm64"]
  }
}

# Security group for NAT instances
resource "aws_security_group" "fck_nat" {
  name_prefix = "fck-nat-"
  vpc_id      = var.vpc_id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = [var.vpc_cidr]  # Allow from VPC
  }

  tags = {
    Name = "fck-nat-sg"
  }
}

# One NAT instance per AZ
resource "aws_instance" "fck_nat" {
  for_each = toset(var.availability_zones)

  ami                    = data.aws_ami.fck_nat.id
  instance_type          = "t4g.nano"
  subnet_id              = var.public_subnet_ids[each.key]
  vpc_security_group_ids = [aws_security_group.fck_nat.id]

  source_dest_check = false

  tags = {
    Name = "fck-nat-${each.key}"
  }

  lifecycle {
    create_before_destroy = true
  }
}

# Elastic IPs for each NAT instance
resource "aws_eip" "fck_nat" {
  for_each = toset(var.availability_zones)

  domain   = "vpc"
  instance = aws_instance.fck_nat[each.key].id

  tags = {
    Name = "fck-nat-eip-${each.key}"
  }
}

# Route tables - one per AZ for fault isolation
resource "aws_route_table" "private" {
  for_each = toset(var.availability_zones)

  vpc_id = var.vpc_id

  route {
    cidr_block           = "0.0.0.0/0"
    network_interface_id = aws_instance.fck_nat[each.key].primary_network_interface_id
  }

  tags = {
    Name = "private-rt-${each.key}"
  }
}

# Associate private subnets with their AZ's route table
resource "aws_route_table_association" "private" {
  for_each = {
    for item in flatten([
      for az, subnets in var.private_subnet_ids : [
        for subnet in subnets : {
          az     = az
          subnet = subnet
        }
      ]
    ]) : "${item.az}-${item.subnet}" => item
  }

  subnet_id      = each.value.subnet
  route_table_id = aws_route_table.private[each.value.az].id
}

# Auto-recovery for failed instances
resource "aws_cloudwatch_metric_alarm" "auto_recover" {
  for_each = toset(var.availability_zones)

  alarm_name          = "fck-nat-auto-recover-${each.key}"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "StatusCheckFailed_System"
  namespace           = "AWS/EC2"
  period              = 60
  statistic           = "Average"
  threshold           = 0
  alarm_description   = "Auto-recover fck-nat instance if system check fails"

  alarm_actions = ["arn:aws:automate:${var.aws_region}:ec2:recover"]

  dimensions = {
    InstanceId = aws_instance.fck_nat[each.key].id
  }
}

output "nat_instance_ids" {
  value = { for az, instance in aws_instance.fck_nat : az => instance.id }
}

output "nat_public_ips" {
  value = { for az, eip in aws_eip.fck_nat : az => eip.public_ip }
}

Usage Example

# main.tf

module "fck_nat" {
  source = "./modules/fck-nat-ha"

  vpc_id             = aws_vpc.main.id
  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  aws_region         = "us-east-1"

  public_subnet_ids = {
    "us-east-1a" = aws_subnet.public_a.id
    "us-east-1b" = aws_subnet.public_b.id
    "us-east-1c" = aws_subnet.public_c.id
  }

  private_subnet_ids = {
    "us-east-1a" = [aws_subnet.private_a.id]
    "us-east-1b" = [aws_subnet.private_b.id]
    "us-east-1c" = [aws_subnet.private_c.id]
  }
}

output "nat_details" {
  value = {
    instance_ids = module.fck_nat.nat_instance_ids
    public_ips   = module.fck_nat.nat_public_ips
  }
}

Deploy it:

terraform init
terraform apply

# Cost: 3 × t4g.nano ($3/mo) + 3 × EIP ($0) = ~$15/month
# vs NAT Gateway: $142/month
# Savings: $127/month = $1,524/year 

Cost Breakdown Comparison

NAT Gateway (Traditional AWS Approach)

3 NAT Gateways:
  - 3 × $0.045/hour × 730 hours     = $97.20
  - Data processing: 1TB × $0.045   = $45.00
  - Total:                            $142.20/month

Annual cost: $1,706.40

fck-nat (Optimized Approach)

3 t4g.nano instances:
  - 3 × $0.0042/hour × 730 hours    = $9.20
  - 3 × EIP (in use)                = $0.00
  - Data processing                  = $0.00
  - Total:                            $9.20/month

Annual cost: $110.40
Savings: $1,596/year (93% reduction!)

⚡ Performance Considerations

Q: Can a t4g.nano handle my traffic?

A: Almost certainly yes. Here's the math:

  • t4g.nano baseline: 5% CPU, bursts to 100%
  • Network performance: Up to 5 Gbps
  • Typical NAT load: Very low CPU usage (mostly network I/O)

Real-world test: A single t4g.nano easily handles:

  • 100+ Mbps sustained throughput
  • 10,000+ concurrent connections
  • 1TB+/month traffic

If you need more, upgrade to t4g.micro ($6/month) for 10% baseline and better burst credits.

High Availability & Fault Tolerance

The HA setup includes:

Per-AZ NAT instances - Each AZ has its own NAT (like NAT Gateway)
Auto-recovery - CloudWatch alarms automatically recover failed instances
Fault isolation - Failure in one AZ doesn't affect others
Elastic IPs - Static IPs maintained across instance recovery

What happens if an instance fails?

  1. CloudWatch detects system status check failure (~2 minutes)
  2. EC2 auto-recovery launches replacement instance (~3-5 minutes)
  3. EIP automatically reattaches
  4. Total downtime: ~5-7 minutes (acceptable for most workloads)

For zero downtime, add Auto Scaling Groups:

# Optional: Zero-downtime with ASG (adds ~$3/month)

resource "aws_autoscaling_group" "fck_nat" {
  for_each = toset(var.availability_zones)

  name                = "fck-nat-asg-${each.key}"
  vpc_zone_identifier = [var.public_subnet_ids[each.key]]
  min_size            = 1
  max_size            = 1
  desired_capacity    = 1

  launch_template {
    id      = aws_launch_template.fck_nat[each.key].id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "fck-nat-${each.key}"
    propagate_at_launch = true
  }
}

⚠️ When NOT to Use fck-nat

There are a few scenarios where NAT Gateway might be worth the cost:

  1. Extreme traffic: >10 Gbps sustained throughput (use NAT Gateway or multiple larger instances)
  2. Compliance requirements: Some regulations explicitly require AWS-managed services
  3. Zero-tolerance for downtime: Sub-minute failover SLA (though ASG setup gets close)
  4. No time for management: You value convenience over $1,500/year in savings

For 95% of use cases, fck-nat is the smarter choice.

Migration Checklist

Switching from NAT Gateway to fck-nat:

Step 1: Deploy fck-nat alongside NAT Gateway

terraform apply -target=module.fck_nat

Step 2: Test with one private subnet

# Update one subnet's route table to point to fck-nat
# Test connectivity from instances in that subnet
curl -I https://api.github.com

Step 3: Migrate remaining subnets

# Update route tables one AZ at a time
terraform apply

Step 4: Remove NAT Gateways

# Comment out NAT Gateway resources
terraform destroy -target=aws_nat_gateway.main

Step 5: Celebrate savings

# Watch your AWS bill drop next month

Pro Tips

1. Use ARM instances (t4g family)
t4g.nano is 20% cheaper than t3.nano and performs better for NAT workloads.

2. Enable detailed monitoring ($2/month per instance)
Worth it for better auto-recovery detection:

resource "aws_instance" "fck_nat" {
  monitoring = true  # Detailed CloudWatch metrics
}

3. Tag your NAT instances
Makes cost tracking easier:

tags = {
  Name        = "fck-nat-${each.key}"
  Purpose     = "NAT"
  CostCenter  = "networking"
  Environment = "production"
}

4. Set up billing alerts
Get notified if traffic spikes unexpectedly:

resource "aws_cloudwatch_metric_alarm" "nat_traffic" {
  alarm_name          = "high-nat-traffic"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "NetworkOut"
  namespace           = "AWS/EC2"
  period              = 3600
  statistic           = "Sum"
  threshold           = 100000000000  # 100 GB/hour
  alarm_description   = "NAT instance processing >100GB/hour"
}

Quick Start

Want to try it right now?

# Clone the fck-nat Terraform module
git clone https://github.com/AndrewGuenther/fck-nat.git

# Or use the examples from this article
mkdir fck-nat-setup
cd fck-nat-setup

# Copy the HA setup code from above into main.tf
# Update variables with your VPC/subnet IDs

terraform init
terraform plan  # Review what will be created
terraform apply # Deploy it!

# Monitor your NAT instance
aws ec2 describe-instances \
  --filters "Name=tag:Name,Values=fck-nat-*" \
  --query 'Reservations[].Instances[].[InstanceId,State.Name,PublicIpAddress]' \
  --output table

Real-World Success Story

Before (NAT Gateway setup):

  • 3 NAT Gateways across us-east-1a/b/c
  • 2TB/month average traffic
  • Monthly cost: $97 (hourly) + $90 (data) = $187/month

After (fck-nat setup):

  • 3 t4g.nano instances
  • Same 2TB/month traffic
  • Monthly cost: $9/month

Annual savings: $2,136

Time to implement: 2 hours
ROI: Literally infinite (one-time 2-hour investment)

Summary

Factor NAT Gateway fck-nat Winner
Cost (3 AZs) $142/month $15/month fck-nat
Data fees $0.045/GB $0/GB fck-nat
Setup complexity Low Medium NAT Gateway
Performance Unlimited Up to 5 Gbps NAT Gateway*
Management Zero Minimal NAT Gateway
Annual savings - $1,524 fck-nat

*For most workloads, 5 Gbps is more than enough

Bottom line: Unless you have extreme requirements, fck-nat saves you $1,500+/year with minimal effort.

Stop overpaying for NAT. Your AWS bill will thank you.


Migrated from NAT Gateway to fck-nat? How much are you saving? Share in the comments!

Follow for more AWS cost optimization with Terraform! ⚡

1 Comment

0 votes

More Posts

What Is an Availability Zone Explained Simply

Ijay - Feb 12

Why most people quit AWS

Ijay - Feb 3

Implementing Cellular Redundancy: Cross-Cloud Failover with AWS Transit Gateway and Azure ExpressRou

Cláudio Raposo - May 5

Designing a Multicloud Cellular Architecture for Blast Radius Containment

Cláudio Raposo - May 4

3 Ways to Configure Resources in Terraform

Ijay - Apr 14
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

8 comments
2 comments

Contribute meaningful comments to climb the leaderboard and earn badges!