Terraform AWS Best Practices for Enterprise Infrastructure
Infrastructure as Code (IaC) has revolutionized how we manage cloud resources, and Terraform has emerged as the leading tool for multi-cloud infrastructure management. After managing hundreds of Terraform deployments for enterprise clients, we've compiled the essential best practices that ensure scalable, maintainable, and secure infrastructure.
Why Terraform Best Practices Matter
Terraform's flexibility is both its greatest strength and potential weakness. Without proper structure and practices, Terraform codebases can become difficult to maintain, insecure, and error-prone. Following established best practices ensures:
- Maintainability - Code that's easy to understand and modify
- Security - Proper secret management and access controls
- Scalability - Architecture that grows with your organization
- Reliability - Consistent, repeatable deployments
Project Structure and Organization
1. Directory Structure
Organize your Terraform code with a clear, logical structure:
terraform/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── outputs.tf
│ ├── staging/
│ └── production/
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── README.md
│ ├── ec2/
│ └── rds/
├── shared/
│ ├── data.tf
│ └── providers.tf
└── scripts/
├── deploy.sh
└── validate.sh
2. Environment Separation
Keep environments completely separate to prevent accidental changes:
# environments/production/main.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state-prod"
key = "infrastructure/terraform.tfstate"
region = "us-west-2"
dynamodb_table = "terraform-locks-prod"
encrypt = true
}
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = "production"
Project = "myapp"
Owner = "platform-team"
Terraform = "true"
CostCenter = "engineering"
}
}
}
Module Development Best Practices
1. Write Reusable Modules
Create modules that can be used across multiple environments:
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.cidr_block
enable_dns_hostnames = var.enable_dns_hostnames
enable_dns_support = var.enable_dns_support
tags = merge(var.tags, {
Name = var.name
})
}
resource "aws_subnet" "private" {
count = length(var.private_subnets)
vpc_id = aws_vpc.main.id
cidr_block = var.private_subnets[count.index]
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = false
tags = merge(var.tags, {
Name = "${var.name}-private-${count.index + 1}"
Type = "private"
})
}
data "aws_availability_zones" "available" {
state = "available"
}
2. Comprehensive Variable Validation
Add validation rules to catch errors early:
# modules/vpc/variables.tf
variable "cidr_block" {
description = "CIDR block for the VPC"
type = string
validation {
condition = can(cidrhost(var.cidr_block, 0))
error_message = "The cidr_block must be a valid IPv4 CIDR block."
}
}
variable "environment" {
description = "Environment name"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be one of: dev, staging, production."
}
}
variable "instance_types" {
description = "Allowed EC2 instance types"
type = list(string)
default = ["t3.micro", "t3.small", "t3.medium"]
validation {
condition = alltrue([
for instance_type in var.instance_types :
can(regex("^[a-z][0-9][a-z]?\\.", instance_type))
])
error_message = "All instance types must be valid AWS instance type format."
}
}
3. Clear Outputs and Documentation
# modules/vpc/outputs.tf
output "vpc_id" {
description = "ID of the created VPC"
value = aws_vpc.main.id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}
output "vpc_cidr_block" {
description = "CIDR block of the VPC"
value = aws_vpc.main.cidr_block
}
State Management and Security
1. Remote State with Locking
Always use remote state with locking for team environments:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "environments/production/terraform.tfstate"
region = "us-west-2"
dynamodb_table = "terraform-locks"
encrypt = true
# Additional security
kms_key_id = "arn:aws:kms:us-west-2:123456789012:key/12345678-1234-1234-1234-123456789012"
}
}
2. State Bucket Security
Secure your state bucket with proper IAM policies:
# terraform-state-bucket.tf
resource "aws_s3_bucket" "terraform_state" {
bucket = "mycompany-terraform-state"
}
resource "aws_s3_bucket_encryption" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.terraform_state.arn
}
}
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
Security Best Practices
1. Secret Management
Never hardcode secrets in Terraform files:
# ❌ Bad: Hardcoded secrets
resource "aws_db_instance" "main" {
password = "super-secret-password" # DON'T DO THIS
}
# ✅ Good: Use AWS Secrets Manager
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "production/database/password"
}
resource "aws_db_instance" "main" {
password = data.aws_secretsmanager_secret_version.db_password.secret_string
}
# ✅ Alternative: Use random password with Secrets Manager
resource "random_password" "db_password" {
length = 32
special = true
}
resource "aws_secretsmanager_secret" "db_password" {
name = "production/database/password"
}
resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
secret_string = random_password.db_password.result
}
2. Least Privilege IAM
Apply principle of least privilege to IAM policies:
# Create specific IAM roles rather than using overly broad permissions
resource "aws_iam_role" "app_role" {
name = "${var.app_name}-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
}
resource "aws_iam_policy" "app_policy" {
name = "${var.app_name}-policy"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject"
]
Resource = "${aws_s3_bucket.app_bucket.arn}/*"
},
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue"
]
Resource = aws_secretsmanager_secret.app_secrets.arn
}
]
})
}
Performance and Reliability
1. Resource Dependencies
Use explicit dependencies when implicit ones aren't sufficient:
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.medium"
subnet_id = aws_subnet.private[0].id
# Explicit dependency to ensure security group is ready
depends_on = [aws_security_group.app]
vpc_security_group_ids = [aws_security_group.app.id]
user_data = base64encode(templatefile("${path.module}/user_data.sh", {
app_version = var.app_version
}))
}
2. Use Data Sources Effectively
Minimize API calls by using data sources efficiently:
# ✅ Good: Single data source call
data "aws_availability_zones" "available" {
state = "available"
}
locals {
az_count = length(data.aws_availability_zones.available.names)
}
# ❌ Bad: Multiple API calls
# data "aws_availability_zone" "az1" { name = "us-west-2a" }
# data "aws_availability_zone" "az2" { name = "us-west-2b" }
3. Error Handling and Validation
Implement proper error handling:
# Use lifecycle rules to prevent accidental deletion
resource "aws_s3_bucket" "critical_data" {
bucket = "mycompany-critical-data"
lifecycle {
prevent_destroy = true
}
}
# Use preconditions for validation
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = var.instance_type
lifecycle {
precondition {
condition = contains(["t3.micro", "t3.small", "t3.medium"], var.instance_type)
error_message = "Instance type must be one of the approved types."
}
}
}
CI/CD Integration
1. Automated Plan and Apply
Create a robust CI/CD pipeline:
# .github/workflows/terraform.yml
name: Terraform
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- name: Terraform Init
run: terraform init
working-directory: ./environments/production
- name: Terraform Validate
run: terraform validate
working-directory: ./environments/production
- name: Terraform Plan
run: terraform plan -no-color
working-directory: ./environments/production
env:
TF_VAR_aws_region: ${{ secrets.AWS_REGION }}
- name: Terraform Apply
if: github.ref == 'refs/heads/main'
run: terraform apply -auto-approve
working-directory: ./environments/production
2. Policy as Code
Implement security and compliance checks:
#!/bin/bash
# scripts/validate.sh
# Check for security best practices
echo "Running security checks..."
# Check for hardcoded secrets
if grep -r "password\s*=" . --include="*.tf" | grep -v "random_password"; then
echo "❌ Found potential hardcoded passwords"
exit 1
fi
# Check for public S3 buckets
if grep -r "acl.*public" . --include="*.tf"; then
echo "❌ Found public S3 bucket ACLs"
exit 1
fi
# Run terraform fmt check
if ! terraform fmt -check -recursive; then
echo "❌ Terraform files are not properly formatted"
exit 1
fi
# Run tflint for additional checks
if command -v tflint &> /dev/null; then
tflint
fi
echo "✅ All security checks passed"
Advanced Patterns
1. Workspace Strategy
Use workspaces for environment management:
# Create and switch to production workspace
terraform workspace new production
terraform workspace select production
# Apply with workspace-specific variables
terraform apply -var-file="production.tfvars"
2. Dynamic Configuration
Use locals and functions for dynamic resource creation:
locals {
# Create subnets based on AZ count
subnet_count = min(length(data.aws_availability_zones.available.names), 3)
# Generate CIDR blocks dynamically
private_subnets = [
for i in range(local.subnet_count) :
cidrsubnet(var.vpc_cidr, 8, i + 10)
]
public_subnets = [
for i in range(local.subnet_count) :
cidrsubnet(var.vpc_cidr, 8, i + 20)
]
# Common tags
common_tags = {
Environment = var.environment
Project = var.project_name
Owner = var.owner
Terraform = "true"
CreatedBy = "terraform"
}
}
3. Module Composition
Compose larger infrastructure from smaller modules:
# environments/production/main.tf
module "vpc" {
source = "../../modules/vpc"
name = "${var.project_name}-${var.environment}"
cidr_block = var.vpc_cidr
private_subnets = local.private_subnets
public_subnets = local.public_subnets
enable_nat_gateway = true
tags = local.common_tags
}
module "app_cluster" {
source = "../../modules/ecs-cluster"
cluster_name = "${var.project_name}-${var.environment}"
vpc_id = module.vpc.vpc_id
private_subnets = module.vpc.private_subnet_ids
tags = local.common_tags
}
module "database" {
source = "../../modules/rds"
identifier = "${var.project_name}-${var.environment}"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
allowed_cidrs = [module.vpc.vpc_cidr_block]
tags = local.common_tags
}
Monitoring and Maintenance
1. State Drift Detection
Regularly check for configuration drift:
#!/bin/bash
# scripts/drift-check.sh
echo "Checking for configuration drift..."
# Refresh state and compare
terraform refresh
terraform plan -detailed-exitcode
if [ $? -eq 2 ]; then
echo "❌ Configuration drift detected!"
echo "Run 'terraform plan' to see changes"
exit 1
else
echo "✅ No configuration drift detected"
fi
2. Cost Monitoring
Track infrastructure costs:
# Add cost allocation tags
resource "aws_instance" "app" {
# ... other configuration
tags = merge(local.common_tags, {
CostCenter = var.cost_center
Team = var.team
Service = var.service_name
})
}
Common Pitfalls to Avoid
- Large State Files - Break down large configurations into smaller, focused modules
- Circular Dependencies - Carefully plan resource dependencies
- Hardcoded Values - Use variables and data sources for flexibility
- Missing Error Handling - Implement proper lifecycle rules and validation
- Poor Naming Conventions - Use consistent, descriptive naming patterns
Conclusion
Following these Terraform best practices will help you build maintainable, secure, and scalable infrastructure. The key is to start with good foundations—proper project structure, security practices, and automation—and then build upon them as your infrastructure grows.
Remember that Infrastructure as Code is not just about automation; it's about bringing software engineering practices to infrastructure management. Treat your Terraform code with the same care you would give to any critical application code.
Need help implementing Terraform best practices in your organization? Our team specializes in Infrastructure as Code implementations and can help you build robust, scalable infrastructure. Contact us for a consultation.
Related Resources:
