Service Operations Runbook
Deploy a New Version
Section titled “Deploy a New Version”Via GitHub Actions (Standard)
Push to main branch triggers automatic deployment.
Manual Deploy (Emergency)
# Update service with new imageaws ecs update-service \ --cluster autom8-cluster \ --service autom8y-{service} \ --force-new-deploymentRollback to Previous Version
Section titled “Rollback to Previous Version”# List task definition revisionsaws ecs list-task-definitions --family autom8y-{service}
# Update to previous revisionaws ecs update-service \ --cluster autom8-cluster \ --service autom8y-{service} \ --task-definition autom8y-{service}:{previous-revision}
# Wait for rollbackaws ecs wait services-stable --cluster autom8-cluster --services autom8y-{service}Scale Service
Section titled “Scale Service”Immediate (via CLI)
aws ecs update-service \ --cluster autom8-cluster \ --service autom8y-{service} \ --desired-count 3Permanent (via Terraform)
Update desired_count in your service module and apply.
View Logs
Section titled “View Logs”# Stream logsaws logs tail /ecs/autom8y-{service} --follow
# Last 100 linesaws logs tail /ecs/autom8y-{service} --since 1hCloudWatch Console: CloudWatch > Log groups > /ecs/autom8y-{service}
Check Service Health
Section titled “Check Service Health”# ECS service statusaws ecs describe-services \ --cluster autom8-cluster \ --services autom8y-{service} \ --query 'services[0].{Status:status,Running:runningCount,Desired:desiredCount}'
# Target group healthaws elbv2 describe-target-health \ --target-group-arn {target-group-arn}
# Recent eventsaws ecs describe-services \ --cluster autom8-cluster \ --services autom8y-{service} \ --query 'services[0].events[:5]'Restart Service
Section titled “Restart Service”Force new deployment without changing image:
aws ecs update-service \ --cluster autom8-cluster \ --service autom8y-{service} \ --force-new-deploymentDebugging Failed Deployments
Section titled “Debugging Failed Deployments”Task Fails to Start
Section titled “Task Fails to Start”# Check stopped task reasonaws ecs describe-tasks \ --cluster autom8-cluster \ --tasks $(aws ecs list-tasks --cluster autom8-cluster --service autom8y-{service} --desired-status STOPPED --query 'taskArns[0]' --output text)Common causes:
- Image not found: Check ECR repository and image tag
- Out of memory: Increase
memoryin module - Permission denied: Check task execution role
Health Check Failures
Section titled “Health Check Failures”- Verify health endpoint works locally
- Check container logs for startup errors
- Verify security group allows ALB traffic
ALB Target Registration Timeout
Section titled “ALB Target Registration Timeout”- Check ECS service events for errors
- Verify subnet has NAT gateway for ECR access
- Check task execution role permissions
AWS CLI Quick Reference
Section titled “AWS CLI Quick Reference”| Command | Description |
|---|---|
aws ecs list-services --cluster autom8-cluster | List all services |
aws ecs describe-services --cluster autom8-cluster --services {name} | Service details |
aws ecs list-tasks --cluster autom8-cluster --service {name} | Running tasks |
aws logs tail /ecs/{name} --follow | Stream logs |
aws ecs update-service --cluster autom8-cluster --service {name} --force-new-deployment | Restart |