Terrform script not running correctly - error: Termination Reason: Client.InstanceInitiatedShutdown: Instance initiated shutdown

0

Hi, I have a terraform script (below) which causes instances to consistently stop before their status checks are complete. The log I get in the Scaling group activity group states for each instance:

'Launching a new EC2 instance: i-04d7eb490406c33dc. Status Reason: Instance became unhealthy while waiting for instance to be in InService state. Termination Reason: Client.InstanceInitiatedShutdown: Instance initiated shutdown

At 2024-01-12T11:41:03Z an instance was launched in response to an unhealthy instance needing to be replaced.'

Below is the script - when I spin up the AMI via the console it works fine:

provider "aws" { region = "eu-west-1" }

Create a VPC

resource "aws_vpc" "my_vpc" { cidr_block = "10.0.0.0/16" enable_dns_support = true enable_dns_hostnames = true }

Create Internet Gateway

resource "aws_internet_gateway" "gw" { vpc_id = aws_vpc.my_vpc.id }

Create four subnets

resource "aws_subnet" "public_subnet_1" { vpc_id = aws_vpc.my_vpc.id cidr_block = "10.0.1.0/24" availability_zone = "eu-west-1a" map_public_ip_on_launch = true }

resource "aws_subnet" "public_subnet_2" { vpc_id = aws_vpc.my_vpc.id cidr_block = "10.0.2.0/24" availability_zone = "eu-west-1b" map_public_ip_on_launch = true }

resource "aws_subnet" "private_subnet_1" { vpc_id = aws_vpc.my_vpc.id cidr_block = "10.0.3.0/24" availability_zone = "eu-west-1a" }

resource "aws_subnet" "private_subnet_2" { vpc_id = aws_vpc.my_vpc.id cidr_block = "10.0.4.0/24" availability_zone = "eu-west-1b" }

Create Route Table for Public Subnets

resource "aws_route_table" "public_rt" { vpc_id = aws_vpc.my_vpc.id route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.gw.id } }

Associate Route Table with Public Subnets

resource "aws_route_table_association" "public_rt_assoc1" { subnet_id = aws_subnet.public_subnet_1.id route_table_id = aws_route_table.public_rt.id }

resource "aws_route_table_association" "public_rt_assoc2" { subnet_id = aws_subnet.public_subnet_2.id route_table_id = aws_route_table.public_rt.id }

Create a Security Group for the EC2 instances

resource "aws_security_group" "ec2_sg" { vpc_id = aws_vpc.my_vpc.id egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } }

Create a Load Balancer

resource "aws_lb" "my_lb" { name = "my-lb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.ec2_sg.id] subnets = [aws_subnet.public_subnet_1.id, aws_subnet.public_subnet_2.id] enable_deletion_protection = false }

resource "aws_lb_listener" "front_end" { load_balancer_arn = aws_lb.my_lb.arn port = "80" protocol = "HTTP" default_action { type = "forward" target_group_arn = aws_lb_target_group.my_tg.arn } }

resource "aws_lb_target_group" "my_tg" { name = "my-tg" port = 80 protocol = "HTTP" vpc_id = aws_vpc.my_vpc.id health_check { enabled = true interval = 30 path = "/" port = "traffic-port" protocol = "HTTP" healthy_threshold = 3 unhealthy_threshold = 3 timeout = 5 matcher = "200-299" } }

Define Launch Template

resource "aws_launch_template" "my_lt" { name_prefix = "lt-" image_id = "ami-05d4867ed58f446d9" instance_type = "t2.micro"

block_device_mappings { device_name = "/dev/sda1" ebs { volume_size = 20 volume_type = "gp3" throughput = 125 delete_on_termination = true encrypted = false } }

network_interfaces { associate_public_ip_address = true security_groups = [aws_security_group.ec2_sg.id] } }

Create Auto Scaling Group

resource "aws_autoscaling_group" "my_asg" { vpc_zone_identifier = [aws_subnet.public_subnet_1.id, aws_subnet.public_subnet_2.id] max_size = 8 min_size = 2 desired_capacity = 2 launch_template { id = aws_launch_template.my_lt.id version = "$Latest" } target_group_arns = [aws_lb_target_group.my_tg.arn]

tag { key = "Name" value = "my-instance" propagate_at_launch = true }

health_check_type = "ELB" health_check_grace_period = 900 force_delete = true }

Auto Scaling Policy for scale up

resource "aws_autoscaling_policy" "scale_up" { name = "scale-up-on-high-cpu" scaling_adjustment = 1 adjustment_type = "ChangeInCapacity" cooldown = 300 autoscaling_group_name = aws_autoscaling_group.my_asg.name }

CloudWatch Metric Alarm for high CPU

resource "aws_cloudwatch_metric_alarm" "high_cpu_alarm" { alarm_name = "high-cpu-usage" comparison_operator = "GreaterThanThreshold" evaluation_periods = 2 metric_name = "CPUUtilization" namespace = "AWS/EC2" period = 300 statistic = "Average" threshold = 65 alarm_description = "This alarm monitors EC2 CPU usage and triggers scaling up." dimensions = { AutoScalingGroupName = aws_autoscaling_group.my_asg.name } alarm_actions = [aws_autoscaling_policy.scale_up.arn] }

CD
已提問 4 個月前檢視次數 454 次
1 個回答
1

It seems you are relying on the ELB to mark the instance as healthy. Because the ALB isnt making it healthy, its terminating the instance. The AMI you are using, does it have a working and valid HTTP server running on it listening on port 80? If not, it will never become healthy.

If you changed the check to EC2 then I think you wouldnt have an issue. health_check_type = "EC2"

As you create it manually in the GUI, you are not adding it to an ASG and relying on a ELB health check.

Be wary of making your security group so open. I recommended having a different SG for the ALB and the EC2s for more granular control.

profile picture
專家
已回答 4 個月前
profile picture
專家
已審閱 4 個月前
  • I changed the health check from ELB to EC2 but the same issue persists.

  • Try removing that line completed from your TF and reapply.

  • Also take a look at your ASG config in the console to make sure it was changed.

  • Was this resolved?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南