I am trying to set up AWS Systems Manager for our private Linux instance so that we can patch/update without any internet connection.
I've read the AWS Systems Manager Documentation and have set it up accordingly. For example:
-I've created an IAM role with the following permissions "AmazonSSMManagedInstanceCore", "AmazonEC2RoleforSSM", "AmazonEC2FullAccess," and "AdministratorAccess". I've also edited the trust relationships to this :
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ssm.amazonaws.com",
"backup.amazonaws.com",
"ec2.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
-I've created a security group with an inbound rule for HTTPS port 443 with our VPC CIDR ranger.
-I've created the three endpoints: ssm, ssmmessages, and ec2messages, and they are all in the same region with our private instance on two private subnets, i.e., private subnet 1a and 1b.
-The security group and the instance profile are attached to the private instance.
-In addition, we have two other security groups that have been attached to that private Linux instance. One is used for the SysAdmins for SSH access, and the other is used to allow users to access to our instance from various departments on our organization.
- Just to let you know, I've created a private instance on my own personal AWS account and it's working fine with the Systems Manager. I can patch/update and even connect to the terminal.
-However, I've been having issues when I try to replicate the same SSM configurations for our production AWS account.
-Upon using the command "ssm-cli get-diagnostics --output table", I found the following error:
Connectivity to ssm endpoint Failed ssm.us-east-1.amazonaws.com is not reachable: dial tcp [ 0 ] i/o timeout
Connectivity to ec2messages endpoint Failed and Connectivity to ssmmessages endpoint Failed.
For security reason I've to put 0 for the TCP IP.
Upon interpreting the error code from this documentation: https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-cli.html
It says the following:
Indicates whether the node is able to reach the service endpoints for Systems Manager on TCP port 443. A failed test indicates connectivity issues to https://ssm.region.amazonaws.com depending on the AWS Region where the node is located. Connectivity issues can be caused by the VPC configuration including security groups, network access control lists, route tables, or OS firewalls and proxies.
If someone could suggest or point me to a direction to resolve this that would be much appreciated since I've hit a dead wall.
Thank you all in advance!!
Hi Riku!
For the following questions of yours, here are the answers:
First question: Yes, the private DNS is enabled for the VPC endpoint.
Second questions: Yes, the VPC DNS settings are enabled.
Third question: I need to check with Reachability Analyzer. However, before doing so, what do you think about checking with AWSSupport-TroubleshootSessionManager (https://repost.aws/articles/AR8nK-hk2aQoef9tTTE8CurQ/support-automation-workflow-saw-runbook-troubleshoot-aws-systems-manager-session-manager) ?
Thank you for your prompt response!
It should be fine to troubleshoot using "AWSSupport-TroubleshootSessionManager" but since we already know that a timeout error is occurring, there's a possibility that it might not lead to a resolution.
Hi again Riku,
Thank you again for your help. If I run Reachability Analyzer, will any of my service (EC2 instances, network, etc.) would be affected/interrupted that are running in my organization's VPC?
Thanks in advance!
Just running Reachability Analyzer will not change settings such as security groups, so other systems will not be affected.
Thank you Riku! I've finally found out where was the problem! I appreciate your help!! Once again , thanks so much!!!