- Newest
- Most votes
- Most comments
Hi, Stevie.
The error message you posted, MasterServerWaitCondition Received FAILURE signal with UniqueId INSTANCE_ID, means there was an error when one of the nodes in the cluster was running its configuration tasks after booting. Usually this is due to a failure on the head node of the cluster.
You can start debugging the issue by first passing the '-nr/--norollbackflag to
pcluster create` and then, if cluster creation fails again, you can log into the head node and look for the cause of the failure in /var/log/cfn-init.log.
This wiki page contains tips for debugging common cluster creation failures: https://github.com/aws/aws-parallelcluster/wiki/Stack-Creation-Failures
Let us know what you find in the logs, and we'll try and provide more specific guidance based on that.
~Tim
Thank you. I had to move on to another project for a bit. I will take into consideration the suggested remedies. One thing that I noticed after trying to delete the failed cluster was that although it appeared to have deleted the cluster - I had to go in and manually remove components (VPC, NAT gateway, etc). In my case, it seemed that the failed create impaired the ability of pcluster delete to back out all supporting components.
Hi Stevie,
How did you created the VPC, NAT gateway and the network components?
You can use your own default resources, create them by hand or create them through the pcluster configure command.
In any case the pcluster create command expects these resources are already created so the delete command doesn't delete them.
It is by design because the same VPC could be used for multiple clusters, and there is a limit on the number of VPC, Elastic IPs, etc the users can create, so we decided to decouple VPC and network resources from the clusters ones, it is not related to the issue with your previous failed creation.
Let us know if it helps.
Relevant content
- asked 2 years ago
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated 3 months ago
- How do I install and troubleshoot Python libraries in Amazon EMR and Amazon EMR Serverless clusters?AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 8 months ago