2 Answers
- Newest
- Most votes
- Most comments
3
Hello,
Can you share if any issue reported after running the below sample wordcount step which upload the output to s3 bucket.
- Please check if the spark application created successfully,
- Check if the service role/execution role has permission to write/read data from s3.
- Please share the steps you followed to produce the outcome. (P.S., Please make sure to not include any sensitive information like clusterid, bucketname or anything specific to your account).
0
Hello, the problem is that your cluster has only one node, for spark.read.csv() to work your cluster must have at least 2 nodes with 1 master.
answered 8 days ago
Relevant content
- asked 3 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
Hello, thank you for your response. I tried loading the sample wordcount notebook you suggested but it was unable to complete the first step of loading the dataframe. I was using m4xlarge for my cluster, and figured I would try scaling up to m5xlarge just to see if that would fix anything and it appears that it did. Not sure why spark was not working properly on the m4xlarge, it seemed to hang on any spark job step and would not complete even given a half hour of time for a simple task like creating that small dataframe.
Thanks for confirming that it worked in higher instance type. There could be two things I can relate. 1. The spark job might not used full resources available on the cluster. 2. The spark job needs more resources which might not be suitable with lower version. In order to confirm that, please leverage cloudwatch metrics[1] to see if they are fully utilised and also analyze the instance-state logs which gives an idea of individual instance resource consumption at the time of execution.[2]
References:
[1] - https://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html
[2] - https://repost.aws/articles/AR77wVn54aSQSjLzJGTQsKEQ/decoding-instance-state-log-in-emr