When I use an AWS CloudFormation template or Docker locally, I can't see the Apache Spark UI for AWS Glue ETL jobs.
Resolution
Based on how you access the Spark UI, take the following resolution steps.
CloudFormation stack
When you use an CloudFormation stack to view the Spark UI, an Amazon Elastic Compute Cloud (Amazon EC2) instance makes an HTTPS request. This request confirms whether the Spark UI works. If the request fails, then you get the following error:
"WaitCondition timed out. Received 0 conditions when expecting 1,"
After this error occurs, CloudFormation rolls back the stack.
To resolve this issue, take the following actions:
- Confirm that the subnet can reach the Amazon Simple Storage Service (Amazon S3) API endpoint. For example, if you use a private subnet, then confirm that the subnet has a virtual private cloud (VPC) endpoint or a NAT gateway.
- Confirm that you can access the subnet through the Spark history server port. For example, a firewall might block the port and cause the preceding error.
- Confirm that you entered a valid Amazon S3 path for the event log directory. You must use s3a:// for the event logs path scheme.
Note: If there are event log files in the Amazon S3 path that you specified, then the path is valid.
If you still get an error, then check the following log groups in Amazon CloudWatch Logs for additional details:
- /aws-glue/sparkui_cfn/cfn-init.log
- /aws-glue/sparkui_cfn/spark_history_server.log
Note: CloudFormation terminates the history server EC2 instance when the stack rolls back. To not terminate the instance, activate termination protection for the stack.
Docker
If you use Docker to view the Spark UI and can't connect to the Spark history server from your browser, then check the following configurations:
-
Confirm that the access key and secret key AWS credentials are valid. To use temporary credentials, you must use spark.hadoop.fs.s3a.session.token in the command. Example command:
docker run -itd -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS \
-Dspark.history.fs.logDirectory=s3a://path_to_eventlog \
-Dspark.hadoop.fs.s3a.access.key=AWS_ACCESS_KEY_ID
-Dspark.hadoop.fs.s3a.secret.key=AWS_SECRET_ACCESS_KEY \
-Dspark.hadoop.fs.s3a.session.token=SESSION_TOKEN \
-Dspark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" \
-p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"
Note: Replace AWS_ACCESS_KEY_ID with your key ID, AWS_SECRET_ACCESS_KEY with your secret access key, and SESSION_TOKEN with your session token.
-
Confirm that you entered a valid Amazon S3 path for the event log directory. You must use s3a:// for the event logs path scheme.
Note: If there are event log files in the Amazon S3 path that you specified, then the path is valid.
-
Confirm that you entered the correct port number in the browser. To change the port number, change the -p parameter in the docker run command and the spark.history.ui.port parameter in the Dockerfile.
Note: By default, the port number is 18080. Example port: http://localhost:18080.
If you still can't view the Spark UI, then review your logs for more details. To get the stdout and stderr logs for the Docker container, run docker run with the -it parameter instead of the -itd parameter. Example command:
docker run -it -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS \
-Dspark.history.fs.logDirectory=s3a://path_to_eventlog \
-Dspark.hadoop.fs.s3a.access.key=AWS_ACCESS_KEY_ID
-Dspark.hadoop.fs.s3a.secret.key=AWS_SECRET_ACCESS_KEY \
-Dspark.hadoop.fs.s3a.session.token=SESSION_TOKEN \
-Dspark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" \
-p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"
Note: Replace AWS_ACCESS_KEY_ID with your key ID, AWS_SECRET_ACCESS_KEY with your secret access key, and SESSION_TOKEN with your session token.
Related information
Monitoring jobs using the Apache Spark web UI
Activating the Apache Spark web UI for AWS Glue jobs