- Newest
- Most votes
- Most comments
That is not a "life" SparkUI, so it doesn't have access to logs.
The stdout and stderr are stored on Cloudwatch under /aws-glue/jobs/output and /aws-glue/jobs/error respectively, the driver is named after the job run id and the executors have a longer name with the executor container starting from "_g-" (the tricky part is matching the container log to the executor id)
Hi
CloudWatch Logs: AWS Glue automatically sends continuous logs to CloudWatch every 5 seconds and before each executor termination . You can view these logs in the CloudWatch console. However, by default, CloudWatch only stores driver logs.
Spark UI: Enable the Spark UI for your AWS Glue job. This will allow you to view the Spark application logs, including the executor logs (stderr and stdout) . You can configure the Spark UI to generate logs in two ways:
- Legacy mode: This mode stores the logs on an S3 location.
- Spark History Server: This mode stores the logs in a dedicated Spark History Server.
AWS Documentation source: https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html#monitor-spark-ui-jobs-console
Relevant content
- asked 3 months ago
- asked a year ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
Thanks for your answer. After experimenting with the below code (Because the foreachPartition is working on executors.), I found the "print" result is on the log file "g-" of "/aws-glue/jobs/error". This is very hard to find out.
# Sample DataFrame creation data = [("Alice", 34), ("Bob", 45), ("Charlie", 25)] columns = ["name", "age"] df = spark.createDataFrame(data, columns)
# Custom function to print partition elements def print_partition(partition): # Iterate over rows in the partition and print for row in partition: print(row)
# Apply foreachPartition to print each partition df.foreachPartition(print_partition)