- Neueste
- Die meisten Stimmen
- Die meisten Kommentare
That is not a "life" SparkUI, so it doesn't have access to logs.
The stdout and stderr are stored on Cloudwatch under /aws-glue/jobs/output and /aws-glue/jobs/error respectively, the driver is named after the job run id and the executors have a longer name with the executor container starting from "_g-" (the tricky part is matching the container log to the executor id)
Hi
CloudWatch Logs: AWS Glue automatically sends continuous logs to CloudWatch every 5 seconds and before each executor termination . You can view these logs in the CloudWatch console. However, by default, CloudWatch only stores driver logs.
Spark UI: Enable the Spark UI for your AWS Glue job. This will allow you to view the Spark application logs, including the executor logs (stderr and stdout) . You can configure the Spark UI to generate logs in two ways:
- Legacy mode: This mode stores the logs on an S3 location.
- Spark History Server: This mode stores the logs in a dedicated Spark History Server.
AWS Documentation source: https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html#monitor-spark-ui-jobs-console
Relevanter Inhalt
- Wie behebe ich den Fehler „java.lang.OutOfMemoryError: Java heap space“ in einem AWS Glue-Spark-Job?AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor 3 Jahren
Thanks for your answer. After experimenting with the below code (Because the foreachPartition is working on executors.), I found the "print" result is on the log file "g-" of "/aws-glue/jobs/error". This is very hard to find out.
# Sample DataFrame creation data = [("Alice", 34), ("Bob", 45), ("Charlie", 25)] columns = ["name", "age"] df = spark.createDataFrame(data, columns)
# Custom function to print partition elements def print_partition(partition): # Iterate over rows in the partition and print for row in partition: print(row)
# Apply foreachPartition to print each partition df.foreachPartition(print_partition)