Issues with Spark history server and s3

0

I have set up and am using the Spark history server for AWS Glue as described here: https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-history.html However, this application is not able to see any of my completed logs, but it is able to see all of my incomplete logs. I tried to set up a Spark history server manually on an ec2 instance and ran into a similar issue. Looking at the logs for the history server, I am seeing a bunch of data like this, which I assume is what the CF-configured server is seeing (I've redacted the s3 bucket name but it is valid in the logs). Any ideas?

23/10/18 05:33:06 INFO FsHistoryProvider: Parsing s3a://*<redacted>*/sparkHistoryLogs/spark-application-1697461405003 for listing data... 23/10/18 05:33:06 INFO FsHistoryProvider: Looking for end event; skipping 83260370 bytes from s3a://*<redacted>*/sparkHistoryLogs/spark-application-1697461405003... 23/10/18 05:33:07 INFO FsHistoryProvider: Finished parsing s3a://*<redacted>*/sparkHistoryLogs/spark-application-1697461405003

Also I can confirm that the completed logs are valid - if I copy these to the local file system on the history server and point to that path, I can in fact see and use the completed logs in the Spark UI.

Alex T
asked 6 months ago63 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions