Spark UI logs are imcomplete

0

I have enabled Spark UI in AWS Glue as described in the documentation. Using Docker, I can read the different Spark executions.

However, I see 2 unexpected behaviours:

  • All jobs appear in "incomplete applications".
  • When a job is finished, I cannot see all the Spark task as finished but it seems that the info showed is a snapshot of some seconds before.

How can I fix this?

posta 8 mesi fa325 visualizzazioni
2 Risposte
1

The job doesn't wait for logs to be published, for short jobs of a few minutes (not sure why), it's common for the final log is not published.
What you can do is at the end of the job, add a bit of sleep (e.g. 30 secs) so at least it has the change the update the inprogress log with the job completion.

profile pictureAWS
ESPERTO
con risposta 8 mesi fa
  • It is not what I expected but I will try that. Thanks!

  • I tried the solution but it works partially. It is true that sometimes you must.

    However, the real problem is that Glue generates 2 files one when is running ("spark-application-1695890646202.inprogress") and another when it has finished ("spark-application-1695890646202"). Then, when you launch SparkUI, it detects both files and only show the incomplete one. In order to fix this, Glue must delete the "inprogress" file before writing the final one.

  • Yes, that's normal. On a local environment Spark would delete the "inprogress" one after creating the final one, but in this case the logs are asynchronously uploaded to s3. On s3 that's normally not an issue because the final one is listed before the inprogress but yes in some case you might have to restart the history server and even delete the inprogress file manually

0

Some times you might get just the complete spark_ui logs instead of incomplete once, Right now there are many conditions in which history files get left around with the ".inprogress" extension. The cleaner doesn't remove these because it can't distinguish between something running and left over abandoned files. This is an unknown, limitation with spark .

As a workaround you can write your script, to remove unnecessary files at once.

profile picture
con risposta 8 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande