- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
Hello, The error seems to be related to Glue Job bookmark (find supported data formats in AWS Docs. https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html). Please try disable Job bookmark on the job, remove the transformation_ctx parameter from the nodes, and remove job.init() and job.commit() functions. These two functions initialize and commit the bookmark service as stated on the previous doc link shared.
I can confirm that disabling Job Bookmarks does not help.
Of course, with Glue Interactive Jupyter Notebooks, the only way to explicitly disable Job Bookmarks is by using magics. Below is my %%configure magic showing how I tried to disable Job Notebooks:
%%configure
{
"--datalake-formats": "iceberg",
"--job-bookmark-option": "job-bookmark-disable",
"--conf": "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.AwsDataCatalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.AwsDataCatalog.warehouse=s3://sandbox-mededatalake-transformed --conf spark.sql.catalog.AwsDataCatalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.AwsDataCatalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.AwsDataCatalog.glue.lakeformation-enabled=true --conf spark.sql.catalog.AwsDataCatalog.glue.id=***********"
}
I can tell Job Bookmarks is actually disabled for this job by going to the Job Details tab where it shows as below in the UI (greyed out):
Job bookmark
Specifies how AWS Glue processes job bookmark when the job runs. It can remember previously processed data (Enable), update state information (Pause), or ignore state information (Disable).
Disable (dropdown selection)
I have gotten rid of all references to the Job library. Literally, the only code in my notebook after my imports is below:
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
dyf = glueContext.create_dynamic_frame.from_catalog(database='<schema_name>', table_name='<table_name>')
dyf.printSchema()
And yet, I still get this useless error message:
Py4JJavaError: An error occurred while calling o96.getDynamicFrame.
: java.lang.Exception: Unsupported dataframe format for job bookmarks
at org.apache.spark.sql.wrapper.SparkSqlDecoratorDataSource.resolveRelation(SparkSqlDecoratorDataSource.scala:103)
at com.amazonaws.services.glue.SparkSQLDataSource.$anonfun$getDynamicFrame$24(DataSource.scala:799)
at com.amazonaws.services.glue.util.FileSchemeWrapper.$anonfun$executeWithQualifiedScheme$1(FileSchemeWrapper.scala:102)
at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWith(FileSchemeWrapper.scala:95)
at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWithQualifiedScheme(FileSchemeWrapper.scala:102)
at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:767)
at com.amazonaws.services.glue.DataSource.getDynamicFrame(DataSource.scala:104)
at com.amazonaws.services.glue.DataSource.getDynamicFrame$(DataSource.scala:104)
at com.amazonaws.services.glue.AbstractSparkSQLDataSource.getDynamicFrame(DataSource.scala:732)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
I don't think it is possible to read Iceberg tables from the Glue Catalog into Glue Dynamic Frames. I think AWS advertises this but it is not actually possible. It is a great disservice to customers to purport support for Iceberg while it is not actually there. It causes a huge waste of time in investigation and diagnosis. The error messages from AWS Services are 99% useless because they invariably hide the real problem. That means what should take just a few seconds to diagnose actually takes days in support requests which usually go nowhere because they are so exhaustingly repetitive and just go in circles. (AWS support engineers can't even look at what you're working on.) It would literally take one-thousandth of the time to diagnose such problems if AWS services didn't mask the real problems with their bad error messages.
Contenuto pertinente
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata un anno fa
- AWS UFFICIALEAggiornata 2 anni fa