Creating a Delta table from Spark using the Glue Catalog

0

I am trying to create a Delta Table from spark sql using the Glue meta catalog. I can correctly query a Delta table using the Glue metastore:

%%sql
select * from `my_table` VERSION AS OF 1 limit 2 

The issues start when I try to create a new Delta Table in the same catalog.

%%sql
CREATE TABLE `my_table_v1` USING DELTA 
AS (select * from `my_table` VERSION AS OF 1 limit 2 )

##
# IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3://my_s3_bucketmy_table_v1

Considering that the Glue database has location s3://my_s3_bucket looks like there is some issue in calculating the full path where the table files must be written.

I then tried to specify the LOCATION of the table creating an EXTERNAL table:

%%sql
CREATE TABLE `my_table_v1` USING DELTA 
LOCATION 's3://my_s3_bucket/my_table_v1'
AS (select * from `my_table` VERSION AS OF 1 limit 2 )

##
# IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3://my_s3_bucketmy_table_v1-__PLACEHOLDER__

In this way the data files are correctly generated in s3://my_s3_bucket/my_table_v1 but it then fails registering the table in the Glue database.

Any Suggestion?

  • I am able to use Spark SQL (CREATE OR REPLACE TABLE USING DELTA AS SELECT * FROM TempVIEW)

    Setting database location to a appropriate s3 path will resolve issue. In glue catalog go to database and edit the database to have s3 path location. Use glue version 4.0 or latest.

    We don't need to put location in spark code.

profile picture
preguntada hace 10 meses1449 visualizaciones
2 Respuestas
1

That is not supported, even if you solve the path issues, that is not a valid Delta/symlink table, it will create a generic Hive table with an array field.
The OSS Delta community is working to rectify that, in the meanwhile you can register Delta tables on the catalog using a Glue crawler or the DeltaTable "generate" API , but not directly via SparkSQL

profile pictureAWS
EXPERTO
respondido hace 10 meses
profile picture
EXPERTO
revisado hace un mes
0

Thanks for confirming it. If it is not supported the first example in this doc page https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html should be corrected as it produces the exact same issue. It basically attempts to do the same thing using pyspark syntax.

Example: Write a Delta Lake table to Amazon S3 and register it to the AWS Glue Data Catalog The following AWS Glue ETL script demonstrates how to write a Delta Lake table to Amazon S3 and register the table to the AWS Glue Data Catalog.*

profile picture
respondido hace 10 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas