Creating a Delta table from Spark using the Glue Catalog

0

I am trying to create a Delta Table from spark sql using the Glue meta catalog. I can correctly query a Delta table using the Glue metastore:

%%sql
select * from `my_table` VERSION AS OF 1 limit 2 

The issues start when I try to create a new Delta Table in the same catalog.

%%sql
CREATE TABLE `my_table_v1` USING DELTA 
AS (select * from `my_table` VERSION AS OF 1 limit 2 )

##
# IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3://my_s3_bucketmy_table_v1

Considering that the Glue database has location s3://my_s3_bucket looks like there is some issue in calculating the full path where the table files must be written.

I then tried to specify the LOCATION of the table creating an EXTERNAL table:

%%sql
CREATE TABLE `my_table_v1` USING DELTA 
LOCATION 's3://my_s3_bucket/my_table_v1'
AS (select * from `my_table` VERSION AS OF 1 limit 2 )

##
# IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3://my_s3_bucketmy_table_v1-__PLACEHOLDER__

In this way the data files are correctly generated in s3://my_s3_bucket/my_table_v1 but it then fails registering the table in the Glue database.

Any Suggestion?

  • I am able to use Spark SQL (CREATE OR REPLACE TABLE USING DELTA AS SELECT * FROM TempVIEW)

    Setting database location to a appropriate s3 path will resolve issue. In glue catalog go to database and edit the database to have s3 path location. Use glue version 4.0 or latest.

    We don't need to put location in spark code.

profile picture
asked 9 months ago1304 views
2 Answers
1

That is not supported, even if you solve the path issues, that is not a valid Delta/symlink table, it will create a generic Hive table with an array field.
The OSS Delta community is working to rectify that, in the meanwhile you can register Delta tables on the catalog using a Glue crawler or the DeltaTable "generate" API , but not directly via SparkSQL

profile pictureAWS
EXPERT
answered 9 months ago
0

Thanks for confirming it. If it is not supported the first example in this doc page https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html should be corrected as it produces the exact same issue. It basically attempts to do the same thing using pyspark syntax.

Example: Write a Delta Lake table to Amazon S3 and register it to the AWS Glue Data Catalog The following AWS Glue ETL script demonstrates how to write a Delta Lake table to Amazon S3 and register the table to the AWS Glue Data Catalog.*

profile picture
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions