Hello,
I have two questions issues:
spark = (
SparkSession.builder
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.glue_catalog.warehouse", f"s3://co-raw-sales-dev")
.config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
.config("spark.sql.catalog.glue_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
.enableHiveSupport()
.getOrCreate()
)
df.writeTo("glue_catalog.co_raw_sales_dev.new_test").using("iceberg").create()
CREATED TABLE DDL:
CREATE TABLE co_raw_sales_dev.new_test (
id bigint,
name string,
points bigint)
LOCATION 's3://co-raw-sales-dev**//**new_test'
TBLPROPERTIES (
'table_type'='iceberg'
);
The problem I am having is that there is double // in location between bucket and table name in s3.
This one wokrs:
df.writeTo("glue_catalog.co_raw_sales_dev.new_test2").using("iceberg").create()
but if I remove "glue_catalog" like:
df.writeTo("co_raw_sales_dev.new_test2").using("iceberg").create()
I am getting error : An error occurred while calling o339.create. Table implementation does not support writes: co_raw_sales_dev.new_test2
am I missing some parameter in SparkSession config?
Thank you, Adas.
Thank you Gonzalo for explanation.
One more question, I am running query below from Glue:
I inserted data, I can query it everything seems fine, but when running "SHOW CREATE TABLE {std_database}.{std_table}" on Athena, I am getting error: CREATE TABLE statement cannot be generated because table has unsupported properties.
Both properties I added are described in: https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html What might be wrong?
Maybe instead of "using" use the table property 'table_type' ='ICEBERG', otherwise it works for me
Hi Honzalo,
I tested various scenarios, and the only way "SHOW CREATE TABLE" on Athena works, when I create table on PySpark without any TBLPROPERTIES.
Also "SHOW CREATE TABLE" answer on Athena and PySpark is different:
Athena:
PySpark:
I think the problem here is that Glue 4.0 creates Iceberg format=1 and Athena is using Iceberg format=2.
Spark defaults to format-version 1, but it should work with 2