How can I use Apache Iceberg with a cross-account AWS Glue Data Catalog in Spark?

2 minute read

I want to use Spark with Amazon EMR or AWS Glue to interact with Apache Iceberg from a cross-account AWS Glue Data Catalog.


Set the following parameters to use Spark to interact with Apache Iceberg tables from the AWS Glue Data Catalog:

--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \  
--conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog \  
--conf<CROSS_ACCOUNT_ID> \  
--conf spark.sql.catalog.glue_catalog.warehouse=s3://<your-warehouse-dir>/ \  
--conf \  

You can set these parameters in a number of ways, depending on whether you use an AWS Glue job or an Amazon EMR cluster.

For AWS Glue jobs, use job parameters. For example:

Key:  --conf  
Value: spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf --conf<CROSS_ACCOUNT_ID> --conf<WAREHOUSE_DIR>/ --conf --conf

For an Amazon EMR cluster that runs version 6.5 or later, set the parameters when you submit the job. Or, use the Spark default configuration (/etc/spark/conf/spark-defaults.conf). For more information, see Use an Iceberg cluster with Spark.

Note: For cross-account scenarios, you must always use the property to specify the corresponding AWS Glue Data Catalog ID (AWS account ID).

If you're using Amazon EMR version 6.5 or later, then use the following spark-defaults configuration:

]    }  
        "configurations": []  
            "": ""  
            "": "",  
            "": "s3://<WAREHOUSE_DIR>/",  
            "": "<CROSS_ACCOUNT_ID>",  
            "": "org.apache.iceberg.spark.SparkCatalog",  
            "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",  
            "spark.jars": "/usr/share/aws/iceberg/lib/iceberg-spark3-runtime.jar",  
        "properties": {  
        "classification": "spark-defaults",  
        "configurations": []  
            "iceberg.enabled": "true"  
        "properties": {  
        "classification": "iceberg-defaults",  

Note: The Amazon EMR or AWS Glue job must have sufficient AWS Identity and Access Management (IAM) permissions to access the cross-account AWS Glue Data Catalog. For more information, see Making a cross-account API call.

Related information

Read an Iceberg table from Amazon S3 using Spark

AWS OFFICIALUpdated a year ago