- Newest
- Most votes
- Most comments
When creating an Amazon S3 table with partitions using the S3Tables API or AWS CLI, you cannot directly specify partition fields in the initial create-table command. The basic schema definition in the create-table command only supports specifying field names, types, and whether they're required.
To add partition fields to an Iceberg table, you would need to follow a two-step process:
- First, create the table with its schema using the s3tables create-table command:
aws s3tables create-table --cli-input-json file://mytabledefinition.json
Where your JSON definition would include all your fields:
{
"tableBucketARN": "arn:aws:s3tables:region:account-id:bucket/your-table-bucket",
"namespace": "your_namespace",
"name": "offers_dataset",
"format": "ICEBERG",
"metadata": {
"iceberg": {
"schema": {
"fields": [
{"name": "campaign_id", "type": "int"},
{"name": "adset_id", "type": "int"},
{"name": "targeting", "type": "string"},
{"name": "match_type", "type": "string"},
{"name": "impressions", "type": "int"},
{"name": "clicks", "type": "int"},
{"name": "spent", "type": "double"},
{"name": "sales", "type": "double"},
{"name": "sales_count", "type": "double"},
{"name": "sold_units", "type": "double"},
{"name": "sold_units_same_seller", "type": "double"},
{"name": "sales_same_seller", "type": "double"},
{"name": "sales_count_same_seller", "type": "double"},
{"name": "advertiser_id", "type": "int"},
{"name": "y", "type": "int"},
{"name": "m", "type": "int"},
{"name": "d", "type": "int"},
{"name": "h", "type": "int"}
]
}
}
}
}
- After creating the table, you would need to add the partition fields using Apache Spark or AWS Glue ETL jobs. For example, with Apache Spark, you could use:
ALTER TABLE catalog_name.database_name.offers_dataset
ADD PARTITION FIELD advertiser_id
And repeat for each partition field (y, m, d, h).
Note that adding partition fields to an existing Iceberg table may require rewriting the data to organize it according to the new partition structure, which can be resource-intensive for large datasets.
Sources
Creating an Amazon S3 table - Amazon Simple Storage Service
Build a data lake for streaming data with Amazon S3 Tables and Amazon Data Firehose | AWS Storage Blog
How to add partition fields to Iceberg table | AWS re:Post
Relevant content
- asked 9 months ago
- AWS OFFICIALUpdated a year ago
