Skip to content

s3tables api - create table with partitions

0

I have noticed there is no option to specify which fields are partitioned in the api configuration from here:

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3tables/create-table.html

The fields only contain name, type and required.

My table schema looks like this:

CREATE TABLE offers_dataset ( campaign_id int, adset_id int, targeting string, match_type string, impressions int, clicks int, spent double, sales double, sales_count double, sold_units double, sold_units_same_seller double, sales_same_seller double, sales_count_same_seller double, advertiser_id int, y int, m int, d int, h int ) PARTITIONED BY ( advertiser_id,y,m,d,h) TBLPROPERTIES ('table_type'='ICEBERG'); }

asked 5 months ago379 views
1 Answer
0
Accepted Answer

When creating an Amazon S3 table with partitions using the S3Tables API or AWS CLI, you cannot directly specify partition fields in the initial create-table command. The basic schema definition in the create-table command only supports specifying field names, types, and whether they're required.

To add partition fields to an Iceberg table, you would need to follow a two-step process:

  1. First, create the table with its schema using the s3tables create-table command:
aws s3tables create-table --cli-input-json file://mytabledefinition.json

Where your JSON definition would include all your fields:

{
"tableBucketARN": "arn:aws:s3tables:region:account-id:bucket/your-table-bucket",
"namespace": "your_namespace",
"name": "offers_dataset",
"format": "ICEBERG",
"metadata": {
"iceberg": {
"schema": {
"fields": [
{"name": "campaign_id", "type": "int"},
{"name": "adset_id", "type": "int"},
{"name": "targeting", "type": "string"},
{"name": "match_type", "type": "string"},
{"name": "impressions", "type": "int"},
{"name": "clicks", "type": "int"},
{"name": "spent", "type": "double"},
{"name": "sales", "type": "double"},
{"name": "sales_count", "type": "double"},
{"name": "sold_units", "type": "double"},
{"name": "sold_units_same_seller", "type": "double"},
{"name": "sales_same_seller", "type": "double"},
{"name": "sales_count_same_seller", "type": "double"},
{"name": "advertiser_id", "type": "int"},
{"name": "y", "type": "int"},
{"name": "m", "type": "int"},
{"name": "d", "type": "int"},
{"name": "h", "type": "int"}
]
}
}
}
}
  1. After creating the table, you would need to add the partition fields using Apache Spark or AWS Glue ETL jobs. For example, with Apache Spark, you could use:
ALTER TABLE catalog_name.database_name.offers_dataset
ADD PARTITION FIELD advertiser_id

And repeat for each partition field (y, m, d, h).

Note that adding partition fields to an existing Iceberg table may require rewriting the data to organize it according to the new partition structure, which can be resource-intensive for large datasets.
Sources
Creating an Amazon S3 table - Amazon Simple Storage Service
Build a data lake for streaming data with Amazon S3 Tables and Amazon Data Firehose | AWS Storage Blog
How to add partition fields to Iceberg table | AWS re:Post

answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.