1回答
- 新しい順
- 投票が多い順
- コメントが多い順
0
Generally the _SUCCESS
marker is per full job.
There are 2 options I could think of -
- Write a custom committer that records the partitions that are being written to, update an accumulator and then have the driver create those files. This could be complex and error-prone.
- Writing out files directly to partition directory
path/to/table/partition_key1=foo/partition_key2=bar
but not tell the output that it's partitioned. A generally-better option is to use a persistent metadata store (like Glue's Catalog) where you update the partition metadata after the write is confirmed complete.
Once the partition metadata is updated, you can use the Predicate pushdowns for partition columns. This predicate can be any SQL expression or user-defined function as long as it uses only the partition columns for filtering. Remember that you are applying this to the metadata stored in the catalog, so you don’t have access to other fields in the schema.
回答済み 2年前
関連するコンテンツ
- AWS公式更新しました 3年前
- AWS公式更新しました 3年前
- AWS公式更新しました 3年前
- AWS公式更新しました 1年前