1 Respuesta
- Más nuevo
- Más votos
- Más comentarios
0
Generally the _SUCCESS
marker is per full job.
There are 2 options I could think of -
- Write a custom committer that records the partitions that are being written to, update an accumulator and then have the driver create those files. This could be complex and error-prone.
- Writing out files directly to partition directory
path/to/table/partition_key1=foo/partition_key2=bar
but not tell the output that it's partitioned. A generally-better option is to use a persistent metadata store (like Glue's Catalog) where you update the partition metadata after the write is confirmed complete.
Once the partition metadata is updated, you can use the Predicate pushdowns for partition columns. This predicate can be any SQL expression or user-defined function as long as it uses only the partition columns for filtering. Remember that you are applying this to the metadata stored in the catalog, so you don’t have access to other fields in the schema.
respondido hace 2 años
Contenido relevante
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 3 años