1 Resposta
- Mais recentes
- Mais votos
- Mais comentários
0
The UNLOAD command in its default configuration unloads a number of files equal to number of slices. For a DC2.8xlarge 4 node cluster the number of slices are 64 (4 node * 16 slices per node). This is the default behavior and makes all the slices at work in parallel. Redshift tries to make the files sizes in chunk of 32 MB row group when unloaded in Parquet format. For smaller data volume where 32 MB chunk are big enough it will generate smaller files. The multiple files are effective than a single file as the later case Redshift combines the data from table and then generate a single file- less effective for parallel compute nodes.
One solution to generate fixed size file is to use the UNLOAD option PARALLEL OFF and MAXFILESIZE 1GB.
respondido há 4 anos
Conteúdo relevante
- AWS OFICIALAtualizada há um ano
- AWS OFICIALAtualizada há um ano
- AWS OFICIALAtualizada há 2 anos
- AWS OFICIALAtualizada há 5 meses