1 Answer
- Newest
- Most votes
- Most comments
0
Hello,
Yes. You are right that INSERT INTO is not yet supported for bucketed tables. For your use case where you wanted to specify the number of buckets/file sizes, using Athena bucketing would be appropriate but, with the downfall of not being able to use INSERT INTO to insert new incoming data.
But, I can recommend of using S3distcp utility on AWS EMR to merge small files into ~128MB size to solve your small file problem. You can use it to combine smaller files into larger objects. You can also use S3DistCP to move large amounts of data in an optimized fashion from HDFS to Amazon S3, Amazon S3 to Amazon S3, and Amazon S3 to HDFS.
REFERENCES:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html
Relevant content
- Accepted Answerasked 6 years ago
- Accepted Answerasked 4 years ago
- asked 8 months ago
- asked 5 months ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 7 months ago