S3 Metadata Query across date folder or hundred thousand folders

0

I have a use case where I need to run Batch EMR job on schedule (daily). I can make folders on date basis for my data coming from IoT. Or I can make folders for each device sending IoT data and put date as S3 meta data. In that case for daily scheduled batch job, I need to use S3 meta data query across all 100 thousand folders. I am planning to go for date wise folders. Is this right approach?

1 Answer
0

Hello,

The decision between using date-wise folders or device-wise folders with date metadata for your IoT data storage in S3 depends on several factors, including your data access patterns, data volume, and performance requirements.

With date-wise folders, your daily batch job can easily locate and process the data for a specific date by looking into the corresponding folder. This simplifies the job's logic and makes it more efficient compared to querying metadata across hundreds of thousands of folders/objects.

Keep in mind that this approach may not be optimal if you frequently need to access data based on device IDs or if you have other use cases that require device-specific data partitioning. Additionally, ensure that you follow best practices for S3 folder structure and object naming conventions to optimize performance and manageability. Also, consider using AWS services like AWS Glue or AWS Athena, which can simplify data querying and processing by automatically partitioning your data based on folder structure or metadata.

profile pictureAWS
SUPPORT ENGINEER
answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions