- Newest
- Most votes
- Most comments
Hello.
If the number of objects is in the hundreds of millions, I think it would be difficult to retrieve them using "list-objects".
So, why not use batch operations to create the manifest file instead?
https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-create-job.html
The cause of the error is thought to be that the IAM temporary authentication information has exceeded the session period.
As suggested in the Stack Overflow answer below, using an IAM user's access key might solve the issue.
However, retrieving billions of objects via CLI would likely take a considerable amount of time, so I would recommend using batch operations instead.
https://stackoverflow.com/questions/62685910/retrieval-of-file-from-s3-fails-with-a-the-provided-token-has-expired-same-file
Hello,
Use AWS managed services like Lambda with S3 Inventory or Batch Operations
batch operations are ideal for this.
- They can process many objects at once, improving efficiency compared to processing them individually.
Go and Check: https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-create-job.html
AWS Lambda with S3 Inventory
Create an S3 Inventory:
- Configure S3 Inventory to generate a manifest file listing all objects in your bucket, including their sizes.
- Choose a suitable frequency for inventory generation based on your data update rate.
Process Inventory with Lambda:
- Create a Lambda function triggered by S3 Inventory events.
- The function processes the inventory file, identifies the object with the maximum size, and stores the result in a DynamoDB table or another desired location.
Example code:
import boto3
import json
def lambda_handler(event, context):
s3 = boto3.client('s3')
bucket_name = 'your-bucket-name'
max_size = 0
largest_object = None
paginator = s3.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=bucket_name)
for page in page_iterator:
for obj in page['Contents']:
if obj['Size'] > max_size:
max_size = obj['Size']
largest_object = obj['Key']
return {
'largest_object': largest_object,
'size': max_size
}
Relevant content
- asked 3 years ago
- Accepted Answerasked a year ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 19 days ago
- AWS OFFICIALUpdated 3 years ago