- Newest
- Most votes
- Most comments
Hello,
please find the solution it will be helpful for you.
To store and occasionally retrieve PDF records in a cost-effective manner using Amazon S3 Glacier, you can implement a streamlined solution involving several AWS services. First, upload your PDFs to an S3 bucket and set a lifecycle policy to transition them to S3 Glacier for long-term storage. To enable searching by specific fields like name, address, or phone number, use Amazon Textract to extract text from the PDFs upon upload. This can be automated with an AWS Lambda function that triggers on new uploads, extracts the relevant metadata, and stores it as JSON files in a separate folder within the same S3 bucket. For searching, leverage Amazon S3 Select to query the stored metadata directly within S3, avoiding the need for a complex database setup. When you need to access a specific PDF, initiate a restore request from S3 Glacier to temporarily move the file back to S3 Standard for download. This approach ensures cost-efficiency and simplifies the process of storing and retrieving records for audit purposes.
please find the below AWS Document If you get more information.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/restoring-objects-retrieval-options.html
Relevant content
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
Thanks for the quick reply and detailed answer. I was wondering if Athena could do the job. The solution shared in the answer seems to be straight forward and very useful. Thanks very much.