- Newest
- Most votes
- Most comments
I understand you're having trouble processing a 50MB TIFF file with Amazon Textract. Let me explain why this might be happening and suggest some solutions.
Amazon Textract has specific file size and page count limitations that you need to be aware of:
For synchronous operations:
- JPEG, PNG, PDF, and TIFF files have a limit of 10 MB in memory.
- PDF and TIFF files are limited to 1 page.
For asynchronous operations:
- JPEG and PNG files have a limit of 10 MB in memory.
- PDF and TIFF files have a limit of 500 MB in memory and 3,000 pages.
Your 50MB TIFF file exceeds the synchronous operation limit of 10 MB, which is likely why it wasn't processed. However, you should be able to process this file using an asynchronous operation, as it's within the 500 MB limit for asynchronous processing of TIFF files.
To process your 50MB TIFF file, you'll need to use Amazon Textract's asynchronous operations. Here are the steps you can follow:
-
Use the StartDocumentTextDetection or StartDocumentAnalysis API (depending on your needs) to initiate the asynchronous processing.
-
These operations will return a JobId that you can use to check the status of the job.
-
Use the GetDocumentTextDetection or GetDocumentAnalysis API with the JobId to retrieve the results once the job is complete.
You can also set up notifications using Amazon SNS or Amazon SQS to be alerted when the job is finished, rather than polling for the results.
If you're dealing with very large documents (over 3,000 pages), you might need to consider splitting the document before processing. Amazon Textract provides sample code for a DocumentSplitter that can be configured to split documents when they exceed 3,000 pages.
Remember to ensure you have the necessary permissions (AmazonTextractFullAccess) and that your AWS CLI and SDKs are properly set up before attempting to process the file.
Sources
Options for processing large documents using Amazon Textract? | AWS re:Post
Processing Documents Asynchronously - Amazon Textract
Relevant content
- asked 2 years ago
- asked 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 7 months ago
I uploaded a .tiff file which is 50MB. But Amazon Textract didn't process it. Please help me.