1 Answer
- Newest
- Most votes
- Most comments
1
Hello,
For asynchronous operations like StartDocumentTextDetection, Amazon Textract requires the input document to be stored in an Amazon S3 bucket AND the asynchronous API does not support passing byte arrays or URLs directly as input. With that in mind, you have the following options:
- Upload the document to an S3 bucket before processing (the simplest, if possible)
- If your documents are relatively small (under 5 MB) and you don't need the scalability of asynchronous processing, you can use the synchronous API (DetectDocumentText). The synchronous API accepts byte arrays, allowing you to process documents without storing them in S3
Hope that helps.
Cheers
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 4 years ago
Does the synchronous API (DetectDocumentText) support multipage document ?
Synchronous APIs support single-page documents only, as mentioned here: https://docs.aws.amazon.com/textract/latest/dg/sync.html You could also configure an S3 notification (via Lambda) so your document automatically starts processing as soon as your client uploads it to S3... And/or set up a more complex workflow orchestrated by something like AWS Step Functions, that could take other steps including potentially deleting the document from S3 once the processing is done.