Extracting table data from PDF documents using AWS Textract in a React.js

0

I am creating a react.js web app where I am using aws textract to get the content of the pdf document which contains the timetable or schedule of the user, I am able to use DetectDocumentTextCommand but it's not good for getting the content of the document which include table, Basically, I want to get the data in the table form, like BlockType:'Table' & row, etc. When I am using the "StartDocumentAnalysisCommand" then it's giving a message in response that "Request has invalid parameters".

import { TextractClient, StartDocumentAnalysisCommand } from "@aws-sdk/client-textract";
import React, { useState } from "react";
import AWS from 'aws-sdk'

export const DetectText = () => {
    const [file, setFile] = useState({});
    const bucketName = process.env.REACT_APP_SECRET_BUCKET_NAME;

    AWS.config.update({
        accessKeyId: process.env.REACT_APP_ACCESS_KEY_id,
        secretAccessKey: process.env.REACT_APP_SECRET_ACCESS_KEY,
        region: 'ap-south-1',
    })

    const client = new TextractClient({
        region: 'ap-south-1', credentials: {
            accessKeyId: process.env.REACT_APP_ACCESS_KEY_id,
            secretAccessKey: process.env.REACT_APP_SECRET_ACCESS_KEY,
        }
    });

    const onSelectFile = (e) => {
        if (!e.target.files || e.target.files.length === 0) return;
        const reader = new FileReader();
        const file = e.target.files[0];
        setFile(file);
        reader.readAsDataURL(file);
    }

    const detectText = async () => {
        // create an instance of the S3 client
        const s3 = new AWS.S3();
        const paramsForS3 = { Bucket: bucketName, Key: `folder3/${file.name}`, Body: file };

        // upload the file to S3
        s3.upload(paramsForS3, async (err, data) => {
            if (data) {
                const paramsforCheck = {
                    DocumentLocation: { S3Object: { Bucket: bucketName, Key: `folder3/${file.name}` } },
                    FeatureTypes: ['TABLES', 'FORMS', 'DOCUMENT_TEXT'],
                };
                const command = new StartDocumentAnalysisCommand(paramsforCheck);
                try {
                    const data = await client.send(command);
                    if (data?.Blocks) {
                        console.log(`Started document analysis with JobId: ${data.JobId}`);
                        console.log(data.Blocks)
                    }
                } catch (error) {
                    console.log('err', error);
                }
            } else console.error(err);
        });
    };

    return (
        <div>
            <input type='file' id='file' name='file' onChange={onSelectFile} className='inputfile' />
            <button onClick={detectText} style={{ margin: "10px" }}>Run OCR</button>
        </div>
    )
}

Please let me know what I am doing wrong. Any help or suggestion will be truly appreciated

I tried every possible thing to get this error but didn't find a solution & I am hoping that I will get a solution or suggestion from you.y

2 Answers
1

Hi,

According to https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-textract/interfaces/startdocumentanalysiscommandinput.html, it seems it requires a DocumentLocation attribute, which you are not passing.

Hope it helps, otherwise provide more details on api as version

profile picture
EXPERT
answered a year ago
  • Thanks for the response, Yes you are right it's necessary to give DocumentLocation in the param but still getting the same error after updating the code.

    Updated Param:- const params = { Document: { Bytes: blob }, DocumentLocation: {S3Object: {Bucket: "schedulesdata", Name: filename}}, FeatureTypes: ['TABLES', 'FORMS', 'DOCUMENT_TEXT'], };

    I am using region: 'ap-south-1', apiVersion: '2018-06-27'

  • There is no "Document" attribute. You need to store your blob/file/document in an S3 location, and then provide that as Document location, like this:

    const params = { DocumentLocation: { S3Object: { Bucket: "your-s3-bucket-containing-your-stored-file", Name: "your-s3-object-key" } }, FeatureTypes: ["TABLES", "FORMS"] };

    Let me know

  • Now, I am storing the document first into the bucket then I am doing further things but again getting the same error. Code has been updated in the post.

1

Here you can find how to use AWS-SDK to call Textract: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-textract/

This I posted is V3. The syntax won't work if you have installed V2. StartDocumentAnalysisCommand requires the following parameters:

https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-textract/classes/startdocumentanalysiscommand.html

profile picture
raffaeu
answered a year ago
  • Thanks for the response, "@aws-sdk/client-textract": "^3.301.0" my version that I am using in package.json as dependency I think it's the latest version!.

    I have also updated my param with DocumentLocation but still getting the same error.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions