Extracting table data from PDF documents using AWS Textract in a React.js

0

I am creating a react.js web app where I am using aws textract to get the content of the pdf document which contains the timetable or schedule of the user, I am able to use DetectDocumentTextCommand but it's not good for getting the content of the document which include table, Basically, I want to get the data in the table form, like BlockType:'Table' & row, etc. When I am using the "StartDocumentAnalysisCommand" then it's giving a message in response that "Request has invalid parameters".

import { TextractClient, StartDocumentAnalysisCommand } from "@aws-sdk/client-textract";
import React, { useState } from "react";
import AWS from 'aws-sdk'

export const DetectText = () => {
    const [file, setFile] = useState({});
    const bucketName = process.env.REACT_APP_SECRET_BUCKET_NAME;

    AWS.config.update({
        accessKeyId: process.env.REACT_APP_ACCESS_KEY_id,
        secretAccessKey: process.env.REACT_APP_SECRET_ACCESS_KEY,
        region: 'ap-south-1',
    })

    const client = new TextractClient({
        region: 'ap-south-1', credentials: {
            accessKeyId: process.env.REACT_APP_ACCESS_KEY_id,
            secretAccessKey: process.env.REACT_APP_SECRET_ACCESS_KEY,
        }
    });

    const onSelectFile = (e) => {
        if (!e.target.files || e.target.files.length === 0) return;
        const reader = new FileReader();
        const file = e.target.files[0];
        setFile(file);
        reader.readAsDataURL(file);
    }

    const detectText = async () => {
        // create an instance of the S3 client
        const s3 = new AWS.S3();
        const paramsForS3 = { Bucket: bucketName, Key: `folder3/${file.name}`, Body: file };

        // upload the file to S3
        s3.upload(paramsForS3, async (err, data) => {
            if (data) {
                const paramsforCheck = {
                    DocumentLocation: { S3Object: { Bucket: bucketName, Key: `folder3/${file.name}` } },
                    FeatureTypes: ['TABLES', 'FORMS', 'DOCUMENT_TEXT'],
                };
                const command = new StartDocumentAnalysisCommand(paramsforCheck);
                try {
                    const data = await client.send(command);
                    if (data?.Blocks) {
                        console.log(`Started document analysis with JobId: ${data.JobId}`);
                        console.log(data.Blocks)
                    }
                } catch (error) {
                    console.log('err', error);
                }
            } else console.error(err);
        });
    };

    return (
        <div>
            <input type='file' id='file' name='file' onChange={onSelectFile} className='inputfile' />
            <button onClick={detectText} style={{ margin: "10px" }}>Run OCR</button>
        </div>
    )
}

Please let me know what I am doing wrong. Any help or suggestion will be truly appreciated

I tried every possible thing to get this error but didn't find a solution & I am hoping that I will get a solution or suggestion from you.y

2 Risposte
1

Hi,

According to https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-textract/interfaces/startdocumentanalysiscommandinput.html, it seems it requires a DocumentLocation attribute, which you are not passing.

Hope it helps, otherwise provide more details on api as version

profile picture
ESPERTO
con risposta un anno fa
  • Thanks for the response, Yes you are right it's necessary to give DocumentLocation in the param but still getting the same error after updating the code.

    Updated Param:- const params = { Document: { Bytes: blob }, DocumentLocation: {S3Object: {Bucket: "schedulesdata", Name: filename}}, FeatureTypes: ['TABLES', 'FORMS', 'DOCUMENT_TEXT'], };

    I am using region: 'ap-south-1', apiVersion: '2018-06-27'

  • There is no "Document" attribute. You need to store your blob/file/document in an S3 location, and then provide that as Document location, like this:

    const params = { DocumentLocation: { S3Object: { Bucket: "your-s3-bucket-containing-your-stored-file", Name: "your-s3-object-key" } }, FeatureTypes: ["TABLES", "FORMS"] };

    Let me know

  • Now, I am storing the document first into the bucket then I am doing further things but again getting the same error. Code has been updated in the post.

1

Here you can find how to use AWS-SDK to call Textract: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-textract/

This I posted is V3. The syntax won't work if you have installed V2. StartDocumentAnalysisCommand requires the following parameters:

https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-textract/classes/startdocumentanalysiscommand.html

profile picture
raffaeu
con risposta un anno fa
  • Thanks for the response, "@aws-sdk/client-textract": "^3.301.0" my version that I am using in package.json as dependency I think it's the latest version!.

    I have also updated my param with DocumentLocation but still getting the same error.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande