Web crawler on Amazon Q

0

Hi, I am trying to crawl a website by entering the Source sitemaps on Amazon Q. Is there a way I can check the web pages crawled on Q?

Enter image description here

asked a month ago166 views
1 Answer
0

Hi,

Yes, there is a way: use the script q_list_documents.py in my repo: https://github.com/didier-durand/qstensils

See doc at https://github.com/didier-durand/qstensils/blob/main/doc/q_list_documents.md

Feel free to re-use and share further: it is under permissive MIT license.

This script will allow you to locate documents in trouble with indexing like the second one below:

{
        "createdAt": "2024-02-21 11:31:00.422000+01:00",
        "documentId": "s3://bucket-name/Togo.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:09.220000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.709000+01:00",
        "documentId": "s3://bucket-name/What Ever Happened to Baby Jane?.json",
        "error": {},
        "status": "DOCUMENT_FAILED_TO_INDEX",
        "updatedAt": "2024-02-21 11:47:46.031000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.698000+01:00",
        "documentId": "s3://bucket-name/Vicky Donor.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:53.677000+01:00"
    }

Best,

Didier

profile pictureAWS
EXPERT
answered a month ago
profile picture
EXPERT
reviewed a month ago
  • Thanks a ton for a quick response Didier. Not a full time developer, but will try it out :-)

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions