Web crawler on Amazon Q

Question

Hi,
I am trying to crawl a website by entering the Source sitemaps on Amazon Q. Is there a way I can check the web pages crawled on Q?

![Enter image description here](/media/postImages/original/IM57aK3e5WSomp3gAw_SZH5g)

Answer

Hi,

Yes, there is a way: use the script q_list_documents.py in my repo: https://github.com/didier-durand/qstensils

See doc at https://github.com/didier-durand/qstensils/blob/main/doc/q_list_documents.md

Feel free to re-use and share further: it is under permissive MIT license.

This script will allow you to locate documents in trouble with indexing like the second one below:

```
{
        "createdAt": "2024-02-21 11:31:00.422000+01:00",
        "documentId": "s3://bucket-name/Togo.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:09.220000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.709000+01:00",
        "documentId": "s3://bucket-name/What Ever Happened to Baby Jane?.json",
        "error": {},
        "status": "DOCUMENT_FAILED_TO_INDEX",
        "updatedAt": "2024-02-21 11:47:46.031000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.698000+01:00",
        "documentId": "s3://bucket-name/Vicky Donor.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:53.677000+01:00"
    }
```

Best,

Didier

Web crawler on Amazon Q

Relevanter Inhalt