Web crawler on Amazon Q

0

Hi, I am trying to crawl a website by entering the Source sitemaps on Amazon Q. Is there a way I can check the web pages crawled on Q?

Enter image description here

已提問 2 個月前檢視次數 189 次
1 個回答
0

Hi,

Yes, there is a way: use the script q_list_documents.py in my repo: https://github.com/didier-durand/qstensils

See doc at https://github.com/didier-durand/qstensils/blob/main/doc/q_list_documents.md

Feel free to re-use and share further: it is under permissive MIT license.

This script will allow you to locate documents in trouble with indexing like the second one below:

{
        "createdAt": "2024-02-21 11:31:00.422000+01:00",
        "documentId": "s3://bucket-name/Togo.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:09.220000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.709000+01:00",
        "documentId": "s3://bucket-name/What Ever Happened to Baby Jane?.json",
        "error": {},
        "status": "DOCUMENT_FAILED_TO_INDEX",
        "updatedAt": "2024-02-21 11:47:46.031000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.698000+01:00",
        "documentId": "s3://bucket-name/Vicky Donor.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:53.677000+01:00"
    }

Best,

Didier

profile pictureAWS
專家
已回答 2 個月前
profile picture
專家
已審閱 2 個月前
  • Thanks a ton for a quick response Didier. Not a full time developer, but will try it out :-)

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南