Web crawler on Amazon Q

0

Hi, I am trying to crawl a website by entering the Source sitemaps on Amazon Q. Is there a way I can check the web pages crawled on Q?

Enter image description here

gefragt vor 2 Monaten189 Aufrufe
1 Antwort
0

Hi,

Yes, there is a way: use the script q_list_documents.py in my repo: https://github.com/didier-durand/qstensils

See doc at https://github.com/didier-durand/qstensils/blob/main/doc/q_list_documents.md

Feel free to re-use and share further: it is under permissive MIT license.

This script will allow you to locate documents in trouble with indexing like the second one below:

{
        "createdAt": "2024-02-21 11:31:00.422000+01:00",
        "documentId": "s3://bucket-name/Togo.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:09.220000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.709000+01:00",
        "documentId": "s3://bucket-name/What Ever Happened to Baby Jane?.json",
        "error": {},
        "status": "DOCUMENT_FAILED_TO_INDEX",
        "updatedAt": "2024-02-21 11:47:46.031000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.698000+01:00",
        "documentId": "s3://bucket-name/Vicky Donor.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:53.677000+01:00"
    }

Best,

Didier

profile pictureAWS
EXPERTE
beantwortet vor 2 Monaten
profile picture
EXPERTE
überprüft vor 2 Monaten
  • Thanks a ton for a quick response Didier. Not a full time developer, but will try it out :-)

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen