By using AWS re:Post, you agree to the AWS re:Post Terms of Use

SQS - Query Messages and generate a report

0

Hello,

We have a SQS dlq with around 1 million messages, and I would like to generate a report for my manager. This report would be pretty simple as I need to group all events by one metadata field. However, I can't find a way to do it. Polling for messages won't work as it is limited to 10 messages at a time, and I need to extract the metadata field value from each message manually.

Is there a way for me to create this report easily?

Thanks!

asked 3 months ago129 views
2 Answers
1

You did not indicate what you want to happen withe the messages in the DLQ. Do you want to delete them after generating the report or do you want to leave in the queue.

If you want to delete them, you should just read messages from the queue (preferably in batches), analyze and delete the messages in the batch. You repeat that until there are no more messages in there. You can also use a mechanism like the one describes in Pandurangaswamy's answer (with the comment below it).

If you want to leave them in the queue it is more tricky. You can do exactly the same, but without deleting the messages. The messages will remain in the queue. However, there are a few caveats:

  1. You need to process all messages before the visibility timeout expires. This is easy to achieve by setting a high value for the VisibilityTimeout param when you read the messages.
  2. The more challenging issue is that you can have a maximum of 120,000 in flight messages (A message that was read from the queue but was not deleted or the visibility timeout did not expire). This means that if you have more than 120,000 messages, the solution will not work.

So if you have more than 120,000 messages you will need to use a different approach,. Something like the first option, but before deleting them, save the messages somewhere, and when done, resend them to the queue.

profile pictureAWS
EXPERT
answered 3 months ago
profile picture
EXPERT
reviewed 3 months ago
  • Thanks for the comment, Uri. You are right. I don't want to delete the messages. I will follow your suggestion with the EventBridge Pipes and then follow the rest of the steps Pandurangaswamy has suggested.

0

Hi Vladi,

Please go through the below steps and Documentation Links, I hope it will help solve your issue.

1. Export Messages to S3

First, you should export the messages from the SQS DLQ to Amazon S3. This will allow you to handle large volumes of data more efficiently.

  • Use AWS Data Pipeline: AWS Data Pipeline can be used to periodically export SQS messages to S3. You can set up a pipeline that reads from your SQS DLQ and writes to an S3 bucket.

https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html

Use Lambda and S3: Another approach is to use an AWS Lambda function triggered by SQS to batch messages and write them to S3. This Lambda function can process messages in larger batches (e.g., 10,000 messages) and save them in a format such as JSON or CSV.

2. Process Data from S3

Once the messages are in S3, you can process them using several tools:

  • AWS Glue: AWS Glue is a fully managed ETL (Extract, Transform, Load) service. It can crawl your data stored in S3, transform it, and load it into a data store like Amazon Redshift or Amazon Athena for analysis.

https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html

  • Amazon Athena: You can use Amazon Athena to query data directly from S3 using SQL. You would define a table schema based on the structure of your exported messages and then run SQL queries to group and analyze the data.

https://docs.aws.amazon.com/athena/latest/ug/what-is.html

3. Group and Analyze Data

Once your data is accessible through a service like Athena or Redshift, you can run SQL queries to group by the metadata field and generate the required report.

Example SQL Query in Athena:

SELECT metadata_field, COUNT(*)
FROM your_s3_table
GROUP BY metadata_field
ORDER BY COUNT(*) DESC;

Replace metadata_field with the actual field name you're interested in and your_s3_table with the table you’ve defined.

4. Automate and Schedule

To handle such tasks regularly, consider automating the process:

  • Set up a regular Data Pipeline: Automate the export of new messages to S3.
  • Schedule AWS Glue Jobs or Athena Queries: Automate data processing and reporting tasks using scheduled jobs.
EXPERT
answered 3 months ago
  • I don't think that AWS Data Pipeline supports reading messages from SQS.

    To use a similar approach, you can use EventBridge Pipes to read messages from the queue (it will delete them from there) and send them to Data Firehose, that will write the data to S3. From there you can do the rest of the steps to generate your report.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions