The backlog of messages in my Amazon Simple Queue Service (Amazon SQS) queue increased and I want to prevent increases.
Short description
Standard and FIFO SQS queues
The backlog of messages increases under the following circumstances:
- Producers send messages at a faster rate than the messages are consumed.
- Consumers don't delete messages within the visibility timeout period. When the SQS queue is polled, messages reappear in the queue.
FIFO SQS queues
For FIFO (First-In-First-Out) SQS queues, an increase in the backlog of messages can also result from either of the following:
- The FIFO queue 120,000 message buffer limit.
- A message that belongs to a message group is stuck with a consumer that prevents other messages from the same message group from processing.
Resolution
Follow these best practices to prevent an increase of backlog of messages.
Standard and FIFO SQS queues
- Set the SQS queue optimum visibility timeout to allow the consumer to delete messages after processing them within the visibility timeout period. If you don't know how long it takes to process a message, then create a heartbeat for your consumer process. Specify the initial visibility timeout (for example, two minutes). If the consumer needs more time to process the message, then use the ChangeMessageVisibility API call to increase the visibility timeout.
- Increase the batch size when you make ReceiveMessage API calls. Set the MaxNumberOfMessages parameter value to more than 1 and up to a maximum of 10.
- Monitor the SQS queue metric Approximate Number of Messages Visible. This metric allows you to see if producers start to produce messages at a higher rate than consumers can consume the messages. To scale horizontally, increase the number of consumers or clients that consume the SQS queue, or increase the number of threads that poll the queue.
FIFO SQS queues
120,000 message buffer
FIFO queues allow a maximum of 120,000 inflight messages. Inflight messages include those received from a queue by a consumer, but not yet deleted from the queue. If you reach the 120,000 quota, then Amazon SQS doesn't return error messages.
A FIFO queue reviews the first 120,000 messages to determine available message groups. If you have a backlog of messages in a single message group, you can't consume messages from other message groups until you successfully consume messages from the backlog.
Scaling over message groups
Messages that belong to the same message group are processed one by one, in the order relative to the message group. When you receive messages with multiple message group IDs, Amazon SQS tries to return as many messages with the same message group ID as possible. This allows other consumers to process messages with a different message group ID.
When messages that belong to a specific message group ID are invisible, no other consumer can process messages with the same message group ID. However, consumers can process messages from other message groups. Try to increase the number of message groups where the order isn't important.