Why is the Lambda IteratorAge metric increasing for my DynamoDB stream?

5 minute read
0

When AWS Lambda consumes records from Amazon DynamoDB Streams, I see a spike in the Lambda IteratorAge metric.

Short description

The Lambda IteratorAge metric measures the latency between when a record is added to a DynamoDB stream, and when the function processes that record. An increase in IteratorAge means that Lambda isn't efficiently processing records that are written to the DynamoDB stream.

The IteratorAge can increase for the following reasons:

  • Invocation errors
  • Throttle occurrence
  • Low Lambda throughput

Resolution

Invocation errors

Lambda is designed to process batches of records in sequence and then retry on errors. If a function returns an error every time it's invoked, then Lambda continues to retry until the records expire. Or, Lambda continues to retry until it exceeds the maximum age that you configure on the event source mapping. The DynamoDB stream retention period is 24 hours. Lambda continues to retry for up to one day. After that, it moves on to the next batch of records.

To confirm if an invocation error is the root cause of the IteratorAge spike, check the Lambda errors metric. If an invocation error is the cause, check the Lambda logs to debug the error, and then modify your code. When you handle the error, make sure to include a try-catch statement in your code.

The event source mapping configuration has three parameters that can help you prevent IteratorAge spikes:

  • Retry attempts: This parameter sets the maximum number of times that Lambda retries when the function returns an error.
  • Maximum age of record: This parameter sets the maximum age of a record that Lambda sends to your function. Use this parameter to help you discard records that are too old.
  • Split batch on error: This parameter helps you retry with smaller batches to isolate bad records and work around timeout issues. When the function returns an error, split the batch before retrying.
    Note: Splitting a batch doesn't count towards the retry quota.

To retain discarded events, configure the event source mapping to send details about failed batches to an Amazon Simple Queue Service (Amazon SQS) queue. Or, configure the event source mapping to send details to an Amazon Simple Notification Service (Amazon SNS) topic. To do this, use the On-failure destination parameter.

Throttle occurrences

Because event records are sequentially read, if the current invocation is throttled, then Lambda functions can't progress to the next record.

When you use DynamoDB streams, don't configure more than two consumers on the same stream shard. If you have more than two readers on a shard, then your function might be throttled. For more information, see Reading and processing a stream.

If you need more than two readers on a single stream shard, then use a fan-out pattern. Configure the Lambda function to consume records from the stream, and then forward them to other downstream Lambda functions or Amazon Kinesis streams. For Lambda, use a concurrency limit to prevent throttling.

Lambda throughput

Runtime duration

If the Duration metric of a Lambda function is high, then the function's throughput decreases and the IteratorAge increases.

To decrease your function's runtime duration, use one or both of these methods:

Concurrent Lambda runs

The maximum number of concurrent Lambda runs are calculated as follows:

Concurrent Runs = Number of Shards x Concurrent batches per shard (parallelization Factor)

  • Number of Shards: In a DynamoDB stream, there's a one-to-one mapping between the number of partitions of the table and the number of stream shards. The size of the table and the table's throughput determines the number of partitions. Each partition on the table can serve up to 3,000 read request units or 1,000 write request units, or the linear combination of both. To increase concurrency, increase the table's provisioned capacity to increase the number of shards.
  • Concurrent batches per shard (Parallelization Factor): You can configure the number of concurrent batches per shard in the event source mapping. The default is 1 and can be increased up to 10.

For example, if the table has 10 partitions and the concurrent batches per shard is set to 5, then you can have up to 50 concurrent runs.

Note: To process item level modification in the right order at any given time, items with the same partition key go to the same batch. Make sure that your table partition key has high cardinality, and that your traffic doesn't generate hot keys. For example, if you set Concurrent batches per shard value to 10 and your write traffic is targeting one single partition key, then you can have only one concurrent run per shard.

Batch size

To help you increase Lambda throughput, tune the batch size value. If you process a low number of records per batch, then the stream is processed more slowly. If you have a high number of records per batch, then the duration of the function run might increase. To find the best value for your use case, it's a best practice to test your batch size with multiple values.

If your function's runtime duration is independent from the number of records in an event, then increasing your function's batch size decreases the function's iterator age.

Related information

Using AWS Lambda with Amazon DynamoDB

AWS OFFICIAL
AWS OFFICIALUpdated 3 months ago
No comments