How do I troubleshoot a blocked or stuck KCL application for Kinesis Data Streams?

4 minute read
0

My Amazon Kinesis Client Library (KCL) application is stuck and can’t process any Amazon Kinesis Data Streams records.

Short description

The KCL application can get stuck or blocked for the following reasons:

  • The record processor (a user implemented method) does a blocking operation or takes longer than normal.
  • There are no data records put to the shard.
  • The KCL gets stuck while retrieving a record.
  • The KCL can't schedule processing or fails to checkpoint.

To detect and troubleshoot the KCL issues, complete the following tasks:

  • Analyze KCL metrics.
  • Analyze the Amazon DynamoDB table for the KCL application.
  • Check the KCL configurations.
  • Turn on the KCL warning logs.
  • Turn on the KCL debug logs.

Resolution

Analyze KCL metrics

Monitor the RecordProcessor.processRecords.Time metric. Confirm that the time taken by the record processor's processRecords method is less than 60 seconds. If your processRecords method is blocked, then the KCL must wait. After your record processor completes its job, optimize your processRecords method.

Analyze DynamoDB table for KCL application

Every KCL application creates a DynamoDB table with the same name as the KCL application to track the application's state. To troubleshoot the KCL application, analyze the columns in the DynamoDB table.

If the checkpoint column in the table isn't updated, then the processRecords method logic is stuck. If both the checkpoint and leaseCounter columns aren't updated, then the maxLeasesPerWorker=1 parameter prevents other workers from taking up the lease. To unblock the processRecords method, increase the parameter value.

Check KCL configurations

Check the number of KCL fleets. Note the number of shards in the Kinesis data stream. If the number of shards increase, then increase the maxLeasesPerWorker parameter according to the number of shards in the KCL.

Turn on advanced KCL warning logs

To verify that the record processor is blocked, set the logWarningForTaskAfterMillis value for the KCL configuration to milliseconds. The KCL then waits for a record processor to complete before the KCL emits a warning message to the log about processing time. If warning messages are logged, then capture successive stack dumps from the JVM to discover what is blocked. You can use the jstack command to capture any stack traces. For more information about the logWarningForTaskAfterMillis value, see LifecycleConfig.java on the GitHub website.

Turn on the KCL debug logs

You can turn on the KCL debug logs to identify issues that caused the KCL to stop consuming data from Kinesis Data Streams. It's also a best practice to restart the KCL application to clear any other application issues.

If you restarted the KCL and the KCL is still stuck, then there might be an issue caused by the transfer of shard ownership. Issues with the shard ownership transfer can also cause issues where the KCL doesn't have the logs for data that you are trying to reproduce. To resolve this issue, turn on the logging feature on the KCL fleet.

To turn on logs, complete the following steps:

  1. Choose a logger.

  2. Create a log4.properties file in the src/main/resources folder to redirect log messages to the console:

    log4j.appender.stdout=org.apache.log4j.ConsoleAppenderlog4j.appender.stdout.Target=System.out
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
    log4j.logger.httpclient.wire=DEBUG

    Note: In this example, log4j is used to debug logs in Java.

  3. Redirect the log messages to a log file:

    log4j.appender.file=org.apache.log4j.RollingFileAppenderlog4j.appender.file.File=/Users/harshdev/Desktop/logfolder/    <== Give the log location where you want to create log files
    log4j.appender.file.MaxFileSize=5MB
    log4j.appender.file.MaxBackupIndex=10
    log4j.appender.file.layout=org.apache.log4j.PatternLayout
    log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
    log4j.rootLogger=DEBUG, stdout, file
  4. Include the log4j dependency in your POM file:

    <dependency>        <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.17</version>
    </dependency>

Related information

Monitoring the Kinesis Client Library with Amazon CloudWatch

Resharding, scaling, and parallel processing

AWS OFFICIAL
AWS OFFICIALUpdated 7 months ago