Skip to content

Amazon Linux 2023: kernel kill my Java app in EKS - "task blocked for more than 122 seconds" (segmentation fault) (Exit code 139)

0

Sometimes I have error: "task blocked for more than 122 seconds" after upgrade from AL2 to AL2023

(EKS 1.31) (Java 21 Amazon Corretto) (EC2: c7i.4xlarge 16vCPU/32RAM) (ami_type = "AL2023_x86_64_STANDARD") (ami_release_ver = "1.31.7-20250610") (Exit code 139)

I dont want to return to AL2 - because it the end of life is coming soon....


WORKAROUND - not use IO uring (it seems to have become less of a case - but this is not accurate)))

javaOpts: >-
        -Djdk.io.uring.disable=true
        -Djava.nio.channels.spi.SelectorProvider=sun.nio.ch.EPollSelectorProvider
        -Djava.nio.channels.DefaultThreadPool.threadFactory=disabled
asked 7 months ago452 views
1 Answer
1

When experiencing "task blocked for more than 122 seconds" errors after upgrading EKS nodes from Amazon Linux 2 (AL2) to Amazon Linux 2023 (AL2023) with EKS 1.31, there are several important differences between AL2 and AL2023 that could be causing this issue.

AL2023 uses a new node initialization process called nodeadm that follows a YAML configuration schema, which is different from the process used in AL2. This change in initialization could affect how your Java application interacts with the underlying system.

Additionally, AL2023 requires Instance Metadata Service Version 2 (IMDSv2) and sets the default hop count for managed node groups to 1, which restricts containers from accessing node metadata. If your Java application relies on accessing node metadata or credentials, this could be causing the blocking issue.

For troubleshooting this specific issue:

  1. Check if your application needs access to node metadata or credentials. If so, you may need to increase the HttpPutResponseHopLimit to 2 in your EC2 launch template or use Amazon EKS Pod Identity for credential access.

  2. Ensure you're using Amazon VPC CNI add-on version 1.16.2 or later, which is required for AL2023.

  3. If you're using any third-party add-ons that interact with node metadata, they might be affected by the hop count limitation.

  4. Review your Java application's resource usage and threading model, as there might be differences in how AL2023 handles resource allocation compared to AL2.

While AL2 is approaching end of life, these configuration adjustments should help resolve the blocking task issues on AL2023.
Sources
FAQs on Amazon Linux 2023 with Amazon EKS | AWS re:Post
AL2023.7 upgrades from AL2 - Amazon Linux 2023

answered 7 months ago
EXPERT
reviewed 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.