Skip to content

Files going unavailible randomly in EBS

0

Hey, So to set a context, i have a jenkins machine that runs automated builds of certain projects(about 10) daily in the morning, today out of the 10 builds, 7 of them failed with the same error, an automated script that is part of the build pipeline was not found, for about a span of 10 minutes , all builds failed because that one particular file that resided in ebs was acc to the error "not present", which is weird because we checked and the file was there(check was done about an hour later), and all builds post that 10 minute window passed and didnt face that error. I am trying to understand if there is a possibility somehow some file went unavailable in ebs because i have not encountered this kind of error before. I would also like to understand if there are any ebs logs that may indicate some errors regarding the same. Thanks and regards

1 Answer
0

This issue sounds like a temporary impairment of your EBS volume that affected file availability for a short period. When an EBS volume becomes impaired, it can cause files to appear unavailable even though they physically exist on the volume.

For troubleshooting this issue, I recommend:

  1. Check the consistency of your EBS volume by running filesystem checks:
  • For Linux instances: use the fsck command
  • For Windows instances: use the chkdsk command
  1. Review your system and application logs for any relevant error messages during that 10-minute window when the builds failed.

  2. Monitor your EBS volume's performance metrics in CloudWatch, particularly looking at:

  • VolumeReadOps and VolumeWriteOps to ensure you're not exceeding the volume's IOPS capacity
  • Any I/O errors that might have occurred
  1. If your workload involves many small I/O operations, the EBS volume might be hitting its IOPS limits, causing temporary unavailability. This can happen when the operating system merges small I/O operations into larger ones.

If this issue persists or recurs, you might want to:

  • Create a snapshot of the volume
  • Create a new volume from that snapshot
  • Replace the problematic volume with the new one

For severe or persistent issues that last more than 20 minutes, you can contact AWS Support for further assistance.
Sources
Work with an impaired Amazon EBS volume - Amazon EBS
IO Errors in EC2 instance | AWS re:Post
Amazon EBS I/O characteristics and monitoring - Amazon EBS

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.