- Newest
- Most votes
- Most comments
Yes, you should absolutely pin by image digest rather than just by tag to avoid this issue in the future.
What Actually Happened:
Even though you were using what appeared to be a specific tag (2.6.0-transformers4.49.0-gpu-py312-cu124-ubuntu22.04), AWS can update the underlying image content behind that same tag. This is exactly what happened - AWS updated the Deep Learning Container image to include a newer version of NCCL, but the tag remained the same. Your CI/CD pipeline was pulling what it thought was the "same" base image, but it had actually been updated with incompatible dependencies.
Best Practices to Prevent This:
- Pin by Image Digest: Instead of using just the tag, specify the exact image digest in your Dockerfile. This ensures you're always building from the exact same base image. Your FROM statement should look like:
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference@sha256:<digest>
-
Retrieve and Store Digests: When you build a Docker image, retrieve and document the base image digest. You can get the digest using the describe-images command to verify which exact image exists in the Amazon Elastic Container Registry repository.
-
Control Your Update Cycle: By pinning to a digest, you control when to update your base image rather than having changes forced upon you. When you're ready to update, you can pull the latest version, test it thoroughly, and then update your Dockerfile with the new digest.
-
Consider Tag Immutability: For your own custom images in ECR, enable tag immutability to prevent tags from being overwritten, ensuring consistency across deployments.
This approach gives you stability and predictability in your CI/CD pipeline while still allowing you to update base images on your own schedule after proper testing.
Sources
Deploy models with DJL Serving - Amazon SageMaker AI
Use latest container image digest during Amazon ECS deployments | AWS re:Post
Relevant content
- asked 10 months ago
