AWS announces preview of AWS Interconnect - multicloud
AWS announces AWS Interconnect – multicloud (preview), providing simple, resilient, high-speed private connections to other cloud service providers. AWS Interconnect - multicloud is easy to configure and provides high-speed, resilient connectivity with dedicated bandwidth, enabling customers to interconnect AWS networking services such as AWS Transit Gateway, AWS Cloud WAN, and Amazon VPC to other cloud service providers with ease.
How to Use Public Container Images with AWS HealthOmics Workflows
Many genomics workflows in the public domain use container images in popular third-party registries, however for security, reliability and provenance AWS HealthOmics uses container images from Amazon ECR private. This article explains how you can easily access containerized tools from supported third-party registries without needing to manually migrate them to private ECR repositories, or make changes to your workflow definition.
AWS HealthOmics Container Registry Maps User Guide
Overview
Container Registry Maps are a feature in AWS HealthOmics that enable workflows to use ECR pull through caches to access public container registries without manually replicating containers into private ECR repositories. This feature provides automatic mapping between upstream registries (like Docker Hub and Quay.io) and your private ECR repositories.
Benefits
- Avoid manual container migration to ECR
- More reliable access than downloading from public registries at runtime
- Automatic synchronization with upstream registries
- HealthOmics can use a container registry map to predictably map a public container URI in a workflow to an ECR URI resulting in a pull through of that URI at workflow runtime.
Prerequisites
- AWS CLI v2 installed and configured
- Appropriate IAM permissions for ECR and HealthOmics
Regions
You should configure your ECR registry and HealthOmics workflows in the same region. If you will use multiple regions then repeat these steps in each region.
Step 1: Create Secrets Manager Secrets (For Authenticated Registries)
Some registries such as Docker Hub or private registries will require authentication. To use pull through cache, you must create a secret in Secrets Manager that contains the credentials for the registry. In these examples the region us-east-1 is specified. You should change this as needed.
To obtain a Docker Hub token refer to https://docs.docker.com/security/access-tokens/
Docker Hub Secret
aws secretsmanager create-secret \ --name "ecr-pullthroughcache/docker-hub" \ --description "Docker Hub credentials for ECR pull through cache" \ --secret-string '{ "username": "your-docker-username", "accessToken": "your-docker-access-token" }' \ --region us-east-1
Quay.io Secret (if using private repositories, not required for public repositories)
aws secretsmanager create-secret \ --name "ecr-pullthroughcache/quay" \ --description "Quay.io credentials for ECR pull through cache" \ --secret-string '{ "username": "your-quay-username", "accessToken": "your-quay-access-token" }' \ --region us-east-1
Step 2: Create ECR Pull Through Cache Rules
Docker Hub Pull Through Cache
aws ecr create-pull-through-cache-rule \ --ecr-repository-prefix docker-hub \ --upstream-registry-url registry-1.docker.io \ --credential-arn arn:aws:secretsmanager:us-east-1:123456789012:secret:ecr-pullthroughcache/docker-hub-AbCdEf \ --region us-east-1
Quay.io Pull Through Cache
aws ecr create-pull-through-cache-rule \ --ecr-repository-prefix quay \ --upstream-registry-url quay.io \ --region us-east-1
ECR Public Pull Through Cache
aws ecr create-pull-through-cache-rule \ --ecr-repository-prefix ecr-public \ --upstream-registry-url public.ecr.aws \ --region us-east-1
Step 3: Configure Registry Permissions
Create a registry permissions policy to allow HealthOmics to use pull through cache:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowPTCinRegPermissions", "Effect": "Allow", "Principal": { "Service": "omics.amazonaws.com" }, "Action": [ "ecr:CreateRepository", "ecr:BatchImportUpstreamImage" ], "Resource": [ "arn:aws:ecr:us-east-1:123456789012:repository/docker-hub/*", "arn:aws:ecr:us-east-1:123456789012:repository/quay/*", "arn:aws:ecr:us-east-1:123456789012:repository/ecr-public/*" ] } ] }
Apply the policy:
aws ecr put-registry-policy \ --policy-text file://registry-policy.json \ --region us-east-1
Step 4: Create Repository Creation Templates
Docker Hub Template
aws ecr create-repository-creation-template \ --prefix docker-hub \ --applied-for PULL_THROUGH_CACHE \ --repository-policy '{ "Version": "2012-10-17", "Statement": [ { "Sid": "PTCRepoCreationTemplate", "Effect": "Allow", "Principal": { "Service": "omics.amazonaws.com" }, "Action": [ "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer" ], "Resource": "*" } ] }' \ --region us-east-1
Quay.io Template
aws ecr create-repository-creation-template \ --prefix quay \ --applied-for PULL_THROUGH_CACHE \ --repository-policy '{ "Version": "2012-10-17", "Statement": [ { "Sid": "PTCRepoCreationTemplate", "Effect": "Allow", "Principal": { "Service": "omics.amazonaws.com" }, "Action": [ "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer" ], "Resource": "*" } ] }' \ --region us-east-1
ECR Public Template
aws ecr create-repository-creation-template \ --prefix ecr-public \ --applied-for PULL_THROUGH_CACHE \ --repository-policy '{ "Version": "2012-10-17", "Statement": [ { "Sid": "PTCRepoCreationTemplate", "Effect": "Allow", "Principal": { "Service": "omics.amazonaws.com" }, "Action": [ "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer" ], "Resource": "*" } ] }' \ --region us-east-1
Step 5: Create Container Registry Maps
Registry Mappings Example
Registry mappings can be used to map specific upstream registries to your private ECR repositories. In the example here, containers from Docker Hub, Quay.io and ECR Public used in a workflow will be mapped to your private ECR pull through caches.
Create a registry map file (registry-map.json):
{ "registryMappings": [ { "upstreamRegistryUrl": "registry-1.docker.io", "ecrRepositoryPrefix": "docker-hub" }, { "upstreamRegistryUrl": "quay.io", "ecrRepositoryPrefix": "quay" }, { "upstreamRegistryUrl": "public.ecr.aws", "ecrRepositoryPrefix": "ecr-public" } ] }
Image Mappings Example
Image mappings can be used to map specific containers to your private ECR repositories. These mappings will take precedence over registryMappings if both are provided.
Create an image map file (image-map.json) for specific container overrides:
{ "imageMappings": [ { "sourceImage": "broadinstitute/gatk:4.6.0.2", "destinationImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/docker-hub/broadinstitute/gatk:latest" }, { "sourceImage": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0", "destinationImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/quay/biocontainers/samtools:1.17--h00cdaf9_0" } ] }
Combined Registry and Image Map
Create a complete map file (container-registry-map.json):
{ "registryMappings": [ { "upstreamRegistryUrl": "registry-1.docker.io", "ecrRepositoryPrefix": "docker-hub" }, { "upstreamRegistryUrl": "quay.io", "ecrRepositoryPrefix": "quay" } ], "imageMappings": [ { "sourceImage": "ubuntu", "destinationImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/docker-hub/library/ubuntu:20.04" }, { "sourceImage": "quay.io/biocontainers/bwa:0.7.17--hed695b0_7", "destinationImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/quay/biocontainers/bwa:0.7.17--hed695b0_7" } ] }
Step 6: Configure HealthOmics Service Role
The HealthOmics service role used during workflow runs must have ECR permissions to pull container images from your pull through cache repositories.
Create Trust Policy File
cat > trust-policy.json << 'EOF' { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "omics.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF
Create Service Role Policy File
cat > service-role-policy.json << 'EOF' { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::your-workflow-bucket/*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::your-workflow-bucket" ] }, { "Effect": "Allow", "Action": [ "logs:DescribeLogStreams", "logs:CreateLogStream", "logs:PutLogEvents", "logs:CreateLogGroup" ], "Resource": [ "arn:aws:logs:us-east-1:123456789012:log-group:/aws/omics/WorkflowLog*" ] }, { "Effect": "Allow", "Action": [ "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer", "ecr:BatchCheckLayerAvailability" ], "Resource": [ "arn:aws:ecr:us-east-1:123456789012:repository/docker-hub/*", "arn:aws:ecr:us-east-1:123456789012:repository/quay/*", "arn:aws:ecr:us-east-1:123456789012:repository/ecr-public/*" ] } ] } EOF
Create the Service Role
aws iam create-role \ --role-name HealthOmicsWorkflowRole \ --assume-role-policy-document file://trust-policy.json \ --description "Service role for HealthOmics workflows with container registry mappings"
Create and Attach the Policy
aws iam create-policy \ --policy-name HealthOmicsWorkflowPolicy \ --policy-document file://service-role-policy.json \ --description "Policy for HealthOmics workflows with ECR pull through cache access" aws iam attach-role-policy \ --role-name HealthOmicsWorkflowRole \ --policy-arn arn:aws:iam::123456789012:policy/HealthOmicsWorkflowPolicy
Get Role ARN for Workflow Creation
aws iam get-role --role-name HealthOmicsWorkflowRole --query 'Role.Arn' --output text
Step 7: Create Workflows with Container Registry Maps
Method 1: Inline Container Registry Map
aws omics create-workflow \ --name "genomics-pipeline-with-ptc" \ --description "Genomics workflow using pull through cache" \ --engine WDL \ --definition-zip fileb://workflow.zip \ --container-registry-map '{ "registryMappings": [ { "upstreamRegistryUrl": "registry-1.docker.io", "ecrRepositoryPrefix": "docker-hub" }, { "upstreamRegistryUrl": "quay.io", "ecrRepositoryPrefix": "quay" } ] }' \ --region us-east-1
Method 2: Container Registry Map from S3
You can use a container registry map stored in S3 as well. This is convenient if you want to use the same map in multiple workflows.
First, upload the map to S3:
aws s3 cp container-registry-map.json s3://my-workflow-bucket/maps/container-registry-map.json
Then create the workflow:
aws omics create-workflow \ --name "genomics-pipeline-with-s3-map" \ --description "Genomics workflow using S3-stored registry map" \ --engine WDL \ --definition-zip fileb://workflow.zip \ --container-registry-map-uri "s3://my-workflow-bucket/maps/container-registry-map.json" \ --region us-east-1
Step 8: Example Workflow Definitions
The following workflow example contains tasks that use containers from Docker Hub, and Quay.io. These containers will be automatically mapped to your private ECR pull through caches if you create the workflow with the appropriate container registry map.
WDL Workflow Example
version 1.0 workflow GenomicsWorkflow { input { File input_bam String sample_name } call SamtoolsSort { input: input_bam = input_bam, sample_name = sample_name } call GatkHaplotypeCaller { input: sorted_bam = SamtoolsSort.sorted_bam, sample_name = sample_name } output { File vcf_file = GatkHaplotypeCaller.vcf } } task SamtoolsSort { input { File input_bam String sample_name } command <<< samtools sort ~{input_bam} -o ~{sample_name}.sorted.bam >>> runtime { docker: "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" memory: "4 GB" cpu: 2 } output { File sorted_bam = "~{sample_name}.sorted.bam" } } task GatkHaplotypeCaller { input { File sorted_bam String sample_name } command <<< gatk HaplotypeCaller \ -I ~{sorted_bam} \ -O ~{sample_name}.vcf \ -R /opt/broad/references/hg38/v0/Homo_sapiens_assembly38.fasta >>> runtime { docker: "broadinstitute/gatk:4.6.0.2" memory: "8 GB" cpu: 4 } output { File vcf = "~{sample_name}.vcf" } }
What Happens During Workflow Execution
When a workflow runs with container registry maps:
- Container Resolution: HealthOmics examines each container reference in the workflow
- Registry Mapping: If a registry mapping exists, the service maps the upstream registry URL to the ECR repository prefix
- Image Mapping: If specific image mappings exist, they override registry mappings for those containers
- Repository Creation: ECR creates the repository using the repository creation template if it doesn't exist
- Pull Through Cache: ECR automatically pulls the image from the upstream registry if not already cached
- Task Execution: The workflow task runs using the cached container from your private ECR registry
Example Container Resolution
Original workflow reference: quay.io/biocontainers/samtools:1.17--h00cdaf9_0
With registry mapping:
- Upstream:
quay.io - ECR Prefix:
quay - Resolved:
123456789012.dkr.ecr.us-east-1.amazonaws.com/quay/biocontainers/samtools:1.17--h00cdaf9_0
Monitoring and Troubleshooting
Check Pull Through Cache Rules
aws ecr describe-pull-through-cache-rules --region us-east-1
Verify ECR Registry Policy Allows HealthOmics
aws ecr get-registry-policy
Verify ECR Repository Creation Templates Grant Access to HealthOmics
aws ecr describe-repository-creation-templates
Validate Repository Creation
aws ecr describe-repositories --repository-names docker-hub/broadinstitute/gatk --region us-east-1
Validate Repository Policy Grants Access to HealthOmics
aws ecr get-repository-policy --repository-name docker-hub/broadinstitute/gatk
Extract Container Registry Map from a Workflow
aws omics get-workflow --id <workflow-id> --query 'containerRegistryMap' --output json --region us-east-1
Monitor Workflow Runs
aws omics get-run --id <run-id> --region us-east-1
Check Task Container Details
aws omics get-run-task --id <run-id> --task-id <task-id> --region us-east-1
Best Practices
- Use Registry Mappings: Prefer registry mappings over image mappings for broader coverage
- Consistent Prefixes: Use consistent repository prefixes across your organization
- Monitor Costs: Pull through cache incurs standard ECR storage costs
- Version Pinning: Always use specific container versions in workflows, for the highest level of reproducibility use SHA sums in container URIs
- Test Mappings: Validate container registry maps before production use
- Documentation: Document your registry mapping strategy for your team
Security Considerations
- Secrets Management: Store registry credentials securely in AWS Secrets Manager
- IAM Permissions: Use least privilege principles for ECR and HealthOmics permissions
- Registry Policies: Restrict access to specific repository prefixes
- Image Scanning: Enable ECR image scanning for security vulnerabilities
- Access Logging: Enable CloudTrail logging for ECR and HealthOmics API calls
- Language
- English
Relevant content
- asked 2 years ago
- asked 3 years ago
AWS OFFICIALUpdated a year ago