Container Registry Best Practices Across Clouds
Image scanning, lifecycle policies, geo-replication, and immutable tags — how to run registries properly on ECR, ACR, Artifact Registry, and OCIR.
Why Container Registries Need Active Management
Container registries are the supply chain for your containerized applications. Every container image your applications run is pulled from a registry, and the security, availability, and cost of that registry directly affect your production workloads. Yet many teams treat their container registry as a passive storage service — push images and forget them. The result is registries bloated with thousands of untagged images consuming hundreds of gigabytes of storage, no vulnerability scanning, no lifecycle policies, and no image signing or provenance verification.
This article covers container registry best practices across the four major cloud registries: Amazon ECR, Azure Container Registry (ACR), Google Artifact Registry, and OCI Container Registry (OCIR). Each practice includes specific implementation guidance for each provider, because while the principles are universal, the configuration details vary significantly.
Image Scanning for Vulnerabilities
Every image in your registry should be scanned for known vulnerabilities before it reaches production. Container images contain operating system packages, language runtime dependencies, and application libraries, all of which can have CVEs (Common Vulnerabilities and Exposures) discovered after the image was built. A base image that was clean when you built it can have critical vulnerabilities disclosed days later.
On ECR, enable enhanced scanning, which uses Amazon Inspector to continuously scan images for OS and programming language package vulnerabilities. Enhanced scanning is more comprehensive than the basic scanning that ECR has offered historically. Configure scanning to trigger on push so every new image is scanned immediately. Set up EventBridge rules to alert on critical and high severity findings. ECR scanning costs $0.09 per image scan for the initial scan and $0.01 per image per rescan.
On ACR, Microsoft Defender for Containers provides vulnerability scanning powered by Qualys. Scanning is triggered automatically when images are pushed. Defender categorizes findings by severity and provides remediation guidance. The integration with Azure Security Center means container vulnerability findings appear alongside other security alerts in a unified dashboard.
On Google Artifact Registry, Container Analysis provides on-push and continuous scanning. On-push scanning is included at no additional cost for images stored in Artifact Registry. Continuous scanning monitors images for new vulnerabilities even after the initial push, which is critical because new CVEs are disclosed daily. Scanning results integrate with Binary Authorization to block deployment of images with critical vulnerabilities.
On OCIR, vulnerability scanning is provided through the OCI Vulnerability Scanning service. You create scan recipes that define the scanning schedule and targets, and apply them to repositories. Scan results are available in the OCI Security Center and can trigger notifications through the OCI Events and Notifications services.
Scan continuously, not just on push
On-push scanning catches vulnerabilities known at build time. Continuous scanning catches vulnerabilities disclosed after the image was built. A Log4Shell-type zero-day affects images already in production — only continuous scanning detects this. Enable continuous scanning on all registries that support it.
Lifecycle Policies for Image Cleanup
Without lifecycle policies, container registries grow indefinitely. Every CI/CD pipeline push adds a new image, and old images are never cleaned up. A typical microservice with daily deployments accumulates 365 images per year per service. At 200 MB per image, that is 73 GB per year per service. Across 20 microservices, you are storing 1.46 TB of container images annually, most of which will never be pulled again.
On ECR, lifecycle policies let you define rules based on image age, image count, and tag status. A recommended baseline policy: keep the 10 most recent tagged images, delete untagged images older than 1 day (these are typically intermediate build layers), and keep any image tagged with a semantic version pattern (v1.2.3) for 90 days. ECR lifecycle policies are evaluated once per 24 hours, so images are not deleted immediately upon matching a rule.
On ACR, retention policies work at the repository level. You can configure tag-retention policies to keep a specified number of tagged manifests and delete the rest. ACR also supports purge tasks using ACR Tasks, which provide more flexible cleanup based on age, tag patterns, and manifest types. For example, you can create a purge task that runs weekly and deletes all images older than 30 days except those matching the pattern "release-*".
On Artifact Registry, cleanup policies can be defined per repository. You can configure keep policies (retain images matching specific tags or newer than a threshold) and delete policies (remove images matching criteria). Dry-run mode lets you preview what would be deleted before activating the policy.
On OCIR, image retention policies can be configured per repository. You can set the number of images to retain and OCIR will automatically delete the oldest images beyond that count. For more sophisticated cleanup, use OCI Functions to build custom cleanup automation triggered on a schedule.
Build ECR lifecycle policiesBuild Artifact Registry cleanup policiesGeo-Replication for Global Deployments
If your containers run in multiple regions, pulling images from a single-region registry adds latency and cross-region data transfer costs. Geo-replication copies images across regions automatically so that each region pulls from a local copy.
ECR supports cross-region replication through replication configuration on the registry. You specify destination regions, and ECR automatically replicates images to those regions. Cross-account replication is also supported for organizations with separate AWS accounts per region. The replication is asynchronous and typically completes within minutes of the image being pushed.
ACR Premium tier supports geo-replication as a built-in feature. You add replica regions in the ACR configuration, and ACR maintains copies of all images in each region. ACR geo-replication also provides a single DNS name that resolves to the nearest replica, so your Kubernetes clusters automatically pull from the closest region without configuration changes. This is the most seamless geo-replication implementation among the four providers.
Artifact Registry supports multi-region repositories that store images across multiple regions within a geography (e.g., US multi-region). For cross-geography replication, you can set up multiple repositories and use Cloud Build or CI/CD pipelines to push to all target registries.
OCIR is available in all OCI regions, but automatic cross-region replication must be built using custom automation. You can use OCI Functions triggered by image push events to replicate images to registries in other regions.
Immutable Tags and Image Provenance
Mutable tags are a security risk. If an attacker gains write access to your registry, they can push a malicious image with the same tag as your production image (e.g., "latest" or "v1.2.3"). The next time your Kubernetes cluster scales up or restarts a pod, it pulls the malicious image. Immutable tags prevent this by ensuring that once a tag is assigned to an image, it cannot be reassigned to a different image.
On ECR, enable tag immutability at the repository level. Once enabled, pushing an image with a tag that already exists is rejected. This forces your CI/CD pipeline to use unique tags for every build, typically using the git commit SHA, build number, or a semantic version that is incremented with each release.
On ACR, immutable tags are not natively enforced, but you can use content trust (Docker Content Trust / Notary) to sign images and verify signatures before deployment. ACR also supports artifact streaming, which can pull only the layers needed for a specific platform, reducing pull times.
On Artifact Registry, immutable tags are supported through the registry configuration. Enable them on all production repositories. Artifact Registry also supports container image signing with Binary Authorization, which verifies that only images signed by trusted builders are deployed to GKE clusters.
On OCIR, image signing is supported through OCI Vault integration. You can sign images with keys stored in OCI Vault and verify signatures before deployment using admission controllers in OKE.
Access Control and Authentication
Registry access control is critical because container images often contain application code, configuration files, and embedded secrets (though secrets in images is an anti-pattern, it is common). Overly permissive registry access can expose proprietary code and sensitive configuration to unauthorized users.
On ECR, use repository policies to control who can push and pull images. Use IAM roles for service authentication — ECS task roles, EKS pod identity, and Lambda execution roles should have permission to pull from specific repositories, not all repositories. Enable ECR pull-through cache for public images to avoid rate limiting from Docker Hub and other public registries.
On ACR, use Azure RBAC to assign the AcrPull role to workloads that need to pull images and AcrPush to CI/CD service principals that build and push images. Avoid assigning the Owner or Contributor role on ACR resources. Use managed identities for AKS-to-ACR authentication by attaching the AcrPull role to the AKS kubelet managed identity.
On Artifact Registry, use IAM roles to control access. The Artifact Registry Reader role provides pull access, and the Artifact Registry Writer role provides push access. Use Workload Identity for GKE-to-Artifact-Registry authentication. For external CI/CD systems, use Workload Identity Federation to avoid service account keys.
On OCIR, use OCI IAM policies to control access at the compartment and repository level. Use instance principals for OKE workloads to authenticate to OCIR without managing credentials.
Image Size Optimization
Smaller images pull faster, start faster, consume less registry storage, and have a smaller attack surface. A common Django application image built on the full Python base image can be 900 MB. The same application built on python:slim is 200 MB. Built with a multi-stage Dockerfile that compiles dependencies in one stage and copies only the runtime artifacts to a minimal base image, it can be under 100 MB.
Use multi-stage builds to separate build-time dependencies from runtime dependencies. Use minimal base images: Alpine, Distroless, or slim variants of language-specific images. Remove package manager caches, temporary files, and build artifacts in the same layer they are created. Pin base image versions to specific digests rather than mutable tags to ensure reproducible builds.
Scan your largest images and identify unnecessary contents. Tools like dive (an open-source tool for exploring Docker image layers) show what files each layer contains and their sizes. Common bloat sources include: entire SDKs when only the runtime is needed, documentation and man pages, test suites and development dependencies, multiple copies of the same file across layers, and cached package downloads.
Monitoring and Alerting
Monitor your container registries for: storage growth trends (unexpected spikes may indicate misconfigured CI/CD pipelines pushing too frequently), failed image pulls (which indicate availability issues that can prevent pod scheduling), vulnerability scan findings (especially new critical CVEs in images currently deployed to production), and unauthorized push or pull activity (which may indicate compromised credentials).
Set up alerts for critical vulnerability findings in images that are currently running in production. This requires correlating vulnerability scan results with deployment data — you need to know not just that an image has a vulnerability, but that the vulnerable image is currently deployed and receiving traffic. Tools like Kubernetes admission controllers with policy engines (OPA Gatekeeper, Kyverno) can prevent deployment of images with critical unpatched vulnerabilities.
Registry hygiene checklist
Enable vulnerability scanning on all repositories. Configure lifecycle policies to delete images older than 90 days (except release-tagged images). Enable immutable tags on production repositories. Use minimal base images and multi-stage builds. Set up geo-replication if you deploy to multiple regions. Review and tighten access controls quarterly.
Written by Jeff Monfield
Cloud architect and founder of CloudToolStack. Building free tools and writing practical guides to help engineers navigate AWS, Azure, GCP, and OCI.
Disclaimer: This article is for informational purposes. Cloud services and pricing change frequently; always verify with official provider documentation. AWS, Azure, GCP, and OCI are trademarks of their respective owners.