ACK: Managed Kubernetes
Deploy and manage Kubernetes clusters with ACK including node pools, Terway networking, storage integration, and security.
Prerequisites
- Basic Kubernetes concepts (pods, deployments, services)
- Alibaba Cloud VPC and ECS familiarity
Container Service for Kubernetes (ACK)
Alibaba Cloud Container Service for Kubernetes (ACK) is a fully managed Kubernetes service that simplifies deploying, managing, and scaling containerized applications. ACK is one of the first cloud-managed Kubernetes services to achieve CNCF Kubernetes Conformance certification, ensuring full compatibility with the upstream Kubernetes ecosystem. With deep integration into Alibaba Cloud services for networking, storage, monitoring, and security, ACK provides an enterprise-grade container platform that powers some of the largest containerized workloads in Asia.
ACK was battle-tested at scale during Alibaba Group's own digital transformation — the platform runs tens of thousands of Kubernetes clusters internally, managing millions of containers across Alibaba's e-commerce, logistics, and cloud infrastructure. This operational experience has produced features like intelligent node pool management, GPU sharing for AI workloads, and Sandboxed-Container runtime for multi-tenant security isolation.
This guide covers ACK cluster types, node pool configuration, networking models, storage integration, application deployment, monitoring, security, and cost optimization for production Kubernetes workloads on Alibaba Cloud.
ACK Cluster Types
ACK offers several cluster types to match different operational requirements and cost profiles:
ACK Managed Cluster
The most popular option for production workloads. Alibaba Cloud manages the Kubernetes control plane (API server, etcd, scheduler, controller manager) at no additional cost — you only pay for the worker nodes. The control plane is highly available across multiple availability zones with automatic upgrades and patching. You have full kubectl access and can install any Kubernetes add-ons.
ACK Pro Cluster
An enhanced version of the managed cluster with SLA-backed control plane availability (99.95%), enhanced API server performance, and additional features including managed Istio service mesh, Sandboxed-Container runtime, and advanced scheduling capabilities. Recommended for mission-critical production workloads that require guaranteed control plane availability.
ACK Serverless (ASK)
A fully serverless Kubernetes experience where you do not manage any nodes. Pods run on Elastic Container Instances (ECI), and you pay only for the CPU and memory consumed by your pods. ASK eliminates node management, patching, and capacity planning. Ideal for batch jobs, CI/CD workloads, and applications with unpredictable scaling requirements.
ACK Edge
Extends ACK to edge computing scenarios. Run Kubernetes workloads on edge nodes that are managed centrally from the cloud but execute locally at edge locations. Supports autonomous edge operation during network disconnection from the cloud control plane.
ACK Pro vs Managed
For production workloads, ACK Pro is the recommended choice. The additional cost (approximately $0.09/hour for the control plane) provides a 99.95% SLA, enhanced API server performance (3x the throughput of standard managed clusters), managed etcd backups, and support for Sandboxed-Container runtime. The SLA commitment alone justifies the upgrade for any workload where downtime has business impact.
Creating an ACK Cluster
Create a production-ready ACK cluster with the following configuration:
# Create an ACK Pro managed cluster
aliyun cs POST /clusters --body '{
"name": "prod-cluster",
"cluster_type": "ManagedKubernetes",
"kubernetes_version": "1.28.3-aliyun.1",
"region_id": "cn-hangzhou",
"vpcid": "vpc-bp1****",
"container_cidr": "172.20.0.0/16",
"service_cidr": "172.21.0.0/20",
"num_of_nodes": 0,
"addons": [
{"name": "terway-eniip"},
{"name": "csi-plugin"},
{"name": "csi-provisioner"},
{"name": "nginx-ingress-controller"},
{"name": "arms-prometheus"}
],
"cluster_spec": "ack.pro.small",
"is_enterprise_security_group": true,
"snat_entry": true,
"endpoint_public_access": false
}'
# Get cluster credentials
aliyun cs GET /k8s/cluster-id/user_config
# Merge kubeconfig
export KUBECONFIG=~/.kube/config
kubectl cluster-infoNode Pool Management
Node pools are the primary mechanism for managing worker nodes in ACK. Each node pool defines the instance type, scaling configuration, disk settings, and Kubernetes labels/taints for a group of worker nodes. Best practices for node pool design include:
- Separate pools by workload type: Create dedicated node pools for system components, general applications, GPU workloads, and batch jobs
- Use multiple instance types: Configure node pools with multiple compatible instance types to improve availability and reduce the risk of capacity shortages
- Enable auto-scaling: Set minimum and maximum node counts and let the cluster autoscaler manage capacity based on pod resource requests
- Multi-zone deployment: Specify vSwitches in multiple availability zones for each node pool to ensure high availability
# Create a general-purpose node pool
aliyun cs POST /clusters/cluster-id/nodepools --body '{
"nodepool_info": {
"name": "general-pool"
},
"scaling_group": {
"instance_types": ["ecs.g7.2xlarge", "ecs.g8a.2xlarge"],
"vswitch_ids": ["vsw-bp1****", "vsw-bp2****"],
"system_disk_category": "cloud_essd",
"system_disk_size": 120,
"data_disks": [
{"category": "cloud_essd", "size": 200, "auto_snapshot_policy_id": "sp-****"}
],
"multi_az_policy": "BALANCE",
"scaling_policy": "release",
"platform": "AliyunLinux3"
},
"auto_scaling": {
"enable": true,
"min_instances": 3,
"max_instances": 20,
"type": "cpu",
"is_bond_eip": false
},
"kubernetes_config": {
"labels": [
{"key": "nodepool-type", "value": "general"}
],
"runtime": "containerd",
"runtime_version": "1.6.28"
}
}'Networking: Terway vs Flannel
ACK supports two Container Network Interface (CNI) plugins, each with different trade-offs:
Terway (Recommended)
Terway is Alibaba Cloud's high-performance CNI plugin that assigns real VPC ENI (Elastic Network Interface) or ENI secondary IP addresses to pods. This means pods get native VPC networking — they are directly addressable within the VPC without any overlay network overhead. Benefits include:
- Native VPC performance with no overlay encapsulation overhead
- Direct pod-to-pod communication within the VPC at line rate
- VPC security groups can be applied directly to pods
- Pods are visible in VPC flow logs and CloudMonitor
- Support for NetworkPolicy through integration with Calico
Flannel
Flannel uses a VXLAN overlay network to provide pod networking. Pods get IP addresses from the container CIDR (not the VPC CIDR), and traffic between nodes is encapsulated in VXLAN tunnels. Flannel is simpler to configure and does not consume VPC ENI resources, making it suitable for clusters with very large numbers of pods or environments where VPC ENI limits are a concern.
Terway ENI Limits
When using Terway in ENI multi-IP mode, each ECS instance has a limit on the number of ENIs and secondary IPs. For example, an ecs.g7.2xlarge supports up to 10 ENIs with 20 secondary IPs each, allowing approximately 200 pods per node. Plan your node pool instance types based on the expected pod density. Check the ECS instance type documentation for specific ENI and IP limits.
Storage Integration
ACK provides CSI (Container Storage Interface) drivers for integrating with Alibaba Cloud storage services:
- Cloud Disk (ESSD): Block storage for stateful workloads requiring persistent volumes. Supports dynamic provisioning, snapshots, and volume expansion. Use StorageClass to specify ESSD performance levels (PL0-PL3).
- NAS (Network Attached Storage): Shared file storage based on NFS protocol. Ideal for workloads that require ReadWriteMany (RWX) access mode — multiple pods reading and writing to the same volume simultaneously.
- OSS (Object Storage Service): Object storage mounted as a FUSE filesystem. Best for read-heavy workloads like model serving, static content, and log archives. Not recommended for write-intensive workloads due to FUSE overhead.
- CPFS (Cloud Parallel File System): High-performance parallel file system for AI/ML training workloads that require extremely high throughput and IOPS.
# StorageClass for ESSD PL1 volumes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: alicloud-essd-pl1
provisioner: diskplugin.csi.alibabacloud.com
parameters:
type: cloud_essd
performanceLevel: PL1
encrypted: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
# PersistentVolumeClaim using ESSD
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: alicloud-essd-pl1
resources:
requests:
storage: 100GiIngress and Service Mesh
ACK supports multiple ingress options for exposing applications:
- NGINX Ingress Controller: The default ingress controller, installed as an ACK add-on. Provides full NGINX configuration flexibility through annotations and ConfigMaps.
- ALB Ingress Controller: Uses Alibaba Cloud Application Load Balancer as the ingress. Offloads TLS termination, WAF, and DDoS protection to the managed ALB service. Recommended for production web applications.
- MSE Ingress (Managed Service for Envoy): Cloud-native API gateway built on Envoy proxy. Supports gRPC, WebSocket, and advanced traffic management features.
For service mesh capabilities, ACK integrates with Alibaba Cloud Service Mesh (ASM), which is based on Istio. ASM provides traffic management, observability, and security features (mTLS, authorization policies) for microservice communication within and across clusters.
Monitoring and Observability
ACK integrates with several Alibaba Cloud observability services:
- ARMS Prometheus: Managed Prometheus service for collecting and querying Kubernetes metrics. Pre-configured dashboards for cluster, node, pod, and container metrics. Integrates with Grafana for visualization.
- SLS (Simple Log Service): Collect container stdout/stderr logs, Kubernetes events, and audit logs. SLS provides SQL-based log analytics and alerting.
- ARMS APM: Application performance monitoring with distributed tracing, slow transaction analysis, and code-level diagnostics.
- CloudMonitor: Infrastructure-level monitoring for ECS nodes, disks, and network interfaces.
# Enable ARMS Prometheus monitoring
aliyun cs InstallClusterAddons --ClusterId c-**** \
--body '[{"name":"arms-prometheus","config":"{}"}]'
# Enable log collection to SLS
aliyun cs InstallClusterAddons --ClusterId c-**** \
--body '[{"name":"logtail-ds","config":"{\"IngressDashboardEnabled\":\"true\"}"}]'Security Best Practices
Securing ACK clusters requires attention at multiple layers:
- RBAC: Use Kubernetes RBAC with RAM integration. Map RAM users and roles to Kubernetes ClusterRoles and Roles for fine-grained access control.
- Pod Security: Enforce Pod Security Standards using Pod Security Admission (PSA) or OPA Gatekeeper. Prevent privileged containers, host networking, and root users in production namespaces.
- Network Policies: Use Calico or Terway network policies to restrict pod-to-pod communication. Implement default-deny policies and explicitly allow only required traffic.
- Image Security: Use Alibaba Cloud Container Registry (ACR) Enterprise Edition with image scanning to detect vulnerabilities before deployment. Enable image signing with Notary for supply chain security.
- Sandboxed Containers: For multi-tenant clusters, use the Sandboxed-Container runtime (based on Kata Containers) to provide hardware-level isolation between pods. Each sandboxed pod runs in its own lightweight VM.
- Audit Logging: Enable Kubernetes API audit logging and forward audit events to SLS for security monitoring and compliance. Configure audit policies to capture authorization decisions and resource modifications.
- Secrets Management: Integrate with Alibaba Cloud KMS to encrypt Kubernetes Secrets at rest. Use the External Secrets Operator to sync secrets from KMS or Secrets Manager into Kubernetes Secrets.
ACK Cluster Upgrades
ACK supports in-place cluster upgrades with zero downtime for the control plane. Worker node upgrades can be performed through node pool rolling updates — ACK cordons and drains nodes one at a time, replaces them with new nodes running the updated Kubernetes version, and verifies pod health before proceeding. Always test upgrades in a staging cluster first and review the upgrade changelog for breaking changes.
Cost Optimization
Optimize ACK costs with these strategies:
- Use ARM instances (g8y/c8y): Yitian 710 ARM instances provide 20-30% better price-performance for most containerized workloads
- Enable cluster autoscaler: Automatically scale node pools based on pending pod requests to avoid over-provisioning
- Use Spot/Preemptible instances: Configure node pools with preemptible instances for fault-tolerant workloads like batch processing and CI/CD
- Right-size pod resources: Use VPA (Vertical Pod Autoscaler) recommendations to set accurate CPU and memory requests. Over-requesting resources wastes capacity.
- Use ASK for burst workloads: Run temporary or burst workloads on ACK Serverless (ECI) to avoid maintaining idle node capacity
- Schedule non-critical workloads off-peak: Use Kubernetes CronJobs and PriorityClasses to run batch jobs during off-peak hours when preemptible instance availability is higher
Key Takeaways
- 1ACK Pro clusters provide 99.95% control plane SLA with enhanced API server performance and managed etcd backups.
- 2Terway CNI assigns real VPC ENI IPs to pods for native VPC networking without overlay overhead.
- 3ACK Serverless (ASK) runs pods on ECI with zero node management and pay-per-pod pricing.
- 4CSI drivers integrate ESSD, NAS, OSS, and CPFS storage directly into Kubernetes persistent volumes.
Frequently Asked Questions
How does ACK compare to Amazon EKS?
What is the Sandboxed-Container runtime?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.