Choosing the Right Load Balancer: ALB vs NLB vs Azure LB vs GCP Load Balancers

Load Balancers Are Not Interchangeable

The most expensive mistake I see teams make with load balancers is treating them as interchangeable. A team picks an Application Load Balancer because it's the default in every tutorial, then wonders why their gRPC service has 200ms of added latency, or why their WebSocket connections keep dropping during deployments. Another team deploys a Network Load Balancer for a REST API and then discovers they cannot do path-based routing, header-based routing, or any of the L7 features their application needs.

Load balancers operate at different layers of the networking stack, and that distinction is not academic. It determines what protocols you can route, how TLS is terminated, what health checks are possible, how much latency is added, what your cost structure looks like, and how the load balancer integrates with the rest of your infrastructure. Choosing the wrong type means either paying for capabilities you do not need or lacking capabilities you desperately need.

This guide walks through the load balancer options on AWS, Azure, and GCP with an emphasis on the decisions that matter in production. We cover L4 vs L7, TLS termination strategies, WebSocket handling, cost comparisons for realistic workloads, and a decision tree you can use to pick the right load balancer for your use case.

L4 vs L7: The Fundamental Distinction

Layer 4 load balancers operate at the transport layer. They see TCP connections and UDP datagrams -- source IP, destination IP, source port, destination port, and protocol. They do not inspect the content of those connections. A TCP packet carrying HTTP, gRPC, MQTT, a database protocol, or raw binary data all look the same to an L4 load balancer. It forwards the connection to a backend and gets out of the way.

Layer 7 load balancers operate at the application layer. They understand HTTP (and often HTTP/2, gRPC, and WebSocket). They can inspect request headers, URL paths, query strings, cookies, and sometimes request bodies. They can make routing decisions based on any of these fields: send requests to /api/v1 to one target group, requests to /api/v2 to another, and requests with a specific cookie to a canary deployment.

The tradeoff is clear: L7 load balancers give you more control at the cost of more processing per request (higher latency, higher cost). L4 load balancers give you less control but handle raw throughput more efficiently. For most HTTP-based web applications and APIs, L7 is the right choice. For anything that is not HTTP -- TCP services, UDP services, gaming servers, IoT protocols, database proxies -- L4 is typically the only option.

AWS: ALB vs NLB vs GWLB

Application Load Balancer (ALB)

The ALB is AWS's L7 load balancer and the default choice for most web applications. It supports HTTP/1.1, HTTP/2, gRPC, and WebSocket. It performs TLS termination, meaning it decrypts HTTPS traffic and forwards plain HTTP (or re-encrypted HTTPS) to your backends. It supports content-based routing via listener rules -- up to 100 rules per listener -- that can match on host headers, paths, HTTP methods, query strings, source IPs, and custom headers.

ALB's strengths in production include native authentication integration (it can authenticate users via Cognito or any OIDC provider before the request reaches your application), sticky sessions (both duration-based and application-based), built-in WAF integration (AWS WAF attaches directly to ALBs), and detailed access logs that include request processing time, target processing time, and response time breakdowns. The target group concept is powerful: you can route to EC2 instances, ECS tasks, Lambda functions, or IP addresses (including on-premises servers via AWS PrivateLink).

ALB pricing has two components: a fixed hourly charge ($0.0225/hour, roughly $16.20/month) and a per-LCU charge ($0.008/hour). An LCU (Load Balancer Capacity Unit) is the highest of four dimensions: new connections per second (25), active connections per minute (3,000), processed bytes per hour (1 GB), and rule evaluations per second (1,000). For a typical web application handling 100 requests per second with 10 KB average response size, expect to pay $25-40/month.

Network Load Balancer (NLB)

The NLB is AWS's L4 load balancer, designed for extreme performance. It handles millions of requests per second with sub-millisecond latency. It preserves the client source IP by default (ALB requires X-Forwarded-For headers). It supports TCP, UDP, and TLS protocols. It integrates with AWS PrivateLink, allowing you to expose services to other VPCs or accounts without traversing the public internet.

NLB also supports TLS termination, but in a limited way compared to ALB. It terminates TLS and forwards TCP traffic to backends, but it does not understand HTTP. You cannot do path-based routing, header inspection, or any L7 features. If you need TLS termination and HTTP routing, use an ALB. If you need TLS termination for a non-HTTP TCP protocol (like a database proxy or a custom protocol), NLB is the right choice.

NLB has a critical feature that ALB lacks: static IP addresses. Each NLB gets a static IP per availability zone (or you can assign your own Elastic IPs). This matters when clients cannot resolve DNS names (embedded devices, legacy systems) or when you need to allowlist your load balancer IP in a firewall. ALB IPs are dynamic and can change without notice.

NLB pricing is similar in structure to ALB: $0.0225/hour fixed plus a per-NLCU charge ($0.006/hour). NLCUs are measured on three dimensions: new TCP connections per second (800), active TCP connections per minute (100,000), and processed bytes per hour (1 GB). For raw TCP/UDP workloads, NLB is typically 20-30% cheaper than ALB because of the higher connection capacity per LCU.

Gateway Load Balancer (GWLB)

GWLB is a specialized load balancer for deploying, scaling, and managing third-party virtual appliances -- firewalls, intrusion detection systems, deep packet inspection. It uses GENEVE encapsulation to transparently insert appliances into the traffic flow without changing source or destination IPs. Unless you are deploying network appliances, you will never use GWLB.

ALB + NLB Together

A common production pattern is to place an NLB in front of an ALB. This gives you the static IPs and PrivateLink support of the NLB combined with the L7 routing capabilities of the ALB. AWS explicitly supports this pattern, and it is the standard approach when you need PrivateLink access to an HTTP service.

Azure Load Balancing Options

Azure Load Balancer (L4)

Azure Load Balancer is the L4 option on Azure, supporting TCP and UDP. It comes in two SKUs: Basic (free, but limited) and Standard (required for production). Standard Load Balancer supports zone-redundant and zonal frontends, HA ports (load balance all ports simultaneously), outbound rules for SNAT management, and up to 1,000 instances in the backend pool.

A critical difference from AWS NLB: Azure Standard Load Balancer does not pass through client source IPs by default for inbound NAT rules. You need Direct Server Return (DSR, called "floating IP" in Azure) for that. Also, Azure Load Balancer is the default outbound internet gateway for VMs in a backend pool, which catches many teams off guard -- VMs lose internet access if you remove them from the load balancer backend pool unless you configure an explicit outbound rule or NAT Gateway.

Compare Azure Load Balancer SKUs side by side

Azure Application Gateway (L7)

Azure Application Gateway is the L7 counterpart, supporting HTTP, HTTPS, HTTP/2, and WebSocket. It offers URL path-based routing, multi-site hosting, SSL termination, cookie-based session affinity, and a built-in web application firewall (WAF v2). Application Gateway v2 supports autoscaling, zone redundancy, static VIP, and header rewriting.

Application Gateway pricing is more complex than ALB. The Standard_v2 SKU charges a fixed capacity unit cost plus a per-capacity-unit consumption charge. Each capacity unit is measured across three dimensions: compute units (50 persistent connections per second or 2,500 concurrent connections), throughput (2.22 Mbps), and persistent connections (2,500). In practice, for a moderate web application, expect $150-300/month -- significantly more expensive than AWS ALB for equivalent workloads.

Azure Front Door

Azure Front Door is a global L7 load balancer built on Microsoft's global edge network. It provides global HTTP load balancing with instant failover, SSL offloading at the edge, URL-based routing, WAF integration, and CDN-like caching. Think of it as Azure's answer to CloudFront + ALB combined.

Front Door is the right choice when you need global load balancing across multiple Azure regions, or when you want to terminate TLS as close to the user as possible (reducing TLS handshake latency). It is not the right choice for internal-only workloads or single-region deployments where Application Gateway is simpler and cheaper.

GCP Load Balancing

GCP's load balancing taxonomy is the most complex of the three clouds. Instead of two or three products, GCP has a matrix of load balancer types based on traffic type (HTTP vs TCP/UDP/SSL), scope (global vs regional), and tier (Premium vs Standard). Understanding this matrix is essential for choosing the right option.

Use the GCP Load Balancer chooser to find the right type

Global External Application Load Balancer

This is GCP's flagship L7 load balancer and probably the most capable load balancer of any cloud provider. It uses Google's global anycast network, meaning users connect to the nearest Google edge location and traffic is routed over Google's private backbone to the closest healthy backend. This provides lower latency and better reliability than routing over the public internet.

It supports URL maps for sophisticated routing (host rules, path matchers, route rules with match conditions), traffic management (weighted traffic splitting for canary deployments, mirroring for testing), header-based routing, and integration with Cloud CDN, Cloud Armor (WAF), and Identity-Aware Proxy (IAP). The URL map configuration is more expressive than AWS ALB listener rules or Azure Application Gateway rules.

Regional External/Internal Application Load Balancer

For single-region workloads or when you need the load balancer to be backed by Envoy proxies that you can customize, regional application load balancers are the choice. They support advanced traffic management including traffic policies, outlier detection, circuit breakers, and retry policies -- features that typically require a service mesh. This makes GCP's regional L7 load balancer uniquely powerful for microservices architectures.

External/Internal Network Load Balancer

GCP's L4 options support TCP, UDP, and SSL protocols. The external network load balancer uses Maglev (Google's software-defined load balancing technology) and supports millions of packets per second with consistent hashing for connection affinity. The internal network load balancer is based on Andromeda (GCP's virtual networking stack) and supports pass-through load balancing -- traffic goes directly from client to backend without a proxy, preserving the original source IP.

TLS Termination Strategies

Where to terminate TLS is one of the most impactful load balancer decisions, affecting security posture, performance, operational complexity, and cost.

Strategy 1: Terminate at the Load Balancer

The load balancer decrypts HTTPS and forwards plain HTTP to backends. This is the simplest approach. Backends do not need certificates, certificate management is centralized at the load balancer, and the load balancer can inspect HTTP traffic for routing decisions. The downside: traffic between the load balancer and backends is unencrypted. In AWS and Azure, this traffic stays within the VPC/VNet, so the risk is internal. For compliance frameworks like PCI DSS, this may not be acceptable.

Strategy 2: TLS Termination + Re-encryption

The load balancer terminates client TLS, inspects the request for routing, then establishes a new TLS connection to the backend. Traffic is encrypted end-to-end, and the load balancer can still make L7 routing decisions. The cost is double TLS handshakes per connection (mitigated by connection pooling and session resumption). All three clouds support this pattern: AWS ALB supports backend HTTPS target groups, Azure Application Gateway supports end-to-end TLS, and GCP supports backend HTTPS for managed instance groups.

Strategy 3: TLS Passthrough

The load balancer forwards encrypted traffic directly to backends without decryption. The backend terminates TLS. This is the most secure option (no plaintext anywhere) but means the load balancer cannot make L7 routing decisions. Only L4 load balancers support this. Use this when backends must terminate their own TLS (mutual TLS requirements, specific cipher suite needs) or when the overhead of double TLS is unacceptable.

WebSocket Support

WebSocket support varies significantly across load balancer types, and the differences matter for real-time applications.

AWS ALB supports WebSocket natively. HTTP Upgrade requests are handled automatically, and WebSocket connections can persist for up to 4,000 seconds (configurable via idle timeout). However, ALB connection draining during deployments will close WebSocket connections, so clients must implement reconnection logic. NLB also supports WebSocket implicitly because it is a TCP passthrough -- it does not know or care that the TCP connection carries WebSocket frames.

Azure Application Gateway v2 supports WebSocket with no additional configuration. The idle timeout defaults to 4 minutes but can be extended. Azure Load Balancer supports WebSocket via TCP passthrough with idle timeouts up to 30 minutes.

GCP's global external Application Load Balancer supports WebSocket with a backend service timeout (default 30 seconds, configurable up to 86,400 seconds / 24 hours). The timeout applies to the entire WebSocket connection lifetime, so you must set it appropriately for long-lived connections.

WebSocket and Connection Draining

All L7 load balancers will terminate WebSocket connections during backend deregistration (rolling deployments, scale-in events). Your clients must implement reconnection with exponential backoff. Do not rely on WebSocket connections being permanent. If you need truly persistent connections that survive deployments, use an L4 load balancer with your own connection management at the application layer.

Cost Comparison: A Realistic Workload

Let us compare costs for a realistic workload: a web application handling 500 requests per second with an average response size of 50 KB, using HTTPS with TLS termination at the load balancer. Monthly data transfer: approximately 64 TB.

AWS

ALB: ~$16.20 fixed + ~$45 LCU charges = ~$61/month (data processing is the dominant LCU dimension at this throughput)
NLB: ~$16.20 fixed + ~$38 NLCU charges = ~$54/month (but no L7 features)

Azure

Application Gateway v2: ~$175 fixed + ~$90 capacity unit charges = ~$265/month
Standard Load Balancer: ~$18 fixed + ~$4 rules + data processing = ~$30/month (but no L7 features)

GCP

Global External Application LB: ~$18 forwarding rule + ~$55 data processing = ~$73/month
Regional External Network LB: ~$18 forwarding rule + ~$20 data processing = ~$38/month (but no L7 features)

Azure Application Gateway is significantly more expensive for equivalent L7 workloads. If cost is a primary concern and you are on Azure, consider whether Azure Front Door or placing nginx/Envoy on VMs behind a Standard Load Balancer gives you the L7 features you need at lower cost.

The Decision Tree

Use this decision tree to choose the right load balancer for your workload:

Is your traffic HTTP/HTTPS? If no, you need an L4 load balancer (NLB, Azure LB Standard, or GCP Network LB). Stop here.
Do you need global load balancing across regions? If yes, use GCP Global External Application LB, Azure Front Door, or AWS CloudFront + ALB (AWS does not have a single global L7 load balancer product).
Do you need static IPs? If yes, on AWS use NLB or NLB in front of ALB. On Azure, Application Gateway v2 provides a static VIP. On GCP, the global external Application LB provides anycast IPs.
Do you need path-based or header-based routing? If yes, you need an L7 load balancer: ALB, Application Gateway, or GCP Application LB.
Is sub-millisecond latency critical? If yes, use an L4 load balancer. L7 load balancers add 1-5ms for HTTP processing and TLS termination.
Do you need PrivateLink/Private Endpoint exposure? If yes, on AWS use NLB (required for PrivateLink). On Azure, both Standard LB and Application Gateway support private endpoints. On GCP, use internal Application LB or Network LB.
Do you need integrated WAF? If yes, use ALB + AWS WAF, Application Gateway + WAF, or GCP Application LB + Cloud Armor.

When in Doubt

If you are building a standard web application or API and do not have specific L4 requirements, start with the L7 load balancer on your cloud (ALB, Application Gateway, or GCP Application LB). The additional cost over L4 is modest, and the L7 features (health checks with HTTP status codes, path-based routing, TLS termination) will save you significant operational complexity.

Production Lessons

After running load balancers across all three clouds for years, here are the patterns that matter most in production:

Health check paths matter. Do not health-check your root URL. Create a dedicated /health endpoint that checks database connectivity, cache connectivity, and any critical dependencies. A 200 OK from a static page tells you nothing about whether your application can actually serve traffic.
Connection draining timeout must match your longest request. If your API has endpoints that take up to 30 seconds to respond, set your deregistration delay to at least 60 seconds. Otherwise, in-flight requests will be terminated during deployments.
Cross-zone load balancing has cost implications. On AWS, ALB has cross-zone load balancing enabled by default (and it is free). NLB charges for cross-zone data transfer. On GCP, cross-region traffic from the global load balancer to backends incurs data transfer charges. Plan your AZ/zone distribution to minimize cross-zone costs.
Pre-warm for traffic spikes. AWS ALB and Azure Application Gateway can take minutes to scale up for sudden traffic increases. If you know a traffic spike is coming (product launch, marketing campaign), pre-warm by gradually ramping up traffic or contact support. NLB and GCP's Maglev-based load balancers handle sudden spikes without pre-warming.
Monitor 5xx errors at the load balancer, not just the backend. Load balancers generate their own 5xx errors (502, 503, 504) for connection failures, timeouts, and backend health issues. These are often the first signal that something is wrong, even before your backend metrics show problems.

Bottom Line

Load balancer selection is not a set-and-forget decision. Get it right by understanding whether you need L4 or L7 capabilities, where TLS should terminate, and what performance characteristics your workload demands. Changing load balancer types later is possible but disruptive -- it affects DNS, security groups, monitoring, and deployment pipelines. Spend the time upfront to make the right choice, and you will avoid painful migrations down the road.

Compare Azure Load Balancer SKUs Choose the right GCP load balancer for your workload