Skip to main content
All articles

AWS Lambda Performance Optimization: From Cold Starts to Sub-100ms Responses

Cold start causes, SnapStart, Provisioned Concurrency, memory tuning, connection pooling, and concrete before-and-after performance numbers.

CloudToolStack TeamApril 2, 202615 min read

The Performance Problem Nobody Warns You About

AWS Lambda is deceptively easy to get started with. Write a function, deploy it, and it runs. But the gap between "it runs" and "it runs fast enough for production" is where most teams spend months. Cold starts turn a 50 ms function into a 3-second function. Memory misconfigurations waste money or starve CPU. Connection pooling mistakes overwhelm downstream databases. VPC attachment adds seconds of initialization overhead.

I have optimized Lambda functions across dozens of production workloads, and the same patterns appear repeatedly. This article covers every meaningful optimization, from quick configuration wins to architectural changes that deliver sub-100 ms response times consistently. Each recommendation includes concrete before-and-after numbers from real workloads.

Understanding Cold Starts

A cold start occurs when Lambda needs to create a new execution environment for your function. This involves downloading your deployment package, starting the runtime, running your initialization code, and then executing the handler. Warm invocations skip all of this -- the execution environment is reused, and only the handler runs.

Cold start duration depends on several factors:

  • Runtime: Python and Node.js cold starts are typically 100 to 300 ms. Java and .NET cold starts range from 1 to 10 seconds without optimization. Go and Rust cold starts are 10 to 50 ms.
  • Package size: A 5 MB deployment package cold starts faster than a 250 MB package. Every megabyte adds roughly 5 to 10 ms of download time.
  • Initialization code: Code that runs outside the handler (imports, SDK client creation, database connections) executes during the cold start. A function that imports 50 Python packages at startup will be slower than one that imports 5.
  • VPC attachment: Functions in a VPC used to add 10 to 15 seconds of cold start time. Since AWS deployed Hyperplane (the shared ENI model in late 2019), VPC cold starts are typically under 1 second, but they still add 200 to 500 ms compared to non-VPC functions.
  • Memory allocation: Higher memory allocations get proportionally more CPU. A function with 128 MB memory gets a fraction of a vCPU, which slows down initialization. At 1769 MB, you get a full vCPU. Going beyond 1769 MB gives you multiple vCPUs but only helps if your code is multi-threaded.

Cold start frequency

Cold starts affect a fraction of invocations, not all of them. AWS keeps execution environments warm for 5 to 15 minutes after the last invocation (the exact duration is undocumented and varies). A function invoked every 30 seconds will rarely experience cold starts. A function invoked once every 30 minutes will cold start on most invocations. For API-facing functions with steady traffic, cold starts might affect 1 to 5 percent of requests. For scheduled functions that run hourly, cold starts affect nearly 100 percent of invocations.

Quick Wins: Configuration Optimization

Memory tuning

Memory configuration is the single highest-impact optimization you can make, and it is free or even cost-negative. Lambda charges by GB-second: doubling the memory doubles the per-millisecond cost but also doubles the CPU allocation, which often more than halves the execution time.

Concrete example: A Node.js function processing JSON payloads ran at 128 MB with an average execution time of 850 ms. At 256 MB: 420 ms. At 512 MB: 210 ms. At 1024 MB: 115 ms. At 1769 MB: 95 ms. At 3008 MB: 92 ms. The cost sweet spot was 512 MB: the execution time dropped 75 percent while cost per invocation dropped 50 percent because the function ran for less than half the time at twice the per-ms rate.

Use AWS Lambda Power Tuning (an open-source tool from Alex Casalboni) to find the optimal memory setting for your specific function. It runs your function at different memory configurations and generates a cost-performance graph. The optimal point is where the cost curve flattens -- adding more memory no longer significantly reduces execution time.

The 1769 MB Sweet Spot

At 1769 MB, Lambda allocates exactly one full vCPU to your function. Below this, your function gets a fractional CPU that throttles CPU-bound operations. Above this, you get additional vCPUs but only benefit if your code uses multi-threading. For most single-threaded functions, 1769 MB is the point of diminishing returns for CPU-bound work. For memory-bound work, you may need to go higher.

Deployment package optimization

Smaller packages cold start faster. The strategies differ by runtime:

  • Node.js: Use esbuild or webpack to bundle and tree-shake your code. A typical Express.js function goes from 80 MB (node_modules) to 2 to 5 MB bundled. Do not include the AWS SDK v3 in your bundle -- it is available in the Lambda runtime. Use specific client packages (@aws-sdk/client-s3) instead of the umbrella package (@aws-sdk/client-all).
  • Python: Use Lambda layers for large dependencies like pandas, numpy, or boto3 extensions. Layers are cached separately and shared across functions. Strip .pyc files and test directories from your packages. Consider using a requirements.txt with only the packages you actually import.
  • Java: This is where package size matters most because the JVM loads classes at startup. Use the Maven shade plugin or Gradle shadow plugin to create a slim JAR. Exclude transitive dependencies you do not use. Consider AWS-specific SDKs like aws-lambda-java-core instead of the full AWS SDK.
  • Container images: Lambda supports container images up to 10 GB. Use multi-stage builds to keep the final image small. Use the AWS-provided base images, which are pre-cached on Lambda infrastructure and cold start faster than custom images.

Initialization optimization

Code outside the handler runs once during the cold start and is reused across warm invocations. This is the right place for expensive initialization: creating SDK clients, establishing database connections, loading configuration. But only initialize what you actually need.

A common anti-pattern is importing every module at the top of the file when only some are used per invocation path. In Python, use lazy imports -- import modules inside the function that uses them, not at the top of the file. In Node.js, use dynamic imports (await import()) for rarely-used modules. In Java, avoid static initializers that load large class hierarchies. Each unused import adds to cold start time without providing any benefit.

Estimate Lambda costs at different memory configurations

Advanced Optimizations

SnapStart (Java)

Lambda SnapStart, launched in late 2022, is a game-changer for Java cold starts. Instead of initializing the JVM on every cold start, SnapStart takes a snapshot of the initialized execution environment (including loaded classes and JIT-compiled code) and restores it on each cold start. This reduces Java cold starts from 3 to 10 seconds down to 200 to 500 ms.

Before SnapStart: A Spring Boot function with DynamoDB client initialization cold-started in 6.2 seconds. After SnapStart: 380 ms. That is a 94 percent reduction with a single configuration change.

The caveats: SnapStart requires the Java 11, Java 17, or Java 21 managed runtimes. It does not work with container images, provisioned concurrency, or arm64 architecture (as of early 2026). Your code must handle snapshot restoration correctly -- any state that should not be shared across invocations (random number generators, unique IDs, network connections) needs to be re-initialized after restore using runtime hooks.

SnapStart and Uniqueness

When SnapStart restores a snapshot, all execution environments start from the same state. If your initialization code generates a unique ID or seeds a random number generator, every restored environment will have the same value. Use the CRaC (Coordinated Restore at Checkpoint) afterRestore hook to re-seed random number generators and regenerate any state that must be unique per environment. Failing to do this can cause subtle bugs: duplicate request IDs, predictable tokens, or reused connection handles.

Provisioned Concurrency

Provisioned Concurrency pre-initializes a specified number of execution environments that are always warm. There are zero cold starts for requests served by provisioned environments. This is the brute-force solution to cold starts -- pay to keep environments warm.

The cost model: you pay for provisioned concurrency whether it is used or not, at roughly 60 percent of the on-demand execution rate. For a function configured with 10 provisioned instances at 512 MB, the cost is approximately $3.80 per hour ($2,736 per month) regardless of invocations. On top of that, you pay the standard execution charges when requests are served.

When to use it: API-facing functions where p99 latency is critical and cold starts are unacceptable. The math works when the alternative is over-provisioning a container or EC2 instance that costs even more. For a function handling 100 requests per second with a 3-second Java cold start affecting 2 percent of requests, provisioned concurrency eliminates 2 requests per second of 3-second latency at the cost of keeping 10 to 15 environments warm.

Use Application Auto Scaling to adjust provisioned concurrency based on a schedule (scale up during business hours, scale down at night) or based on utilization. A common pattern is provisioning enough concurrency for the morning traffic ramp and letting auto-scaling handle the rest. This reduces the cost compared to flat provisioning for peak capacity.

Connection pooling

Lambda functions that connect to databases face a unique challenge: each execution environment maintains its own connection, and Lambda can scale to hundreds or thousands of concurrent environments. At 500 concurrent invocations, you have 500 database connections. Most relational databases choke well before that -- a standard RDS PostgreSQL instance handles 100 to 200 concurrent connections comfortably, and performance degrades sharply beyond that.

The solution is RDS Proxy, which sits between Lambda and the database, pooling and multiplexing connections. Instead of each Lambda environment opening a direct connection, it connects to RDS Proxy, which maintains a smaller pool of connections to the actual database. 500 Lambda environments might share 50 database connections through the proxy.

Before RDS Proxy: A function at 300 concurrent invocations caused PostgreSQL connection exhaustion, resulting in connection timeouts and 500 errors. After RDS Proxy: The same function handled 1,000 concurrent invocations with stable database performance. RDS Proxy adds 1 to 5 ms of latency per query, which is negligible compared to the connection stability it provides.

For DynamoDB, connection management is less critical because the SDK uses HTTP connections rather than persistent database connections. However, creating a new DynamoDB client on every invocation is still wasteful. Create the client outside the handler so it is reused across warm invocations, and configure the SDK's HTTP agent with keepAlive: true and a reasonable maxSockets value (50 is a good default).

Lambda Powertools

AWS Lambda Powertools is an open-source library available for Python, TypeScript, Java, and .NET that provides structured logging, custom metrics, distributed tracing, and idempotency out of the box. It does not directly reduce cold start time, but it dramatically improves observability, which is essential for identifying and fixing performance issues.

The tracing module integrates with X-Ray and adds subsegment annotations for every AWS SDK call, HTTP request, and custom operation. This visibility lets you see exactly where time is spent during execution: 45 ms on the DynamoDB query, 120 ms on the external API call, 8 ms on JSON serialization. Without this granularity, you are guessing about what to optimize.

The idempotency module prevents duplicate processing when Lambda retries a failed invocation. For functions that write to databases or trigger external side effects, idempotency is critical for correctness. The module stores a hash of the request in DynamoDB and returns the cached response if the same request is seen again within the configured TTL.

Architecture-Level Optimizations

Avoid VPC when possible

Functions that only call AWS services (DynamoDB, S3, SQS, SNS) and external APIs do not need to be in a VPC. VPC attachment adds cold start latency (200 to 500 ms) and complicates networking (you need NAT Gateways for internet access, VPC endpoints for AWS services). Only place functions in a VPC when they need to access VPC-bound resources like RDS databases, ElastiCache clusters, or internal services.

Before: A function calling DynamoDB and S3 was placed in a VPC because "everything goes in the VPC." Cold starts were 1.2 seconds. After removing VPC attachment: cold starts dropped to 350 ms, and the NAT Gateway was no longer needed, saving $32 per month.

Separate hot and cold paths

If your function handles multiple operations with different performance requirements, split it into separate functions. A single function that handles both lightweight reads (should be fast) and heavyweight writes (acceptable to be slower) forces you to optimize for the worst case. Separate functions can have different memory configurations, timeout settings, and concurrency limits.

A practical example: An API handler function processed both GET requests (read from DynamoDB, 20 ms average) and POST requests (validate, write to DynamoDB, publish to SNS, 250 ms average). Splitting into two functions allowed the read function to run at 256 MB memory with a 5-second timeout, while the write function ran at 512 MB with a 30-second timeout. The read function's cold start dropped from 800 ms to 400 ms because it loaded fewer dependencies.

Response streaming

Lambda response streaming (available for Node.js functions behind a Function URL) lets your function start sending the response before the entire response is generated. For functions that aggregate data from multiple sources, the client receives the first bytes in milliseconds rather than waiting for the entire operation to complete. Time to first byte (TTFB) can drop from seconds to under 100 ms for functions that return large responses or call multiple downstream services.

Compare serverless functions across cloud providers

Monitoring and Continuous Optimization

Key metrics to track

Set up CloudWatch dashboards and alarms for these metrics:

  • Duration (p50, p95, p99): The p50 tells you typical performance. The p99 reveals cold start impact. A large gap between p50 and p99 indicates cold starts are a problem.
  • Init Duration: This metric (available in CloudWatch Logs Insights) shows the cold start initialization time separately from handler execution time. If Init Duration is high, focus on initialization optimization.
  • Concurrent Executions: Track how close you are to your concurrency limit. Hitting the limit causes throttling (429 errors), not cold starts.
  • Throttles: Any non-zero throttle count means you are hitting concurrency limits. Request a limit increase or implement request queuing.
  • Cold Start Rate: Calculate the percentage of invocations that are cold starts by comparing invocations to init events. A rate above 5 percent for high-traffic functions warrants investigation.

CloudWatch Logs Insights queries

Use CloudWatch Logs Insights to analyze cold start patterns. Query for REPORT lines to get duration, billed duration, memory used, and init duration. Filter for init duration to isolate cold starts. Group by hour to see when cold starts concentrate (usually after scaling events or deployment).

The Optimization Priority

Optimize in this order for maximum impact with minimum effort: (1) Memory tuning -- 5 minutes, often the biggest improvement. (2) Package size reduction -- a few hours, reduces cold start by 10 to 50 percent. (3) Initialization optimization -- a few hours, reduces cold start by 20 to 40 percent. (4) SnapStart for Java -- a configuration change, reduces cold start by 80 to 95 percent. (5) Provisioned Concurrency -- a cost-performance tradeoff for latency-critical functions. (6) Architecture changes -- days to weeks, but addresses fundamental bottlenecks.

A Real Optimization Story

A team running a customer-facing API on Lambda with Java 17 had the following performance profile before optimization:

  • Cold start duration: 7.4 seconds (p99)
  • Warm invocation duration: 180 ms (p50), 450 ms (p99)
  • Memory: 512 MB
  • Package size: 85 MB (Spring Boot fat JAR)
  • Cold start rate: 3.2 percent of invocations

After a week of optimization:

  1. Increased memory from 512 MB to 1769 MB. Warm p50 dropped from 180 ms to 65 ms. Cold start dropped from 7.4s to 4.1s.
  2. Enabled SnapStart. Cold start dropped from 4.1s to 420 ms.
  3. Replaced Spring Boot with Micronaut (which has better Lambda support and faster startup). Package size dropped from 85 MB to 22 MB. Cold start dropped to 310 ms. Warm p50 dropped to 42 ms.
  4. Implemented connection pooling with RDS Proxy. Warm p99 dropped from 450 ms to 85 ms (eliminated connection establishment on cold starts).
  5. Added 5 provisioned concurrency instances for the morning traffic ramp.

Final performance:

  • Cold start duration: 310 ms (p99) -- reduced 96 percent
  • Warm invocation duration: 42 ms (p50), 85 ms (p99) -- reduced 77 to 81 percent
  • Cold start rate: effectively 0 percent during business hours (provisioned concurrency)
  • Monthly cost: increased $280 (provisioned concurrency) but decreased $120 (faster execution = fewer GB-seconds) for a net increase of $160

The $160 monthly cost increase eliminated all user-facing cold starts and reduced API response times by 80 percent. For a customer-facing API, that tradeoff is obvious. For a backend batch processing function, the calculus would be different -- cold starts on a function that processes SQS messages are invisible to end users and rarely worth the cost of provisioned concurrency.

Estimate API Gateway costs for your Lambda APIs

When Lambda Is Not the Answer

Not every performance problem should be solved by optimizing Lambda. If your function consistently runs for 10+ seconds, handles persistent WebSocket connections, requires more than 10 GB of memory, or maintains complex in-memory state between requests, Lambda is the wrong compute platform. ECS Fargate, App Runner, or EC2 will give you more control over the execution environment and eliminate cold starts entirely.

The break-even point varies, but as a rough guide: if your function handles more than 5 million requests per month with consistent traffic, the cost of Lambda plus provisioned concurrency often exceeds the cost of a right-sized Fargate task or EC2 instance. Run the numbers for your specific workload. Lambda's strength is elastic scaling and per-request billing, not steady-state high-throughput workloads.

The fastest Lambda function is one that does not need to exist. Before optimizing, ask whether the function should be a Lambda at all, or whether it belongs on a different compute platform.

Written by CloudToolStack Team

Cloud architects with 15+ years of production experience across AWS, Azure, GCP, and OCI. We build free tools and write practical guides to help engineers navigate multi-cloud infrastructure.

Disclaimer: This article is for informational purposes. Cloud services and pricing change frequently; always verify with official provider documentation. AWS, Azure, GCP, and OCI are trademarks of their respective owners.