Durable Functions Patterns Guide
Build stateful workflows with Durable Functions: chaining, fan-out, human interaction, and eternal orchestrations.
Prerequisites
- Experience with Azure Functions basics
- Understanding of async programming and workflow patterns
What Are Durable Functions?
Durable Functions is an extension of Azure Functions that lets you write stateful workflows in a serverless environment. Instead of managing queues, state stores, and retry logic yourself, Durable Functions handles all of this through an orchestrator function that coordinates the execution of activity functions. The orchestrator's state is automatically checkpointed and replayed, so workflows can run for seconds or months without losing progress.
Durable Functions solves the problem of complex multi-step processes in serverless: order fulfillment pipelines, approval workflows, data processing with fan-out/fan-in, human interaction patterns, and long-running monitoring scenarios. Without Durable Functions, you would need to manually manage state in a database, handle retries with dead-letter queues, and coordinate between multiple functions using custom logic.
This guide covers the core patterns (chaining, fan-out/fan-in, async HTTP APIs, monitoring, human interaction, eternal orchestrations), implementation in C# and Python, error handling, and production best practices.
Runtime and Languages
Durable Functions supports C#, JavaScript/TypeScript, Python, Java, and PowerShell. The orchestration patterns are the same across languages, but the SDK syntax differs. This guide shows examples in both C# and Python. The underlying durable task framework uses Azure Storage (queues, tables, and blobs) or the newer Netherite and MSSQL storage providers for state management.
Pattern 1: Function Chaining
Function chaining executes a sequence of activity functions in a specific order, where the output of one function becomes the input to the next. This is the simplest Durable Functions pattern and replaces complex queue-based choreography.
import azure.functions as func
import azure.durable_functions as df
app = func.FunctionApp()
bp = df.Blueprint()
# Orchestrator function
@bp.orchestration_trigger(context_name="context")
def order_processing_orchestrator(context: df.DurableOrchestrationContext):
"""Process an order through a series of steps."""
order = context.get_input()
# Step 1: Validate the order
validated_order = yield context.call_activity("validate_order", order)
# Step 2: Process payment
payment_result = yield context.call_activity("process_payment", validated_order)
# Step 3: Reserve inventory
inventory_result = yield context.call_activity("reserve_inventory", {
"order": validated_order,
"payment": payment_result
})
# Step 4: Send confirmation
yield context.call_activity("send_confirmation", {
"order": validated_order,
"payment": payment_result,
"inventory": inventory_result
})
return {"status": "completed", "orderId": order["orderId"]}
# Activity functions
@bp.activity_trigger(input_name="order")
def validate_order(order: dict) -> dict:
# Validate order items, customer, shipping address
if not order.get("items"):
raise ValueError("Order must contain items")
order["validated"] = True
return order
@bp.activity_trigger(input_name="data")
def process_payment(data: dict) -> dict:
# Call payment gateway
return {"transactionId": "txn-abc123", "status": "charged"}
@bp.activity_trigger(input_name="data")
def reserve_inventory(data: dict) -> dict:
# Reserve inventory for each item
return {"reserved": True, "warehouse": "us-east-1"}
@bp.activity_trigger(input_name="data")
def send_confirmation(data: dict) -> None:
# Send email/SMS confirmation
pass
# HTTP starter
@bp.route(route="start-order/{orderId}")
@bp.durable_client_input(client_name="client")
async def start_order(req: func.HttpRequest, client) -> func.HttpResponse:
order_data = req.get_json()
instance_id = await client.start_new(
"order_processing_orchestrator", None, order_data
)
return client.create_check_status_response(req, instance_id)
app.register_functions(bp)Pattern 2: Fan-Out / Fan-In
Fan-out/fan-in executes multiple activity functions in parallel, waits for all of them to complete, and then aggregates the results. This pattern is ideal for batch processing, parallel API calls, and map-reduce workloads.
@bp.orchestration_trigger(context_name="context")
def parallel_processing_orchestrator(context: df.DurableOrchestrationContext):
"""Process multiple items in parallel and aggregate results."""
items = context.get_input() # List of items to process
# Fan-out: launch parallel tasks
parallel_tasks = []
for item in items:
task = context.call_activity("process_item", item)
parallel_tasks.append(task)
# Fan-in: wait for all tasks to complete
results = yield context.task_all(parallel_tasks)
# Aggregate results
summary = yield context.call_activity("aggregate_results", results)
return summary
@bp.activity_trigger(input_name="item")
def process_item(item: dict) -> dict:
"""Process a single item (runs in parallel with other instances)."""
# Simulate processing (image resize, data transformation, etc.)
return {
"itemId": item["id"],
"status": "processed",
"size": len(str(item))
}
@bp.activity_trigger(input_name="results")
def aggregate_results(results: list) -> dict:
"""Aggregate results from parallel processing."""
return {
"totalProcessed": len(results),
"successCount": sum(1 for r in results if r["status"] == "processed"),
"totalSize": sum(r["size"] for r in results)
}Use task_any for Racing
Use context.task_any() instead of task_all() to wait for the first task to complete (racing pattern). This is useful for implementing timeouts, redundant API calls where you take the fastest response, or competitive processing where you want the first result.
Pattern 3: Async HTTP APIs
The async HTTP API pattern provides a standard way to start long-running operations via HTTP and poll for their status. Durable Functions provides built-in HTTP endpoints for checking status, sending events to running orchestrations, and terminating instances.
# Start an orchestration
curl -X POST "https://myfuncapp.azurewebsites.net/api/start-order/ord-123" \
-H "Content-Type: application/json" \
-d '{"orderId": "ord-123", "items": [{"sku": "WIDGET-1", "qty": 5}]}'
# Response includes status query URLs:
# {
# "id": "abc123",
# "statusQueryGetUri": "https://myfuncapp.azurewebsites.net/runtime/webhooks/durabletask/instances/abc123",
# "sendEventPostUri": "https://myfuncapp.azurewebsites.net/runtime/webhooks/durabletask/instances/abc123/raiseEvent/{eventName}",
# "terminatePostUri": "https://myfuncapp.azurewebsites.net/runtime/webhooks/durabletask/instances/abc123/terminate?reason={text}"
# }
# Poll for status
curl "https://myfuncapp.azurewebsites.net/runtime/webhooks/durabletask/instances/abc123"
# Response when running:
# {"runtimeStatus": "Running", "input": {...}, "output": null}
# Response when completed:
# {"runtimeStatus": "Completed", "input": {...}, "output": {"status": "completed"}}Pattern 4: Human Interaction
The human interaction pattern implements approval workflows where an orchestration pauses and waits for a human to take action (approve, reject, provide input). The orchestrator uses wait_for_external_event to pause execution until an external event is raised, with an optional timeout.
import datetime
@bp.orchestration_trigger(context_name="context")
def approval_workflow(context: df.DurableOrchestrationContext):
"""Approval workflow with timeout and escalation."""
request = context.get_input()
# Send approval request
yield context.call_activity("send_approval_request", {
"approver": request["manager"],
"details": request["details"],
"instanceId": context.instance_id
})
# Wait for approval with 72-hour timeout
timeout = context.current_utc_datetime + datetime.timedelta(hours=72)
approval_event = context.wait_for_external_event("ApprovalResponse")
timeout_event = context.create_timer(timeout)
winner = yield context.task_any([approval_event, timeout_event])
if winner == approval_event:
approval = approval_event.result
if approval["decision"] == "approved":
yield context.call_activity("process_approved_request", request)
return {"status": "approved", "approver": approval["approvedBy"]}
else:
yield context.call_activity("notify_rejection", request)
return {"status": "rejected", "reason": approval.get("reason")}
else:
# Timeout: escalate to VP
yield context.call_activity("escalate_to_vp", request)
return {"status": "escalated", "reason": "timeout"}
# Raise an event from outside (e.g., from an approval API)
@bp.route(route="approve/{instanceId}")
@bp.durable_client_input(client_name="client")
async def approve_request(req: func.HttpRequest, client) -> func.HttpResponse:
instance_id = req.route_params["instanceId"]
body = req.get_json()
await client.raise_event(
instance_id,
"ApprovalResponse",
{"decision": body["decision"], "approvedBy": body["approvedBy"]}
)
return func.HttpResponse("Event raised", status_code=202)Pattern 5: Eternal Orchestrations
Eternal orchestrations are long-running (potentially infinite) workflows that periodically perform work. Unlike a timer-triggered function, eternal orchestrations maintain state between iterations, making them suitable for monitoring, polling external systems, and periodic data synchronization.
@bp.orchestration_trigger(context_name="context")
def monitoring_orchestrator(context: df.DurableOrchestrationContext):
"""Monitor a resource and alert on anomalies (runs forever)."""
config = context.get_input()
endpoint = config["endpoint"]
threshold = config.get("threshold", 95)
# Check the resource
status = yield context.call_activity("check_health", endpoint)
if status["cpu_percent"] > threshold:
yield context.call_activity("send_alert", {
"endpoint": endpoint,
"metric": "cpu",
"value": status["cpu_percent"],
"threshold": threshold
})
# Wait 5 minutes before next check
next_check = context.current_utc_datetime + datetime.timedelta(minutes=5)
yield context.create_timer(next_check)
# Continue as new (restart orchestration with fresh history)
# This prevents history from growing unbounded
context.continue_as_new(config)Continue As New Is Essential
Always use continue_as_new in eternal orchestrations. Without it, the orchestration history grows with every iteration, eventually causing performance degradation and memory issues. continue_as_new resets the history while preserving the orchestration instance ID.
Error Handling and Retries
Durable Functions provides built-in retry policies for activity functions. You can configure the number of retries, initial retry interval, backoff coefficient, and maximum retry interval. Failed activities throw exceptions in the orchestrator, which you can catch and handle with custom logic.
from azure.durable_functions import RetryOptions
@bp.orchestration_trigger(context_name="context")
def resilient_orchestrator(context: df.DurableOrchestrationContext):
"""Orchestrator with retry policies and error handling."""
order = context.get_input()
# Configure retry policy
retry_options = RetryOptions(
first_retry_interval_in_milliseconds=5000, # 5 seconds
max_number_of_attempts=3,
backoff_coefficient=2.0, # Exponential backoff
max_retry_interval_in_milliseconds=60000, # Max 60 seconds
)
try:
# Call with retry policy
result = yield context.call_activity_with_retry(
"call_external_api",
retry_options,
order
)
except Exception as e:
# All retries failed - run compensation logic
yield context.call_activity("compensate_order", {
"order": order,
"error": str(e)
})
return {"status": "failed", "error": str(e)}
return {"status": "success", "result": result}Terraform Deployment
resource "azurerm_linux_function_app" "durable" {
name = "myapp-durable-func"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
storage_account_name = azurerm_storage_account.func.name
storage_account_access_key = azurerm_storage_account.func.primary_access_key
service_plan_id = azurerm_service_plan.main.id
site_config {
application_stack {
python_version = "3.11"
}
}
app_settings = {
"AzureWebJobsFeatureFlags" = "EnableWorkerIndexing"
"FUNCTIONS_WORKER_RUNTIME" = "python"
"AzureWebJobsStorage" = azurerm_storage_account.func.primary_connection_string
"WEBSITE_RUN_FROM_PACKAGE" = "1"
}
}
resource "azurerm_service_plan" "main" {
name = "myapp-plan"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
os_type = "Linux"
sku_name = "EP1" # Elastic Premium for Durable Functions
}Production Best Practices
| Practice | Recommendation | Why |
|---|---|---|
| Hosting plan | Elastic Premium (EP1+) or Dedicated | Consumption plan has cold start and timeout limits |
| Orchestrator rules | No I/O, non-deterministic code, or blocking calls | Orchestrators replay; side effects cause bugs |
| Instance management | Purge completed instances regularly | Storage costs grow with instance history |
| Versioning | Use side-by-side versioning for breaking changes | Running instances cannot handle schema changes |
| Monitoring | Use Durable Functions Monitor extension | Visibility into orchestration state and history |
| Testing | Unit test orchestrators with mocked context | Verify workflow logic without Azure dependencies |
# Purge completed orchestration instances (older than 30 days)
az functionapp durable purge-history \
--app-name myapp-durable-func \
--resource-group myapp-rg \
--created-before "2026-02-14T00:00:00Z" \
--runtime-status Completed Terminated Failed
# Query orchestration instances
az functionapp durable get-instances \
--app-name myapp-durable-func \
--resource-group myapp-rg \
--runtime-status RunningOrchestrator Code Constraints
Orchestrator functions must be deterministic because they replay from history. Never usedatetime.now() (use context.current_utc_datetime), random numbers, environment variables that change, or direct I/O (HTTP calls, database queries). All side effects must happen in activity functions, not the orchestrator.
Key Takeaways
- 1Durable Functions provides five core patterns: chaining, fan-out/fan-in, async HTTP, human interaction, and eternal.
- 2Orchestrator functions must be deterministic because they replay from history.
- 3Built-in retry policies with exponential backoff handle transient failures automatically.
- 4Use continue_as_new in eternal orchestrations to prevent unbounded history growth.
Frequently Asked Questions
What hosting plan should I use for Durable Functions?
Can Durable Functions run for days or weeks?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.