Azure Monitor & Application Insights
Master Azure Monitor, Application Insights, Log Analytics, KQL queries, metrics, alerts, distributed tracing, and workbooks.
Prerequisites
- Basic understanding of Azure services
- Familiarity with Azure portal and Azure CLI
- Experience with web application development
Azure Observability Overview
Observability in cloud-native environments goes far beyond traditional monitoring. It encompasses the ability to understand the internal state of your systems by examining their external outputs: logs, metrics, and traces. Azure provides a comprehensive observability stack built around Azure Monitor, which serves as the unified platform for collecting, analyzing, and acting on telemetry data from your cloud and on-premises environments.
Azure Monitor sits at the center of the observability ecosystem, aggregating data from multiple sources including Application Insights (application performance monitoring), Log Analytics (centralized log storage and querying), Azure Metrics (time-series performance data), and Azure Alerts (proactive notification and automation). Together, these services provide end-to-end visibility into the health, performance, and behavior of your applications and infrastructure.
Whether you are running a simple web application on Azure App Service, a complex microservices architecture on Azure Kubernetes Service, or a hybrid environment spanning on-premises data centers and multiple Azure regions, the observability principles and tools covered in this guide apply. We will walk through the architecture, configuration, querying, alerting, and cost management aspects of Azure Monitor and Application Insights with practical examples you can apply immediately.
OpenTelemetry & Azure Monitor
Azure Monitor now supports OpenTelemetry natively through the Azure Monitor OpenTelemetry Distro. This means you can instrument your applications using the vendor-neutral OpenTelemetry SDK and send telemetry directly to Azure Monitor without the classic Application Insights SDK. For new projects, Microsoft recommends using OpenTelemetry as the primary instrumentation approach, while the classic SDK remains fully supported for existing applications.
Azure Monitor Architecture
Azure Monitor is structured around a data platform that ingests telemetry from multiple sources and stores it in two primary data stores: Azure Monitor Metrics (a time-series database optimized for numeric values with timestamps) and Azure Monitor Logs (a log analytics store powered by Azure Data Explorer that supports the Kusto Query Language). Understanding this dual-store architecture is essential for designing effective monitoring strategies.
Data Sources
Azure Monitor collects data from several tiers of your application stack. At the application layer, Application Insights captures request rates, response times, failure rates, dependency calls, and custom events. At the infrastructure layer, VM insights collect CPU, memory, disk, and network metrics. At the platform layer, Azure resource logs (formerly diagnostic logs) capture operations performed on resources like storage accounts, databases, and networking components.
| Data Source | Data Type | Store | Typical Use Case |
|---|---|---|---|
| Application Insights | Requests, dependencies, exceptions, traces, custom events | Logs (Log Analytics) | APM, distributed tracing, user behavior analytics |
| Platform Metrics | CPU, memory, request count, latency (per resource) | Metrics | Real-time dashboards, autoscale triggers, quick health checks |
| Resource Logs | Audit events, operational logs, data plane operations | Logs (Log Analytics) | Compliance auditing, troubleshooting, security investigation |
| Activity Logs | Subscription-level events (create, update, delete resources) | Logs (Log Analytics) | Change tracking, governance, who-did-what auditing |
| VM Insights | Guest OS performance, process data, network dependencies | Logs + Metrics | Infrastructure monitoring, capacity planning, dependency mapping |
| Container Insights | Pod/node metrics, container logs, Prometheus metrics | Logs + Metrics | AKS monitoring, container performance, Kubernetes health |
Log Analytics Workspace Design
A Log Analytics workspace is the logical container for log data in Azure Monitor. The workspace design decision is one of the most important architectural choices because it affects data access control, cost management, query scope, and data residency compliance. Most organizations should start with a centralized workspace strategy and only split into multiple workspaces when there is a concrete requirement such as data sovereignty, billing separation, or strict access isolation.
# Create a resource group for monitoring resources
az group create \
--name rg-monitoring-prod \
--location eastus
# Create a Log Analytics workspace with 90-day retention
az monitor log-analytics workspace create \
--resource-group rg-monitoring-prod \
--workspace-name law-central-prod \
--location eastus \
--retention-time 90 \
--sku PerGB2018
# Verify workspace creation
az monitor log-analytics workspace show \
--resource-group rg-monitoring-prod \
--workspace-name law-central-prod \
--query '{Name:name, Sku:sku.name, Retention:retentionInDays, Id:customerId}' \
--output tableApplication Insights Setup & Configuration
Application Insights is the Application Performance Management (APM) component of Azure Monitor. It provides deep telemetry about your application's behavior including server-side request processing, client-side page loads, dependency calls to databases and external services, exception tracking, and custom event telemetry. Application Insights uses a workspace-based model where all telemetry is stored in a Log Analytics workspace, enabling cross-resource correlation and unified querying.
Creating an Application Insights Resource
Every Application Insights resource is associated with a Log Analytics workspace. When you create a new Application Insights resource, you must specify which workspace should receive its telemetry. This workspace-based model replaced the older classic model and provides better integration with the broader Azure Monitor ecosystem.
# Create a workspace-based Application Insights resource
az monitor app-insights component create \
--app appi-mywebapp-prod \
--location eastus \
--resource-group rg-monitoring-prod \
--workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.OperationalInsights/workspaces/law-central-prod \
--application-type web \
--kind web
# Retrieve the connection string (used for SDK configuration)
az monitor app-insights component show \
--app appi-mywebapp-prod \
--resource-group rg-monitoring-prod \
--query connectionString \
--output tsvInstrumenting a .NET Application
For .NET applications, the Azure Monitor OpenTelemetry Distro is the recommended approach for new projects. It provides auto-instrumentation for ASP.NET Core, HTTP clients, SQL clients, and other common libraries, while also allowing you to add custom telemetry.
using Azure.Monitor.OpenTelemetry.AspNetCore;
var builder = WebApplication.CreateBuilder(args);
// Add Azure Monitor OpenTelemetry distro
builder.Services.AddOpenTelemetry().UseAzureMonitor(options =>
{
options.ConnectionString = builder.Configuration
.GetConnectionString("ApplicationInsights");
});
// Add services
builder.Services.AddControllers();
builder.Services.AddHealthChecks();
var app = builder.Build();
app.MapControllers();
app.MapHealthChecks("/health");
app.Run();{
"ConnectionStrings": {
"ApplicationInsights": "InstrumentationKey=00000000-0000-0000-0000-000000000000;IngestionEndpoint=https://eastus-0.in.applicationinsights.azure.com/;LiveEndpoint=https://eastus.livediagnostics.monitor.azure.com/"
},
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.AspNetCore": "Warning"
}
},
"ApplicationInsights": {
"EnableAdaptiveSampling": true,
"EnableDependencyTrackingTelemetryModule": true,
"EnablePerformanceCounterCollectionModule": true
}
}Connection Strings vs Instrumentation Keys
Microsoft recommends using connection strings instead of standalone instrumentation keys for configuring Application Insights. Connection strings include the ingestion endpoint URL, which enables features like regional ingestion, private link support, and sovereign cloud compatibility. Instrumentation keys alone will continue to work but lack these capabilities. Always store connection strings in Azure Key Vault or environment variables; never hard-code them in source code.
Instrumenting a Node.js Application
For Node.js applications, the @azure/monitor-opentelemetry package provides auto-instrumentation for Express, HTTP, and other popular frameworks. The setup follows the same OpenTelemetry pattern as the .NET distro.
const { useAzureMonitor } = require("@azure/monitor-opentelemetry");
// Must be called before other imports to patch modules
useAzureMonitor({
azureMonitorExporterOptions: {
connectionString: process.env.APPLICATIONINSIGHTS_CONNECTION_STRING
},
instrumentationOptions: {
http: { enabled: true },
azureSdk: { enabled: true },
mongoDb: { enabled: true },
mySql: { enabled: true },
postgreSql: { enabled: true },
redis: { enabled: true }
}
});
const express = require("express");
const app = express();
app.get("/", (req, res) => {
res.json({ status: "healthy", timestamp: new Date().toISOString() });
});
app.get("/api/users/:id", async (req, res) => {
// Dependency calls are automatically tracked
const user = await fetchUserFromDatabase(req.params.id);
res.json(user);
});
app.listen(3000, () => console.log("Server running on port 3000"));Log Analytics & KQL Queries
Kusto Query Language (KQL) is the query language used to explore and analyze data in Azure Monitor Logs. KQL is a read-only, request-based language designed for big-data analytics. It combines a SQL-like syntax with pipeline-style data transformations, making it both powerful and intuitive once you learn the core operators. Mastering KQL is essential for getting value from your observability data, from simple log searches to complex statistical analyses and anomaly detection.
Essential KQL Operators
| Operator | Purpose | Example |
|---|---|---|
where | Filter rows based on conditions | requests | where resultCode == 500 |
summarize | Aggregate data (count, avg, sum, percentile) | requests | summarize count() by bin(timestamp, 1h) |
project | Select and rename columns | requests | project name, duration, resultCode |
extend | Add computed columns | requests | extend durationMs = duration * 1000 |
join | Combine tables on matching keys | requests | join dependencies on operation_Id |
render | Visualize results as charts | ... | render timechart |
order by | Sort results | requests | order by duration desc |
top | Return top N rows by a column | requests | top 10 by duration |
Practical KQL Queries for Application Insights
The following KQL queries demonstrate common analysis scenarios when working with Application Insights data. These queries run against the standard Application Insights tables including requests, dependencies,exceptions, traces, and customEvents.
// 1. Request failure rate by endpoint (last 24 hours)
requests
| where timestamp > ago(24h)
| summarize totalRequests = count(),
failedRequests = countif(success == false),
avgDuration = avg(duration),
p95Duration = percentile(duration, 95),
p99Duration = percentile(duration, 99)
by name
| extend failureRate = round(100.0 * failedRequests / totalRequests, 2)
| order by failureRate desc
| project name, totalRequests, failedRequests, failureRate,
avgDuration = round(avgDuration, 1),
p95Duration = round(p95Duration, 1),
p99Duration = round(p99Duration, 1)
// 2. Slowest dependency calls (database, HTTP, etc.)
dependencies
| where timestamp > ago(1h)
| summarize avgDuration = avg(duration),
p95Duration = percentile(duration, 95),
callCount = count(),
failureCount = countif(success == false)
by target, type, name
| where callCount > 10
| order by p95Duration desc
| take 20
// 3. Exception trends by type
exceptions
| where timestamp > ago(7d)
| summarize exceptionCount = count() by bin(timestamp, 1h), type
| render timechart
// 4. Unique users and sessions over time
customEvents
| where timestamp > ago(30d)
| summarize users = dcount(user_Id),
sessions = dcount(session_Id)
by bin(timestamp, 1d)
| render timechart
// 5. End-to-end transaction analysis
requests
| where timestamp > ago(1h) and duration > 5000
| project operation_Id, name, duration, resultCode, timestamp
| join kind=inner (
dependencies
| where timestamp > ago(1h)
| project operation_Id, depName = name, depDuration = duration,
depType = type, depTarget = target, depSuccess = success
) on operation_Id
| order by duration desc
| project timestamp, name, duration, resultCode,
depName, depDuration, depType, depTarget, depSuccessKQL Performance Tips
Always place time filters (where timestamp > ago(24h)) as early as possible in your query pipeline. This dramatically reduces the data scanned and improves query performance. Also use project to limit columns early when you only need specific fields. For queries that run frequently (dashboards, alerts), test them in Log Analytics and check the query execution statistics to ensure they complete within acceptable time and cost limits.
Metrics & Custom Metrics
Azure Monitor Metrics is a time-series database optimized for storing and querying numeric values with timestamps. Platform metrics are collected automatically from Azure resources at one-minute intervals with no configuration required. These include metrics like CPU percentage, memory usage, request count, and latency for services like App Service, Virtual Machines, SQL Database, and Cosmos DB.
In addition to platform metrics, you can emit custom metrics from your application code to track business-specific KPIs such as orders processed per minute, cache hit ratios, queue depth, or any other numeric value relevant to your application's health.
Custom Metrics with OpenTelemetry
Custom metrics can be emitted using the OpenTelemetry metrics API. These metrics flow through the Azure Monitor exporter and appear in Azure Monitor Metrics alongside platform metrics, enabling unified dashboarding and alerting.
using System.Diagnostics.Metrics;
public class OrderMetrics
{
private static readonly Meter OrderMeter = new("MyApp.Orders", "1.0.0");
// Counter: monotonically increasing value
private static readonly Counter<long> OrdersProcessed =
OrderMeter.CreateCounter<long>(
"orders.processed",
description: "Total number of orders processed");
// Histogram: distribution of values (e.g., latency, size)
private static readonly Histogram<double> OrderProcessingTime =
OrderMeter.CreateHistogram<double>(
"orders.processing_time_ms",
unit: "ms",
description: "Time to process an order");
// UpDownCounter: value that can increase or decrease
private static readonly UpDownCounter<int> ActiveOrders =
OrderMeter.CreateUpDownCounter<int>(
"orders.active",
description: "Number of orders currently being processed");
public async Task ProcessOrderAsync(Order order)
{
ActiveOrders.Add(1);
var stopwatch = Stopwatch.StartNew();
try
{
await ValidateOrder(order);
await ChargePayment(order);
await FulfillOrder(order);
OrdersProcessed.Add(1,
new KeyValuePair<string, object?>("region", order.Region),
new KeyValuePair<string, object?>("type", order.Type));
}
finally
{
stopwatch.Stop();
OrderProcessingTime.Record(stopwatch.ElapsedMilliseconds);
ActiveOrders.Add(-1);
}
}
}Querying Metrics via Azure CLI
You can query both platform and custom metrics programmatically using the Azure CLI or REST API. This is useful for building custom dashboards, integrating with external tools, or automating operational workflows.
# List available metrics for an App Service
az monitor metrics list-definitions \
--resource /subscriptions/<sub-id>/resourceGroups/rg-app/providers/Microsoft.Web/sites/mywebapp \
--query "[].{Name:name.value, Unit:unit, Aggregation:primaryAggregationType}" \
--output table
# Query CPU percentage for the last 6 hours (5-minute intervals)
az monitor metrics list \
--resource /subscriptions/<sub-id>/resourceGroups/rg-app/providers/Microsoft.Web/sites/mywebapp \
--metric "CpuPercentage" \
--aggregation Average Maximum \
--interval PT5M \
--start-time 2024-01-15T00:00:00Z \
--end-time 2024-01-15T06:00:00Z \
--output table
# Query custom metric from Application Insights
az monitor app-insights metrics show \
--app appi-mywebapp-prod \
--resource-group rg-monitoring-prod \
--metrics "customMetrics/orders.processed" \
--aggregation sum \
--interval PT1HAlert Rules & Action Groups
Azure Monitor Alerts proactively notify you when conditions in your monitoring data indicate a potential problem. Alerts are composed of two parts: an alert rule that defines the condition to evaluate, and an action group that defines what happens when the alert fires. Azure Monitor supports three types of alert rules: metric alerts (evaluate metric values), log search alerts (run KQL queries on log data), and activity log alerts (trigger on subscription-level events).
Alert Rule Types
| Alert Type | Evaluation | Latency | Cost | Best For |
|---|---|---|---|---|
| Metric Alert | Checks metric value at regular intervals | 1–5 minutes | Low | Threshold-based alerts (CPU > 80%, response time > 2s) |
| Log Search Alert | Runs a KQL query at scheduled intervals | 5–15 minutes | Medium (per query execution) | Complex conditions, multi-resource correlation, pattern detection |
| Activity Log Alert | Triggers on subscription events | Near real-time | Free | Resource deletion, service health events, policy violations |
| Smart Detection | ML-based anomaly detection in App Insights | Varies | Included with App Insights | Failure anomalies, performance degradation, memory leaks |
# Create an action group with email and webhook notifications
az monitor action-group create \
--resource-group rg-monitoring-prod \
--name ag-platform-team \
--short-name PlatTeam \
--email-receiver name="Platform Lead" email-address="platform-lead@company.com" \
--webhook-receiver name="PagerDuty" uri="https://events.pagerduty.com/integration/<key>/enqueue"
# Create a metric alert for high response time
az monitor metrics alert create \
--resource-group rg-monitoring-prod \
--name "alert-high-response-time" \
--description "Average response time exceeds 2 seconds" \
--scopes /subscriptions/<sub-id>/resourceGroups/rg-app/providers/Microsoft.Web/sites/mywebapp \
--condition "avg HttpResponseTime > 2000" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 2 \
--action ag-platform-team
# Create a log search alert for error spike detection
az monitor scheduled-query create \
--resource-group rg-monitoring-prod \
--name "alert-error-spike" \
--description "More than 50 server errors in 5 minutes" \
--scopes /subscriptions/<sub-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.OperationalInsights/workspaces/law-central-prod \
--condition "count > 50" \
--condition-query "requests | where resultCode startswith '5'" \
--window-size 5 \
--evaluation-frequency 5 \
--severity 1 \
--action /subscriptions/<sub-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.Insights/actionGroups/ag-platform-teamAlert Fatigue
One of the most common pitfalls in monitoring is alert fatigue, creating too many alerts or setting thresholds too aggressively, resulting in a flood of notifications that teams learn to ignore. Start with a small number of high-signal alerts focused on user-facing impact (error rates, latency, availability) rather than infrastructure metrics. Use severity levels consistently: Sev 0 for customer-impacting outages, Sev 1 for degraded performance, Sev 2 for potential issues requiring investigation, and Sev 3 for informational alerts.
Distributed Tracing & Application Map
Distributed tracing is essential for understanding request flow through microservices architectures. When a user request enters your system, it may traverse multiple services, databases, message queues, and external APIs before a response is returned. Distributed tracing correlates all of these interactions using a shared operation ID, allowing you to visualize the entire call chain, identify bottlenecks, and pinpoint failure points.
Application Insights automatically correlates telemetry across services using the W3C Trace Context standard. When Service A calls Service B via HTTP, the trace context headers (traceparent and tracestate) are propagated automatically by the SDK, ensuring that both services' telemetry is linked under the same operation ID.
Viewing the Application Map
The Application Map in Application Insights provides a visual topology of your application's components and their dependencies. Each node represents a component (your application, a database, an external API), and the edges show the calls between them with aggregate metrics like call count, average duration, and failure rate. This is invaluable for quickly identifying which dependency is causing performance issues.
End-to-End Transaction Diagnostics
For individual requests, the Transaction Diagnostics view shows a Gantt chart of every operation involved in processing a request. You can see exactly how long each dependency call took, which calls were made in parallel versus sequentially, and where exceptions occurred. This view is accessible from any request, dependency, or exception telemetry item in Application Insights.
// Find slow end-to-end transactions spanning multiple services
let slowOperations = requests
| where timestamp > ago(1h)
| where duration > 5000
| project operation_Id, requestName = name, requestDuration = duration;
// Get all telemetry for those operations across all services
union requests, dependencies, exceptions, traces
| where timestamp > ago(1h)
| where operation_Id in ((slowOperations | project operation_Id))
| project timestamp, operation_Id, itemType,
name = coalesce(name, type, problemId),
duration,
success,
cloud_RoleName,
cloud_RoleInstance
| order by operation_Id, timestamp asc
// Application dependency health summary
dependencies
| where timestamp > ago(24h)
| summarize totalCalls = count(),
failedCalls = countif(success == false),
avgDuration = avg(duration),
p99Duration = percentile(duration, 99)
by target, type, cloud_RoleName
| extend failureRate = round(100.0 * failedCalls / totalCalls, 2)
| order by failureRate desc, totalCalls descAvailability Tests & Synthetic Monitoring
Availability tests (also called web tests) allow you to proactively monitor your application's health by sending synthetic requests from multiple Azure data center locations around the world. These tests verify that your application is reachable, responds within expected time limits, and returns the expected content, all without waiting for real users to encounter problems.
Types of Availability Tests
| Test Type | Description | Use Case |
|---|---|---|
| Standard Test | Single URL request with response validation | Health endpoint monitoring, simple uptime checks |
| Custom TrackAvailability | Code-based test using Azure Functions or custom logic | Multi-step workflows, authenticated endpoints, API sequences |
# Create a standard availability test (URL ping test)
az monitor app-insights web-test create \
--resource-group rg-monitoring-prod \
--name "avail-homepage-check" \
--defined-web-test-name "Homepage Availability" \
--location "us-fl-mia-edge" \
--location "emea-gb-db3-azr" \
--location "apac-sg-sin-azr" \
--location "us-ca-sjc-azr" \
--location "emea-nl-ams-azr" \
--frequency 300 \
--timeout 120 \
--kind standard \
--enabled true \
--web-test-kind standard \
--request-url "https://mywebapp.azurewebsites.net/health" \
--expected-status-code 200 \
--ssl-check true \
--ssl-lifetime-check 7 \
--tags "hidden-link:/subscriptions/<sub-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.Insights/components/appi-mywebapp-prod=Resource"Custom Availability Test with Azure Functions
For more complex scenarios like testing authenticated APIs, multi-step transactions, or checking specific response content, you can implement custom availability tests using Azure Functions that call the TrackAvailability API.
using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.DataContracts;
using Microsoft.Azure.Functions.Worker;
using System.Diagnostics;
public class AvailabilityTestFunction
{
private readonly TelemetryClient _telemetry;
private readonly HttpClient _httpClient;
public AvailabilityTestFunction(
TelemetryClient telemetry,
IHttpClientFactory httpClientFactory)
{
_telemetry = telemetry;
_httpClient = httpClientFactory.CreateClient("ApiClient");
}
[Function("CheckApiHealth")]
public async Task Run(
[TimerTrigger("0 */5 * * * *")] TimerInfo timer)
{
var availability = new AvailabilityTelemetry
{
Name = "API Health Check",
RunLocation = Environment.GetEnvironmentVariable("REGION_NAME"),
Success = false
};
var stopwatch = Stopwatch.StartNew();
try
{
// Step 1: Authenticate
var token = await GetAccessTokenAsync();
// Step 2: Call protected API endpoint
_httpClient.DefaultRequestHeaders.Authorization =
new AuthenticationHeaderValue("Bearer", token);
var response = await _httpClient.GetAsync("/api/health/deep");
response.EnsureSuccessStatusCode();
var body = await response.Content.ReadAsStringAsync();
// Step 3: Validate response content
if (body.Contains("\"status\":\"healthy\""))
{
availability.Success = true;
availability.Message = "All health checks passed";
}
else
{
availability.Message = $"Unexpected response: {body[..200]}";
}
}
catch (Exception ex)
{
availability.Message = ex.Message;
_telemetry.TrackException(ex);
}
finally
{
stopwatch.Stop();
availability.Duration = stopwatch.Elapsed;
_telemetry.TrackAvailability(availability);
}
}
}Workbooks & Dashboards
Azure Workbooks provide a flexible canvas for creating rich, interactive reports that combine text, KQL queries, metrics, and parameters into shareable documents. Unlike Azure Dashboards (which are designed for at-a-glance operational views), Workbooks are designed for deeper analysis and storytelling; they support parameterized queries, conditional visibility, drill-down navigation, and narrative text alongside visualizations.
Workbooks vs Dashboards
| Feature | Azure Workbooks | Azure Dashboards |
|---|---|---|
| Primary Use | Deep analysis, incident investigation, reports | Operational overview, NOC screens, quick glance |
| Interactivity | Parameters, drill-down, conditional sections | Time range picker, basic filtering |
| Data Sources | Logs, Metrics, Azure Resource Graph, custom endpoints | Metrics, Logs, Markdown, pinned query results |
| Sharing | Saved as Azure resources, gallery templates | Shared dashboards via RBAC, published to portal |
| Visualization Types | Grids, charts, tiles, maps, text, honeycomb | Charts, metrics tiles, Markdown, pinned blades |
Creating a Workbook Template
Workbook templates can be defined as ARM/Bicep resources, making them deployable through infrastructure-as-code pipelines. This is useful for standardizing monitoring views across environments and teams.
param location string = resourceGroup().location
param appInsightsId string
param workspaceName string = 'law-central-prod'
resource workspace 'Microsoft.OperationalInsights/workspaces@2022-10-01' existing = {
name: workspaceName
}
resource workbook 'Microsoft.Insights/workbooks@2022-04-01' = {
name: guid('app-health-workbook', resourceGroup().id)
location: location
kind: 'shared'
properties: {
displayName: 'Application Health Overview'
category: 'workbook'
sourceId: appInsightsId
serializedData: loadTextContent('workbook-template.json')
version: '1.0'
}
tags: {
'hidden-title': 'Application Health Overview'
environment: 'production'
}
}Gallery Templates
Azure Monitor includes a gallery of pre-built workbook templates covering common scenarios like failure analysis, performance diagnostics, and usage analytics. Before building custom workbooks, check the gallery. You can clone an existing template and customize it, which is significantly faster than starting from scratch. Access the gallery from any Application Insights resource by navigating to Workbooks in the left menu.
Cost Management & Best Practices
Azure Monitor costs are primarily driven by two factors: log data ingestion volume (measured in GB per day) and log data retention (how long data is kept). Understanding the cost model and implementing data optimization strategies is critical, as monitoring costs can escalate quickly in large environments, especially when verbose application logging, diagnostic settings, and multiple data sources are enabled without careful planning.
Cost Optimization Strategies
The following strategies help control Azure Monitor costs while maintaining the observability coverage your team needs:
- Sampling: Application Insights supports adaptive sampling, which automatically reduces telemetry volume while preserving statistically accurate metrics. For high-traffic applications, sampling can reduce costs by 80–90% with minimal impact on diagnostic capability.
- Data Collection Rules (DCR): Use DCRs to filter and transform data before it reaches the workspace. You can drop unnecessary columns, filter out low-value events, and route different data types to different tables or workspaces based on their retention requirements.
- Basic Logs tier: For high-volume, low-query-frequency data (like verbose debug logs or security logs you only search during investigations), use the Basic Logs tier which costs approximately 60% less for ingestion but has limited query capabilities and a small per-query cost.
- Commitment Tiers: If your daily ingestion is consistently above 100 GB, consider commitment tiers (100, 200, 300, 400, 500 GB/day) which offer significant per-GB discounts compared to pay-as-you-go pricing.
- Archive tier: For long-term retention requirements (compliance, forensics), use the Archive tier which costs a fraction of interactive retention but requires a restore operation to query the data.
- Diagnostic settings audit: Regularly review which resources have diagnostic settings enabled and which log categories are being collected. Many organizations enable all categories during setup and never revisit, resulting in significant unnecessary ingestion.
Estimating Costs
// Daily ingestion volume by table (last 30 days)
Usage
| where TimeGenerated > ago(30d)
| where IsBillable == true
| summarize IngestedGB = round(sum(Quantity) / 1024, 2) by bin(TimeGenerated, 1d), DataType
| order by TimeGenerated desc, IngestedGB desc
// Top 10 tables by cost contribution
Usage
| where TimeGenerated > ago(30d)
| where IsBillable == true
| summarize TotalGB = round(sum(Quantity) / 1024, 2) by DataType
| top 10 by TotalGB
| extend EstimatedMonthlyCost = round(TotalGB * 2.76, 2) // ~$2.76/GB pay-as-you-go
| order by EstimatedMonthlyCost desc
// Identify verbose trace sources
traces
| where timestamp > ago(1d)
| summarize traceCount = count(), estimatedSizeMB = round(sum(estimate_data_size()) / 1048576, 2)
by cloud_RoleName, severityLevel
| order by estimatedSizeMB desc
| take 20Best Practices Summary
- Centralize your workspace: Use a single Log Analytics workspace for most scenarios. Split only for data sovereignty, strict access isolation, or billing separation requirements.
- Use OpenTelemetry for new projects: The Azure Monitor OpenTelemetry Distro provides vendor-neutral instrumentation with full Azure Monitor integration.
- Implement structured logging: Use structured log formats (JSON) with consistent property names across services to enable powerful cross-service queries.
- Set up alerts for SLIs: Define Service Level Indicators (error rate, latency percentiles, availability) and alert on deviations from your Service Level Objectives.
- Enable distributed tracing: Ensure all services in your architecture propagate W3C Trace Context headers for end-to-end transaction visibility.
- Review costs monthly: Use the ingestion analysis KQL queries to track data volume trends and identify opportunities for optimization before costs spiral.
- Automate with Infrastructure as Code: Deploy monitoring resources (workspaces, Application Insights, alert rules, workbooks) through Bicep or Terraform to ensure consistency across environments.
Azure Monitor Agent (AMA)
The Azure Monitor Agent (AMA) is the next-generation agent that replaces the legacy Log Analytics agent (MMA/OMS) and the Diagnostics extension. AMA uses Data Collection Rules (DCRs) for configuration, supports multiple workspaces, and provides more granular control over data collection. If you are still using the legacy agent, plan your migration to AMA, as the legacy agent was deprecated in August 2024.
Key Takeaways
- 1Azure Monitor is the unified observability platform collecting metrics, logs, and traces.
- 2Application Insights provides deep application performance monitoring with auto-instrumentation.
- 3KQL (Kusto Query Language) enables powerful log analytics across all Azure monitoring data.
- 4Alert rules with action groups automate incident response across email, SMS, webhooks, and ITSM.
- 5Distributed tracing via Application Map visualizes dependencies across microservices.
- 6Workbooks provide customizable interactive dashboards for operational and business metrics.
Frequently Asked Questions
What is the difference between Azure Monitor and Application Insights?
What is KQL and why should I learn it?
How much does Azure Monitor cost?
Does Application Insights support auto-instrumentation?
Can I use Azure Monitor with non-Azure resources?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.