AzureObservabilityintermediate

Azure Monitor & Application Insights

Master Azure Monitor, Application Insights, Log Analytics, KQL queries, metrics, alerts, distributed tracing, and workbooks.

CloudToolStack Editorial26 min readPublished Feb 22, 2026

Prerequisites

Basic understanding of Azure services
Familiarity with Azure portal and Azure CLI
Experience with web application development

Azure Observability Overview

Observability in cloud-native environments goes far beyond traditional monitoring. It encompasses the ability to understand the internal state of your systems by examining their external outputs: logs, metrics, and traces. Azure provides a comprehensive observability stack built around Azure Monitor, which serves as the unified platform for collecting, analyzing, and acting on telemetry data from your cloud and on-premises environments.

Azure Monitor sits at the center of the observability ecosystem, aggregating data from multiple sources including Application Insights (application performance monitoring), Log Analytics (centralized log storage and querying), Azure Metrics (time-series performance data), and Azure Alerts (proactive notification and automation). Together, these services provide end-to-end visibility into the health, performance, and behavior of your applications and infrastructure.

Whether you are running a simple web application on Azure App Service, a complex microservices architecture on Azure Kubernetes Service, or a hybrid environment spanning on-premises data centers and multiple Azure regions, the observability principles and tools covered in this guide apply. We will walk through the architecture, configuration, querying, alerting, and cost management aspects of Azure Monitor and Application Insights with practical examples you can apply immediately.

OpenTelemetry & Azure Monitor

Azure Monitor now supports OpenTelemetry natively through the Azure Monitor OpenTelemetry Distro. This means you can instrument your applications using the vendor-neutral OpenTelemetry SDK and send telemetry directly to Azure Monitor without the classic Application Insights SDK. For new projects, Microsoft recommends using OpenTelemetry as the primary instrumentation approach, while the classic SDK remains fully supported for existing applications.

Azure Monitor Architecture

Azure Monitor is structured around a data platform that ingests telemetry from multiple sources and stores it in two primary data stores: Azure Monitor Metrics (a time-series database optimized for numeric values with timestamps) and Azure Monitor Logs (a log analytics store powered by Azure Data Explorer that supports the Kusto Query Language). Understanding this dual-store architecture is essential for designing effective monitoring strategies.

Data Sources

Azure Monitor collects data from several tiers of your application stack. At the application layer, Application Insights captures request rates, response times, failure rates, dependency calls, and custom events. At the infrastructure layer, VM insights collect CPU, memory, disk, and network metrics. At the platform layer, Azure resource logs (formerly diagnostic logs) capture operations performed on resources like storage accounts, databases, and networking components.

Data Source	Data Type	Store	Typical Use Case
Application Insights	Requests, dependencies, exceptions, traces, custom events	Logs (Log Analytics)	APM, distributed tracing, user behavior analytics
Platform Metrics	CPU, memory, request count, latency (per resource)	Metrics	Real-time dashboards, autoscale triggers, quick health checks
Resource Logs	Audit events, operational logs, data plane operations	Logs (Log Analytics)	Compliance auditing, troubleshooting, security investigation
Activity Logs	Subscription-level events (create, update, delete resources)	Logs (Log Analytics)	Change tracking, governance, who-did-what auditing
VM Insights	Guest OS performance, process data, network dependencies	Logs + Metrics	Infrastructure monitoring, capacity planning, dependency mapping
Container Insights	Pod/node metrics, container logs, Prometheus metrics	Logs + Metrics	AKS monitoring, container performance, Kubernetes health

Log Analytics Workspace Design

A Log Analytics workspace is the logical container for log data in Azure Monitor. The workspace design decision is one of the most important architectural choices because it affects data access control, cost management, query scope, and data residency compliance. Most organizations should start with a centralized workspace strategy and only split into multiple workspaces when there is a concrete requirement such as data sovereignty, billing separation, or strict access isolation.

Terminal: Create a Log Analytics workspace

# Create a resource group for monitoring resources
az group create \
  --name rg-monitoring-prod \
  --location eastus

# Create a Log Analytics workspace with 90-day retention
az monitor log-analytics workspace create \
  --resource-group rg-monitoring-prod \
  --workspace-name law-central-prod \
  --location eastus \
  --retention-time 90 \
  --sku PerGB2018

# Verify workspace creation
az monitor log-analytics workspace show \
  --resource-group rg-monitoring-prod \
  --workspace-name law-central-prod \
  --query '{Name:name, Sku:sku.name, Retention:retentionInDays, Id:customerId}' \
  --output table

Application Insights Setup & Configuration

Application Insights is the Application Performance Management (APM) component of Azure Monitor. It provides deep telemetry about your application's behavior including server-side request processing, client-side page loads, dependency calls to databases and external services, exception tracking, and custom event telemetry. Application Insights uses a workspace-based model where all telemetry is stored in a Log Analytics workspace, enabling cross-resource correlation and unified querying.

Creating an Application Insights Resource

Every Application Insights resource is associated with a Log Analytics workspace. When you create a new Application Insights resource, you must specify which workspace should receive its telemetry. This workspace-based model replaced the older classic model and provides better integration with the broader Azure Monitor ecosystem.

Terminal: Create Application Insights resource

# Create a workspace-based Application Insights resource
az monitor app-insights component create \
  --app appi-mywebapp-prod \
  --location eastus \
  --resource-group rg-monitoring-prod \
  --workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.OperationalInsights/workspaces/law-central-prod \
  --application-type web \
  --kind web

# Retrieve the connection string (used for SDK configuration)
az monitor app-insights component show \
  --app appi-mywebapp-prod \
  --resource-group rg-monitoring-prod \
  --query connectionString \
  --output tsv

Instrumenting a .NET Application

For .NET applications, the Azure Monitor OpenTelemetry Distro is the recommended approach for new projects. It provides auto-instrumentation for ASP.NET Core, HTTP clients, SQL clients, and other common libraries, while also allowing you to add custom telemetry.

Program.cs: ASP.NET Core with OpenTelemetry

using Azure.Monitor.OpenTelemetry.AspNetCore;

var builder = WebApplication.CreateBuilder(args);

// Add Azure Monitor OpenTelemetry distro
builder.Services.AddOpenTelemetry().UseAzureMonitor(options =>
{
    options.ConnectionString = builder.Configuration
        .GetConnectionString("ApplicationInsights");
});

// Add services
builder.Services.AddControllers();
builder.Services.AddHealthChecks();

var app = builder.Build();

app.MapControllers();
app.MapHealthChecks("/health");
app.Run();

appsettings.json: Connection string configuration

{
  "ConnectionStrings": {
    "ApplicationInsights": "InstrumentationKey=00000000-0000-0000-0000-000000000000;IngestionEndpoint=https://eastus-0.in.applicationinsights.azure.com/;LiveEndpoint=https://eastus.livediagnostics.monitor.azure.com/"
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft.AspNetCore": "Warning"
    }
  },
  "ApplicationInsights": {
    "EnableAdaptiveSampling": true,
    "EnableDependencyTrackingTelemetryModule": true,
    "EnablePerformanceCounterCollectionModule": true
  }
}

Connection Strings vs Instrumentation Keys

Microsoft recommends using connection strings instead of standalone instrumentation keys for configuring Application Insights. Connection strings include the ingestion endpoint URL, which enables features like regional ingestion, private link support, and sovereign cloud compatibility. Instrumentation keys alone will continue to work but lack these capabilities. Always store connection strings in Azure Key Vault or environment variables; never hard-code them in source code.

Instrumenting a Node.js Application

For Node.js applications, the @azure/monitor-opentelemetry package provides auto-instrumentation for Express, HTTP, and other popular frameworks. The setup follows the same OpenTelemetry pattern as the .NET distro.

app.js: Node.js with Azure Monitor OpenTelemetry

const { useAzureMonitor } = require("@azure/monitor-opentelemetry");

// Must be called before other imports to patch modules
useAzureMonitor({
  azureMonitorExporterOptions: {
    connectionString: process.env.APPLICATIONINSIGHTS_CONNECTION_STRING
  },
  instrumentationOptions: {
    http: { enabled: true },
    azureSdk: { enabled: true },
    mongoDb: { enabled: true },
    mySql: { enabled: true },
    postgreSql: { enabled: true },
    redis: { enabled: true }
  }
});

const express = require("express");
const app = express();

app.get("/", (req, res) => {
  res.json({ status: "healthy", timestamp: new Date().toISOString() });
});

app.get("/api/users/:id", async (req, res) => {
  // Dependency calls are automatically tracked
  const user = await fetchUserFromDatabase(req.params.id);
  res.json(user);
});

app.listen(3000, () => console.log("Server running on port 3000"));

Log Analytics & KQL Queries

Kusto Query Language (KQL) is the query language used to explore and analyze data in Azure Monitor Logs. KQL is a read-only, request-based language designed for big-data analytics. It combines a SQL-like syntax with pipeline-style data transformations, making it both powerful and intuitive once you learn the core operators. Mastering KQL is essential for getting value from your observability data, from simple log searches to complex statistical analyses and anomaly detection.

Essential KQL Operators

Operator	Purpose	Example
`where`	Filter rows based on conditions	`requests \| where resultCode == 500`
`summarize`	Aggregate data (count, avg, sum, percentile)	`requests \| summarize count() by bin(timestamp, 1h)`
`project`	Select and rename columns	`requests \| project name, duration, resultCode`
`extend`	Add computed columns	`requests \| extend durationMs = duration * 1000`
`join`	Combine tables on matching keys	`requests \| join dependencies on operation_Id`
`render`	Visualize results as charts	`... \| render timechart`
`order by`	Sort results	`requests \| order by duration desc`
`top`	Return top N rows by a column	`requests \| top 10 by duration`

Practical KQL Queries for Application Insights

The following KQL queries demonstrate common analysis scenarios when working with Application Insights data. These queries run against the standard Application Insights tables including requests, dependencies,exceptions, traces, and customEvents.

KQL: Application performance analysis queries

// 1. Request failure rate by endpoint (last 24 hours)
requests
| where timestamp > ago(24h)
| summarize totalRequests = count(),
            failedRequests = countif(success == false),
            avgDuration = avg(duration),
            p95Duration = percentile(duration, 95),
            p99Duration = percentile(duration, 99)
  by name
| extend failureRate = round(100.0 * failedRequests / totalRequests, 2)
| order by failureRate desc
| project name, totalRequests, failedRequests, failureRate,
          avgDuration = round(avgDuration, 1),
          p95Duration = round(p95Duration, 1),
          p99Duration = round(p99Duration, 1)

// 2. Slowest dependency calls (database, HTTP, etc.)
dependencies
| where timestamp > ago(1h)
| summarize avgDuration = avg(duration),
            p95Duration = percentile(duration, 95),
            callCount = count(),
            failureCount = countif(success == false)
  by target, type, name
| where callCount > 10
| order by p95Duration desc
| take 20

// 3. Exception trends by type
exceptions
| where timestamp > ago(7d)
| summarize exceptionCount = count() by bin(timestamp, 1h), type
| render timechart

// 4. Unique users and sessions over time
customEvents
| where timestamp > ago(30d)
| summarize users = dcount(user_Id),
            sessions = dcount(session_Id)
  by bin(timestamp, 1d)
| render timechart

// 5. End-to-end transaction analysis
requests
| where timestamp > ago(1h) and duration > 5000
| project operation_Id, name, duration, resultCode, timestamp
| join kind=inner (
    dependencies
    | where timestamp > ago(1h)
    | project operation_Id, depName = name, depDuration = duration,
              depType = type, depTarget = target, depSuccess = success
  ) on operation_Id
| order by duration desc
| project timestamp, name, duration, resultCode,
          depName, depDuration, depType, depTarget, depSuccess

KQL Performance Tips

Always place time filters (where timestamp > ago(24h)) as early as possible in your query pipeline. This dramatically reduces the data scanned and improves query performance. Also use project to limit columns early when you only need specific fields. For queries that run frequently (dashboards, alerts), test them in Log Analytics and check the query execution statistics to ensure they complete within acceptable time and cost limits.

Metrics & Custom Metrics

Azure Monitor Metrics is a time-series database optimized for storing and querying numeric values with timestamps. Platform metrics are collected automatically from Azure resources at one-minute intervals with no configuration required. These include metrics like CPU percentage, memory usage, request count, and latency for services like App Service, Virtual Machines, SQL Database, and Cosmos DB.

In addition to platform metrics, you can emit custom metrics from your application code to track business-specific KPIs such as orders processed per minute, cache hit ratios, queue depth, or any other numeric value relevant to your application's health.

Custom Metrics with OpenTelemetry

Custom metrics can be emitted using the OpenTelemetry metrics API. These metrics flow through the Azure Monitor exporter and appear in Azure Monitor Metrics alongside platform metrics, enabling unified dashboarding and alerting.

CustomMetrics.cs: Emitting custom metrics in .NET

using System.Diagnostics.Metrics;

public class OrderMetrics
{
    private static readonly Meter OrderMeter = new("MyApp.Orders", "1.0.0");

    // Counter: monotonically increasing value
    private static readonly Counter<long> OrdersProcessed =
        OrderMeter.CreateCounter<long>(
            "orders.processed",
            description: "Total number of orders processed");

    // Histogram: distribution of values (e.g., latency, size)
    private static readonly Histogram<double> OrderProcessingTime =
        OrderMeter.CreateHistogram<double>(
            "orders.processing_time_ms",
            unit: "ms",
            description: "Time to process an order");

    // UpDownCounter: value that can increase or decrease
    private static readonly UpDownCounter<int> ActiveOrders =
        OrderMeter.CreateUpDownCounter<int>(
            "orders.active",
            description: "Number of orders currently being processed");

    public async Task ProcessOrderAsync(Order order)
    {
        ActiveOrders.Add(1);
        var stopwatch = Stopwatch.StartNew();

        try
        {
            await ValidateOrder(order);
            await ChargePayment(order);
            await FulfillOrder(order);

            OrdersProcessed.Add(1,
                new KeyValuePair<string, object?>("region", order.Region),
                new KeyValuePair<string, object?>("type", order.Type));
        }
        finally
        {
            stopwatch.Stop();
            OrderProcessingTime.Record(stopwatch.ElapsedMilliseconds);
            ActiveOrders.Add(-1);
        }
    }
}

Querying Metrics via Azure CLI

You can query both platform and custom metrics programmatically using the Azure CLI or REST API. This is useful for building custom dashboards, integrating with external tools, or automating operational workflows.

Terminal: Query Azure Monitor metrics

# List available metrics for an App Service
az monitor metrics list-definitions \
  --resource /subscriptions/<sub-id>/resourceGroups/rg-app/providers/Microsoft.Web/sites/mywebapp \
  --query "[].{Name:name.value, Unit:unit, Aggregation:primaryAggregationType}" \
  --output table

# Query CPU percentage for the last 6 hours (5-minute intervals)
az monitor metrics list \
  --resource /subscriptions/<sub-id>/resourceGroups/rg-app/providers/Microsoft.Web/sites/mywebapp \
  --metric "CpuPercentage" \
  --aggregation Average Maximum \
  --interval PT5M \
  --start-time 2024-01-15T00:00:00Z \
  --end-time 2024-01-15T06:00:00Z \
  --output table

# Query custom metric from Application Insights
az monitor app-insights metrics show \
  --app appi-mywebapp-prod \
  --resource-group rg-monitoring-prod \
  --metrics "customMetrics/orders.processed" \
  --aggregation sum \
  --interval PT1H

Alert Rules & Action Groups

Azure Monitor Alerts proactively notify you when conditions in your monitoring data indicate a potential problem. Alerts are composed of two parts: an alert rule that defines the condition to evaluate, and an action group that defines what happens when the alert fires. Azure Monitor supports three types of alert rules: metric alerts (evaluate metric values), log search alerts (run KQL queries on log data), and activity log alerts (trigger on subscription-level events).

Alert Rule Types

Alert Type	Evaluation	Latency	Cost	Best For
Metric Alert	Checks metric value at regular intervals	1–5 minutes	Low	Threshold-based alerts (CPU > 80%, response time > 2s)
Log Search Alert	Runs a KQL query at scheduled intervals	5–15 minutes	Medium (per query execution)	Complex conditions, multi-resource correlation, pattern detection
Activity Log Alert	Triggers on subscription events	Near real-time	Free	Resource deletion, service health events, policy violations
Smart Detection	ML-based anomaly detection in App Insights	Varies	Included with App Insights	Failure anomalies, performance degradation, memory leaks

Terminal: Create an action group and metric alert

# Create an action group with email and webhook notifications
az monitor action-group create \
  --resource-group rg-monitoring-prod \
  --name ag-platform-team \
  --short-name PlatTeam \
  --email-receiver name="Platform Lead" email-address="platform-lead@company.com" \
  --webhook-receiver name="PagerDuty" uri="https://events.pagerduty.com/integration/<key>/enqueue"

# Create a metric alert for high response time
az monitor metrics alert create \
  --resource-group rg-monitoring-prod \
  --name "alert-high-response-time" \
  --description "Average response time exceeds 2 seconds" \
  --scopes /subscriptions/<sub-id>/resourceGroups/rg-app/providers/Microsoft.Web/sites/mywebapp \
  --condition "avg HttpResponseTime > 2000" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --severity 2 \
  --action ag-platform-team

# Create a log search alert for error spike detection
az monitor scheduled-query create \
  --resource-group rg-monitoring-prod \
  --name "alert-error-spike" \
  --description "More than 50 server errors in 5 minutes" \
  --scopes /subscriptions/<sub-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.OperationalInsights/workspaces/law-central-prod \
  --condition "count > 50" \
  --condition-query "requests | where resultCode startswith '5'" \
  --window-size 5 \
  --evaluation-frequency 5 \
  --severity 1 \
  --action /subscriptions/<sub-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.Insights/actionGroups/ag-platform-team

Alert Fatigue

One of the most common pitfalls in monitoring is alert fatigue, creating too many alerts or setting thresholds too aggressively, resulting in a flood of notifications that teams learn to ignore. Start with a small number of high-signal alerts focused on user-facing impact (error rates, latency, availability) rather than infrastructure metrics. Use severity levels consistently: Sev 0 for customer-impacting outages, Sev 1 for degraded performance, Sev 2 for potential issues requiring investigation, and Sev 3 for informational alerts.

Distributed Tracing & Application Map

Distributed tracing is essential for understanding request flow through microservices architectures. When a user request enters your system, it may traverse multiple services, databases, message queues, and external APIs before a response is returned. Distributed tracing correlates all of these interactions using a shared operation ID, allowing you to visualize the entire call chain, identify bottlenecks, and pinpoint failure points.

Application Insights automatically correlates telemetry across services using the W3C Trace Context standard. When Service A calls Service B via HTTP, the trace context headers (traceparent and tracestate) are propagated automatically by the SDK, ensuring that both services' telemetry is linked under the same operation ID.

Viewing the Application Map

The Application Map in Application Insights provides a visual topology of your application's components and their dependencies. Each node represents a component (your application, a database, an external API), and the edges show the calls between them with aggregate metrics like call count, average duration, and failure rate. This is invaluable for quickly identifying which dependency is causing performance issues.

End-to-End Transaction Diagnostics

For individual requests, the Transaction Diagnostics view shows a Gantt chart of every operation involved in processing a request. You can see exactly how long each dependency call took, which calls were made in parallel versus sequentially, and where exceptions occurred. This view is accessible from any request, dependency, or exception telemetry item in Application Insights.

KQL: Cross-service trace analysis

// Find slow end-to-end transactions spanning multiple services
let slowOperations = requests
| where timestamp > ago(1h)
| where duration > 5000
| project operation_Id, requestName = name, requestDuration = duration;

// Get all telemetry for those operations across all services
union requests, dependencies, exceptions, traces
| where timestamp > ago(1h)
| where operation_Id in ((slowOperations | project operation_Id))
| project timestamp, operation_Id, itemType,
          name = coalesce(name, type, problemId),
          duration,
          success,
          cloud_RoleName,
          cloud_RoleInstance
| order by operation_Id, timestamp asc

// Application dependency health summary
dependencies
| where timestamp > ago(24h)
| summarize totalCalls = count(),
            failedCalls = countif(success == false),
            avgDuration = avg(duration),
            p99Duration = percentile(duration, 99)
  by target, type, cloud_RoleName
| extend failureRate = round(100.0 * failedCalls / totalCalls, 2)
| order by failureRate desc, totalCalls desc

Availability Tests & Synthetic Monitoring

Availability tests (also called web tests) allow you to proactively monitor your application's health by sending synthetic requests from multiple Azure data center locations around the world. These tests verify that your application is reachable, responds within expected time limits, and returns the expected content, all without waiting for real users to encounter problems.

Types of Availability Tests

Test Type	Description	Use Case
Standard Test	Single URL request with response validation	Health endpoint monitoring, simple uptime checks
Custom TrackAvailability	Code-based test using Azure Functions or custom logic	Multi-step workflows, authenticated endpoints, API sequences

Terminal: Create an availability test

# Create a standard availability test (URL ping test)
az monitor app-insights web-test create \
  --resource-group rg-monitoring-prod \
  --name "avail-homepage-check" \
  --defined-web-test-name "Homepage Availability" \
  --location "us-fl-mia-edge" \
  --location "emea-gb-db3-azr" \
  --location "apac-sg-sin-azr" \
  --location "us-ca-sjc-azr" \
  --location "emea-nl-ams-azr" \
  --frequency 300 \
  --timeout 120 \
  --kind standard \
  --enabled true \
  --web-test-kind standard \
  --request-url "https://mywebapp.azurewebsites.net/health" \
  --expected-status-code 200 \
  --ssl-check true \
  --ssl-lifetime-check 7 \
  --tags "hidden-link:/subscriptions/<sub-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.Insights/components/appi-mywebapp-prod=Resource"

Custom Availability Test with Azure Functions

For more complex scenarios like testing authenticated APIs, multi-step transactions, or checking specific response content, you can implement custom availability tests using Azure Functions that call the TrackAvailability API.

AvailabilityTest.cs: Custom availability test function

using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.DataContracts;
using Microsoft.Azure.Functions.Worker;
using System.Diagnostics;

public class AvailabilityTestFunction
{
    private readonly TelemetryClient _telemetry;
    private readonly HttpClient _httpClient;

    public AvailabilityTestFunction(
        TelemetryClient telemetry,
        IHttpClientFactory httpClientFactory)
    {
        _telemetry = telemetry;
        _httpClient = httpClientFactory.CreateClient("ApiClient");
    }

    [Function("CheckApiHealth")]
    public async Task Run(
        [TimerTrigger("0 */5 * * * *")] TimerInfo timer)
    {
        var availability = new AvailabilityTelemetry
        {
            Name = "API Health Check",
            RunLocation = Environment.GetEnvironmentVariable("REGION_NAME"),
            Success = false
        };

        var stopwatch = Stopwatch.StartNew();

        try
        {
            // Step 1: Authenticate
            var token = await GetAccessTokenAsync();

            // Step 2: Call protected API endpoint
            _httpClient.DefaultRequestHeaders.Authorization =
                new AuthenticationHeaderValue("Bearer", token);

            var response = await _httpClient.GetAsync("/api/health/deep");
            response.EnsureSuccessStatusCode();

            var body = await response.Content.ReadAsStringAsync();

            // Step 3: Validate response content
            if (body.Contains("\"status\":\"healthy\""))
            {
                availability.Success = true;
                availability.Message = "All health checks passed";
            }
            else
            {
                availability.Message = $"Unexpected response: {body[..200]}";
            }
        }
        catch (Exception ex)
        {
            availability.Message = ex.Message;
            _telemetry.TrackException(ex);
        }
        finally
        {
            stopwatch.Stop();
            availability.Duration = stopwatch.Elapsed;
            _telemetry.TrackAvailability(availability);
        }
    }
}

Workbooks & Dashboards

Azure Workbooks provide a flexible canvas for creating rich, interactive reports that combine text, KQL queries, metrics, and parameters into shareable documents. Unlike Azure Dashboards (which are designed for at-a-glance operational views), Workbooks are designed for deeper analysis and storytelling; they support parameterized queries, conditional visibility, drill-down navigation, and narrative text alongside visualizations.

Workbooks vs Dashboards

Feature	Azure Workbooks	Azure Dashboards
Primary Use	Deep analysis, incident investigation, reports	Operational overview, NOC screens, quick glance
Interactivity	Parameters, drill-down, conditional sections	Time range picker, basic filtering
Data Sources	Logs, Metrics, Azure Resource Graph, custom endpoints	Metrics, Logs, Markdown, pinned query results
Sharing	Saved as Azure resources, gallery templates	Shared dashboards via RBAC, published to portal
Visualization Types	Grids, charts, tiles, maps, text, honeycomb	Charts, metrics tiles, Markdown, pinned blades

Creating a Workbook Template

Workbook templates can be defined as ARM/Bicep resources, making them deployable through infrastructure-as-code pipelines. This is useful for standardizing monitoring views across environments and teams.

workbook.bicep: Application health workbook

param location string = resourceGroup().location
param appInsightsId string
param workspaceName string = 'law-central-prod'

resource workspace 'Microsoft.OperationalInsights/workspaces@2022-10-01' existing = {
  name: workspaceName
}

resource workbook 'Microsoft.Insights/workbooks@2022-04-01' = {
  name: guid('app-health-workbook', resourceGroup().id)
  location: location
  kind: 'shared'
  properties: {
    displayName: 'Application Health Overview'
    category: 'workbook'
    sourceId: appInsightsId
    serializedData: loadTextContent('workbook-template.json')
    version: '1.0'
  }
  tags: {
    'hidden-title': 'Application Health Overview'
    environment: 'production'
  }
}

Gallery Templates

Azure Monitor includes a gallery of pre-built workbook templates covering common scenarios like failure analysis, performance diagnostics, and usage analytics. Before building custom workbooks, check the gallery. You can clone an existing template and customize it, which is significantly faster than starting from scratch. Access the gallery from any Application Insights resource by navigating to Workbooks in the left menu.

Cost Management & Best Practices

Azure Monitor costs are primarily driven by two factors: log data ingestion volume (measured in GB per day) and log data retention (how long data is kept). Understanding the cost model and implementing data optimization strategies is critical, as monitoring costs can escalate quickly in large environments, especially when verbose application logging, diagnostic settings, and multiple data sources are enabled without careful planning.

Cost Optimization Strategies

The following strategies help control Azure Monitor costs while maintaining the observability coverage your team needs:

Sampling: Application Insights supports adaptive sampling, which automatically reduces telemetry volume while preserving statistically accurate metrics. For high-traffic applications, sampling can reduce costs by 80–90% with minimal impact on diagnostic capability.
Data Collection Rules (DCR): Use DCRs to filter and transform data before it reaches the workspace. You can drop unnecessary columns, filter out low-value events, and route different data types to different tables or workspaces based on their retention requirements.
Basic Logs tier: For high-volume, low-query-frequency data (like verbose debug logs or security logs you only search during investigations), use the Basic Logs tier which costs approximately 60% less for ingestion but has limited query capabilities and a small per-query cost.
Commitment Tiers: If your daily ingestion is consistently above 100 GB, consider commitment tiers (100, 200, 300, 400, 500 GB/day) which offer significant per-GB discounts compared to pay-as-you-go pricing.
Archive tier: For long-term retention requirements (compliance, forensics), use the Archive tier which costs a fraction of interactive retention but requires a restore operation to query the data.
Diagnostic settings audit: Regularly review which resources have diagnostic settings enabled and which log categories are being collected. Many organizations enable all categories during setup and never revisit, resulting in significant unnecessary ingestion.

Estimating Costs

KQL: Analyze ingestion volume and estimate costs

// Daily ingestion volume by table (last 30 days)
Usage
| where TimeGenerated > ago(30d)
| where IsBillable == true
| summarize IngestedGB = round(sum(Quantity) / 1024, 2) by bin(TimeGenerated, 1d), DataType
| order by TimeGenerated desc, IngestedGB desc

// Top 10 tables by cost contribution
Usage
| where TimeGenerated > ago(30d)
| where IsBillable == true
| summarize TotalGB = round(sum(Quantity) / 1024, 2) by DataType
| top 10 by TotalGB
| extend EstimatedMonthlyCost = round(TotalGB * 2.76, 2) // ~$2.76/GB pay-as-you-go
| order by EstimatedMonthlyCost desc

// Identify verbose trace sources
traces
| where timestamp > ago(1d)
| summarize traceCount = count(), estimatedSizeMB = round(sum(estimate_data_size()) / 1048576, 2)
  by cloud_RoleName, severityLevel
| order by estimatedSizeMB desc
| take 20

Best Practices Summary

Centralize your workspace: Use a single Log Analytics workspace for most scenarios. Split only for data sovereignty, strict access isolation, or billing separation requirements.
Use OpenTelemetry for new projects: The Azure Monitor OpenTelemetry Distro provides vendor-neutral instrumentation with full Azure Monitor integration.
Implement structured logging: Use structured log formats (JSON) with consistent property names across services to enable powerful cross-service queries.
Set up alerts for SLIs: Define Service Level Indicators (error rate, latency percentiles, availability) and alert on deviations from your Service Level Objectives.
Enable distributed tracing: Ensure all services in your architecture propagate W3C Trace Context headers for end-to-end transaction visibility.
Review costs monthly: Use the ingestion analysis KQL queries to track data volume trends and identify opportunities for optimization before costs spiral.
Automate with Infrastructure as Code: Deploy monitoring resources (workspaces, Application Insights, alert rules, workbooks) through Bicep or Terraform to ensure consistency across environments.

Azure Monitor Agent (AMA)

The Azure Monitor Agent (AMA) is the next-generation agent that replaces the legacy Log Analytics agent (MMA/OMS) and the Diagnostics extension. AMA uses Data Collection Rules (DCRs) for configuration, supports multiple workspaces, and provides more granular control over data collection. If you are still using the legacy agent, plan your migration to AMA, as the legacy agent was deprecated in August 2024.

Azure Functions Hosting Plans Multi-Cloud Observability Comparison

Key Takeaways

1Azure Monitor is the unified observability platform collecting metrics, logs, and traces.
2Application Insights provides deep application performance monitoring with auto-instrumentation.
3KQL (Kusto Query Language) enables powerful log analytics across all Azure monitoring data.
4Alert rules with action groups automate incident response across email, SMS, webhooks, and ITSM.
5Distributed tracing via Application Map visualizes dependencies across microservices.
6Workbooks provide customizable interactive dashboards for operational and business metrics.

Frequently Asked Questions

What is the difference between Azure Monitor and Application Insights?

Azure Monitor is the umbrella platform for all monitoring data in Azure, including infrastructure metrics, activity logs, and diagnostics. Application Insights is a feature within Azure Monitor specifically for application performance monitoring (APM), providing request tracking, dependency tracing, and user analytics.

What is KQL and why should I learn it?

KQL (Kusto Query Language) is the query language for Azure Monitor Logs, Application Insights, and Azure Data Explorer. It is a powerful, SQL-like language optimized for log and telemetry analysis. Learning KQL is essential for effective troubleshooting and monitoring on Azure.

How much does Azure Monitor cost?

Azure Monitor has a pay-as-you-go model. Platform metrics are free. Log Analytics charges per GB of data ingested ($2.76/GB with commitment tiers available). Application Insights charges for data ingestion and retention. The first 5 GB/month of log data is free.

Does Application Insights support auto-instrumentation?

Yes. Application Insights supports codeless auto-instrumentation for .NET, Java, Node.js, and Python applications running on Azure App Service, Azure Functions, VMs, and AKS. This captures requests, dependencies, exceptions, and performance counters without code changes.

Can I use Azure Monitor with non-Azure resources?

Yes. Azure Monitor Agent can be installed on any server (on-premises, other clouds). Application Insights can instrument applications anywhere. Azure Arc extends Azure Monitor to hybrid and multi-cloud resources.

Written by CloudToolStack Editorial

Written and reviewed by the CloudToolStack editorial team. Every guide is verified against current provider documentation and revised in place when providers change pricing, deprecate services, or release meaningfully better alternatives.

Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.