IBM Cloud Monitoring Alert Builder

MonitoringIBM Cloud

Build Monitoring alert configurations with metric and event conditions and notification channels.

Last verified: May 2026

IBM Cloud Monitoring Configuration

Build Monitoring alert configurations with metric and event conditions, notification channels, and dashboard panels.

Required Fields

instanceNamealertsnotificationChannels

{
  "instanceName": "prod-monitoring",
  "plan": "graduated-tier",
  "region": "us-south",
  "alerts": [
    {
      "name": "High CPU Usage",
      "severity": "high",
      "type": "metric",
      "condition": {
        "metric": "cpu.used.percent",
        "scope": "kubernetes.cluster.name = 'prod-iks-cluster'",
        "operator": ">",
        "threshold": 85,
        "duration": "5m",
        "aggregation": "avg"
      },
      "notificationChannels": ["ops-pagerduty", "ops-slack"]
    },
    {
      "name": "Pod CrashLooping",
      "severity": "critical",
      "type": "event",
      "condition": {
        "eventName": "CrashLoopBackOff",
        "scope": "kubernetes.namespace.name in ('production', 'staging')",
        "count": 3,
        "duration": "10m"
      },
      "notificationChannels": ["ops-pagerduty"]
    },
    {
      "name": "Memory Pressure",
      "severity": "medium",
      "type": "metric",
      "condition": {
        "metric": "memory.used.percent",
        "scope": "host.name starts with 'prod'",
        "operator": ">",
        "threshold": 90,
        "duration": "10m",
        "aggregation": "max"
      },
      "notificationChannels": ["ops-slack"]
    }
  ],
  "notificationChannels": [
    {
      "name": "ops-pagerduty",
      "type": "pagerduty",
      "serviceKey": "pd-integration-key",
      "autoResolve": true
    },
    {
      "name": "ops-slack",
      "type": "slack",
      "webhookUrl": "https://hooks.slack.com/services/xxx",
      "channel": "#cloud-ops-alerts"
    },
    {
      "name": "ops-email",
      "type": "email",
      "recipients": ["cloudops@example.com"]
    }
  ],
  "dashboards": [
    {
      "name": "Production Overview",
      "panels": [
        {
          "title": "CPU by Host",
          "type": "timechart",
          "metric": "cpu.used.percent",
          "groupBy": "host.name"
        },
        {
          "title": "Active Pods",
          "type": "number",
          "metric": "kubernetes.pod.count",
          "scope": "kubernetes.namespace.name = 'production'"
        }
      ]
    }
  ]
}

Generated Output

Output will appear here...

How This Tool Works

The builder collects alert name, severity, PromQL condition (metric, comparison, threshold, for-duration), notification channels, and grouping rules. It validates the PromQL syntax and emits YAML matching the IBM Cloud Monitoring Alert API schema. Notification channels are referenced by ID and assumed to be pre-configured in the Monitoring instance.

Overview

IBM Cloud Monitoring (powered by Sysdig) collects metrics from IBM Cloud resources and your applications, with PromQL-compatible querying and alerting. The IBM Cloud Monitoring Alert Builder generates alert definitions with PromQL conditions, severity, notification channels, and silencing rules. Output is YAML-ready for the Monitoring API and includes alert grouping patterns that reduce notification fatigue.

How Engineers Use This

•Building a baseline set of infrastructure alerts (CPU, memory, disk, network) targeted at workload tags rather than specific resources.
•Tuning an alert that fires too often by adjusting the PromQL threshold or extending the for-duration parameter.
•Configuring multi-condition alerts that only fire when several metrics agree (high CPU AND high latency, not either alone).
•Routing different alert severities to different destinations (Slack for warning, PagerDuty for critical) without manual per-alert configuration.

A Real Example

Your on-call rotation is getting woken up 5-6 times per week by transient alerts that recover before anyone can respond. You audit the alert rules, find that none have `for:` durations or grouping, and use the builder to regenerate them with 5-minute `for:` and per-service grouping. The next month, the on-call gets paged twice — both for real incidents — and the team can actually focus on fixing the underlying causes.

Tips & Gotchas

TIP

Use PromQL `for:` durations to avoid flapping. An alert that triggers on a single 5-second metric spike is noise; an alert that triggers on `for: 5m` of sustained condition is signal.

TIP

Set per-alert `runbook_url` annotations linking to documented response procedures. An alert page at 3am with no runbook is the recipe for resolution-by-vibes; an alert page with a runbook link is operationally professional.

Questions & Answers

Why use IBM Cloud Monitoring instead of self-managed Prometheus?

IBM Cloud Monitoring runs Sysdig under the hood with managed Prometheus-compatible storage and querying. You skip the operational cost of running Prometheus at scale (sharding, long-term storage, HA). The trade-off is per-time-series and ingestion pricing — for very large workloads with thousands of unique series, self-managed may be cheaper, but for typical workloads the managed offering is better economics.

How does alert grouping work?

Alerts fire as events; a single alert condition can produce many events when fanning out across many resources. Grouping consolidates related events into a single notification (one Slack message for 'high CPU on 5 instances' rather than 5 separate messages). Set grouping based on the natural unit of incident response — usually 'all alerts for service X in 5 minutes' is the right unit.

Was this tool helpful?

Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.