ServiceNow Alert Grouping by CI and Resource: A Practical Rule Design

Good alert grouping is not “group everything on the same server.” That creates quiet failures. The safer pattern is to group by configuration item plus resource plus symptom, then allow related alerts to roll into the same operational story.

Quick answer: Use CI as the anchor, resource as the affected component, and metric or symptom as the grouping discriminator. Group CPU with CPU, disk with disk, service failures with the specific service, and application checks with the monitored endpoint or business service.

The grouping rule that usually works

For infrastructure events, start with this key:

Grouping key = node_ci + resource + metric_family + environment

Examples:

Event	Bad grouping	Better grouping
High CPU on server A	server A	server A + CPU + production
D drive low space on server A	server A	server A + D: + disk + production
SQL service stopped on server A	server A	server A + MSSQLSERVER + service + production

Why CI-only grouping causes pain

CI-only grouping looks clean during demos because the incident count drops fast. In production it creates a different problem: unrelated symptoms get buried under a parent alert. A server can have high CPU, a dead service, and low disk at the same time. Those may share a root cause, but they still need separate evidence until someone confirms the relationship.

Fields to normalize before grouping

CI: The actual operational CI, not a generic “other” record.
Resource: Disk letter, mount point, interface name, process, service, URL, database, queue, or synthetic check.
Metric family: CPU, memory, disk, availability, latency, errors, process, certificate, job, backup.
Environment: Production, DR, QA, dev, lab.
Source: SolarWinds, Dynatrace, Zabbix, Splunk, Azure Monitor, etc.

Routing rule example

// Pseudocode for assignment group fallback
if (ci.support_group) {
  assignment_group = ci.support_group;
} else if (ci.change_group) {
  assignment_group = ci.change_group;
} else {
  assignment_group = "SysOps";
}

This gives operations a deterministic route even when the CMDB is incomplete. It also exposes missing CMDB ownership as a data quality issue instead of letting alerts die in a generic queue.

Suppression versus grouping

Grouping keeps related alerts visible under a shared operational context. Suppression hides or delays alerts that are not useful during a known condition. Do not use suppression to compensate for bad grouping. Suppress planned maintenance, flapping non-production checks, and duplicate raw events. Group actionable symptoms that still need visibility.

Validation checklist

Pull the top 25 alert patterns from the last 30 days.
Confirm every pattern has a CI, resource, source, severity, and environment.
Test grouping against historical events before enabling incident creation.
Manually review examples where multiple alerts hit the same CI.
Measure incident count reduction and missed-severity complaints after rollout.

Operator rule: If the child alerts do not tell the same story, they do not belong in the same group.

About the author

Jason Purvis works in enterprise monitoring and IT operations, with hands-on experience across ServiceNow ITOM/Event Management, SolarWinds-style infrastructure monitoring, Microsoft 365 operations, alert routing, and incident process improvement.