Skip to content

Alerts (On-Call)

Alerts groups deliver urgent, actionable notifications to on-call personnel. They’re designed for small membership, strict allowlisting, and zero noise.

prv-{owner}-alerts-{system}[-{scope}][-{env}]@{domain}
  • Small membership. Alerts go to on-call personnel, not the whole team. Keep membership at 10 or fewer.
  • Allowlist only. Only known monitoring systems can send. Reject everything else.
  • No human senders. If a human needs to notify on-call, they use a different channel (Slack, PagerDuty).
  • Subject prefixes. Every alerts group should have a prefix for downstream filtering.
EmailPurposeOwner
prv-plt-alerts-aws-prdAWS production alertsPlatform
prv-plt-alerts-wks-adminWorkspace admin alertsPlatform
prv-sec-alerts-gl-securityGitLab security eventsSecurity
prv-plt-alerts-tf-prdTerraform plan/apply failuresPlatform
prv-org-auto-alertsGlobal automation failure alertsPlatform
  • Who can post: Anyone + allowlist only (reject non-allowlisted senders)
  • Members: Small on-call set (ideally <= 10 humans)
  • External posting: ON (monitoring systems are often external)
  • External members: ON (for allowlisted notifier systems)
  • Archive: ON (audit trail)
  • Security label: OFF
  • Subject prefix: Required (e.g., [AWS-PRD], [GL-SEC], [TF-FAIL])

Keep alert group membership at 10 or fewer. If you need more recipients:

  • Use rotation aliases (week-on, week-off)
  • Fan out through an Infra router to multiple focused alerts groups
  • Don’t inflate a single alerts group to cover everyone
AWS CloudWatch → prv-plt-alerts-aws-prd → on-call engineer
System A ──┐
System B ──┤→ Infra (router) → Alerts (on-call)
System C ──┘

Via Infra Classifier (One Source, Many Topics)

Section titled “Via Infra Classifier (One Source, Many Topics)”
GitLab → Infra (classifier) ──→ prv-sec-alerts-gl-security
└→ prv-eng-alerts-gl-deploy
└→ prv-plt-alerts-gl-infra
  1. Identify the monitoring system(s) and sender addresses.
  2. Set email/name/description.
  3. Labels: Mailing=ON, Security=OFF.
  4. Set allowlist-only posting (reject non-allowlisted).
  5. Add on-call members (keep small).
  6. Add subject prefix.
  7. Send a test alert to verify delivery.
  • Monthly: synthetic test alert to verify delivery chain.
  • Quarterly: review membership (still correct on-call rotation?), allowlist (new/removed systems?).
  • Monitor: rejected messages (might indicate a new alerting system not yet allowlisted).
  • Confirm no active monitoring routes to this group.
  • Remove from any Infra router downstream lists.
  • Export archive. Delete after hold.
  • Alerts group with > 10 members (noise → alert fatigue → missed incidents)
  • Human senders posting to alerts groups
  • Missing allowlist (spam drowns real alerts)
  • Mixing alert urgencies in one group (separate by system/severity)
  • Alerts group used on ACLs
MetricTarget
Alert delivery latency< 2 minutes
False positive rate< 10%
Membership size<= 10
Monthly synthetic testPass
Rejected messages (allowlist gap)Investigated within 1 biz day