Skip to main content

Alerting for Cirata Symphony

Cirata Symphony does not include a built-in alerting engine. Instead, it provides OTLP (OpenTelemetry) telemetry—metrics, logs, and traces—through the Observability Extension, which can export data to external backends such as Prometheus, Grafana, and InfluxDB. From there, standard alerting tools can consume the metrics.

This approach gives operators full control over alerting rules, notification channels, and escalation policies using industry-standard tools.

How Metrics Flow to Alerting

  1. The platform and extensions produce OTLP telemetry (metrics, logs, traces)
  2. The Observability Extension collects this data over NATS on a 30-second interval
  3. The Observability Extension pushes metrics to configured OTLP backends
  4. Standard alerting tools (Alertmanager, Grafana Alerting) evaluate rules against the exported metrics

Setting Up Alerting

Prerequisites

  • The Observability Extension deployed and connected to Symphony
  • OTLP export configured in the Observability Extension to push to your metrics backend
  • An alerting tool such as Prometheus Alertmanager or Grafana Alerting

Configuring OTLP Export

The Observability Extension supports exporting collected telemetry to OTLP-compatible backends. Configure the export endpoint in the extension's environment or properties file:

# Push metrics to a Prometheus remote write endpoint
observability.export.metrics.endpoint=http://prometheus:9090/api/v1/otlp
observability.export.metrics.protocol=http/protobuf

# Push traces to Jaeger or Tempo
observability.export.traces.endpoint=http://tempo:4318/v1/traces
observability.export.traces.protocol=http/protobuf

Once export is configured, all platform and extension metrics are pushed automatically. No per-extension scrape configuration is required.

Example Alerting Rules

Create a Prometheus alerting rules file (e.g., symphony-alerts.yml) using the metric names emitted by the platform and extensions:

groups:
- name: symphony
rules:
- alert: SymphonyDown
expr: up{job="symphony"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Symphony instance is unreachable"

- alert: HighHTTPErrorRate
expr: |
rate(http_responses_5xx_total{service_namespace="symphony"}[5m])
/ rate(http_requests_total{service_namespace="symphony"}[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High HTTP error rate on Symphony"

- alert: ExtensionDisconnected
expr: absent(up{service_name="myextension"})
for: 5m
labels:
severity: warning
annotations:
summary: "Extension myextension is not reporting metrics"

Alertmanager Configuration

Configure Alertmanager to route alerts to your preferred notification channels:

route:
receiver: "default"
group_by: ["alertname", "instance"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h

routes:
- match:
severity: critical
receiver: "pagerduty"
- match:
severity: warning
receiver: "slack"

receivers:
- name: "default"
email_configs:
- to: "ops@example.com"
from: "alertmanager@example.com"
smarthost: "smtp.example.com:587"

- name: "slack"
slack_configs:
- api_url: "https://hooks.slack.com/services/T.../B.../..."
channel: "#symphony-alerts"
title: '{{ "{{" }} .GroupLabels.alertname {{ "}}" }}'
text: '{{ "{{" }} .CommonAnnotations.summary {{ "}}" }}'

- name: "pagerduty"
pagerduty_configs:
- service_key: "<your-pagerduty-key>"
AlertConditionSeverityPurpose
SymphonyDownHealth endpoint unreachable for 2mCriticalInstance health
HighHTTPErrorRate>5% 5xx errors for 5mWarningAPI health
ExtensionDisconnectedExtension not reporting for 5mWarningExtension health
HighMemoryUsageJVM/Go heap above thresholdWarningCapacity planning
LicenseNearLimit>90% license units consumedWarningLicense capacity

Metric names depend on what the platform and your extensions emit. See Observability for details on how extensions instrument metrics, and Monitoring for the platform's built-in metrics.

See Also

  • Monitoring—Platform telemetry and the Observability Extension
  • Observability—Instrumenting extensions with metrics, logs, and traces
  • Operations—Health checks and troubleshooting