Alerting for Cirata Symphony
Cirata Symphony does not include a built-in alerting engine. Instead, it provides OTLP (OpenTelemetry) telemetry—metrics, logs, and traces—through the Observability Extension, which can export data to external backends such as Prometheus, Grafana, and InfluxDB. From there, standard alerting tools can consume the metrics.
This approach gives operators full control over alerting rules, notification channels, and escalation policies using industry-standard tools.
How Metrics Flow to Alerting
- The platform and extensions produce OTLP telemetry (metrics, logs, traces)
- The Observability Extension collects this data over NATS on a 30-second interval
- The Observability Extension pushes metrics to configured OTLP backends
- Standard alerting tools (Alertmanager, Grafana Alerting) evaluate rules against the exported metrics
Setting Up Alerting
Prerequisites
- The Observability Extension deployed and connected to Symphony
- OTLP export configured in the Observability Extension to push to your metrics backend
- An alerting tool such as Prometheus Alertmanager or Grafana Alerting
Configuring OTLP Export
The Observability Extension supports exporting collected telemetry to OTLP-compatible backends. Configure the export endpoint in the extension's environment or properties file:
# Push metrics to a Prometheus remote write endpoint
observability.export.metrics.endpoint=http://prometheus:9090/api/v1/otlp
observability.export.metrics.protocol=http/protobuf
# Push traces to Jaeger or Tempo
observability.export.traces.endpoint=http://tempo:4318/v1/traces
observability.export.traces.protocol=http/protobuf
Once export is configured, all platform and extension metrics are pushed automatically. No per-extension scrape configuration is required.
Example Alerting Rules
Create a Prometheus alerting rules file (e.g., symphony-alerts.yml) using the metric names emitted by the platform and extensions:
groups:
- name: symphony
rules:
- alert: SymphonyDown
expr: up{job="symphony"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Symphony instance is unreachable"
- alert: HighHTTPErrorRate
expr: |
rate(http_responses_5xx_total{service_namespace="symphony"}[5m])
/ rate(http_requests_total{service_namespace="symphony"}[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High HTTP error rate on Symphony"
- alert: ExtensionDisconnected
expr: absent(up{service_name="myextension"})
for: 5m
labels:
severity: warning
annotations:
summary: "Extension myextension is not reporting metrics"
Alertmanager Configuration
Configure Alertmanager to route alerts to your preferred notification channels:
route:
receiver: "default"
group_by: ["alertname", "instance"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: "pagerduty"
- match:
severity: warning
receiver: "slack"
receivers:
- name: "default"
email_configs:
- to: "ops@example.com"
from: "alertmanager@example.com"
smarthost: "smtp.example.com:587"
- name: "slack"
slack_configs:
- api_url: "https://hooks.slack.com/services/T.../B.../..."
channel: "#symphony-alerts"
title: '{{ "{{" }} .GroupLabels.alertname {{ "}}" }}'
text: '{{ "{{" }} .CommonAnnotations.summary {{ "}}" }}'
- name: "pagerduty"
pagerduty_configs:
- service_key: "<your-pagerduty-key>"
Recommended Alerts
| Alert | Condition | Severity | Purpose |
|---|---|---|---|
| SymphonyDown | Health endpoint unreachable for 2m | Critical | Instance health |
| HighHTTPErrorRate | >5% 5xx errors for 5m | Warning | API health |
| ExtensionDisconnected | Extension not reporting for 5m | Warning | Extension health |
| HighMemoryUsage | JVM/Go heap above threshold | Warning | Capacity planning |
| LicenseNearLimit | >90% license units consumed | Warning | License capacity |
Metric names depend on what the platform and your extensions emit. See Observability for details on how extensions instrument metrics, and Monitoring for the platform's built-in metrics.
See Also
- Monitoring—Platform telemetry and the Observability Extension
- Observability—Instrumenting extensions with metrics, logs, and traces
- Operations—Health checks and troubleshooting