ODC SRE Weekly Ops Dashboard

OutSystems Developer Cloud  · 

6 Active Incidents
Loading...
Generated: 2026-03-24  |  Source: Rootly LIVE
50 incidents · Last 7 Days · Teams: SRE
Operational Intelligence
Ops Alerts per day
Alert volume for selected period
Ops Alert Responders
Primary responder distribution
Incidents by Severity
Distribution for selected period
Executive Summary · Last 7 Days · Teams: SRE
Service Reliability Risk
2
Confirmed customer-impacting outages
Current Actionable Load
6
Incidents actively being investigated
Detection Speed (MTTD)
0.0h
Overall Mean Time to Detect
Resolution Speed (MTTR)
0.1h
Overall Mean Time to Resolve
Engineering Toil (Lost Hours)
2.9h
Total combined incident recovery time
Alert Fatigue (Noise)
4%
2 duplicated / false alerts
AI Actionable Insights
🤖
Ready to analyze 50 incidents
Click the button above to harness Gemini for actionable SRE recommendations.
Response Performance
SEV1 Analysis
CRITICAL
1 incidents  ·  OS-BERT-SLIB-00000 - REST (Expose) Something went wrong on our side. dominant
Total SEV11 incidents
Resolved / Closed0 actionable
Cancelled (noise)0 duplicates
Active (in progress)1 started
MTTD0.00 h (mean)
MTTR — median0.00 h
MTTR — mean0.00 h
System-wide impact0 confirmed
Primary SLO firingOS-BERT-SLIB-00000 - REST (Expose) Something went wrong on our side.
STABLE: SEV1 resolution mean is within acceptable bounds (0.00h).
SEV2 Analysis
HIGH
4 incidents  ·  Process breakdown
Total SEV24 incidents
Resolved / Closed1 actionable
Cancelled (noise)1 noise
Active (in progress)2 started
MTTD0.00 h (mean)
MTTR — median2.21 h
MTTR — mean2.21 h
System-wide impact2 confirmed
NOISE LEVEL: 4% cancellation rate is within monitoring health targets.
MTTR by Value Stream (Resolved/Closed Only)
SEV1 Resolution Times
No resolved incidents to map
SEV2 Resolution Times (top)
System-wide - ga - ga Database Scripts Execution - SucessRate - il-central-1 (ga-database-scripts-execution-successrate-il-central-1)2.21h
SLO Firing Frequency
Top SLOs by Incident Count
Stacked SEV1 (red) + SEV2 (amber) — includes cancelled/noise
Manually Created & Customer Escalated
Total
48
Manually Created
46
Customer Escalated
2
Incidents Over Time
Manually Created vs Customer Escalated per day
By Severity
SEV1 vs SEV2
By Status
Resolution breakdown
System-Wide SLO Incidents
Total System-Wide SLO Incidents
2
Total SEV1 System-Wide SLO Incidents
0
Total SEV2 System-Wide SLO Incidents
2
SEV1 Incidents by SLO
Distribution across SLOs
No data
SEV2 Incidents by SLO
Distribution across SLOs
Full Incident Log — SEV1 + SEV2 (50 of 50)
Manager Insights & Actions · Filtered View
Reliability Risk
2.21h
Based on the longest resolution time in the selected period. If significantly high, it indicates a monitoring gap or complex degradation requiring post-incident review.
Process Gap
4%
Current signal-to-noise ratio is 4%. Priority should be given to duplicate suppression and smarter alert grouping to reach the < 15% target.
Recurring SLO
Identifies the most frequent alert source. Repeat-offender SLOs should be reviewed for error budget burn rates and threshold calibration.
Detection Speed
0.0h
Mean Time to Detection (MTTD) is 0.0h for the selected period. This measures how quickly the on-call team acknowledges alerts.
Customer Impact
2
Total of 2 confirmed system-wide customer-impacting incident(s) in this period. These are the highest priority for remediation.
Action Items
3
Recommended focus: (1) Review outliers, (2) Implement alert deduplication for noisy streams, (3) Audit unassigned incidents.