FEB
Feb – Mar 2026
Agent success rate: 35% to 74.8%
Agent success rate jumped from 35% to 74.8%. Eight new integrations, full ECS support, multi-region infra graph, and investigation time cut nearly in half.
Agent Success Rate
35.5%74.8%
Tool Call Success
57.3%72%
Investigation Time (p75)
15 min8 min
Alert Ingestion
43%65%
Classification Cost
—↓ 70%
Conclusive RCAs
55%61%
AI SRE Agent Intelligence
Alert Context Agent·Pulls memory context across all connected platforms and past incidents before every investigation.
Alert Classification via Cypher Queries·Topology-aware classification over the graph. 30% faster, 70% cheaper.
Infra Q&A Agent v2·Rebuilt with structured formatting for the infra graph UI. More precise, more readable answers.
Agent Tool Call Tracking·Every tool call tracked for reliability metrics, debugging, and accountability.
OpenTelemetry Tracing·Full tracing and monitoring integrated into the agent runtime.
Infrastructure Topology Graph
ECS Full Support·ECS entities integrated with Redis and CubeAPM mapping, debugging skills, and hypothesis tree generation.
Multi-Region / Multi-Cluster / Multi-AZ·Cross-region, multi-cluster Kubernetes, multi-AZ support. Critical enterprise milestone.
Slack Memories in the Graph·Past incident conversations and runbook references embedded directly into the infra graph.
MongoDB Atlas + RabbitMQ + External API nodes·All three added with full edge support for richer topology context.

Multi-region infrastructure topology view

Service-level detail with dependency connections
Platform & Incident Management
Incidents with Context·Investigation trigger passes full alert context to the agent pipeline, not just the raw alert.
Impacted Entity Tracking·Every investigation stores impacted entity details for a richer audit trail.
Incident Conversation API·New API for listing incident conversations with namespace scoping.
GitHub + Slack as Data Providers·Both added as platform-level data sources for broader investigation context.
Performance & Reliability Fixes
Steampipe Query Optimization·Columns scoped to minimum required to eliminate permission blockers.
CubeAPM Service Normalization·Deduplicates and collapses similar services before graph enrichment.
ELK Latency Query Fix·Corrected metric source from transactions to spans for accurate APM data.
Throughput + MySQL + RDS fixes·Fixed interval conversions, host config key mapping, and RDS cluster resolution.
New AI SRE Integrations
| Integration | What it enables |
|---|---|
| Elastic APM | Full dependency graph with K8s-to-Elastic service mapping; latency/throughput by transaction + percentile |
| MySQL | Self-hosted MySQL on VMs with discovery, index inspection, read-only query execution, K8s mapping |
| MongoDB Atlas | Managed cloud MongoDB with projects, clusters, processes, time-series metrics |
| MongoDB (self-hosted) | Raw query execution with safe formatting, full metrics provider |
| Grafana | Alert rules, firing alerts, alert-rule-by-UID for full alerting surface |
| GCP | GCP integration support + kubectl tool + Steampipe query support |
| CubeAPM | Metric provider, alert retrieval, service normalization, K8s mapping |
| PagerDuty | Slack message capturing and lifecycle management |

Infra Q&A Agent v2 with structured responses and actionable recommendations