Agent success rate: 35% to 74.8%

Agent success rate jumped from 35% to 74.8%. Eight new integrations, full ECS support, multi-region infra graph, and investigation time cut nearly in half.

Agent Success Rate

35.5%74.8%

Tool Call Success

57.3%72%

Investigation Time (p75)

15 min8 min

Alert Ingestion

43%65%

Classification Cost

—↓ 70%

Conclusive RCAs

55%61%

AI SRE Agent Intelligence

Alert Context Agent·Pulls memory context across all connected platforms and past incidents before every investigation.

Alert Classification via Cypher Queries·Topology-aware classification over the graph. 30% faster, 70% cheaper.

Infra Q&A Agent v2·Rebuilt with structured formatting for the infra graph UI. More precise, more readable answers.

Agent Tool Call Tracking·Every tool call tracked for reliability metrics, debugging, and accountability.

OpenTelemetry Tracing·Full tracing and monitoring integrated into the agent runtime.

Infrastructure Topology Graph

ECS Full Support·ECS entities integrated with Redis and CubeAPM mapping, debugging skills, and hypothesis tree generation.

Multi-Region / Multi-Cluster / Multi-AZ·Cross-region, multi-cluster Kubernetes, multi-AZ support. Critical enterprise milestone.

Slack Memories in the Graph·Past incident conversations and runbook references embedded directly into the infra graph.

MongoDB Atlas + RabbitMQ + External API nodes·All three added with full edge support for richer topology context.

Full infrastructure topology graph showing service dependencies across clusters

Multi-region infrastructure topology view

Service detail view showing dependency connections for a single service

Service-level detail with dependency connections

Platform & Incident Management

Incidents with Context·Investigation trigger passes full alert context to the agent pipeline, not just the raw alert.

Impacted Entity Tracking·Every investigation stores impacted entity details for a richer audit trail.

Incident Conversation API·New API for listing incident conversations with namespace scoping.

GitHub + Slack as Data Providers·Both added as platform-level data sources for broader investigation context.

Performance & Reliability Fixes

Steampipe Query Optimization·Columns scoped to minimum required to eliminate permission blockers.

CubeAPM Service Normalization·Deduplicates and collapses similar services before graph enrichment.

ELK Latency Query Fix·Corrected metric source from transactions to spans for accurate APM data.

Throughput + MySQL + RDS fixes·Fixed interval conversions, host config key mapping, and RDS cluster resolution.

New AI SRE Integrations

Integration	What it enables
Elastic APM	Full dependency graph with K8s-to-Elastic service mapping; latency/throughput by transaction + percentile
MySQL	Self-hosted MySQL on VMs with discovery, index inspection, read-only query execution, K8s mapping
MongoDB Atlas	Managed cloud MongoDB with projects, clusters, processes, time-series metrics
MongoDB (self-hosted)	Raw query execution with safe formatting, full metrics provider
Grafana	Alert rules, firing alerts, alert-rule-by-UID for full alerting surface
GCP	GCP integration support + kubectl tool + Steampipe query support
CubeAPM	Metric provider, alert retrieval, service normalization, K8s mapping
PagerDuty	Slack message capturing and lifecycle management

Infra Q&A Agent answering questions about Kubernetes clusters and services with structured responses

Infra Q&A Agent v2 with structured responses and actionable recommendations

Sherlocks AI Changelog