
What Happened at KubeCon India 2026? A Complete Recap
A complete, simple recap of KubeCon India 2026 in Mumbai. The stat everyone repeated, platform engineering, security, the show floor, community, and the AI SRE
Three philosophies define today's AI SRE tools: work with existing telemetry, collect your own, or assume no monitoring. Here is how to choose the right fit.

When teams evaluate AI SRE tools, the conversation usually starts with features. Who has the best RCA? Who integrates with Slack? Who supports Kubernetes?
The more useful question is: what does this tool assume about your observability stack?
That assumption shapes everything downstream: deployment complexity, ongoing cost, accuracy ceiling, and how much of your existing infrastructure you keep versus replace. Three distinct philosophies have emerged in the market, and understanding them saves months of evaluation time.
Philosophy: Your organization already runs monitoring. You have Datadog, or Grafana, or CloudWatch, or New Relic, or some combination. You have alerts configured. You have dashboards. You have years of institutional knowledge baked into those systems. An AI SRE should plug into that existing telemetry, not replace it.
How it works: The AI connects to your current observability tools through their APIs. When an alert fires, it pulls metrics, logs, and traces from the systems you already trust. It builds a knowledge graph of your infrastructure from the data those systems already collect: service dependencies, deployment history, past incidents, communication patterns.
The key engineering challenge is not data collection. It is retrieval: knowing which 2% of your existing data matters for this specific investigation, right now. That requires a structured model of your system, not just access to raw APIs.
What this gets right:
Where it gets hard:
Best fit for: Organizations with mature observability stacks that work well enough. Teams that do not want to rip and replace their monitoring. Environments with mixed infrastructure (Kubernetes, ECS, EC2, VMs) where a single-orchestrator solution will not cover everything.
Philosophy: Reliable RCA depends on having complete, high-fidelity signals. Existing observability platforms emit whatever they were configured to emit, which is often incomplete, inconsistent, or too coarse for root cause analysis. The AI should collect its own telemetry at the kernel level using eBPF, ensuring it always has the data it needs.
How it works: An eBPF-based agent is deployed into your Kubernetes clusters. It hooks into Linux kernel events to capture network traffic, system calls, and process behavior without requiring code changes or container restarts. The AI then has first-party access to high-resolution data: every HTTP request between services, every DNS lookup, every TCP connection, captured at the kernel level rather than the application level.
This gives the AI a complete picture of what actually happened, not what your monitoring tool was configured to report.
What this gets right:
Where it gets hard:
Best fit for: Kubernetes-native organizations that want deep, consistent telemetry without relying on per-service instrumentation. Teams frustrated with gaps in their current monitoring. Environments where the AI vendor replacing (not supplementing) the observability layer is acceptable.
Philosophy: Most AI SRE tools turn 100 alerts into 20 hypotheses. That does not help anyone. Instead of depending on observability data that may be incomplete or expensive, build automation that actively diagnoses problems by running checks against live infrastructure. Reduce dependency on "just in case" logging and dashboarding entirely.
How it works: Background agents run continuously, executing diagnostic checks across infrastructure, applications, and data platforms. Instead of waiting for an alert and then querying metrics, the agents proactively probe the system: checking health endpoints, verifying configurations, testing connectivity, validating resource states. When something is wrong, the agent already has the diagnostic data because it collected it as part of the investigation, not because a monitoring system happened to be watching.
The promise extends further: if agents can diagnose problems by running checks on demand, you do not need to store months of high-cardinality metrics "just in case." Observability spend goes down because automation replaces passive dashboarding.
What this gets right:
Where it gets hard:
Best fit for: Organizations with minimal existing monitoring that want to leapfrog the "build out a full observability stack first" step. Teams with high observability costs looking to reduce spend. Environments where the primary pain is not "we cannot find the root cause" but "we do not have the automation to act on what we know."
| Work With Existing | Collect Own (eBPF) | Assume No Monitoring | |
|---|---|---|---|
| Telemetry cost | None (uses existing) | Additional (eBPF agents) | Claims to reduce overall |
| Deployment speed | Fast (API integration) | Medium (agent rollout) | Medium (agent rollout) |
| Infrastructure scope | Any (K8s, ECS, VMs, multi-cloud) | Primarily Kubernetes | Broad but check-based |
| Signal depth | Depends on existing tools | Kernel-level, very deep | Diagnostic, on-demand |
| Accuracy ceiling | Limited by existing telemetry quality | High if eBPF covers the environment | Limited for novel failures |
| Existing stack | Preserved | Partially replaced | Potentially reduced |
| Biggest risk | Blind spots in current monitoring | Operational overhead of kernel agents | Missing data during novel incidents |
There is no universally correct approach. The right choice depends on where you are:
If you have a mature observability stack and the problem is "we have the data but cannot investigate fast enough," start with Approach 1. You do not need to collect more data. You need something that can reason over what you already have.
If your monitoring is inconsistent across services and the problem is "we do not trust our telemetry," Approach 2 solves that by giving you uniform, high-fidelity data. Be prepared for the operational overhead of another agent in your cluster.
If you are early in your monitoring journey or your observability costs are unsustainable, Approach 3 offers a different path. Be honest about the risks: novel incidents still need historical data, and no amount of proactive checking replaces a good distributed trace when things go sideways.
The worst outcome is choosing an approach that fights your existing infrastructure instead of complementing it. An AI SRE that requires you to rip out your monitoring to work is not saving you time. One that ignores your monitoring entirely is leaving data on the table. The best tools meet you where you are.
At Sherlocks.ai, we took Approach 1 because we believe most organizations already have the signals they need. They just cannot reason over them fast enough when it matters. The knowledge graph, the agentic retrieval, the statistical pre-processing: all of it exists to extract maximum value from the telemetry you already collect. No additional agents, no kernel modules, no replacement of tools your team already trusts.