Agentic SRE

Vibe SRE vs Agentic SRE: What Karpathy's Coding Taxonomy Teaches Us About Incident Response

By Gaurav ToshniwalPublished on: Mar 14, 2026 10 min read

Vibe SRE vs Agentic SRE: reactive manual approach on the left versus autonomous verified approach on the right

TL;DR

In 2025, Andrej Karpathy coined “vibe coding” to describe the practice of prompting an AI, accepting its output without reviewing it, and hoping it works. A year later, he introduced “agentic engineering” as a professional evolution: AI does the implementation while humans own the architecture, quality, and correctness.

The same shift is happening in SRE. Most teams doing “AI SRE” today are actually doing Vibe SRE — pasting alerts into ChatGPT or Claude Code with no system context, no verification, and no guardrails. Agentic SRE is the structured alternative: purpose-built AI agents investigate incidents using deep system knowledge while humans define policies, set guardrails, and approve actions.

This article maps Karpathy's taxonomy onto the SRE world, shows you where your team actually sits, and explains why the distinction matters when every minute of downtime counts.

Karpathy's Two Eras — and What They Mean for Operations

In February 2025, Andrej Karpathy fired off what he called a “shower thoughts throwaway tweet” that would define an era. He described a new way of programming where “you fully give in to the vibes, embrace exponentials, and forget that the code even exists.” You prompt, you accept, you run it. If it doesn't work, you paste the error back in and try again. He called it vibe coding.

It was fun. It was fast. It worked for prototypes and weekend projects. And then people started shipping vibe-coded software to production.

A year later, Karpathy course-corrected. In early 2026, he introduced agentic engineering — a term that acknowledges AI is doing the implementation, but insists that humans remain in the loop as architects, reviewers, and decision-makers. As he put it:

“You are not writing the code directly 99% of the time. You are orchestrating agents who do, and acting as oversight.”

The distinction matters because it maps to a fundamental difference in how you treat quality:

Vibe Coding

You prompt, you accept, you see if it works. YOLO.

Agentic Engineering

The process involves planning, agent execution, and subsequent verification, following the Plan, Execute, Verify (PEV) loop.

Addy Osmani distilled it further: agentic engineering treats AI as “a fast but unreliable junior developer who needs constant oversight.” The human role doesn't shrink — it intensifies. You write clearer specifications, understand systems more deeply, and maintain ownership of outcomes.

Now here's the question nobody in the SRE world is asking yet: If this distinction matters for writing code, why would it not matter — possibly 10x more — for managing production systems where failures cost $400 billion a year?

The Four Quadrants of AI in SRE

Karpathy's taxonomy doesn't map to a simple timeline. It maps to a quadrant — because the two axes that matter are independent:

Depth of System Understanding — Does the AI have surface-level context (just the alert text) or deep context (telemetry, service dependencies, deployment history, past incidents)?
Autonomy of Action — Does the human do everything, or does the AI investigate and act with human guidance?

This gives us four quadrants:

AI Investigates & Acts (with oversight)

Human Does Everything

Surface Only

Deep System Understanding

“Vibe SRE”

ChatGPT

Claude Code

Gemini CLI

Cursor

Fast but blind. No context. No memory.

Agentic SRE

Sherlocks.ai

Resolve.ai

Cleric

Traversal

Fast AND aware. Full context. Memory.

Manual SRE

SSH + grep

Manual runbooks

War rooms

Tribal knowledge

Slow and blind.

AI-Assisted SRE (AIOps)

Datadog

PagerDuty AIOps

New Relic AI

Dynatrace Davis AI

Aware but passive.

Ideal path

The Vibe SRE trap

The ideal path is bottom-left → bottom-right → top-right: you build deep system understanding first (AIOps), then add autonomy with guardrails (Agentic SRE).

The trap is bottom-left → top-left: you skip the hard work of building context and guardrails, jump straight to autonomy, and end up doing Vibe SRE.

Bottom-Left: Manual SRE (The Baseline)

This is where most teams started, and where many still are.

An alert fires at 3 AM. The on-call engineer wakes up, opens a laptop, SSHs into a server, and starts grepping logs. They check Grafana dashboards. They ping a colleague who might know the service. They pull up a runbook in Confluence — if one exists. They open a war room call.

The Toolset

Grafana, Prometheus, manual runbooks, SSH, war room calls, tribal knowledge passed down through Slack threads.

The problem: This approach doesn't scale. Modern systems generate too many signals across too many services. The median microservices architecture has 200+ services. No human can hold that dependency graph in their head, especially at 3 AM.

Average resolution time: 3-5 hours, heavily dependent on who's on call and whether they've seen this failure before.

Most organizations have recognized this and moved to the next stage — but some have leapfrogged to Vibe SRE, skipping the critical middle step.

Bottom-Right: AI-Assisted SRE / AIOps (Aware but Passive)

This is where the observability platforms live. They've been building deep system understanding for years — ingesting metrics, logs, traces, and events — and layering AI on top for anomaly detection, noise reduction, and smart alerting.

The Toolset

Datadog (anomaly detection, Watchdog), PagerDuty (AIOps noise reduction, event intelligence), New Relic (AI-powered anomaly detection), Dynatrace (Davis AI for automatic root cause analysis).

What they do well: These platforms know your system deeply. They understand service dependencies, baseline performance, and historical patterns. Dynatrace's Davis AI can automatically correlate a spike in error rates to a specific deployment across a complex service mesh.

What they don't do: Act. AIOps platforms are fundamentally observability-first — they tell you something is wrong and often tell you why, but the human still has to investigate the details, decide on a fix, and execute it. Unite.AI calls this the “Recommendation Gap”: understanding problems does not lead to faster resolution if the human still has to execute every step.

Average resolution time: 1-2 hours — significantly faster than manual SRE because the “what happened?” phase is shorter, but the “what do I do about it?” phase remains fully manual.

This quadrant represents necessary but insufficient progress. You need the extent of knowledge that AIOps provides, but you also need to close the execution gap.

Top-Left: “Vibe SRE” — The Trap

This is where the industry is right now, and it doesn't realize it.

Here's the scenario: An alert fires. The on-call engineer opens ChatGPT (or Claude, or Gemini) and types: “I'm getting a 503 error on our checkout service. Here's the alert payload: [pastes alert]. What's wrong and how do I fix it?”

The LLM responds with a confident, well-structured analysis. It suggests checking the database connection pool, reviewing recent deployments, and looking at upstream service health. It sounds smart. The engineer follows the suggestions.

Sometimes it works. Sometimes the LLM hallucinates a plausible-sounding root cause that is completely wrong, and the engineer spends 45 minutes chasing a phantom before starting over.

This is Vibe SRE. And it's the exact same pattern Karpathy identified in coding: you prompt, you accept, you see if it works. If it doesn't, you paste the error back in and try again.

Why Vibe SRE is dangerous

1. No institutional memory.

When you paste an alert into ChatGPT, the model starts from zero. It doesn't know that this exact alert pattern occurred 6 months ago during Sprint 47, and the root cause was a config change to the rate limiter. It doesn't know that your team's post-mortem identified three contributing factors, not one. Every incident is a blank slate.

2. No system context.

The LLM doesn't know your service dependency graph. It doesn't know that Service A depends on Service B, which depends on a shared Redis cluster that was upgraded last Tuesday. It doesn't know your deployment cadence, your infrastructure topology, or your capacity thresholds. It's reasoning about a system it has never seen.

3. No verification loop.

When the LLM says “the issue is likely a database connection pool exhaustion,” there's no evidence trail. No confidence scoring. No link to the actual metrics that support or contradict the hypothesis. You're trusting a probabilistic model's best guess about a system it has no access to.

4. No guardrails.

If the LLM suggests “restart the service” or “roll back the last deployment,” there's nothing between that suggestion and the engineer executing it. No blast radius assessment. No policy check. No approval gate. No verification that the suggested action won't cascade into a bigger outage.

5. No learning.

After the incident is resolved, nothing is retained. The next time the same alert fires, the next on-call engineer starts from scratch. The organization doesn't get smarter.

The Claude Code question

This brings us to the most common question we get at Sherlocks.ai: “Why can't I just use Claude Code for incident response?”

It's a fair question. Claude Code is arguably the best agentic engineering tool available today. It reads your codebase, understands your git history, runs tests, and iterates on solutions with remarkable sophistication. It's everything Karpathy means by “agentic engineering” — in the coding domain.

But using Claude Code for SRE is like using a Formula 1 engine to power a submarine. The engine is world-class; the environment is wrong.

Claude Code operates in the development context: it sees your codebase, your git history, your test suite, your CI pipeline. It's agentic in that world because it has deep context there.

Production incident response requires a fundamentally different context: live telemetry streams, service dependency graphs, real-time infrastructure metrics, historical incident patterns, deployment timelines, runbook history, and the Slack thread from the last time this exact thing happened at 2 AM. Claude Code has access to none of that.

When you paste an alert into Claude Code and ask, “What's wrong?”, you're treating a tool that's agentic in one domain as a vibe tool in another. It will reason brilliantly — but blindly. It's like asking a world-class surgeon to diagnose a patient without any lab results, imaging, or medical history. The expertise is real; the context is missing.

Karpathy himself noted that vibe coding works for throwaway projects, but “you wouldn't vibe code an airplane's flight control system.” By the same logic, you shouldn't vibe-SRE your production infrastructure.

For a deeper technical comparison of the two approaches, see our in-depth breakdown of Claude Code vs Sherlocks.

Top-Right: Agentic SRE — Where the Industry Needs to Go

Agentic SRE combines the autonomy of the top-left quadrant with the deep system understanding of the bottom-right quadrant. AI agents investigate, correlate, and propose fixes — but they do so with full access to your system's context, history, and constraints.

This is the SRE equivalent of Karpathy's agentic engineering: the AI handles the implementation (investigation, correlation, hypothesis testing) while humans own the architecture (policies, guardrails, approval gates).

The three layers of Agentic SRE

1Unified Telemetry Layer (The "What")

The system has direct access to your metrics, logs, traces, and events — not through copy-pasted snippets, but through real-time integrations with your observability stack. OpenTelemetry, Prometheus, Datadog, Elasticsearch, CloudWatch — the agent sees what your infrastructure sees.

2Reasoning Layer (The "Why")

This is where Agentic SRE separates from both AIOps and Vibe SRE. The reasoning layer uses:

Retrieval-Augmented Generation (RAG) to pull relevant historical incidents and post-mortems
Service dependency graphs to assess blast radius and trace cascading failures
Deployment correlation to check whether recent code changes, config updates, or infrastructure modifications coincide with the incident
Institutional memory — the system knows that this alert pattern was seen before, what the root cause was, and what fixed it

3Action Layer (The "What Next")

The agent proposes specific remediation steps, supported by evidence. But critically, it operates under policy-as-code guardrails: predefined rules governing which actions are safe, which require human approval, and which should never be automated. The action is verified before it's applied, and the outcome is validated after.

The PEV Loop in SRE

Karpathy's agentic engineering relies on the PEV loop: Plan → Execute → Verify. In SRE, this translates to:

Plan

The agent receives an alert, consults the service dependency graph, reviews recent deployments, and formulates investigation hypotheses — all before taking any action.

Execute

The agent runs parallel investigations across multiple telemetry sources, correlates signals, and builds a narrative root cause analysis with an evidence trail.

Verify

The agent's conclusions are validated against the actual data. Confidence scores are attached. If the agent proposes a remediation, it goes through an approval gate. After execution, the system confirms that the fix actually worked.

No “prompt and hope.” No “sounds right, let's try it.” Structured investigation with built-in accountability.

What Agentic SRE tools look like in practice

At Sherlocks.ai, this translates to 16+ specialized agents — each trained for a specific domain (database, Kubernetes, networking, CI/CD, security) — that collaborate during an investigation. When an alert fires:

The system immediately correlates the alert with recent deployments, infrastructure changes, and historical patterns
The relevant Sherlocks are dispatched — the Database Sherlock, the Kubernetes Sherlock, the Code Analysis Sherlock — each investigating their domain in parallel
Findings are synthesized into a narrative RCA: not just “CPU is high,” but “CPU spiked on pod checkout-v2-7b because the v2.4.1 deployment introduced an unindexed query on the orders table, following the same pattern we saw in Incident #847 on November 12.”
Remediation options are proposed with blast radius assessment and supporting evidence
The on-call engineer reviews, approves, and the system executes — or the engineer takes manual action with the full investigation context in hand
Everything — the investigation path, the findings, the resolution — is captured as institutional memory for next time

This is the core difference: Agentic SRE tools arrive with context. Vibe SRE tools arrive with confidence.

For a thorough comparison of the leading platforms in this space, see our Top AI SRE Tools for 2026 guide, which evaluates eight tools across investigation depth, autonomy, integrations, and pricing.

The Checklist: Is Your Team Doing Vibe SRE or Agentic SRE?

Dimension	Vibe SRE	Agentic SRE
Investigation	Paste alert into general-purpose LLM	Purpose-built agents traverse telemetry, dependencies, and history autonomously
System context	Zero — fresh prompt every time	Full — service graph, deployment history, infrastructure topology
Historical awareness	None — every incident starts from scratch	Institutional memory — similar incidents surface automatically
Verification	“Sounds right, let’s try it”	Evidence trail, confidence scoring, human approval gate
Remediation	Engineer manually executes LLM’s suggestion	Agent proposes fix with blast radius assessment; human approves; system executes
Guardrails	None	Policy-as-code, rate limiting, approved action sets
Learning	Nothing retained	Every incident feeds back into knowledge graph
Accountability	“ChatGPT told me to”	Full audit trail of investigation steps, evidence, and decisions

If you recognize your team in the left column, you're not alone. Most teams that claim to do “AI SRE” today are actually doing Vibe SRE. The good news: the path from vibe to agentic isn't about replacing your tools. It's about adding the missing layers — context, memory, verification, and guardrails.

Why This Distinction Matters Right Now

Three forces are converging to make this urgent:

1. The market is confusing AI-assisted with agentic.

The AIOps market is projected to grow from $14.6 billion in 2024 to over $36 billion by 2030. Every vendor now claims “AI SRE” or “AI-powered incident response.” But most of what's being sold is either AI-assisted SRE (bottom-right quadrant — anomaly detection and smart alerting) or a chatbot wrapper (top-left — Vibe SRE with a logo). Teams need a framework to evaluate what they're actually buying.

2. LLMs have made Vibe SRE dangerously easy.

Before ChatGPT, nobody would paste an alert into a text box and ask a machine what to do. Now it's the default behavior for a generation of engineers who grew up with AI assistants. The tool is so good at sounding confident that it obscures the lack of system context. The gap between “plausible answer” and “correct answer” is invisible until you're 45 minutes into a false investigation path during a production outage.

3. The human role is shifting, not shrinking.

Karpathy's key insight about agentic engineering applies directly: the human role becomes more important, not less. In Agentic SRE, engineers define policies and guardrails, architect the system's understanding, review investigation findings, and make high-stakes decisions. The toil disappears; the judgment stays. This is how SRE scales without scaling headcount.

Even Google's own SRE teams have started using Gemini CLI for incident response — but in a structured, agentic way, not a vibe way. They've built guardrails, integrated it with their internal systems, and embedded it in established workflows. If Google isn't doing Vibe SRE, you probably shouldn't be either.

The Bottom Line

Karpathy gave us the vocabulary. The shift from vibe coding to agentic engineering is the same shift that needs to happen in SRE.

Vibe SRE is tempting because it's easy. Paste an alert, get an answer, move on. But production reliability isn't a prototype you can throw away if it breaks. It's a $400 billion-a-year problem that demands the same rigor Karpathy is now advocating for in software development.

The question isn't whether AI belongs in SRE — it clearly does. The question is whether you're using it as a tool that understands your system, remembers your history, operates within guardrails, and learns from every incident — or as a magic 8-ball you shake at 3 AM and hope for the best.

Agentic SRE isn't a product category. It's a standard. And it's the one your team should be holding every “AI SRE” vendor to — including us.

Frequently Asked Questions

What is Agentic SRE?

Agentic SRE is an approach to site reliability engineering where AI agents autonomously investigate, correlate, and help resolve production incidents — but with deep system context, institutional memory, and human-defined guardrails. Unlike general-purpose AI assistants, Agentic SRE systems have direct access to your telemetry, service dependencies, deployment history, and past incident records. The term draws from Andrej Karpathy’s “agentic engineering” concept: AI handles the implementation while humans own architecture, policy, and final decisions.

What is the difference between AI SRE and Agentic SRE?

“AI SRE” is a broad category that includes everything from AI-enhanced alerting (AIOps) to chatbot wrappers to fully autonomous investigation platforms. “Agentic SRE” is a specific subset that requires three things: deep system context (not just alert text), a structured investigation loop with verification (not prompt-and-hope), and policy-governed guardrails for any recommended actions. Most products marketed as “AI SRE” today are either AI-assisted (enhanced alerting) or “Vibe SRE” (a general-purpose LLM with no system integration).

What is "Vibe SRE"?

Vibe SRE is the practice of using general-purpose AI tools like ChatGPT, Claude, or Gemini for incident response by pasting alert payloads and asking “what’s wrong?” It mirrors Karpathy’s “vibe coding” — fast and intuitive, but lacking system context, institutional memory, verification loops, and safety guardrails. It works for simple, isolated issues but is unreliable for complex production incidents where root causes span multiple services and historical context matters.

Can I use Claude Code or ChatGPT for incident response?

You can, but you’ll be doing Vibe SRE. Claude Code is an exceptional agentic engineering tool for software development — it has deep context in the coding domain (codebase, git history, tests). But for SRE, it lacks the operational context that production incidents demand: live telemetry, service dependency graphs, historical incident patterns, and deployment correlation. Using Claude Code for incident response is like asking a world-class surgeon to diagnose without lab results. The expertise is real; the context is missing. For a detailed comparison, see our Claude Code vs Sherlocks analysis.

Is Agentic SRE the same as AIOps?

No. AIOps (bottom-right quadrant) focuses on enhancing observability — such as anomaly detection, noise reduction, and smart alerting. It gives you deep system understanding, though it doesn’t close the “Recommendation Gap”: the human still investigates and executes everything. Agentic SRE builds on the foundation of AIOps, adding autonomous investigation, causal reasoning, and policy-governed action. Think of AIOps as the “eyes” and Agentic SRE as the “eyes plus brain plus hands (with supervision).”

Can Agentic SRE replace human SREs?

No — and that’s the point. Karpathy’s insight about agentic engineering applies directly: the human role shifts but becomes more critical, not less. In Agentic SRE, engineers define policies and guardrails, architect the system’s context model, review AI-generated root cause analyses, and make high-stakes remediation decisions. The toil of manual log correlation and dashboard diving disappears. The judgment, architectural thinking, and strategic reliability work stay. Agentic SRE is an “Iron Man suit” for SREs, not a replacement.

What did Karpathy say about agentic engineering?

In early 2026, Andrej Karpathy proposed “agentic engineering” as the evolution beyond vibe coding. His key statement: “You are not writing the code directly 99% of the time. You are orchestrating agents who do, and acting as oversight.” He emphasized that agentic engineering is something people can learn and improve at — it’s a discipline, not simply a tool. The distinction centers on retaining human responsibility for architecture, quality, and correctness while delegating implementation to AI agents operating within structured workflows.

Want to see Agentic SRE in action?

Stop pasting alerts into chatbots. See how Sherlocks.ai investigates incidents with full system context, institutional memory, and policy guardrails.

Book a Demo

Written by

Gaurav Toshniwal

Co-founder and CEO of Sherlocks.ai. A former CTO who spent years owning on-call rotations and incident response, Gaurav writes about reducing MTTR, cutting alert noise, and what it actually takes to run reliable systems.

View all posts →