<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Subramanya N</title>
    <description>Welcome to my personal website, a space where I blog and share my musings on academic and professional topics. Dive in to explore my intellectual journey and gain insights from my experiences in various fields!</description>
    <link>https://subramanya.ai/</link>
    <atom:link href="https://subramanya.ai/feed.xml" rel="self" type="application/rss+xml" />
    <pubDate>Fri, 12 Jun 2026 01:53:51 +0000</pubDate>
    <lastBuildDate>Fri, 12 Jun 2026 01:53:51 +0000</lastBuildDate>
    <generator>Jekyll v4.4.1</generator>
    
      <item>
        <title>Context Engineering: Why Prompt Engineering Was Never Enough</title>
        <description>&lt;p&gt;For a while, “prompt engineering” was the name we gave to the craft of getting good results from large language models. It made sense in the early days. Most people were using one-shot interactions, and the main lever really did feel like wording: ask more clearly, add an example, constrain the format, and the model behaved better.&lt;/p&gt;

&lt;p&gt;That framing is now too small for the real problem.&lt;/p&gt;

&lt;p&gt;When an AI system fails in production, the issue is usually not that the model needed one more clever sentence in the system prompt. The issue is that the model did not see the right information, saw too much irrelevant information, saw the right information in the wrong format, or could not carry the right state forward from one step to the next. In other words, the problem was not just the prompt. The problem was the &lt;strong&gt;entire context pipeline&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That is why the term &lt;strong&gt;context engineering&lt;/strong&gt; has caught on. The phrase entered mainstream AI discussion in mid-2025, when Tobi Lütke and Andrej Karpathy argued that “prompt engineering” undersold the real work involved in building reliable LLM systems.[1] But the underlying discipline is older than the name. If you have built RAG, tool calling, memory systems, summarization, or evaluation loops, you have already done pieces of context engineering. What changed is that we finally have a name that describes the whole job.&lt;/p&gt;

&lt;h2 id=&quot;a-simple-mental-model&quot;&gt;A Simple Mental Model&lt;/h2&gt;

&lt;p&gt;If you want the simplest possible picture, context engineering is the layer between the outside world and the model’s working memory.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;flowchart TD
    U[&quot;User request&quot;] --&amp;gt; CE[&quot;Context engine&quot;]

    I[&quot;Instructions and policies&quot;] --&amp;gt; CE
    R[&quot;Retrieved knowledge&quot;] --&amp;gt; CE
    M[&quot;Memory and saved state&quot;] --&amp;gt; CE
    T[&quot;Tool definitions and results&quot;] --&amp;gt; CE
    H[&quot;Recent conversation history&quot;] --&amp;gt; CE

    CE --&amp;gt; W[&quot;Model context window&quot;]
    W --&amp;gt; L[&quot;LLM reasons and acts&quot;]
    L --&amp;gt; O[&quot;Answer or tool call&quot;]
    O --&amp;gt; S[&quot;New memory, logs, and state&quot;]
    S --&amp;gt; CE
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That is the whole game.&lt;/p&gt;

&lt;p&gt;The model is the reasoning engine. The context engine decides what the model gets to reason over.&lt;/p&gt;

&lt;h2 id=&quot;the-name-is-new-the-job-is-not&quot;&gt;The Name Is New. The Job Is Not.&lt;/h2&gt;

&lt;p&gt;One reason the term resonates is that it ties together several threads that had been evolving separately.&lt;/p&gt;

&lt;p&gt;Retrieval-Augmented Generation, or RAG, taught us that models need access to external knowledge at inference time.[2] ReAct taught us that reasoning and acting work better when models can call tools, observe results, and continue from there.[3] Memory research taught us that long-running assistants need indexing, retrieval, and reading strategies rather than endless transcript accumulation.[4] Long-context evaluation showed that simply stuffing more tokens into a model is not the same thing as giving it better working memory.[5][6][7]&lt;/p&gt;

&lt;p&gt;Seen this way, context engineering is not a replacement for those ideas. It is the umbrella above them.&lt;/p&gt;

&lt;p&gt;That umbrella matters because modern AI systems are no longer isolated prompts. They are dynamic systems that assemble instructions, documents, structured data, tool outputs, and prior state into a temporary context window for the next step. LangChain described this well when it defined context engineering as the work of providing the right information and tools in the right format so the LLM can plausibly complete the task.[8]&lt;/p&gt;

&lt;p&gt;The phrase “plausibly complete the task” is doing a lot of work there. It is the right test.&lt;/p&gt;

&lt;p&gt;If an agent fails, the first question should not be, “How do I make the prompt smarter?”&lt;/p&gt;

&lt;p&gt;The first question should be, “Did I actually give the model what it needed to succeed?”&lt;/p&gt;

&lt;h2 id=&quot;why-prompt-engineering-became-too-small&quot;&gt;Why Prompt Engineering Became Too Small&lt;/h2&gt;

&lt;p&gt;Prompt engineering still matters. It just became a subset of a larger discipline.&lt;/p&gt;

&lt;p&gt;The old mental model was:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Prompt engineering&lt;/th&gt;
      &lt;th&gt;Context engineering&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Write better instructions&lt;/td&gt;
      &lt;td&gt;Build the full information environment&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Focus on a single request&lt;/td&gt;
      &lt;td&gt;Focus on multi-step systems&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Mostly static&lt;/td&gt;
      &lt;td&gt;Dynamic and stateful&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Optimize wording&lt;/td&gt;
      &lt;td&gt;Optimize selection, structure, memory, and tools&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Improve a single model call&lt;/td&gt;
      &lt;td&gt;Improve the whole loop&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This distinction becomes obvious the moment you build an agent.&lt;/p&gt;

&lt;p&gt;Suppose you are building a support agent for enterprise software. The user asks, “Why are our API requests timing out?”&lt;/p&gt;

&lt;p&gt;If you think only in prompt terms, you might improve the wording:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ask the model to be concise&lt;/li&gt;
  &lt;li&gt;Ask it to cite evidence&lt;/li&gt;
  &lt;li&gt;Ask it to think step by step&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are fine improvements. But they are not enough.&lt;/p&gt;

&lt;p&gt;The real system questions are harder:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Does the agent have access to the incident runbooks?&lt;/li&gt;
  &lt;li&gt;Can it see the latest logs and status pages?&lt;/li&gt;
  &lt;li&gt;Does it know which customer tier this account belongs to?&lt;/li&gt;
  &lt;li&gt;Does it remember earlier turns in the conversation?&lt;/li&gt;
  &lt;li&gt;Can it query the ticket system?&lt;/li&gt;
  &lt;li&gt;Can it distinguish stale documents from current ones?&lt;/li&gt;
  &lt;li&gt;If it gets too much context, what gets trimmed?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is context engineering.&lt;/p&gt;

&lt;p&gt;The prompt is one line item inside it.&lt;/p&gt;

&lt;h2 id=&quot;what-counts-as-context&quot;&gt;What Counts as Context&lt;/h2&gt;

&lt;p&gt;In practice, context includes everything the model sees at inference time, not just the visible prompt.[8][9]&lt;/p&gt;

&lt;p&gt;That usually means:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;System instructions&lt;/li&gt;
  &lt;li&gt;The current user request&lt;/li&gt;
  &lt;li&gt;Retrieved documents&lt;/li&gt;
  &lt;li&gt;Structured data like JSON, tables, schemas, and records&lt;/li&gt;
  &lt;li&gt;Tool definitions&lt;/li&gt;
  &lt;li&gt;Tool outputs&lt;/li&gt;
  &lt;li&gt;Recent conversation history&lt;/li&gt;
  &lt;li&gt;Long-term memory or saved notes&lt;/li&gt;
  &lt;li&gt;Security, policy, and formatting constraints&lt;/li&gt;
  &lt;li&gt;Environment state such as files, tabs, tickets, or working directories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why the phrase “filling the context window” has become so central. The context window is not just a place where text goes. It is the model’s temporary working memory. Everything that enters it competes for attention.&lt;/p&gt;

&lt;p&gt;And competition is the key word.&lt;/p&gt;

&lt;p&gt;Every extra token is not merely additional information. It is also additional distraction.&lt;/p&gt;

&lt;h2 id=&quot;why-bigger-context-windows-did-not-solve-the-problem&quot;&gt;Why Bigger Context Windows Did Not Solve the Problem&lt;/h2&gt;

&lt;p&gt;One of the most common misconceptions in the current AI market is that larger context windows made context engineering less important.&lt;/p&gt;

&lt;p&gt;The research points in the opposite direction.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Lost in the Middle&lt;/em&gt; showed that models often use long contexts unevenly, performing better when relevant information appears near the beginning or end and worse when important information sits in the middle.[5] Databricks’ long-context RAG study found that while adding more retrieved documents can help, only a small number of state-of-the-art models maintained strong performance above 64k tokens.[6] Chroma’s &lt;em&gt;Context Rot&lt;/em&gt; report went even further: even simple tasks become less reliable as input length grows, especially when ambiguity and distractors are introduced.[7]&lt;/p&gt;

&lt;p&gt;This is the part many teams learn the hard way.&lt;/p&gt;

&lt;p&gt;Bigger windows do not eliminate the need to choose. They make the cost of bad choices less obvious at first and more painful later.&lt;/p&gt;

&lt;p&gt;A long prompt can fail in at least four different ways:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Context poisoning&lt;/strong&gt;: a bad fact, hallucination, or outdated result gets carried forward.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Context distraction&lt;/strong&gt;: too much relevant-but-not-critical detail overwhelms the core task.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Context confusion&lt;/strong&gt;: different pieces of context contradict each other.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Context waste&lt;/strong&gt;: useful tokens are buried under redundant or low-value material.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is why context engineering is not about maximizing tokens. It is about maximizing &lt;strong&gt;signal density&lt;/strong&gt; inside the context window.&lt;/p&gt;

&lt;h2 id=&quot;from-retrieval-to-navigation&quot;&gt;From Retrieval to Navigation&lt;/h2&gt;

&lt;p&gt;This is where one of the best recent ideas enters the picture.&lt;/p&gt;

&lt;p&gt;Jason Liu argued that the next step after classic chunk-based RAG is to stop thinking only about “the most similar passages” and start thinking about the &lt;strong&gt;shape of the search space&lt;/strong&gt;.[10] His framing is especially useful because it maps out a progression that many teams are already moving through:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Minimal chunks&lt;/li&gt;
  &lt;li&gt;Chunks with source metadata&lt;/li&gt;
  &lt;li&gt;Better handling for multimodal and structured content&lt;/li&gt;
  &lt;li&gt;Facets and query refinement&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first three are improvements in what gets retrieved.&lt;/p&gt;

&lt;p&gt;The fourth is more interesting. It improves what the agent learns &lt;strong&gt;about the corpus itself&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Facets give the model something like peripheral vision. Instead of returning only the top few chunks, the system can also return aggregated metadata:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Which document types dominate the result set&lt;/li&gt;
  &lt;li&gt;Which teams or owners appear most often&lt;/li&gt;
  &lt;li&gt;Which dates cluster together&lt;/li&gt;
  &lt;li&gt;Which categories are present but underrepresented in the top results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because similarity search is biased toward what is easiest to match, not necessarily what is most important to inspect.[10] A retrieval system may over-surface well-documented resolved incidents and under-surface sparse, still-open incidents. A legal search may over-surface signed contracts and hide the unsigned ones that actually need attention. Facets help the agent see not just “what matched,” but “what else exists nearby.”&lt;/p&gt;

&lt;p&gt;This is a major conceptual shift.&lt;/p&gt;

&lt;p&gt;RAG was mostly about retrieval.&lt;/p&gt;

&lt;p&gt;Context engineering is increasingly about &lt;strong&gt;navigation&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id=&quot;the-six-jobs-of-context-engineering&quot;&gt;The Six Jobs of Context Engineering&lt;/h2&gt;

&lt;p&gt;The easiest way to make context engineering concrete is to break it into the actual jobs it performs.&lt;/p&gt;

&lt;h3 id=&quot;1-selection&quot;&gt;1. Selection&lt;/h3&gt;

&lt;p&gt;The first job is deciding what deserves to enter the window at all.&lt;/p&gt;

&lt;p&gt;This includes retrieval, ranking, filtering, source choice, and freshness checks. It sounds obvious, but it is still where a huge amount of quality is won or lost. Benchmarks like BRIGHT show that realistic retrieval is much harder than surface-level semantic matching suggests.[11] If your retrieval quality is weak, no amount of downstream prompt polishing will fully save the result.&lt;/p&gt;

&lt;p&gt;Selection is not just “find relevant chunks.” It is:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;choose the right source&lt;/li&gt;
  &lt;li&gt;choose the right granularity&lt;/li&gt;
  &lt;li&gt;choose the right amount&lt;/li&gt;
  &lt;li&gt;choose the right ordering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good systems often retrieve less than naive systems, but retrieve it more intentionally.&lt;/p&gt;

&lt;h3 id=&quot;2-structure&quot;&gt;2. Structure&lt;/h3&gt;

&lt;p&gt;The second job is deciding how the chosen context is represented.&lt;/p&gt;

&lt;p&gt;The same information can be helpful or useless depending on formatting. Anthropic’s tool-use guidance is explicit about this: tool descriptions and interfaces strongly shape model behavior.[9] Long-context prompting guidance makes similar recommendations for XML tagging, source labeling, and clearly separated document sections.[12]&lt;/p&gt;

&lt;p&gt;In practice, structure means:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;label sources&lt;/li&gt;
  &lt;li&gt;separate instructions from data&lt;/li&gt;
  &lt;li&gt;wrap complex documents in consistent markup&lt;/li&gt;
  &lt;li&gt;preserve tables as tables when they matter&lt;/li&gt;
  &lt;li&gt;return citations and metadata with evidence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A short, well-labeled result often outperforms a giant JSON blob.&lt;/p&gt;

&lt;h3 id=&quot;3-compression&quot;&gt;3. Compression&lt;/h3&gt;

&lt;p&gt;The third job is reducing context without destroying what matters.&lt;/p&gt;

&lt;p&gt;This is where a lot of agent systems either get much better or much worse.&lt;/p&gt;

&lt;p&gt;Compression can mean:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;summarizing earlier turns&lt;/li&gt;
  &lt;li&gt;trimming stale history&lt;/li&gt;
  &lt;li&gt;keeping only the last few user turns verbatim&lt;/li&gt;
  &lt;li&gt;extracting durable facts from long threads&lt;/li&gt;
  &lt;li&gt;caching stable prefixes to reduce cost and latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenAI’s prompt caching documentation shows that prompt order matters economically as well as cognitively: static shared prefixes are cheaper and faster when placed up front because cache hits depend on exact prefix reuse.[13] OpenAI’s newer Responses API work on compaction pushes the same idea further by treating long-running agent history as something that should be compressed into a more token-efficient representation before the window fills up.[14]&lt;/p&gt;

&lt;p&gt;Compression is not optional. The only question is whether you do it deliberately or let the context window degrade on its own.&lt;/p&gt;

&lt;h3 id=&quot;4-memory&quot;&gt;4. Memory&lt;/h3&gt;

&lt;p&gt;The fourth job is deciding what should persist beyond the current turn.&lt;/p&gt;

&lt;p&gt;This is where many teams make the same mistake: they confuse memory with transcript retention.&lt;/p&gt;

&lt;p&gt;But good memory is not “keep everything forever.” LongMemEval frames long-term memory as a three-stage problem: indexing, retrieval, and reading.[4] That is the right way to think about it. A memory system should help the model recover the right prior fact at the right moment, not drown it in the complete past.&lt;/p&gt;

&lt;p&gt;This leads to a useful distinction:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Working memory&lt;/strong&gt;: the short-term context needed for the current task&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Reference memory&lt;/strong&gt;: externalized facts, summaries, notes, or artifacts that can be reloaded later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If everything stays in working memory, the model gets distracted.
If everything gets pushed out, the model loses continuity.&lt;/p&gt;

&lt;p&gt;Context engineering decides what belongs in each layer.&lt;/p&gt;

&lt;h3 id=&quot;5-tool-and-interface-design&quot;&gt;5. Tool and Interface Design&lt;/h3&gt;

&lt;p&gt;The fifth job is making tools legible to the model.&lt;/p&gt;

&lt;p&gt;This is an underappreciated part of the discipline. A tool surface is not just software API design. It is also context design.&lt;/p&gt;

&lt;p&gt;The model needs to understand:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;what the tool does&lt;/li&gt;
  &lt;li&gt;when to use it&lt;/li&gt;
  &lt;li&gt;what each parameter means&lt;/li&gt;
  &lt;li&gt;what the output implies&lt;/li&gt;
  &lt;li&gt;what to do next after seeing the result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why tool descriptions matter so much.[9] It is also why Jason Liu’s emphasis on tool results is important.[10] The output of a tool does not merely answer the current query. It teaches the agent how to think about the next query.&lt;/p&gt;

&lt;p&gt;When the tool surface becomes standardized through a protocol like MCP, this becomes even more important. MCP makes it easier to connect tools, resources, and prompts to LLM applications, but it does not decide what information should be surfaced, how it should be filtered, or how much of it should be injected into the next model call.[15] The protocol is the plumbing. Context engineering is still the craft.&lt;/p&gt;

&lt;h3 id=&quot;6-isolation-and-orchestration&quot;&gt;6. Isolation and Orchestration&lt;/h3&gt;

&lt;p&gt;The sixth job is deciding when not to share context.&lt;/p&gt;

&lt;p&gt;This is one of the biggest differences between toy demos and production agents.&lt;/p&gt;

&lt;p&gt;Sometimes the right answer is not a larger shared prompt. It is multiple smaller prompts with isolated scopes.&lt;/p&gt;

&lt;p&gt;Anthropic’s multi-agent research system is a strong example.[16] Their subagents run in parallel with separate context windows, which helps them explore different branches of a problem without contaminating each other with every intermediate detail. LangChain describes a similar pattern under “isolate”: sometimes the best way to improve agent reliability is to split contexts rather than accumulate them.[17]&lt;/p&gt;

&lt;p&gt;This matters because shared context has a hidden cost. It creates path dependence. A single bad branch can influence the next step, and the next, and the next.&lt;/p&gt;

&lt;p&gt;Isolation is a way to limit blast radius.&lt;/p&gt;

&lt;h2 id=&quot;what-changed-in-2026&quot;&gt;What Changed in 2026&lt;/h2&gt;

&lt;p&gt;In 2025, context engineering was mostly a useful name for a problem people already felt. In 2026, it is starting to harden into an architecture.&lt;/p&gt;

&lt;p&gt;The first big shift is that builders are moving durable state &lt;strong&gt;outside&lt;/strong&gt; the raw context window. Anthropic’s context editing and memory tool explicitly separate what stays live in the working window from what should persist across sessions.[18] OpenAI’s January 2026 cookbook on personalization makes the same move in a different form: structured state objects that persist across runs and are deliberately injected back into working memory at the start of each run.[19] OpenAI’s Responses API then pushes this one step further with native compaction, so long-running agent loops do not require every team to build a custom summarization subsystem from scratch.[14]&lt;/p&gt;

&lt;p&gt;Anthropic’s Managed Agents makes the underlying pattern unusually explicit: &lt;strong&gt;the session is not the model’s context window&lt;/strong&gt;.[20] That is a critical 2026 idea. The window is transient working memory. The session log is the durable object. The harness decides how to slice, compact, and rehydrate that durable context back into the next model call.&lt;/p&gt;

&lt;p&gt;The second shift is that retrieval is becoming more &lt;strong&gt;just in time&lt;/strong&gt; and more interface-native. Instead of front-loading every possibly relevant token, teams are giving agents retrieval surfaces they already know how to operate. Mintlify’s ChromaFs is a good example: rather than booting a full sandbox for documentation retrieval, it presents docs as a virtual filesystem navigable with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ls&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;grep&lt;/code&gt;, cutting p90 session creation from about 46 seconds to about 100 milliseconds.[21] Turso’s AgentFS pushes the same intuition toward general agent execution: a copy-on-write filesystem abstraction with portable single-file storage and built-in auditing.[22]&lt;/p&gt;

&lt;p&gt;The third shift is that &lt;strong&gt;context graphs&lt;/strong&gt; are becoming an implementation direction, not just a metaphor. Foundation Capital’s thesis made the term visible, but the stronger claim is architectural: when agents sit in the execution path, they can capture decision traces as durable artifacts, not just emit final outputs.[26][27] Open-source systems like Graphiti and commercial platforms like Zep operationalize this as temporal context graphs with validity windows, provenance episodes, and hybrid retrieval across semantics, keywords, and graph structure.[23] TrustGraph takes a related approach by treating context as a versioned artifact: graph, embeddings, evidence, and policies bundled into portable “context cores” that can be promoted or rolled back like build outputs.[24][25]&lt;/p&gt;

&lt;p&gt;The fourth shift is that context engineering is now visible in real software practice, not just platform blogs. The 2026 MSR paper on context engineering in open-source software studied 466 repositories and found that AI context files such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AGENTS.md&lt;/code&gt; are spreading, but with no stable content structure yet.[28] That matters because it marks a move from theory to operational artifacts. Context is no longer just something inferred at runtime. It is being authored, versioned, reviewed, and mined as part of the software lifecycle.&lt;/p&gt;

&lt;p&gt;If you want the 2026 mental model in one picture, it looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;flowchart LR
    E[&quot;Session log / events&quot;] --&amp;gt; A[&quot;Context assembler&quot;]
    F[&quot;Files, docs, and tools&quot;] --&amp;gt; A
    G[&quot;Context graph / memory&quot;] --&amp;gt; A
    P[&quot;Policies and AGENTS.md&quot;] --&amp;gt; A

    A --&amp;gt; W[&quot;Working context window&quot;]
    W --&amp;gt; X[&quot;Agent action&quot;]

    X --&amp;gt; E
    X --&amp;gt; G
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That is a very different architecture from “prompt + vector search.”&lt;/p&gt;

&lt;h2 id=&quot;where-context-graphs-actually-fit&quot;&gt;Where Context Graphs Actually Fit&lt;/h2&gt;

&lt;p&gt;One reason this conversation gets muddy is that people use &lt;strong&gt;context engineering&lt;/strong&gt; and &lt;strong&gt;context graph&lt;/strong&gt; as if they mean the same thing. They do not.&lt;/p&gt;

&lt;p&gt;Context engineering is the broader discipline. It is the work of deciding what goes into the next context window, what stays out, what gets compressed, and what gets retrieved on demand.&lt;/p&gt;

&lt;p&gt;A context graph is one possible long-term memory substrate inside that larger system.&lt;/p&gt;

&lt;p&gt;That distinction matters because not every useful agent needs a context graph. A documentation assistant over mostly static content may need good retrieval, tool design, and compaction, but not a graph. A coding agent may get surprisingly far with repository instructions, a durable session log, and a filesystem abstraction.[20][21][22][28]&lt;/p&gt;

&lt;p&gt;Context graphs become compelling when the problem has four characteristics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Temporal truth matters.&lt;/strong&gt; You need to know not just what is true now, but what was true at decision time.[23]&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Provenance matters.&lt;/strong&gt; You need to trace facts back to the episode, document, or interaction that produced them.[23][24]&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Precedent matters.&lt;/strong&gt; The task depends on how similar cases were handled before, including exceptions and approvals.[26][27]&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Cross-entity reasoning matters.&lt;/strong&gt; The useful memory is not a flat note, but a network of people, policies, incidents, accounts, tickets, and outcomes.[23][25]&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why the best definition of a context graph, in my view, is not “a graph database for AI.” It is a &lt;strong&gt;durable representation of precedent&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That is also why decision traces matter so much. Foundation Capital’s framing is useful here: rules tell the agent what should happen in general; decision traces tell it what happened in a specific case, under real constraints, with real exceptions.[26] Once those traces are linked across entities and time, you get something much more valuable than generic memory. You get searchable judgment.&lt;/p&gt;

&lt;h2 id=&quot;how-i-would-build-it-in-2026&quot;&gt;How I Would Build It in 2026&lt;/h2&gt;

&lt;p&gt;If I were building a serious context-engineering stack today, I would not start with the graph. I would start with the interfaces and promotion rules.&lt;/p&gt;

&lt;h3 id=&quot;1-build-a-durable-session-layer-first&quot;&gt;1. Build a durable session layer first&lt;/h3&gt;

&lt;p&gt;Every action, tool result, observation, and important intermediate artifact should land in an append-only session log or event store. This is your recoverable context object.[14][20]&lt;/p&gt;

&lt;p&gt;Do not confuse the active context window with the source of truth.&lt;/p&gt;

&lt;p&gt;The window is for reasoning.
The session is for recovery, replay, debugging, and selective rehydration.&lt;/p&gt;

&lt;h3 id=&quot;2-treat-the-context-assembler-as-a-product-surface&quot;&gt;2. Treat the context assembler as a product surface&lt;/h3&gt;

&lt;p&gt;The assembler should explicitly manage:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;token budgets&lt;/li&gt;
  &lt;li&gt;source priority&lt;/li&gt;
  &lt;li&gt;freshness&lt;/li&gt;
  &lt;li&gt;compaction thresholds&lt;/li&gt;
  &lt;li&gt;history trimming&lt;/li&gt;
  &lt;li&gt;citation formatting&lt;/li&gt;
  &lt;li&gt;cache-aware ordering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the layer that decides what the model sees &lt;em&gt;now&lt;/em&gt;. It should be observable, testable, and cheap to change.[18][19][14]&lt;/p&gt;

&lt;h3 id=&quot;3-prefer-just-in-time-retrieval-over-eager-stuffing&quot;&gt;3. Prefer just-in-time retrieval over eager stuffing&lt;/h3&gt;

&lt;p&gt;Give the model lightweight handles first: file paths, object IDs, URLs, query templates, ticket IDs, incident IDs. Then let it pull detail only when needed.[9][18][21]&lt;/p&gt;

&lt;p&gt;This is where filesystems, MCP tools, search APIs, and structured queries become more valuable than giant top-K dumps.&lt;/p&gt;

&lt;h3 id=&quot;4-promote-only-high-value-state-into-long-term-memory&quot;&gt;4. Promote only high-value state into long-term memory&lt;/h3&gt;

&lt;p&gt;Not everything should become memory.&lt;/p&gt;

&lt;p&gt;I would promote four classes of artifacts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;stable user or account preferences&lt;/li&gt;
  &lt;li&gt;durable facts with provenance&lt;/li&gt;
  &lt;li&gt;important intermediate summaries&lt;/li&gt;
  &lt;li&gt;decision traces and exceptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else should stay in the session log until it proves it deserves promotion.&lt;/p&gt;

&lt;h3 id=&quot;5-build-the-context-graph-as-a-promoted-memory-layer&quot;&gt;5. Build the context graph as a promoted memory layer&lt;/h3&gt;

&lt;p&gt;This is the part many teams invert.&lt;/p&gt;

&lt;p&gt;The graph should not be your raw transcript in graph form. It should be the curated layer that sits above sessions and below real-time assembly:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;entities&lt;/li&gt;
  &lt;li&gt;relationships&lt;/li&gt;
  &lt;li&gt;time validity&lt;/li&gt;
  &lt;li&gt;source episodes&lt;/li&gt;
  &lt;li&gt;approvals&lt;/li&gt;
  &lt;li&gt;exceptions&lt;/li&gt;
  &lt;li&gt;outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you skip the promotion step, the graph becomes a dumping ground.
If you get promotion right, the graph becomes the memory of how the organization actually reasons.[23][26]&lt;/p&gt;

&lt;h3 id=&quot;6-package-context-like-code&quot;&gt;6. Package context like code&lt;/h3&gt;

&lt;p&gt;By 2026, one of the most promising ideas is to treat context as a versioned artifact. In software projects this shows up as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AGENTS.md&lt;/code&gt; and other repository-specific context files.[28] In graph-native systems it shows up as context cores: portable bundles of ontology, graph structure, embeddings, provenance, and retrieval policy.[24][25]&lt;/p&gt;

&lt;p&gt;This matters because context changes need the same operational discipline as code changes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;review&lt;/li&gt;
  &lt;li&gt;versioning&lt;/li&gt;
  &lt;li&gt;rollback&lt;/li&gt;
  &lt;li&gt;environment promotion&lt;/li&gt;
  &lt;li&gt;evaluation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once context becomes an artifact, it becomes governable.&lt;/p&gt;

&lt;h3 id=&quot;7-separate-observability-from-intelligence&quot;&gt;7. Separate observability from intelligence&lt;/h3&gt;

&lt;p&gt;You need both:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;observability of the agent run&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;observability of the context system&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not the same thing.&lt;/p&gt;

&lt;p&gt;I want to know:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;what the model saw&lt;/li&gt;
  &lt;li&gt;what it did not see&lt;/li&gt;
  &lt;li&gt;what got compacted&lt;/li&gt;
  &lt;li&gt;what was retrieved just in time&lt;/li&gt;
  &lt;li&gt;what got promoted into memory&lt;/li&gt;
  &lt;li&gt;what graph neighborhood was traversed&lt;/li&gt;
  &lt;li&gt;which precedent actually influenced the action&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot answer those questions, you are still debugging prompts in the dark.&lt;/p&gt;

&lt;h2 id=&quot;a-practical-maturity-model&quot;&gt;A Practical Maturity Model&lt;/h2&gt;

&lt;p&gt;If you are trying to evaluate where your own system stands, this maturity model is more useful than abstract definitions.&lt;/p&gt;

&lt;h3 id=&quot;level-0-prompt-only&quot;&gt;Level 0: Prompt-Only&lt;/h3&gt;

&lt;p&gt;You have a system prompt, a user message, and maybe a couple of examples.&lt;/p&gt;

&lt;p&gt;This can work surprisingly well for narrow tasks. It breaks quickly when the task requires fresh knowledge, persistence, or tools.&lt;/p&gt;

&lt;h3 id=&quot;level-1-retrieval-enhanced&quot;&gt;Level 1: Retrieval-Enhanced&lt;/h3&gt;

&lt;p&gt;You add documents at runtime.&lt;/p&gt;

&lt;p&gt;This is where many teams stop. It is also where many teams start seeing the limitations of naive chunking, ranking, and context bloat.&lt;/p&gt;

&lt;h3 id=&quot;level-2-agent-aware&quot;&gt;Level 2: Agent-Aware&lt;/h3&gt;

&lt;p&gt;You now manage history, tool results, memory, and formatting intentionally.&lt;/p&gt;

&lt;p&gt;This is the first level where “context engineering” becomes a useful term, because the system is no longer just prompt plus retrieval. It is assembling multiple forms of context dynamically.&lt;/p&gt;

&lt;h3 id=&quot;level-3-adaptive&quot;&gt;Level 3: Adaptive&lt;/h3&gt;

&lt;p&gt;The system changes how it builds context based on the task.&lt;/p&gt;

&lt;p&gt;It may:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;choose among sources&lt;/li&gt;
  &lt;li&gt;compress older history&lt;/li&gt;
  &lt;li&gt;reload memory selectively&lt;/li&gt;
  &lt;li&gt;route work to specialized tools&lt;/li&gt;
  &lt;li&gt;isolate subproblems into separate contexts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this point, context construction is part of the application’s core logic.&lt;/p&gt;

&lt;h3 id=&quot;level-4-context-native&quot;&gt;Level 4: Context-Native&lt;/h3&gt;

&lt;p&gt;The system treats context as a first-class engineering surface.&lt;/p&gt;

&lt;p&gt;It has:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;explicit context budgets&lt;/li&gt;
  &lt;li&gt;retrieval and generation evals&lt;/li&gt;
  &lt;li&gt;metadata and facet-aware navigation&lt;/li&gt;
  &lt;li&gt;memory policies&lt;/li&gt;
  &lt;li&gt;observability around failure modes&lt;/li&gt;
  &lt;li&gt;cost-aware prompt assembly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where the strongest production systems are heading.&lt;/p&gt;

&lt;h2 id=&quot;what-good-context-engineering-looks-like-in-practice&quot;&gt;What Good Context Engineering Looks Like in Practice&lt;/h2&gt;

&lt;p&gt;If I had to reduce the whole discipline to a checklist, it would look like this:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Start with the task, not the prompt. Define what success looks like first.&lt;/li&gt;
  &lt;li&gt;Enumerate the context sources the model might need. Instructions, docs, tools, memory, state, policies.&lt;/li&gt;
  &lt;li&gt;Separate working memory from reference memory. Not everything should live in the active window.&lt;/li&gt;
  &lt;li&gt;Retrieve with intent. More chunks is not the same as better recall.&lt;/li&gt;
  &lt;li&gt;Structure context so the model can parse it quickly. Labels, sources, tables, and boundaries matter.&lt;/li&gt;
  &lt;li&gt;Design tools as if they are part of the prompt, because they are.&lt;/li&gt;
  &lt;li&gt;Trim aggressively. If you would not ask a human to reread it, do not force the model to reread it.&lt;/li&gt;
  &lt;li&gt;Measure retrieval and generation separately. Otherwise you will diagnose the wrong problem.&lt;/li&gt;
  &lt;li&gt;Use isolated contexts when tasks branch or can run in parallel.&lt;/li&gt;
  &lt;li&gt;Promote durable facts and decision traces intentionally. Not every transcript belongs in long-term memory.&lt;/li&gt;
  &lt;li&gt;Package critical context like code. Instructions, policies, and graph artifacts should be versioned.&lt;/li&gt;
  &lt;li&gt;Treat context bugs like software bugs. They should be observable, reproducible, and fixable.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this is glamorous. That is exactly why it matters.&lt;/p&gt;

&lt;p&gt;Prompt engineering became popular because it sounded like a shortcut.&lt;/p&gt;

&lt;p&gt;Context engineering matters because it describes the actual work.&lt;/p&gt;

&lt;h2 id=&quot;the-real-takeaway&quot;&gt;The Real Takeaway&lt;/h2&gt;

&lt;p&gt;The center of gravity in AI is moving.&lt;/p&gt;

&lt;p&gt;The frontier question used to be: &lt;strong&gt;How smart is the model?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The applied question is increasingly: &lt;strong&gt;What does the model get to see before it has to act?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is a different engineering problem. It is less about single prompts and more about systems design. Less about phrasing and more about information flow. Less about one-shot output quality and more about whether an agent can stay reliable over time.&lt;/p&gt;

&lt;p&gt;This is why context engineering is going to keep growing as a discipline. The better models get, the more the remaining failures look like context failures. Missing state. Wrong tool. Bad retrieval. Bloated history. Poor formatting. Conflicting evidence. Weak memory. Unbounded loops.&lt;/p&gt;

&lt;p&gt;The irony is that this makes AI systems feel more like classical software, not less. We are back to building pipelines, interfaces, state machines, memory hierarchies, caches, and observability layers. The novelty is that all of those pieces now exist in service of a probabilistic reasoning engine.&lt;/p&gt;

&lt;p&gt;The name may be new. The direction is not.&lt;/p&gt;

&lt;p&gt;Reliable AI systems will be built by teams that treat context as a first-class product surface.&lt;/p&gt;

&lt;p&gt;Everyone else will keep calling the model flaky.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://simonwillison.net/2025/Jun/27/context-engineering/&quot;&gt;Simon Willison. (2025, June 27). &lt;em&gt;Context engineering&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://arxiv.org/abs/2005.11401&quot;&gt;Lewis, P. et al. (2020). &lt;em&gt;Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://arxiv.org/abs/2210.03629&quot;&gt;Yao, S. et al. (2023). &lt;em&gt;ReAct: Synergizing Reasoning and Acting in Language Models&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://arxiv.org/abs/2410.10813&quot;&gt;Wu, D. et al. (2025). &lt;em&gt;LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://arxiv.org/abs/2307.03172&quot;&gt;Liu, N. F. et al. (2023). &lt;em&gt;Lost in the Middle: How Language Models Use Long Contexts&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[6] &lt;a href=&quot;https://arxiv.org/abs/2411.03538&quot;&gt;Leng, Q. et al. (2024). &lt;em&gt;Long Context RAG Performance of Large Language Models&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[7] &lt;a href=&quot;https://www.trychroma.com/research/context-rot&quot;&gt;Hong, K., Troynikov, A., and Huber, J. (2025, July 14). &lt;em&gt;Context Rot: How Increasing Input Tokens Impacts LLM Performance&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[8] &lt;a href=&quot;https://blog.langchain.com/the-rise-of-context-engineering&quot;&gt;LangChain. (2025, June 23). &lt;em&gt;The rise of “context engineering”&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[9] &lt;a href=&quot;https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use&quot;&gt;Anthropic. &lt;em&gt;How to implement tool use&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[10] &lt;a href=&quot;https://jxnl.co/writing/2025/08/27/facets-context-engineering/&quot;&gt;Jason Liu. (2025, August 27). &lt;em&gt;Beyond Chunks: Why Context Engineering is the Future of RAG&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[11] &lt;a href=&quot;https://arxiv.org/abs/2407.12883&quot;&gt;Su, H. et al. (2025). &lt;em&gt;BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[12] &lt;a href=&quot;https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/long-context-tips&quot;&gt;Anthropic. &lt;em&gt;Long context prompting tips&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[13] &lt;a href=&quot;https://platform.openai.com/docs/guides/prompt-caching&quot;&gt;OpenAI. &lt;em&gt;Prompt caching&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[14] &lt;a href=&quot;https://openai.com/index/equip-responses-api-computer-environment&quot;&gt;OpenAI. (2026, March 19). &lt;em&gt;From model to agent: Equipping the Responses API with a computer environment&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[15] &lt;a href=&quot;https://modelcontextprotocol.io/&quot;&gt;Model Context Protocol. &lt;em&gt;What is the Model Context Protocol (MCP)?&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[16] &lt;a href=&quot;https://www.anthropic.com/engineering/built-multi-agent-research-system&quot;&gt;Anthropic. (2025, June 13). &lt;em&gt;How we built our multi-agent research system&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[17] &lt;a href=&quot;https://blog.langchain.com/context-engineering-for-agents/&quot;&gt;LangChain. (2025, July 2). &lt;em&gt;Context Engineering&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[18] &lt;a href=&quot;https://claude.com/blog/context-management&quot;&gt;Anthropic. (2025, September 29). &lt;em&gt;Managing context on the Claude Developer Platform&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[19] &lt;a href=&quot;https://developers.openai.com/cookbook/examples/agents_sdk/context_personalization&quot;&gt;Okcular, E. (2026, January 5). &lt;em&gt;Context Engineering for Personalization - State Management with Long-Term Memory Notes using OpenAI Agents SDK&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[20] &lt;a href=&quot;https://www.anthropic.com/engineering/managed-agents&quot;&gt;Anthropic. &lt;em&gt;Scaling Managed Agents: Decoupling the brain from the hands&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[21] &lt;a href=&quot;https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant&quot;&gt;Mintlify. (2026, March 24). &lt;em&gt;How we built a virtual filesystem for our Assistant&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[22] &lt;a href=&quot;https://docs.turso.tech/agentfs/introduction&quot;&gt;Turso. &lt;em&gt;AgentFS&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[23] &lt;a href=&quot;https://github.com/getzep/graphiti&quot;&gt;Zep. &lt;em&gt;Graphiti: Build Real-Time Knowledge Graphs for AI Agents&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[24] &lt;a href=&quot;https://github.com/trustgraph-ai/trustgraph&quot;&gt;TrustGraph. &lt;em&gt;The context development platform&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[25] &lt;a href=&quot;https://docs.trustgraph.ai/guides/context-cores/&quot;&gt;TrustGraph. &lt;em&gt;Working with Context Cores&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[26] &lt;a href=&quot;https://foundationcapital.com/ideas/context-graphs-ais-trillion-dollar-opportunity&quot;&gt;Gupta, J., and Garg, A. (2025, December 22). &lt;em&gt;AI’s trillion-dollar opportunity: Context graphs&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[27] &lt;a href=&quot;https://foundationcapital.com/ideas/why-context-graphs-are-the-missing-layer-for-ai&quot;&gt;Garg, A. (2026, January 16). &lt;em&gt;Why context graphs are the missing layer for AI&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[28] &lt;a href=&quot;https://arxiv.org/abs/2510.21413&quot;&gt;Mohsenimofidi, S., Galster, M., Treude, C., and Baltes, S. (2026). &lt;em&gt;Context Engineering for AI Agents in Open-Source Software&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2026/04/23/context-engineering-why-prompt-engineering-was-never-enough/</link>
        <guid isPermaLink="true">https://subramanya.ai/2026/04/23/context-engineering-why-prompt-engineering-was-never-enough/</guid>
        
        <category>Context Engineering</category>
        
        <category>Context Graphs</category>
        
        <category>AI Agents</category>
        
        <category>RAG</category>
        
        <category>Prompt Engineering</category>
        
        <category>MCP</category>
        
        <category>Enterprise AI</category>
        
        <category>Agent Architecture</category>
        
        <category>Agent Memory</category>
        
        
      </item>
    
      <item>
        <title>The Filesystem Is the Database: Why Agents Need a New Storage Primitive</title>
        <description>&lt;p&gt;Something interesting is happening in the agentic infrastructure space, and it is not what most people expected. For the past two years, the dominant paradigm for giving agents access to knowledge has been Retrieval-Augmented Generation: embed your documents, store them in a vector database, and let the model query them at inference time. RAG worked. It was good enough. But “good enough” has a shelf life, and in 2026, that shelf life is expiring.&lt;/p&gt;

&lt;p&gt;A new pattern is emerging across the industry, and it is converging from multiple directions at once. Mintlify replaced its entire RAG pipeline with a virtual filesystem and saw session creation drop from 46 seconds to 100 milliseconds [1]. Turso built AgentFS, a SQLite-backed filesystem that gives every agent its own copy-on-write sandbox [2]. Box, the enterprise content giant, announced that it is repositioning its entire platform as a virtual filesystem layer for AI agents [3]. And ByteDance open-sourced OpenViking, a context database that organizes all agent memory, resources, and skills as a hierarchical filesystem [4].&lt;/p&gt;

&lt;p&gt;These are not niche experiments. They are signals of a fundamental shift. &lt;strong&gt;The filesystem is becoming the universal interface for agent cognition, and the database is quietly becoming its substrate.&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;why-rag-hit-a-wall&quot;&gt;Why RAG Hit a Wall&lt;/h2&gt;

&lt;p&gt;RAG was the right answer for 2023. You had a pile of documents, a model with a limited context window, and you needed a way to surface relevant chunks at query time. Vector embeddings and similarity search solved that problem elegantly.&lt;/p&gt;

&lt;p&gt;But agents are not chatbots. An agent does not ask one question and leave. It explores. It reads a file, discovers a reference, follows it, reads another file, runs a command, writes an output. This is not a retrieval problem. It is a navigation problem.&lt;/p&gt;

&lt;p&gt;RAG pipelines struggle with this for three reasons. First, they are stateless by design. Every query is independent; there is no concept of “I was just looking at this directory, now show me the adjacent file.” Second, they flatten structure. A documentation site with a clear hierarchy of sections, pages, and code examples gets shredded into anonymous 512-token chunks that lose their organizational context. Third, they are expensive at scale. Embedding computation, vector index maintenance, and re-ranking all add latency and cost that compound as the corpus grows.&lt;/p&gt;

&lt;p&gt;The filesystem solves all three. It is inherently stateful (the agent has a working directory). It preserves structure (directories, subdirectories, files). And it is fast because the operations are simple: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ls&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;grep&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;find&lt;/code&gt;. These are not novel abstractions. They are the most battle-tested interface in computing.&lt;/p&gt;

&lt;h2 id=&quot;the-convergence-four-approaches-one-pattern&quot;&gt;The Convergence: Four Approaches, One Pattern&lt;/h2&gt;

&lt;p&gt;What makes this moment significant is that the filesystem pattern is emerging independently across very different contexts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mintlify’s ChromaFs&lt;/strong&gt; is perhaps the most instructive example. Mintlify powers documentation assistants for thousands of companies. Their original architecture was textbook RAG: chunk the docs, embed them, retrieve at query time. When they replaced it with ChromaFs, a virtual filesystem that intercepts UNIX commands and translates them into Chroma database queries, the results were dramatic. Session creation went from 46 seconds to 100 milliseconds, a 460x improvement. Marginal cost per conversation dropped from $0.0137 to effectively zero [1]. The key insight: the agent already knows how to navigate a filesystem. Teaching it to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat /auth/oauth.mdx&lt;/code&gt; is trivial compared to teaching it to formulate the right vector query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Turso’s AgentFS&lt;/strong&gt; attacks a different problem: agent isolation and auditability. Every agent gets its own SQLite-backed filesystem with copy-on-write semantics. The host filesystem is a read-only base layer; the agent writes to a SQLite delta layer. Every file operation, tool call, and state change is recorded. The entire agent runtime, files, state, history, fits in a single portable SQLite file [2]. This is not just a filesystem. It is an auditable, reproducible execution environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Box’s enterprise VFS&lt;/strong&gt; is the most strategically significant. Box CEO Aaron Levie has been explicit: agents need a filesystem to do knowledge work in the enterprise [3]. But Box is not pitching a literal filesystem. They are pitching a “dynamic data delivery contract” that can be backed by object storage, relational databases, or their own content platform. The filesystem is the interface; the backing store is whatever makes sense for the data. What makes Box’s play interesting is the governance layer: permissions, audit trails, and compliance boundaries that carry over automatically from the content platform to the agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ByteDance’s OpenViking&lt;/strong&gt; takes the pattern furthest. It organizes all agent context, memories, resources, skills, knowledge, under a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;viking://&lt;/code&gt; protocol using standard filesystem semantics. Agents navigate with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ls&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;find&lt;/code&gt;. But the clever part is the tiered access model: every piece of context is processed into three layers. L0 is a one-sentence summary for quick retrieval. L1 is an overview with core information for planning. L2 is the full content for deep reading [4]. The agent starts with L0, drills into L1 when it needs more, and only loads L2 when it is doing detailed work. On the LoCoMo benchmark, this reduced token consumption from 24.6 million to 4.2 million while increasing task completion rates to 52% [4].&lt;/p&gt;

&lt;h2 id=&quot;filesystem-as-interface-database-as-substrate&quot;&gt;Filesystem as Interface, Database as Substrate&lt;/h2&gt;

&lt;p&gt;The pattern that connects all four is what I would call the &lt;strong&gt;VFS duality&lt;/strong&gt;: the filesystem wins as the interface, and the database wins as the substrate. This is not an either-or choice. It is a layered architecture.&lt;/p&gt;

&lt;p&gt;Why the filesystem wins as the interface is straightforward. LLMs are trained on the internet, and the internet is built by developers who think in terms of files, directories, paths, and command-line tools. Models are unusually competent with these primitives because they have seen billions of examples of developers navigating codebases, reading files, and running shell commands. When you give an agent a filesystem, you are meeting it where its training data lives.&lt;/p&gt;

&lt;p&gt;Why the database wins as the substrate is equally clear. The moment agent memory needs to be shared, audited, queried by multiple agents, or made reliable under concurrency, you need database guarantees. ACID transactions, access control, semantic search, version history: these are hard problems that databases have spent decades solving. Reimplementing them on top of a literal filesystem is a path to pain.&lt;/p&gt;

&lt;p&gt;The VFS pattern gives you both. The agent sees files and directories. The system sees tables, indexes, and access control lists. ChromaFs stores everything in Chroma but exposes it as files. AgentFS stores everything in SQLite but exposes it as a POSIX filesystem. OpenViking uses its own storage engine but exposes it as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;viking://&lt;/code&gt; paths. Box uses its enterprise content platform but exposes it as a navigable tree.&lt;/p&gt;

&lt;h2 id=&quot;but-can-a-vfs-actually-beat-the-native-filesystem&quot;&gt;But Can a VFS Actually Beat the Native Filesystem?&lt;/h2&gt;

&lt;p&gt;The natural objection to all of this is: why not just use the real filesystem? POSIX is right there. Every operating system ships with it. Why add an abstraction layer?&lt;/p&gt;

&lt;p&gt;I wanted to answer this question empirically, so I built &lt;a href=&quot;https://github.com/subramanya1997/markdownfs&quot;&gt;markdownfs&lt;/a&gt;, a from-scratch virtual filesystem in Rust designed specifically for agent workloads [6]. It supports the full set of UNIX-like commands (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ls&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;grep&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;find&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;chmod&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;chown&lt;/code&gt;), Git-style versioning with content-addressable storage, multi-user permissioning, and exposes three access methods: a CLI/REPL, an HTTP/REST API, and an MCP server that agents like Claude and Cursor can connect to directly.&lt;/p&gt;

&lt;p&gt;The architecture is simple: an in-memory inode table backed by a content-addressable blob store using SHA-256 hashing, with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tokio::RwLock&lt;/code&gt; for safe concurrent access. Files are deduplicated automatically. Version control uses the same commit/revert model as Git, but at the filesystem level. Persistence is handled through atomic bincode snapshots.&lt;/p&gt;

&lt;p&gt;When I benchmarked markdownfs against the native filesystem across the standard agent operations (file creation, reads, writes, directory listing, grep, find, move, copy, deletion), markdownfs averaged roughly &lt;strong&gt;130x faster&lt;/strong&gt; across the board. The reasons are structural, not incidental. In-memory operations eliminate disk I/O entirely. Content-addressable storage means duplicate files are stored once. Zero-copy reads mean the agent gets data without serialization overhead. And because the entire filesystem state lives in a single process, there are no system call boundaries to cross.&lt;/p&gt;

&lt;p&gt;The comparison is particularly stark for the operations agents perform most frequently:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Operation&lt;/th&gt;
      &lt;th&gt;Why VFS Wins&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Repeated reads&lt;/strong&gt; (agent re-reading context)&lt;/td&gt;
      &lt;td&gt;In-memory, zero-copy. No disk seeks, no page cache misses.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;grep across files&lt;/strong&gt; (agent searching for patterns)&lt;/td&gt;
      &lt;td&gt;All content is in-memory. No directory traversal, no file handle management.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Rapid file creation&lt;/strong&gt; (agent producing work artifacts)&lt;/td&gt;
      &lt;td&gt;No filesystem journaling, no inode allocation on disk, no fsync.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Directory listing&lt;/strong&gt; (agent exploring structure)&lt;/td&gt;
      &lt;td&gt;BTreeMap lookup vs. readdir syscalls.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;But performance is not the real argument. The real argument is what the native filesystem &lt;em&gt;cannot do&lt;/em&gt;. A POSIX filesystem has no concept of semantic search. It has no built-in versioning (you need Git for that). It has no tiered access model (you get the whole file or nothing). It has no content deduplication. It has no audit trail of agent operations. And critically, it has no MCP interface, which means agents cannot access it through the standard protocol that the ecosystem is converging on.&lt;/p&gt;

&lt;p&gt;The VFS is not just faster. It is a richer primitive. It gives you the familiar interface of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ls&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat&lt;/code&gt; while adding the capabilities that agents actually need: versioning, permissions, search, deduplication, and protocol-native access via MCP or HTTP.&lt;/p&gt;

&lt;h2 id=&quot;what-this-means-for-rag&quot;&gt;What This Means for RAG&lt;/h2&gt;

&lt;p&gt;To be clear, RAG is not dead. Vector search remains valuable for fuzzy, semantic queries where the agent genuinely does not know what it is looking for. But the honest assessment is that RAG has been over-applied. Many of the use cases where teams deployed RAG pipelines, documentation retrieval, codebase navigation, enterprise knowledge management, are better served by a filesystem interface.&lt;/p&gt;

&lt;p&gt;The evidence is striking. Mintlify’s 460x speedup came from replacing RAG with a filesystem, not augmenting it [1]. Research from Letta shows that agents using simple filesystem operations achieve 74% accuracy on memory benchmarks, competitive with specialized retrieval tools. And agentic keyword search approaches can achieve over 90% of RAG performance without vector databases at all [5].&lt;/p&gt;

&lt;p&gt;The future is likely hybrid. RAG for open-ended semantic search. Filesystem for structured navigation and task execution. But the center of gravity is shifting toward the filesystem, and the strategic implications are significant.&lt;/p&gt;

&lt;h2 id=&quot;the-strategic-imperative&quot;&gt;The Strategic Imperative&lt;/h2&gt;

&lt;p&gt;If you are building agentic infrastructure, you need a VFS strategy. Here is why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For SaaS companies&lt;/strong&gt;: the lesson from Box is that the filesystem is becoming the integration surface for agents. If your platform’s content is not navigable as a filesystem, agents will bypass you. The SaaS companies that expose their data through filesystem-like interfaces will become part of the agentic workflow. Those that do not will become invisible to agents, which means invisible to users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For infrastructure vendors&lt;/strong&gt;: the database is not going away. It is moving underneath the filesystem. This is actually good news for database companies. Turso understood this and built AgentFS on top of SQLite. Every agent that spins up creates a new database. The more agents the world runs, the more databases the world needs. But the database needs to disappear behind a filesystem abstraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For enterprises&lt;/strong&gt;: the governance story is what matters. Box’s pitch is not really about filesystems. It is about the fact that their permission model, audit trail, and compliance infrastructure automatically extends to agents when content is accessed through the VFS layer [3]. This is the answer to the question every CISO is asking: “How do we let agents access our content without creating a security nightmare?”&lt;/p&gt;

&lt;h2 id=&quot;the-unifying-layer&quot;&gt;The Unifying Layer&lt;/h2&gt;

&lt;p&gt;The agentic infrastructure stack has been evolving in clear phases: tools (MCP), skills, and context graphs. The virtual filesystem fits into this arc as the &lt;strong&gt;delivery mechanism&lt;/strong&gt; for all three. MCP tools are invoked through the filesystem. Skills are stored as files. Context graphs are navigated as directory trees. The filesystem does not replace these layers. It unifies them behind a single, familiar interface.&lt;/p&gt;

&lt;p&gt;This is the real insight. The filesystem is not a new idea. It is the oldest abstraction in computing. But that is exactly why it works for agents. In a world where we are inventing new paradigms every quarter, the most powerful move might be reaching back to the most proven interface we have and putting a modern database behind it.&lt;/p&gt;

&lt;p&gt;The companies that understand this, Mintlify, Turso, Box, ByteDance, are not building something new. They are recognizing something old and giving it a new job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant&quot;&gt;Mintlify. (2026, April 2). &lt;em&gt;How we built a virtual filesystem for our Assistant&lt;/em&gt;. Mintlify Blog.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://turso.tech/blog/agentfs&quot;&gt;Turso. (2026). &lt;em&gt;The Missing Abstraction for AI Agents: The Agent Filesystem&lt;/em&gt;. Turso Blog.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://www.blocksandfiles.com/ai-ml/2026/03/09/box-pitches-virtual-filesystem-layer-for-ai-agents/5208017&quot;&gt;Blocks and Files. (2026, March 9). &lt;em&gt;Box pitches ‘virtual filesystem’ layer for AI agents&lt;/em&gt;. Blocks and Files.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://github.com/volcengine/OpenViking&quot;&gt;Volcengine. (2026). &lt;em&gt;OpenViking: An open-source context database for AI Agents&lt;/em&gt;. GitHub.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://signals.aktagon.com/articles/2026/02/keyword-search-is-all-you-need-achieving-rag-level-performance-without-vector-databases-using-agentic-tool-use/&quot;&gt;Signals. (2026, February). &lt;em&gt;Keyword Search is All You Need: Achieving RAG-Level Performance Without Vector Databases Using Agentic Tool Use&lt;/em&gt;. Signals.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[6] &lt;a href=&quot;https://github.com/subramanya1997/markdownfs&quot;&gt;Subramanya N. (2026). &lt;em&gt;markdownfs: A high-performance, concurrent markdown database built in Rust&lt;/em&gt;. GitHub.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Mon, 13 Apr 2026 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2026/04/13/the-filesystem-is-the-database-why-agents-need-a-new-storage-primitive/</link>
        <guid isPermaLink="true">https://subramanya.ai/2026/04/13/the-filesystem-is-the-database-why-agents-need-a-new-storage-primitive/</guid>
        
        <category>Agentic AI</category>
        
        <category>Virtual Filesystem</category>
        
        <category>RAG</category>
        
        <category>Agent Infrastructure</category>
        
        <category>Enterprise AI</category>
        
        <category>Context Engineering</category>
        
        <category>AgentFS</category>
        
        <category>MCP</category>
        
        
      </item>
    
      <item>
        <title>The SaaSpocalypse: A Survival Guide</title>
        <description>&lt;p&gt;It started with a single press release. On January 30, 2026, AI startup Anthropic announced 11 specialized plugins for its Claude Cowork agent, empowering it to handle complex workflows in sales, finance, legal, and HR [1]. Wall Street’s reaction was not just negative; it was apocalyptic. In the week that followed, nearly &lt;strong&gt;$1 trillion&lt;/strong&gt; in value was wiped from software and services stocks in a sell-off so brutal, Jefferies traders coined a new term for it: the &lt;strong&gt;“SaaSpocalypse”&lt;/strong&gt; [2, 3].&lt;/p&gt;

&lt;p&gt;Thomson Reuters suffered its biggest single-day drop on record (-15.8%), LegalZoom plunged nearly 20%, and established giants like Atlassian and Intuit saw their valuations crumble by 50% and 34% respectively since the start of the year [4, 5]. The panic was clear: if an AI agent can do the job of your software, why would anyone pay for your software?&lt;/p&gt;

&lt;p&gt;This is more than a market correction. It’s a referendum on the entire Software-as-a-Service model. But is it truly an apocalypse, or is it a long-overdue reckoning? And for the thousands of founders, employees, and investors in the SaaS ecosystem, a more urgent question looms: how do you survive?&lt;/p&gt;

&lt;h2 id=&quot;the-great-divide-two-competing-realities&quot;&gt;The Great Divide: Two Competing Realities&lt;/h2&gt;

&lt;p&gt;The market is now split into two warring camps, each with a compelling narrative.&lt;/p&gt;

&lt;h3 id=&quot;camp-1-the-end-is-nigh&quot;&gt;Camp 1: The End Is Nigh&lt;/h3&gt;

&lt;p&gt;This camp believes the threat is existential. The core of their argument is the death of the per-seat pricing model. As Morningstar analysts bluntly put it, “if one person can now do the work of two, seat counts fall” [5]. Why pay for 500 Salesforce seats when 450 employees and an AI agent can do the same work? This isn’t a hypothetical; Salesforce CEO Marc Benioff has already stated the company won’t be hiring more engineers, customer service agents, or lawyers precisely because of AI’s capabilities [4].&lt;/p&gt;

&lt;p&gt;Anthropic CEO Dario Amodei predicts AI could displace half of all entry-level white-collar jobs in the next five years, and OpenAI CEO Sam Altman has warned that AI will be “quite harmful” to some traditional software companies [4, 6]. For this camp, the math is simple: fewer employees and more capable AI means less revenue for traditional SaaS.&lt;/p&gt;

&lt;h3 id=&quot;camp-2-the-panic-is-illogical&quot;&gt;Camp 2: The Panic Is Illogical&lt;/h3&gt;

&lt;p&gt;On the other side, a powerful contingent of tech leaders argues the panic is a massive overreaction. Nvidia CEO Jensen Huang called the notion that AI will replace the software industry “the most illogical thing in the world,” while Arm Holdings CEO Rene Haas dismissed the sell-off as “micro-hysteria” [7].&lt;/p&gt;

&lt;p&gt;Their argument, articulated well by Bernard Golden, CEO of Navica, is threefold [8]:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Enterprise DIY Will Fail:&lt;/strong&gt; Building real software requires far more than just code. It demands deep domain expertise, regulatory knowledge, global support, and legal indemnification—things AI can’t replicate.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Incumbents Have Moats:&lt;/strong&gt; Established players have network effects, scale, and deep, custom integrations that startups can’t easily displace.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Jevons Paradox:&lt;/strong&gt; As AI makes software cheaper and easier to create, the demand for it won’t shrink—it will explode. Cheaper software will lead to vastly &lt;em&gt;more&lt;/em&gt; software, not less.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;the-survival-playbook&quot;&gt;The Survival Playbook&lt;/h2&gt;

&lt;p&gt;So, who is right? The truth, as analyzed by firms like Bain &amp;amp; Company and The Guardian, lies in the middle. Disruption is mandatory, but obsolescence is not [5, 9]. Survival depends on a clear-eyed assessment of your company’s position and a swift, decisive pivot. Here is the emerging survival guide.&lt;/p&gt;

&lt;h3 id=&quot;1-defend-your-moat&quot;&gt;1. Defend Your Moat&lt;/h3&gt;

&lt;p&gt;The companies weathering the storm are not the ones fighting AI, but the ones with unique, defensible assets that AI can’t easily replicate. The four critical moats are:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Moat Type&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Description&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Example&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Proprietary Data&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Data that is unique to your customers and not publicly available. AI models are trained on public data; they can’t access your private, firewalled customer information.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;A vertical SaaS for pharmaceutical research with years of private clinical trial data.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Complex Systems&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Deeply embedded, mission-critical workflows that are core to a business’s operations. The cost and risk of ripping out these systems are too high.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Oracle’s ERP systems, ServiceNow’s IT service management platform.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Network Effects&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Platforms where the value increases as more users join. The classic example is a marketplace, but it also applies to collaborative software.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;A procurement platform that connects thousands of buyers and suppliers.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Deep Integration&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Software that is intricately woven into a customer’s tech stack, with numerous custom APIs and data connections.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;A manufacturing execution system tied into a factory’s physical hardware.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;If your product relies solely on analyzing public data or performing a task that can be replicated by a generic AI agent, you are in the kill zone.&lt;/p&gt;

&lt;h3 id=&quot;2-embrace-the-new-pricing-model-the-outcome-economy&quot;&gt;2. Embrace the New Pricing Model: The Outcome Economy&lt;/h3&gt;

&lt;p&gt;The per-seat license is dying. The future is &lt;strong&gt;outcome-based pricing&lt;/strong&gt;. Your customers no longer want to pay for access to your tool; they want to pay for the result it delivers. As a recent BVP report notes, AI-native companies are abandoning seat-based pricing almost entirely [10].&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;The Old Model:&lt;/strong&gt; $150 per user per month.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;The New Model:&lt;/strong&gt; $0.99 per resolved customer issue, $5 per generated lead, or 1% of the cost savings achieved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not theoretical. Intercom’s AI agent, Fin, is already at a $100M+ revenue run rate by charging per resolution. This model aligns your success directly with your customer’s success. It’s a harder model to build, but a far more defensible one.&lt;/p&gt;

&lt;h3 id=&quot;3-become-the-trusted-incumbent&quot;&gt;3. Become the Trusted Incumbent&lt;/h3&gt;

&lt;p&gt;Here lies the greatest advantage for existing SaaS companies. In a world of black-box AIs, trust is the scarcest resource. A Bain &amp;amp; Company survey found that customers would &lt;em&gt;prefer&lt;/em&gt; to buy AI-enabled solutions from their incumbent vendors [9]. They trust their security, their reliability, and their longevity.&lt;/p&gt;

&lt;p&gt;The challenge is that most incumbents have been slow to deliver compelling AI offerings. The opportunity is massive for those who can integrate AI deeply into their existing, trusted products. Don’t just add an AI chatbot in the corner; use AI to supercharge your core workflow and deliver a 10x better outcome.&lt;/p&gt;

&lt;p&gt;The SaaSpocalypse is not an extinction-level event for everyone. It is a cleansing fire. The companies that will be wiped out are the ones selling undifferentiated, easily-replicated features. The companies that survive—and thrive—will be the ones with deep moats, customer trust, and a business model built for the new outcome-based economy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://www.cnn.com/2026/02/04/investing/us-stocks-anthropic-software&quot;&gt;Anthropic’s new AI tool sends shudders through software stocks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://www.reuters.com/business/media-telecom/global-software-stocks-hit-by-anthropic-wake-up-call-ai-disruption-2026-02-04/&quot;&gt;Selloff wipes out nearly $1 trillion from software and services stocks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://finance.yahoo.com/news/traders-dump-software-stocks-ai-115502147.html&quot;&gt;‘Get me out’: Traders dump software stocks as AI fears erupt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://www.cnbc.com/2026/02/06/ai-anthropic-tools-saas-software-stocks-selloff.html&quot;&gt;AI fears pummel software stocks: Is it ‘illogical’ panic or a SaaS apocalypse?&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://www.theguardian.com/australia-news/2026/feb/21/what-would-share-stock-market-saaspocalypse-mean-saas-apocalypse-meaning&quot;&gt;Is the share market headed toward a ‘SaaS-pocalypse’?&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[6] &lt;a href=&quot;https://m.economictimes.com/tech/artificial-intelligence/ai-to-change-nature-of-software-industry-will-be-bad-for-some-companies-sam-altman/articleshow/128556596.cms&quot;&gt;AI to change nature of software industry; will be bad for some companies: Sam Altman&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[7] &lt;a href=&quot;https://www.cnbc.com/2026/02/06/ai-anthropic-tools-saas-software-stocks-selloff.html&quot;&gt;AI fears pummel software stocks: Is it ‘illogical’ panic or a SaaS apocalypse?&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[8] &lt;a href=&quot;https://www.businessinsider.com/saaspocalypse-ai-software-overreaction-premature-obituary-openai-anthropic-2026-2&quot;&gt;The AI software freakout is a massive overreaction. Here’s why.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[9] &lt;a href=&quot;https://www.bain.com/insights/why-saas-stocks-have-dropped-and-what-it-signals-for-softwares-next-chapter/&quot;&gt;Why SaaS Stocks Have Dropped—and What It Signals for Software’s Next Chapter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[10] &lt;a href=&quot;https://www.bvp.com/atlas/the-ai-pricing-and-monetization-playbook&quot;&gt;The AI pricing and monetization playbook&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Mon, 23 Feb 2026 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2026/02/23/the-saaspocalypse-a-survival-guide/</link>
        <guid isPermaLink="true">https://subramanya.ai/2026/02/23/the-saaspocalypse-a-survival-guide/</guid>
        
        <category>SaaS</category>
        
        <category>SaaSpocalypse</category>
        
        <category>AI Agents</category>
        
        <category>Business Models</category>
        
        <category>Outcome Pricing</category>
        
        <category>Enterprise Software</category>
        
        
      </item>
    
      <item>
        <title>2026: The Year SaaS Disappeared Into the Conversation</title>
        <description>&lt;p&gt;What if the best user interface was no interface at all? For decades, we have been trained to navigate a labyrinth of menus, buttons, and settings screens. We learned the language of software. In 2026, that paradigm is finally flipping. Software is learning to speak our language.&lt;/p&gt;

&lt;p&gt;This is not just about adding a chatbot to a dashboard. A quiet revolution is underway: Software as a Service (SaaS) is no longer a destination and is becoming a capability accessed through natural language. The primary interface for getting work done is shifting from graphical (GUI) to conversational (CUI) and, increasingly, to voice. As a recent analysis in Harvard Business Review noted, the goal is no longer to automate the past but to orchestrate a new, more dynamic future where intelligent agents assemble novel workflows in real time, unconstrained by human org charts [1].&lt;/p&gt;

&lt;h2 id=&quot;meet-your-new-coworker-the-personalized-ai-agent&quot;&gt;Meet Your New Coworker: The Personalized AI Agent&lt;/h2&gt;

&lt;p&gt;At the heart of this transformation is the move from generic, one-size-fits-all tools to deeply personalized AI agents that act as expert coworkers. These agents do not just access public data; they understand your world. As Goldman Sachs CIO Marco Argenti declared in January 2026, “Context is the new frontier,” signaling the rise of personal agents that know your context and can act on your behalf [2].&lt;/p&gt;

&lt;p&gt;This trend has accelerated across the enterprise landscape in just the first two months of the year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Glean&lt;/strong&gt; launched its latest AI Assistant on February 17, positioning it as an “expert agentic coworker” powered by a “Personal Graph” that understands an employee’s role, projects, and collaborators to move from insight to execution [3].&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Microsoft&lt;/strong&gt; announced on January 30 that M365 Copilot can reference a user’s “memory” in voice chats, using stored personalization settings for more relevant responses [4].&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Slack&lt;/strong&gt;, on January 29, relaunched Slackbot as a “personal, context-aware AI agent for work,” designed to be the teammate that was “in the meeting with you,” saving users at least 90 minutes per day by internal estimates [5].&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Atlassian&lt;/strong&gt; followed on January 30, declaring that “teammate agents are what’s hot in 2026” and showcasing Rovo AI, which pulls context from project trackers, code repositories, and third-party apps to act as a core member of the team [6].&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Google&lt;/strong&gt; introduced “Personal Intelligence” for Search on January 22, connecting AI with private Gmail and Photos data to deliver tailored recommendations and turn a public utility into a personal concierge [7].&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shift is significant enough to create a new software category. The viral open-source project &lt;strong&gt;OpenClaw&lt;/strong&gt; showcased a personal AI that could run locally and control user apps, eventually leading to its creator being hired by OpenAI [8]. It hints at the end of app sprawl: why juggle a dozen tools when one intelligent agent can coordinate calendars, tasks, and research?&lt;/p&gt;

&lt;h2 id=&quot;the-rise-of-voice-dont-type-just-speak&quot;&gt;The Rise of Voice: Don’t Type, Just Speak&lt;/h2&gt;

&lt;p&gt;The conversational interface reaches its strongest expression through voice. As Forbes argued, voice is becoming the defining UI of the AI era [9]. Tools like &lt;strong&gt;Wispr Flow&lt;/strong&gt; are advancing this concept with a universal voice input layer that works across applications and turns speech into polished text at up to 220 words per minute [10]. It is not an app you switch into; it is a layer that sits on top of everything.&lt;/p&gt;

&lt;p&gt;This is not a niche behavior. A 2026 Voices.com report found that &lt;strong&gt;55% of consumers now use voice to interact with AI&lt;/strong&gt;, while only 29% of companies have deployed voice AI, exposing a clear gap between user behavior and enterprise adoption [11]. VentureBeat described this transition as a move from “chatbots that speak” to “empathetic interfaces” that understand nuance and intent [12]. That is why every major SaaS player is moving quickly to add voice.&lt;/p&gt;

&lt;p&gt;The graphical interface is not disappearing. Complex visualization, creative design, and exploratory analysis still benefit from visual canvases. The future is a hybrid model where voice and conversation handle routine tasks while GUI surfaces are used for specialized work, potentially generated on the fly by AI for each specific need.&lt;/p&gt;

&lt;h2 id=&quot;the-saaspocalypse-and-the-new-business-model&quot;&gt;The “SaaSpocalypse” and the New Business Model&lt;/h2&gt;

&lt;p&gt;This shift is fueling what some call a “SaaSpocalypse.” A February 2026 Fortune report noted that $2 trillion had been wiped from software stocks as AI pressures traditional SaaS models [13]. The conventional per-seat, per-month model is weakening. As Goldman Sachs noted, we are entering an “agent-as-a-service economy” where organizations deploy fleets of agents and pay by token consumption rather than human time [2].&lt;/p&gt;

&lt;p&gt;In its place, a new model is emerging: &lt;strong&gt;outcome-based pricing&lt;/strong&gt;. Intercom’s AI agent Fin is a strong example. Customers pay for results ($0.99 per resolved issue), not software access. Fin now handles over 80% of support volume and has grown past $100M ARR, demonstrating the economic power of this model [14].&lt;/p&gt;

&lt;h2 id=&quot;welcome-to-the-post-app-era&quot;&gt;Welcome to the Post-App Era&lt;/h2&gt;

&lt;p&gt;The pieces are now in place. Personalized, agentic AI and voice-first interfaces are dissolving the traditional SaaS model. The center of gravity is shifting from features to outcomes and from clicks to conversations.&lt;/p&gt;

&lt;p&gt;The key question is no longer “Which app should I use?” but “What do I want to accomplish?” In 2026, for the first time, software is ready to answer directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://hbr.org/sponsored/2026/02/a-blueprint-for-enterprise-wide-agentic-ai-transformation&quot;&gt;Harvard Business Review. (2026, February). &lt;em&gt;A Blueprint for Enterprise-Wide Agentic AI Transformation&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://www.goldmansachs.com/insights/articles/what-to-expect-from-ai-in-2026-personal-agents-mega-alliances&quot;&gt;Goldman Sachs. (2026). &lt;em&gt;What to Expect From AI in 2026: Personal Agents, Mega Alliances&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://www.glean.com/press/gleans-latest-ai-assistant-moves-every-employee-from-insight-to-execution&quot;&gt;Glean. (2026, February 17). &lt;em&gt;Glean’s Latest AI Assistant Moves Every Employee from Insight to Execution&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://techcommunity.microsoft.com/blog/microsoft365copilotblog/what-new-in-microsoft-365-copilot-january-2026/4488916&quot;&gt;Microsoft Tech Community. (2026, January). &lt;em&gt;What’s New in Microsoft 365 Copilot&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://slack.com/intl/en-in/blog/news/slackbot-context-aware-ai-agent-for-work&quot;&gt;Slack. (2026, January). &lt;em&gt;Introducing Slackbot, Your Context-Aware AI Agent for Work&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[6] &lt;a href=&quot;https://www.atlassian.com/blog/teamwork/ai-insights-january-2026&quot;&gt;Atlassian. (2026, January). &lt;em&gt;AI Takes a Seat on the Team&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[7] &lt;a href=&quot;https://blog.google/products-and-platforms/products/search/personal-intelligence-ai-mode-search/&quot;&gt;Google. (2026, January). &lt;em&gt;Google Brings Personal Intelligence to AI Mode in Search&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[8] &lt;a href=&quot;https://techcrunch.com/2026/02/15/openclaw-creator-peter-steinberger-joins-openai/&quot;&gt;TechCrunch. (2026, February 15). &lt;em&gt;OpenClaw Creator Peter Steinberger Joins OpenAI&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[9] &lt;a href=&quot;https://www.forbes.com/sites/stevenwolfepereira/2026/02/02/voice-is-the-ui-in-the-ai/&quot;&gt;Forbes. (2026, February). &lt;em&gt;Is Voice Becoming the UI of the AI Era?&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[10] &lt;a href=&quot;https://wisprflow.ai&quot;&gt;Wispr Flow. (2026). &lt;em&gt;Effortless Voice Dictation&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[11] &lt;a href=&quot;https://www.voices.com/blog/amplified-2026-the-state-of-voice-report/&quot;&gt;Voices.com. (2026). &lt;em&gt;Amplified 2026: The State of Voice and the Trends Shaping the Industry&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[12] &lt;a href=&quot;https://venturebeat.com/orchestration/everything-in-voice-ai-just-changed-how-enterprise-ai-builders-can-benefit&quot;&gt;VentureBeat. (2026). &lt;em&gt;Everything in Voice AI Just Changed&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[13] &lt;a href=&quot;https://fortune.com/2026/02/13/saas-software-stocks-ai-apocalypse-nadella-microsoft-oracle-sap-salesforce/&quot;&gt;Fortune. (2026, February 13). &lt;em&gt;SaaSpocalypse: Why $2 Trillion Got Wiped From Software Stocks&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[14] &lt;a href=&quot;https://gtmnow.com/how-intercom-built-the-highest-performing-ai-agent-on-the-market-using-outcome-based-pricing-with-archana-agrawal-president-at-intercom/&quot;&gt;GTM Now. (2026). &lt;em&gt;How Intercom Built a $100M AI Agent with Outcome Pricing&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Thu, 19 Feb 2026 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2026/02/19/the-year-saas-disappeared-into-the-conversation/</link>
        <guid isPermaLink="true">https://subramanya.ai/2026/02/19/the-year-saas-disappeared-into-the-conversation/</guid>
        
        <category>SaaS</category>
        
        <category>Agentic AI</category>
        
        <category>Voice AI</category>
        
        <category>Enterprise AI</category>
        
        <category>AI Agents</category>
        
        <category>Business Models</category>
        
        
      </item>
    
      <item>
        <title>OpenClaw and the Rise of User-Built Intelligence: A Wake-Up Call for SaaS</title>
        <description>&lt;p&gt;In the last few weeks, the AI community has been captivated by a project that is not a new model, but a new paradigm. OpenClaw, an open-source personal AI assistant, has exploded in popularity, amassing over 114,000 GitHub stars in just two months [1]. Andrej Karpathy, one of the most respected voices in AI, described it as “genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently” [2].&lt;/p&gt;

&lt;p&gt;This is not just another AI tool. OpenClaw represents a fundamental shift in how users interact with software, and it is a direct challenge to the traditional SaaS model. While SaaS platforms have spent a decade becoming the systems of record for business data, users are now building their own intelligence layers on top, turning incumbent platforms into dumb data pipes. This is the wake-up call for every SaaS company.&lt;/p&gt;

&lt;blockquote class=&quot;twitter-tweet&quot;&gt;&lt;p lang=&quot;en&quot; dir=&quot;ltr&quot;&gt;You know what&apos;s crazy about @openclaw... It will actually be the thing that nukes a ton of startups, not ChatGPT as people meme about... The fact that it&apos;s hackable (and more importantly, self-hackable) and hostable on-prem will make sure tech like this DOMINATES conventional SaaS imo&lt;/p&gt;&amp;mdash; Max Rovensky (@MaxRovensky) &lt;a href=&quot;https://x.com/MaxRovensky/status/2010676669124612111&quot;&gt;January 2026&lt;/a&gt;&lt;/blockquote&gt;

&lt;h2 id=&quot;the-ambient-ai-revelation&quot;&gt;The Ambient AI Revelation&lt;/h2&gt;

&lt;p&gt;What makes OpenClaw so significant? It’s not the technology itself, which is a clever combination of existing tools. As one analyst put it, OpenClaw’s innovation was to give an AI model “its own computer and told it to act like a personal assistant” [4].&lt;/p&gt;

&lt;p&gt;The real breakthrough is the validation of a new form factor for AI: &lt;strong&gt;ambient, proactive intelligence&lt;/strong&gt;. Unlike every major AI tool today - ChatGPT, Copilot, even your own internal copilots - which require a human in the loop, OpenClaw is designed to act autonomously. It runs 24/7, even when you’re asleep, watching for things that matter and taking action on your behalf. As one writer noted, “Claude Code knows your codebase. OpenClaw knows your life” [4].&lt;/p&gt;

&lt;p&gt;This flips the current SaaS paradigm on its head. SaaS platforms are systems of record, but they are blind to the &lt;em&gt;process&lt;/em&gt; of the business. They capture the nouns, but not the verbs. This is the “System of Record Trap.” Your CRM knows your customer data, but it doesn’t know the informal follow-up sequence your top salesperson uses. Your project management tool knows your deadlines, but it doesn’t know the complex triage process your team uses to handle incoming requests. This is the value that is being left on the table, and it’s the value that tools like OpenClaw are now capturing.&lt;/p&gt;

&lt;h2 id=&quot;more-than-a-toy-what-users-are-actually-building&quot;&gt;More Than a Toy: What Users Are Actually Building&lt;/h2&gt;

&lt;p&gt;If you think this is just a developer toy, you are mistaken. The community around OpenClaw is building and sharing thousands of “skills” that give their agents real-world capabilities. As chronicled by Simon Willison, users are already using OpenClaw to [1]:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Buy a car&lt;/strong&gt; by negotiating with multiple dealers over email.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Remotely control an Android phone&lt;/strong&gt; to scroll through TikTok or use Google Maps.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Monitor a server for security threats&lt;/strong&gt;, detecting failed SSH login attempts and exposed ports.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Transcribe voice messages&lt;/strong&gt; by finding an API key and using it to call the Whisper API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not simple automations. They are complex, multi-step workflows that are being built and executed outside of any traditional SaaS platform. The value being unlocked is so compelling that users are willing to accept significant security risks, a phenomenon Simon Willison calls the “Normalization of Deviance” [1]. People are buying dedicated Mac Minis just to run OpenClaw in a sandboxed environment, a clear signal of the demand for this new paradigm.&lt;/p&gt;

&lt;h2 id=&quot;the-saas-dilemma-build-or-be-bypassed&quot;&gt;The SaaS Dilemma: Build or Be Bypassed&lt;/h2&gt;

&lt;p&gt;This is the existential threat to SaaS. Every workflow built in OpenClaw is a workflow that is not being captured by the underlying SaaS platform. Every decision made by a personal AI agent is a decision that the SaaS vendor has no visibility into. The SaaS platform becomes a commodity data layer, a “dumb data pipe,” while the intelligence, the context, and the customer relationship move to the agentic layer.&lt;/p&gt;

&lt;p&gt;The only durable defense is to build a native intelligence layer that allows users to automate their workflows directly within the platform. This journey from reactive software to proactive intelligence unfolds in three stages.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Stage&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Description&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;User Experience&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;1. User-Built Automation&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Users can describe their goals in natural language, and the platform builds and runs the automation workflow natively.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;“When a new maintenance request comes in, check if it’s urgent. If so, text the on-call vendor.”&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;2. Pattern Learning&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The platform analyzes workflow usage across its user base to identify common patterns and best practices.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The platform notices that responding to requests within 4 hours boosts tenant retention by 40% and suggests this workflow to other users.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;3. Proactive Delivery&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The platform learns individual user patterns and proactively delivers personalized automation, anticipating needs before the user even asks.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;A property manager logs in to find the weekend’s maintenance requests already triaged, assigned, and with draft notifications ready for approval.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This evolution transforms a SaaS product from a passive tool into an active partner, creating a powerful moat built on compounded knowledge of user behavior. The more users automate, the smarter the platform becomes, and the harder it is for competitors to replicate.&lt;/p&gt;

&lt;h2 id=&quot;the-time-to-act-is-now&quot;&gt;The Time to Act is Now&lt;/h2&gt;

&lt;p&gt;The path forward for SaaS leaders is clear, and the timeline is short. The technological pillars are now in place: reliable function-calling models, long context windows, and universal standards like the Model Context Protocol (MCP) are mature and widely adopted. The enterprise demand has been validated by the explosive growth of platforms like Salesforce Agentforce, which generated $900 million in revenue in its first six months.&lt;/p&gt;

&lt;p&gt;The choice for SaaS vendors is stark: either build a native intelligence layer or risk becoming a commoditized backend for your users’ personal AI agents. The era of passive, reactive software is over. The agentic workspace is the new strategic imperative, and the time to build it is now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://simonwillison.net/2026/Jan/30/moltbook/&quot;&gt;Willison, S. (2026, January 30). &lt;em&gt;Moltbook is the most interesting place on the internet right now&lt;/em&gt;. Simon Willison’s Weblog.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://x.com/karpathy/status/2017296988589723767&quot;&gt;Karpathy, A. (2026, January 30). &lt;em&gt;Tweet on X&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://x.com/MaxRovensky/status/2010676669124612111&quot;&gt;Rovensky, M. (2026, January 12). &lt;em&gt;Tweet on X&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://nextword.substack.com/p/the-ambient-ai-and-clawdbot-openclaw-implications&quot;&gt;Hwang, J. (2026, January 31). &lt;em&gt;The Ambient AI Era: Clawdbot (OpenClaw)’s Ripple Effects&lt;/em&gt;. Nextword.&lt;/a&gt;&lt;/p&gt;

&lt;script async=&quot;&quot; src=&quot;https://platform.twitter.com/widgets.js&quot; charset=&quot;utf-8&quot;&gt;&lt;/script&gt;

</description>
        <pubDate>Sun, 01 Feb 2026 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2026/02/01/openclaw-and-the-rise-of-user-built-intelligence-a-wake-up-call-for-saas/</link>
        <guid isPermaLink="true">https://subramanya.ai/2026/02/01/openclaw-and-the-rise-of-user-built-intelligence-a-wake-up-call-for-saas/</guid>
        
        <category>SaaS</category>
        
        <category>Agentic AI</category>
        
        <category>Enterprise AI</category>
        
        <category>AI Agents</category>
        
        <category>AI Transformation</category>
        
        <category>B2B Software</category>
        
        
      </item>
    
      <item>
        <title>The Agentic Workspace: A Strategic Imperative for the Next Era of SaaS</title>
        <description>&lt;p&gt;The SaaS landscape is at a critical inflection point. The traditional, human-driven application model is giving way to a new paradigm: the agentic workspace. This is not a distant trend, but a strategic imperative for today. We propose that the next evolution for every successful SaaS company is to become a platform that orchestrates intelligent agents to achieve user outcomes. This transition is complex and fraught with challenges, but for those who navigate it successfully, the rewards will be immense. Those who fail to adapt risk being left behind.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/saas_agent_convergence.png&quot; alt=&quot;SaaS and AI Agent Convergence&quot; class=&quot;post-img&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;The convergence of SaaS and AI agents is reshaping the enterprise software landscape&lt;/span&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-decline-of-seat-based-saas-dominance&quot;&gt;The Decline of Seat-Based SaaS Dominance&lt;/h2&gt;

&lt;p&gt;The traditional SaaS model, built on per-user licensing and incremental feature updates, is facing unprecedented pressure. The rise of powerful, autonomous AI agents is beginning to render this model insufficient. As one industry analyst put it, “In three years, any routine, rules-based digital task could move from ‘human plus app’ to ‘AI agent plus API’” [2]. This fundamental change has exposed the vulnerabilities of the old guard and paved the way for a new generation of AI-native startups.&lt;/p&gt;

&lt;p&gt;These startups, unburdened by legacy systems, are operating with unprecedented efficiency. As highlighted in recent analysis [5], AI-native firms are averaging $3.48 million in revenue per employee—a staggering 5.7 times more than their traditional SaaS counterparts. This efficiency gap is a clear signal of a major market shift.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/efficiency_gap.png&quot; alt=&quot;Efficiency Gap Between Traditional SaaS and AI-Native Startups&quot; class=&quot;post-img&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;AI-native startups are averaging $3.48M revenue per employee — 5.7x more than traditional SaaS companies&lt;/span&gt;&lt;/p&gt;

&lt;h2 id=&quot;six-pressures-reshaping-the-saas-model&quot;&gt;Six Pressures Reshaping the SaaS Model&lt;/h2&gt;

&lt;p&gt;Drawing inspiration from analysis by Cloud.Substack [5], the decline of the traditional model can be attributed to six interconnected pressures:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Pressure Point&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Description &amp;amp; Example&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Seat Expansion Stall&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The primary growth engine for SaaS has sputtered. For example, Zoom, once a paragon of high NRR, saw its enterprise NRR fall to 98% as customers no longer needed to add seats at the same pace [5].&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Price Increases Consuming Budget&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;SaaS inflation is running at nearly 5x the market rate, with price hikes consuming a significant portion of incremental IT budgets. This leaves little room for new investments and creates a cycle of vendor consolidation [5].&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;The Shift to AI Budgets&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Enterprise spending is decisively moving towards AI. With leaders expecting a 75% growth in their LLM budgets, if a product isn’t tapping into this new pool of capital, it’s competing for a shrinking one [5].&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;The Speed of Innovation&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The pace of development has accelerated dramatically. AI-native startups are shipping new features weekly, while traditional SaaS companies are often stuck in quarterly release cycles. This speed differential is a critical competitive advantage.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Single-Product Plateau&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The multi-product suite strategy is losing its effectiveness. Customers increasingly prefer best-in-class point solutions, and are less willing to accept a suite of mediocre products from a single vendor [5].&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;The Value-Add Test&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Many early AI features have been underwhelming. The bar for AI integration is now genuine productivity gains, not incremental improvements. Features must deliver measurable, tangible value to justify their cost and complexity [5].&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;acknowledging-the-obstacles-on-the-path-to-autonomy&quot;&gt;Acknowledging the Obstacles on the Path to Autonomy&lt;/h2&gt;

&lt;p&gt;While the promise of agentic AI is immense, the path to full autonomy is not without significant challenges. Acknowledging these hurdles is crucial for a credible strategy.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Reliability and Trust:&lt;/strong&gt; Agentic systems still struggle with reliability. Hallucinations, where an AI generates false information, remain a key concern. According to a recent McKinsey report, &lt;strong&gt;80% of organizations have already encountered risky behaviors from AI agents&lt;/strong&gt;, including improper data exposure and unauthorized system access [7]. Building robust validation and human-in-the-loop systems is essential.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;The Incumbent’s Moat:&lt;/strong&gt; Large SaaS players like Salesforce and Microsoft have powerful distribution channels and are actively acquiring promising agent startups. Their deep enterprise integrations and existing customer relationships provide a significant defensive moat that shouldn’t be underestimated.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;The Economics of AI:&lt;/strong&gt; Many AI-native startups are currently operating with a high burn rate, spending heavily on tokens and compute power with an unclear path to profitability. Industry estimates suggest that inference costs can consume 30-50% of gross margins for agent-heavy applications, and the long-term economic viability of these models is still being tested.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;the-new-moat-capturing-the-why-with-context-graphs&quot;&gt;The New Moat: Capturing the ‘Why’ with Context Graphs&lt;/h2&gt;

&lt;p&gt;Despite the challenges, the strategic advantage of becoming an agentic platform is undeniable. The new competitive moat is the &lt;strong&gt;Context Graph&lt;/strong&gt;: a living record of decision traces that explains not just &lt;em&gt;what&lt;/em&gt; happened, but &lt;em&gt;why it was allowed&lt;/em&gt; to happen [6].&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Agents don’t just need rules. They need access to the decision traces that show how rules were applied in the past, where exceptions were granted, how conflicts were resolved, who approved what, and which precedents actually govern reality. [6]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While traditional systems of record store data about objects (like customers or invoices), context graphs create a system of record for &lt;em&gt;decisions&lt;/em&gt;. They capture the exceptions, overrides, and precedents that currently live in siloed communications.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/context_graph_saas.png&quot; alt=&quot;Context Graph Visualization&quot; class=&quot;post-img&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;Context graphs capture the decision traces that explain not just what happened, but why&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;This creates a powerful feedback loop. The companies that provide the agentic execution layer are the only ones who can capture these decision traces. As their context graphs grow, their agents become smarter and more reliable, creating a defensible advantage that is nearly impossible for competitors to replicate.&lt;/p&gt;

&lt;h2 id=&quot;evolving-business-models-for-the-agentic-era&quot;&gt;Evolving Business Models for the Agentic Era&lt;/h2&gt;

&lt;p&gt;This transformation requires a radical rethinking of business models. The seat-based license is being replaced by new models that align price with the value AI agents deliver.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Pricing Model&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Description &amp;amp; Example&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Usage-Based: Resources&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Customers pay for the compute and token resources they consume. &lt;strong&gt;Example:&lt;/strong&gt; A developer platform charges based on the number of API calls and GPU hours used by its agents.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Agent-Based&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Customers purchase or subscribe to individual AI agents with specific skills. &lt;strong&gt;Example:&lt;/strong&gt; An e-commerce platform sells a “Pricing Optimization Agent” for a monthly fee.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Usage-Based: Interactions&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Customers are charged per discrete interaction or completed task. &lt;strong&gt;Example:&lt;/strong&gt; A customer service platform charges per successfully resolved support ticket.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Outcome-Based: Jobs Completed&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Payment is tied to the successful execution of a predefined job. &lt;strong&gt;Example:&lt;/strong&gt; A sales automation platform charges a fee for each qualified lead its agents generate.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Outcome-Based: Financial Pricing&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The most advanced model, where payment is a percentage of the financial value created. &lt;strong&gt;Example:&lt;/strong&gt; A marketing automation platform takes a share of the revenue generated from campaigns run by its agents.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;what-winners-will-look-like&quot;&gt;What Winners Will Look Like&lt;/h2&gt;

&lt;p&gt;Beyond the tech giants, a new class of winners is emerging. These companies are not just building features; they are building agentic workspaces. &lt;strong&gt;Glean&lt;/strong&gt; is creating enterprise search agents that can query across dozens of enterprise tools to answer complex questions autonomously—replacing hours of manual research with seconds of agent-driven synthesis. &lt;strong&gt;Adept AI&lt;/strong&gt; is building general-purpose agents that can learn to use any software application through observation and interaction. Meanwhile, &lt;strong&gt;Sierra&lt;/strong&gt; is pioneering conversational AI agents for customer experience that can resolve issues end-to-end without human handoff. These pioneers are demonstrating the power of focusing on autonomous, outcome-driven workflows rather than incremental feature additions.&lt;/p&gt;

&lt;h2 id=&quot;the-strategic-imperative-to-act-now&quot;&gt;The Strategic Imperative to Act Now&lt;/h2&gt;

&lt;p&gt;The evidence is clear. The convergence of market pressures, from stalled seat expansion to the rise of hyper-efficient AI-native competitors, points to a single conclusion: the future of SaaS is the agentic workspace. This is no longer a question of ‘if,’ but ‘when.’ The companies that act now—that begin the work of transforming their platforms into orchestrators of intelligent agents and capturing the invaluable context graphs that power them—will be the leaders of the next decade.&lt;/p&gt;

&lt;p&gt;Where to start? Audit your core workflows for agentic potential: identify the repetitive, rules-based processes where human judgment is minimal but human time is maximal. Then pilot context capture in one high-value process—every decision trace you record today becomes training data for tomorrow’s autonomous agents.&lt;/p&gt;

&lt;p&gt;The choice is simple: build the future, or be relegated to the past. The time to build your agentic workspace is now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2026/saas-ai-agents.html&quot;&gt;Deloitte. (2025, November 18). &lt;em&gt;SaaS meets AI agents: Transforming budgets, customer experience, and workforce dynamics&lt;/em&gt;. Deloitte Insights.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://www.bain.com/insights/will-agentic-ai-disrupt-saas-technology-report-2025/&quot;&gt;Bain &amp;amp; Company. (2025, September 23). &lt;em&gt;Will Agentic AI Disrupt SaaS?&lt;/em&gt; Bain &amp;amp; Company.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://www.forbes.com/sites/josipamajic/2026/01/15/are-saas-moats-real-or-ai-mirage-the-great-enterprise-software-debate/&quot;&gt;Forbes. (2026, January 15). &lt;em&gt;Are SaaS Moats Real Or AI Mirage? The Great Enterprise Software Debate&lt;/em&gt;. Forbes.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://www.bcg.com/publications/2025/rethinking-b2b-software-pricing-in-the-era-of-ai&quot;&gt;BCG. (2025, August 13). &lt;em&gt;Rethinking B2B Software Pricing in the Era of AI&lt;/em&gt;. BCG.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://cloud.substack.com/p/the-6-threat-vectors-killing-traditional&quot;&gt;Cloud.Substack. (2026, January 17). &lt;em&gt;The 6 Threat Vectors Killing Traditional B2B Software in 2026 (And How to Fight Back)&lt;/em&gt;. Cloud.Substack.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[6] &lt;a href=&quot;https://foundationcapital.com/context-graphs-ais-trillion-dollar-opportunity/&quot;&gt;Foundation Capital. (2025, December 22). &lt;em&gt;AI’s trillion-dollar opportunity: Context graphs&lt;/em&gt;. Foundation Capital.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[7] &lt;a href=&quot;https://www.mckinsey.com/capabilities/risk-and-resilience/our-insights/deploying-agentic-ai-with-safety-and-security-a-playbook-for-technology-leaders&quot;&gt;McKinsey &amp;amp; Company. (2025, October 16). &lt;em&gt;Deploying agentic AI with safety and security: A playbook for technology leaders&lt;/em&gt;. McKinsey &amp;amp; Company.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Mon, 19 Jan 2026 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2026/01/19/the-agentic-workspace-a-strategic-imperative-for-the-next-era-of-saas/</link>
        <guid isPermaLink="true">https://subramanya.ai/2026/01/19/the-agentic-workspace-a-strategic-imperative-for-the-next-era-of-saas/</guid>
        
        <category>SaaS</category>
        
        <category>Agentic AI</category>
        
        <category>Enterprise AI</category>
        
        <category>AI Agents</category>
        
        <category>Context Graphs</category>
        
        <category>AI Transformation</category>
        
        <category>B2B Software</category>
        
        <category>AI Pricing</category>
        
        
      </item>
    
      <item>
        <title>Context Graphs Are a Trillion-Dollar Opportunity. But Who Actually Captures It?</title>
        <description>&lt;p&gt;The concept of &lt;strong&gt;Context Graphs&lt;/strong&gt;, first articulated by Jaya Gupta of Foundation Capital, has rapidly captured the industry’s imagination [1]. The thesis is that the next trillion-dollar enterprise platforms will not be systems of record for data, but systems of record for &lt;strong&gt;decisions&lt;/strong&gt;. For the underlying definition, see my explainer on &lt;a href=&quot;/2026/01/01/what-are-context-graphs-really/&quot;&gt;what are context graphs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The thesis is compelling. But the most pressing question remains: &lt;strong&gt;who actually captures this trillion-dollar opportunity?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer, I believe, is hiding in plain sight. It is not in the data warehouses or the CRMs. It is in the agentic tools that are already operating in the wild, in the execution path, generating decision traces every second. And the most advanced and widely discussed of these tools, Anthropic’s Claude Code and the newly released Claude Cowork, provide a fascinating, real-world case study of both the immense potential and the critical, missing piece.&lt;/p&gt;

&lt;h2 id=&quot;the-agents-in-the-arena-claude-code-and-cowork&quot;&gt;The Agents in the Arena: Claude Code and Cowork&lt;/h2&gt;

&lt;p&gt;On January 12, 2026, Anthropic launched Claude Cowork, a desktop agent that extends the power of its developer-focused Claude Code tool to non-technical users [2]. This was not just another feature release. It was a statement. While the industry has been debating the future of agentic workflows, Anthropic has been shipping them.&lt;/p&gt;

&lt;p&gt;What makes Claude Code and Cowork so different is that they are not just chatbots; they are &lt;strong&gt;doers&lt;/strong&gt;. They operate within a designated folder on your computer, with the ability to read, write, and create files. They can take a messy folder of receipts and turn it into a structured expense report. They can take scattered notes and draft a coherent document. They are, in short, executing complex, multi-step tasks that generate a rich history of decisions.&lt;/p&gt;

&lt;p&gt;Perhaps the most stunning demonstration of this was the revelation that Claude Cowork itself was built almost entirely by Claude Code in about a week and a half. Think about that. An AI agent planned and executed the creation of a new software product. This is not a theoretical exercise; it is a real-world, complex &lt;strong&gt;decision trace&lt;/strong&gt; of immense value.&lt;/p&gt;

&lt;h2 id=&quot;the-irony-generating-traces-but-not-capturing-them&quot;&gt;The Irony: Generating Traces, But Not Capturing Them&lt;/h2&gt;

&lt;p&gt;Every time a developer uses Claude Code to refactor a codebase, or a project manager uses Claude Cowork to organize a project folder, a decision trace is generated. The agent is walking the graph of the user’s intent, pulling context from different files, making decisions, and executing actions. It is creating the raw material of a context graph.&lt;/p&gt;

&lt;p&gt;But where does that raw material go? It evaporates. It is ephemeral, living for a moment in the agent’s context window or the user’s chat history, but it is not persisted as a structured, queryable artifact. The &lt;em&gt;why&lt;/em&gt; is lost, leaving only the &lt;em&gt;what&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This is the central irony of the current agentic landscape. The most advanced agentic tools are the perfect instruments for creating context graphs, yet they are not being used for that purpose. They are generating a constant stream of valuable decision data that is simply being discarded.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/context_graphs_agentic_loop.png&quot; alt=&quot;The Ephemeral Nature of Decision Traces in Today&apos;s Agents&quot; width=&quot;2752&quot; height=&quot;1536&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The Ephemeral Nature of Decision Traces in Today’s Agents&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/context_graphs_evolution.png&quot; alt=&quot;The Evolution of Agentic Infrastructure&quot; width=&quot;2752&quot; height=&quot;1536&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;why-incumbents-cant-just-add-this-feature&quot;&gt;Why Incumbents Can’t Just Add This Feature&lt;/h2&gt;

&lt;p&gt;Incumbents are structurally disadvantaged from capturing this opportunity. They are simply in the wrong place architecturally.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Systems of Record (Salesforce, Workday):&lt;/strong&gt; These platforms are built to store the current &lt;strong&gt;state&lt;/strong&gt; of an object. They know the deal is closed-won, but they do not have a record of the dozen steps, approvals, and exceptions that led to that outcome. They are in the wrong architectural layer.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Data Warehouses (Snowflake, Databricks):&lt;/strong&gt; These platforms are in the &lt;strong&gt;read path&lt;/strong&gt;, not the write path. They receive data via ETL &lt;em&gt;after&lt;/em&gt; the decisions have been made and the context has been lost. They can tell you what happened, but they cannot tell you why.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trying to retrofit decision trace capture onto these systems is like trying to understand a chess game by only looking at the final board position. You have lost the move-by-move history that contains all the strategic insight.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/context_graphs_opportunity.png&quot; alt=&quot;The Context Graph Opportunity Landscape&quot; width=&quot;2752&quot; height=&quot;1536&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-real-race-who-builds-the-event-clock-for-agents&quot;&gt;The Real Race: Who Builds the “Event Clock” for Agents?&lt;/h2&gt;

&lt;p&gt;So, who are the real contenders?&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Contender&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Strengths&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Weaknesses&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Anthropic (The Agent Provider)&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Owns the agent and the execution path. In the pole position to build persistence directly into their products.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Not their core business. May see it as a feature, not a platform. Risks vendor lock-in for customers.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Orchestration Startups&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Focused on the cross-system workflow layer where context is richest. Can be vendor-neutral, orchestrating agents from multiple providers.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Need to convince customers to adopt a new layer in their stack. Dependent on agent providers for core capabilities.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This brings us to a critical distinction. The goal is not to simply monitor agents. Agent observability and telemetry tools are useful for capturing the &lt;em&gt;what&lt;/em&gt;—metrics, logs, and traces of execution. They can tell you an agent made 10 API calls and wrote 3 files. But they cannot tell you &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/context_graphs_telemetry.png&quot; alt=&quot;Telemetry vs. Decision Traces&quot; width=&quot;2752&quot; height=&quot;1536&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;decision trace&lt;/strong&gt; captures the reasoning, the context, and the precedents that led to an action. This is a fundamentally different and more valuable asset than telemetry. The trillion-dollar prize will go to whoever successfully builds the &lt;strong&gt;event clock&lt;/strong&gt; for the agentic era—the system that captures the decision traces of every agent, human, and automated process in the enterprise. My bet is on a new category of company to emerge: one that is purpose-built to be this system of record for decisions.&lt;/p&gt;

&lt;h2 id=&quot;from-code-and-cowork-to-context&quot;&gt;From “Code” and “Cowork” to “Context”&lt;/h2&gt;

&lt;p&gt;Jaya Gupta was right. The opportunity is massive. But the winner will not be a better database or a smarter CRM. The winner will be the company that recognizes that the actions of agents like Claude Code and Cowork are not just outputs; they are assets. They are the building blocks of the enterprise’s collective intelligence.&lt;/p&gt;

&lt;p&gt;For Anthropic, the path seems clear. The next logical product in their suite is not just a new skill or a new integration. It is &lt;strong&gt;Claude Context&lt;/strong&gt;: a platform that captures, stores, and makes sense of every decision trace their agents generate. It would transform their tools from powerful productivity aids into an indispensable system of record for the modern enterprise.&lt;/p&gt;

&lt;p&gt;Whether Anthropic seizes this opportunity or leaves the door open for a new wave of startups remains to be seen. But one thing is certain: the race to build the context graph is on, and the companies that are in the execution path of agentic work are the ones with the head start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://foundationcapital.com/context-graphs-ais-trillion-dollar-opportunity/&quot;&gt;Gupta, J. (2025, December 22). &lt;em&gt;AI’s trillion-dollar opportunity: Context graphs&lt;/em&gt;. Foundation Capital.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://venturebeat.com/technology/anthropic-launches-cowork-a-claude-desktop-agent-that-works-in-your-files-no&quot;&gt;Nuñez, M. (2026, January 12). &lt;em&gt;Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required&lt;/em&gt;. VentureBeat.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Wed, 14 Jan 2026 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2026/01/14/context-graphs-are-a-trillion-dollar-opportunity-but-who-captures-it/</link>
        <guid isPermaLink="true">https://subramanya.ai/2026/01/14/context-graphs-are-a-trillion-dollar-opportunity-but-who-captures-it/</guid>
        
        <category>Context Graphs</category>
        
        <category>Agentic AI</category>
        
        <category>Enterprise AI</category>
        
        <category>Claude Code</category>
        
        <category>Claude Cowork</category>
        
        <category>AI Infrastructure</category>
        
        <category>Systems of Record</category>
        
        <category>Anthropic</category>
        
        
      </item>
    
      <item>
        <title>A Year with Cursor: How My Workflow Evolved from Agent to Architect</title>
        <description>&lt;p&gt;It’s been over a year since I made Cursor my primary IDE, and it’s hard to overstate the impact it’s had on my work. As a machine learning engineer building conversational AI platforms at Dylog and experimenting with agentic infrastructure on my personal projects, I’ve lived through the evolution of AI-native development. My journey with Cursor mirrors the maturation of the tool itself: from a simple agent to a sophisticated architectural partner.&lt;/p&gt;

&lt;p&gt;This post is a reflection on that journey, detailing how my workflow evolved and how I’ve come to rely on a powerful combination of Plan Mode, custom commands, and context engineering to build faster, smarter, and with more clarity.&lt;/p&gt;

&lt;h2 id=&quot;phase-1-the-agent-takes-the-wheel&quot;&gt;Phase 1: The Agent Takes the Wheel&lt;/h2&gt;

&lt;p&gt;When I first started, my usage was simple. I treated Cursor like a supercharged autocomplete. I’d write a comment, hit &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Cmd+K&lt;/code&gt;, and let the agent generate the code. It was magical, but it was also a black box. I was a passenger, and the agent was driving.&lt;/p&gt;

&lt;p&gt;Then came the &lt;strong&gt;@ mentions&lt;/strong&gt;. This was my first taste of giving the agent real context. Instead of hoping it understood my codebase, I could explicitly tell it what to look at:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@file&lt;/code&gt; to reference a specific file&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@folder&lt;/code&gt; to include an entire directory&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@codebase&lt;/code&gt; to let it search across the whole project&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@web&lt;/code&gt; to pull in external documentation&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@docs&lt;/code&gt; to reference official docs for libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was a huge leap. Suddenly, the agent wasn’t guessing; it was working with the same context I had. I could say “refactor this function to match the pattern in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@file:utils/helpers.ts&lt;/code&gt;” and it would actually understand.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/cursor-at-mentions.png&quot; alt=&quot;Cursor @ mention context&quot; class=&quot;post-img&quot; width=&quot;1639&quot; height=&quot;935&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;The @ mention dropdown in Cursor, showing context options like @file, @folder, @codebase, @web, and @docs that allow explicit context control&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;But even with better context, I’d often find myself in a loop of generating, debugging, and regenerating. The agent lacked the architectural vision for larger tasks.&lt;/p&gt;

&lt;h2 id=&quot;phase-2-mcp-changes-everything&quot;&gt;Phase 2: MCP Changes Everything&lt;/h2&gt;

&lt;p&gt;The introduction of &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; was when things got serious. MCP allowed me to connect Cursor to external tools and data sources, turning the agent from a code generator into a true assistant with access to my entire workflow.&lt;/p&gt;

&lt;p&gt;I started integrating MCPs for:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;GitHub&lt;/strong&gt; for pulling issues and PRs directly into context&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Linear&lt;/strong&gt; for task management integration&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Slack&lt;/strong&gt; for team communication context&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Custom MCPs&lt;/strong&gt; for internal APIs and databases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With MCP, I could say “implement the feature described in Linear issue #234” and the agent would fetch the issue, understand the requirements, and start building. It was no longer just about code; it was about connecting the dots across my entire development ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/cursor-mcp-integrations.png&quot; alt=&quot;MCP integrations in Cursor&quot; class=&quot;post-img&quot; width=&quot;1639&quot; height=&quot;935&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;MCP configuration panel showing connected integrations like GitHub, Linear, Slack, and custom servers that extend Cursor’s capabilities across the development ecosystem&lt;/span&gt;&lt;/p&gt;

&lt;h2 id=&quot;phase-3-the-rise-of-the-planner&quot;&gt;Phase 3: The Rise of the Planner&lt;/h2&gt;

&lt;p&gt;The introduction of &lt;strong&gt;Plan Mode&lt;/strong&gt; was the next game-changer. It was the first time I felt like I was collaborating with the AI, not just delegating to it. Inspired by workflows from developers like Ray Fernando, I started using a two-step process:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Plan with Opus:&lt;/strong&gt; I’d use a powerful model like Claude Opus to generate a detailed, step-by-step implementation plan. I’d give it the high-level goal, and it would break it down into a series of concrete tasks, complete with file names, function signatures, and logic.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Execute with Sonnet/GPT:&lt;/strong&gt; I’d then hand that plan to a faster, cheaper model like Sonnet or GPT-5.2 to execute each step. The cheaper model didn’t need to be a brilliant architect; it just needed to be a diligent builder.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This workflow was a massive improvement. It separated the “what” from the “how,” and it gave me a reviewable artifact—the plan—that I could edit and approve before any code was written. It also saved a ton of money on tokens.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/cursor-plan-mode.png&quot; alt=&quot;Cursor Plan Mode workflow&quot; class=&quot;post-img&quot; width=&quot;1639&quot; height=&quot;935&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;A split view showing a detailed implementation plan in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.cursor/plans/&lt;/code&gt; file on the left, and the corresponding generated code on the right, demonstrating the separation of architecture from execution&lt;/span&gt;&lt;/p&gt;

&lt;h2 id=&quot;phase-4-the-architect-emerges-commands--planning&quot;&gt;Phase 4: The Architect Emerges (Commands + Planning)&lt;/h2&gt;

&lt;p&gt;This is where I live today. While Plan Mode is still central to my workflow, I’ve layered on a set of &lt;strong&gt;custom commands&lt;/strong&gt; and &lt;strong&gt;rules&lt;/strong&gt; to fine-tune the process and bake my architectural principles directly into the IDE.&lt;/p&gt;

&lt;h3 id=&quot;my-current-setup&quot;&gt;My Current Setup&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Rules (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.cursorrules&lt;/code&gt;):&lt;/strong&gt; I have a set of rules that define my coding standards, preferred patterns, and architectural constraints. The agent reads these before every task, ensuring consistency across the codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom Commands:&lt;/strong&gt; I’ve built commands that wrap my most common workflows:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/plan&lt;/code&gt; - Generates a detailed implementation plan using Opus&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/refactor&lt;/code&gt; - Takes a file and refactors it based on instructions&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/test&lt;/code&gt; - Generates a test suite for a given function&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/review&lt;/code&gt; - Reviews code against my rules and suggests improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Queued Messages:&lt;/strong&gt; I use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Ctrl+Enter&lt;/code&gt; to queue follow-up instructions while the agent is working. This lets me think ahead and keep the momentum going without interrupting the current task.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/cursor-custom-commands.png&quot; alt=&quot;Cursor custom commands and rules&quot; class=&quot;post-img&quot; width=&quot;1639&quot; height=&quot;935&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;The Cursor command palette showing custom commands like /plan, /refactor, /test, and /review, alongside a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.cursorrules&lt;/code&gt; file that defines coding standards and architectural constraints&lt;/span&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-evolution-at-a-glance&quot;&gt;The Evolution at a Glance&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Phase&lt;/th&gt;
      &lt;th&gt;Key Feature&lt;/th&gt;
      &lt;th&gt;What Changed&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;Agent Mode + @ Mentions&lt;/td&gt;
      &lt;td&gt;Context became explicit, not guessed&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;MCP Integration&lt;/td&gt;
      &lt;td&gt;External tools and data became accessible&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;Plan Mode&lt;/td&gt;
      &lt;td&gt;Architecture separated from execution&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;Commands + Rules&lt;/td&gt;
      &lt;td&gt;Workflows became repeatable and personalized&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;why-this-matters&quot;&gt;Why This Matters&lt;/h2&gt;

&lt;p&gt;This evolution from agent to architect is more than just a personal productivity hack. It’s a glimpse into the future of software development. We’re moving from a world where we write code to a world where we &lt;strong&gt;describe systems&lt;/strong&gt;. Our job is to be the architect, to define the blueprint, and to let the agents do the building.&lt;/p&gt;

&lt;p&gt;Cursor, more than any other tool I’ve used, understands this shift. It’s not just about generating code; it’s about managing complexity, maintaining context, and giving developers the leverage to build at a scale that was previously unimaginable.&lt;/p&gt;

&lt;p&gt;If you’re still using AI as a simple code generator, I encourage you to explore @ mentions, MCP, Plan Mode, and custom commands. It’s a journey that will transform you from a developer who uses AI to an architect who directs it.&lt;/p&gt;
</description>
        <pubDate>Sun, 04 Jan 2026 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2026/01/04/a-year-with-cursor-how-my-workflow-evolved-from-agent-to-architect/</link>
        <guid isPermaLink="true">https://subramanya.ai/2026/01/04/a-year-with-cursor-how-my-workflow-evolved-from-agent-to-architect/</guid>
        
        <category>Cursor</category>
        
        <category>AI IDE</category>
        
        <category>MCP</category>
        
        <category>Developer Workflow</category>
        
        <category>AI Agents</category>
        
        <category>Plan Mode</category>
        
        <category>AI Productivity</category>
        
        <category>Agentic AI</category>
        
        <category>Developer Tools</category>
        
        
      </item>
    
      <item>
        <title>What Are Context Graphs, Really?</title>
        <description>&lt;p&gt;Last week, I wrote about my reaction to Jaya Gupta’s viral post on Context Graphs [1]. The idea of a “system of record for decisions” resonated deeply, framing the evolution of agentic infrastructure from tools to skills to memory. But since then, the conversation has exploded, and it has become clear that the term “context graph” itself is a bit of a Rorschach test. Everyone sees something different.&lt;/p&gt;

&lt;p&gt;Animesh Koratana, founder of PlayerZero, has written a series of follow up posts that cut through the noise and get to the heart of what a context graph actually is, and why it is so structurally hard to build [2] [3]. His insights are critical for anyone serious about building agentic AI in the enterprise. This is not about “adding memory to your agent” or wiring up a graph database. It is about rethinking our assumptions about data, time, and the nature of organizational knowledge.&lt;/p&gt;

&lt;h2 id=&quot;the-two-clocks-problem-why-we-are-missing-half-of-time&quot;&gt;The Two Clocks Problem: Why We Are Missing Half of Time&lt;/h2&gt;

&lt;p&gt;Koratana’s most powerful insight is what he calls the &lt;strong&gt;Two Clocks Problem&lt;/strong&gt;. We have built trillion dollar infrastructure for the &lt;strong&gt;state clock&lt;/strong&gt;: what is true right now. Your CRM stores the final deal value. Your ticketing system stores “resolved.” Your codebase stores the current state.&lt;/p&gt;

&lt;p&gt;But we have almost no infrastructure for the &lt;strong&gt;event clock&lt;/strong&gt;: what happened, in what order, and with what reasoning. The git blame shows &lt;em&gt;who&lt;/em&gt; changed the timeout from 5s to 30s, but the &lt;em&gt;why&lt;/em&gt; is gone. The CRM says “closed lost,” but it does not say you were the second choice and the winner had one feature you are shipping next quarter. As Koratana puts it:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“We’ve built trillion-dollar infrastructure for what’s true now. Almost nothing for why it became true.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the core of the problem. We are asking agents to exercise judgment without access to precedent. We are training lawyers on verdicts without case law. The context graph is the infrastructure for the event clock. It is the case law of the enterprise.&lt;/p&gt;

&lt;h2 id=&quot;the-five-coordinate-systems-problem-why-this-is-not-a-database-problem&quot;&gt;The Five Coordinate Systems Problem: Why This Is Not a Database Problem&lt;/h2&gt;

&lt;p&gt;So why can’t we just build a better database? Because a context graph requires joins across five different coordinate systems that do not share keys:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Events&lt;/strong&gt;: What happened?&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Timeline&lt;/strong&gt;: When did it happen?&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Semantics&lt;/strong&gt;: What does it mean?&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Attribution&lt;/strong&gt;: Who owned it?&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Outcome&lt;/strong&gt;: What did it cause?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each of these has a different geometry. Timelines are linear. Events are sequential. Semantics live in vector space. Attribution is graph structured. Outcomes are causal DAGs. And the keys are fluid. “Jaya Gupta” in an email, “J. Gupta” in a contract, and “@JayaGup10” in Slack are the same entity with no shared identifier.&lt;/p&gt;

&lt;p&gt;Traditional databases are built for joins on stable keys within a single coordinate system. Context graphs require probabilistic joins across all five simultaneously. This is not a database problem; it is a representation problem.&lt;/p&gt;

&lt;h2 id=&quot;agents-as-informed-walkers-how-we-solve-the-representation-problem&quot;&gt;Agents as Informed Walkers: How We Solve the Representation Problem&lt;/h2&gt;

&lt;p&gt;If the ontology of every organization is different and constantly changing, how can we ever hope to model it? Koratana’s answer is that we do not have to. The agents do it for us.&lt;/p&gt;

&lt;p&gt;When an agent works through a problem, its trajectory is a trace through the state space of the organization. It is an implicit map of the ontology, discovered through use rather than specified upfront. This is the key insight from graph representation learning (node2vec): you do not need to know the structure of a graph to learn representations of it. You just need to walk it.&lt;/p&gt;

&lt;p&gt;Agents are &lt;strong&gt;informed walkers&lt;/strong&gt;. Their trajectories are not random; they are problem directed. By accumulating enough of these trajectories, we can learn embeddings that encode the structure of the organization. We can learn that two engineers who never interact are structurally equivalent because they play the same role in different subgraphs. We can learn that a certain sequence of events is a precursor to churn, even if those events have never been explicitly linked.&lt;/p&gt;

&lt;h2 id=&quot;what-this-actually-means-for-builders&quot;&gt;What This Actually Means for Builders&lt;/h2&gt;

&lt;p&gt;So, what is a context graph, really? It is not a graph database. It is not a vector store. It is a &lt;strong&gt;learned representation of organizational reasoning, derived from the trajectories of agents solving problems&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This has profound implications for how we build agentic systems:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;The agents are not building the context graph; they are solving problems worth solving.&lt;/strong&gt; The context graph is an emergent property of their work. The focus should be on deploying agents into real workflows, not on building a perfect ontology upfront.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;The value is in the trajectories, not the state.&lt;/strong&gt; We need to shift our focus from storing the final state to capturing the full, replayable history of how that state was reached.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;This is a machine learning problem, not a data engineering problem.&lt;/strong&gt; The goal is not to build a perfect data model, but to learn a representation that is useful for reasoning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Building a context graph is not about buying a new piece of software. It is about a fundamental shift in how we think about data, time, and the nature of work in the agentic era. It is about recognizing that the most valuable asset we have is not our data, but the accumulated wisdom of the decisions we make every day. And it is about building the infrastructure to finally capture that wisdom and put it to work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://x.com/JayaGup10/status/2003525933534179480&quot;&gt;Gupta, J. (2025, December 23). &lt;em&gt;AI’s trillion-dollar opportunity: Context graphs&lt;/em&gt;. X.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://www.linkedin.com/pulse/why-context-graphs-rare-wild-animesh-koratana-3wzte/&quot;&gt;Koratana, A. (2026, January 1). &lt;em&gt;Why context graphs are rare in the wild&lt;/em&gt;. LinkedIn.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://www.linkedin.com/pulse/how-build-context-graph-animesh-koratana-6abve&quot;&gt;Koratana, A. (2025, December 28). &lt;em&gt;How to build a context graph&lt;/em&gt;. LinkedIn.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Thu, 01 Jan 2026 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2026/01/01/what-are-context-graphs-really/</link>
        <guid isPermaLink="true">https://subramanya.ai/2026/01/01/what-are-context-graphs-really/</guid>
        
        <category>Context Graphs</category>
        
        <category>Agentic AI</category>
        
        <category>Enterprise AI</category>
        
        <category>Two Clocks Problem</category>
        
        <category>Event Sourcing</category>
        
        <category>Graph Representation Learning</category>
        
        <category>AI Infrastructure</category>
        
        <category>Organizational Memory</category>
        
        
      </item>
    
      <item>
        <title>Context Graphs: My Thoughts on the Trillion Dollar Evolution of Agentic Infrastructure</title>
        <description>&lt;p&gt;After reading Jaya Gupta’s post about Context Graphs, I have not been able to stop thinking about it [1]. For me, it did something personal: it gave a name to the architectural pattern I have been circling around in the agentic infrastructure discussions on this blog for the past year. I later wrote a more direct explainer on &lt;a href=&quot;/2026/01/01/what-are-context-graphs-really/&quot;&gt;what are context graphs&lt;/a&gt; and why the term means more than agent memory.&lt;/p&gt;

&lt;p&gt;Gupta’s thesis is simple but profound. The last generation of enterprise software (Salesforce, Workday, SAP) created trillion dollar companies by becoming &lt;strong&gt;systems of record&lt;/strong&gt;. Own the canonical data, own the workflow, own the lock in. The question now is whether those systems survive the shift to agents. Gupta argues they will, but that a new layer will emerge on top of them: &lt;strong&gt;a system of record for decisions&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I agree. And I think this is the missing piece that connects everything I have been writing about.&lt;/p&gt;

&lt;h2 id=&quot;the-missing-layer-decision-traces&quot;&gt;The Missing Layer: Decision Traces&lt;/h2&gt;

&lt;p&gt;What resonated most with me was Gupta’s articulation of the &lt;strong&gt;decision trace&lt;/strong&gt;. This is the context that currently lives in Slack threads, deal desk conversations, escalation calls, and people’s heads. It is the exception logic that says, “We always give healthcare companies an extra 10% because their procurement cycles are brutal.” It is the precedent from past decisions that says, “We structured a similar deal for Company X last quarter, we should be consistent.”&lt;/p&gt;

&lt;p&gt;None of this is captured in our systems of record. The CRM shows the final price, but not who approved the deviation or why. The support ticket says “escalated to Tier 3,” but not the cross system synthesis that led to that decision. As Gupta puts it:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“The reasoning connecting data to action was never treated as data in the first place.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the wall that every enterprise hits when they try to scale agents. The wall is not missing data. It is missing &lt;strong&gt;decision traces&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id=&quot;from-tools-to-skills-to-context-the-evolution-i-have-been-documenting&quot;&gt;From Tools to Skills to Context: The Evolution I Have Been Documenting&lt;/h2&gt;

&lt;p&gt;Reading Gupta’s post, I realized that the evolution I have been documenting on this blog (from MCP to Agent Skills to governance) is really a story about building the infrastructure for context graphs. Let me explain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1&lt;/strong&gt; was about &lt;strong&gt;tools&lt;/strong&gt;. The Model Context Protocol (MCP) gave agents the ability to interact with external systems. It was the plumbing that connected agents to databases, APIs, and the outside world. But we quickly learned that tool access alone is not enough. An agent with a hammer is not a carpenter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2&lt;/strong&gt; was about &lt;strong&gt;skills&lt;/strong&gt;. Anthropic’s Agent Skills standard gave us a way to codify procedural knowledge, the “how to” guides that teach agents to use tools effectively. Skills are the brain of the agent. They turn tribal knowledge into portable, composable assets. But even skills are not enough. An agent with a hammer and a carpentry manual is still not a master carpenter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 3&lt;/strong&gt; is about &lt;strong&gt;context&lt;/strong&gt;. This is where context graphs come in. A context graph is the accumulated record of every decision, every exception, and every outcome. It answers the question, “What happened last time?” It turns exceptions into precedents and tribal knowledge into institutional knowledge.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;strong&gt;Phase&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Primitive&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;What It Provides&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;My Analogy&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Phase 1&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Tools (MCP)&lt;/td&gt;
      &lt;td&gt;Capability&lt;/td&gt;
      &lt;td&gt;The agent has a hammer.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Phase 2&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Skills (Agent Skills)&lt;/td&gt;
      &lt;td&gt;Expertise&lt;/td&gt;
      &lt;td&gt;The agent has a carpentry manual.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Phase 3&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Context (Context Graphs)&lt;/td&gt;
      &lt;td&gt;Experience&lt;/td&gt;
      &lt;td&gt;The agent has access to the record of every house it has ever built.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;why-this-matters-for-the-governance-stack&quot;&gt;Why This Matters for the Governance Stack&lt;/h2&gt;

&lt;p&gt;The governance stack I have been advocating for (agent registries, tool registries, skill registries, policy engines) is the infrastructure that makes context graphs possible. The agent registry provides the identity of the agent making the decision. The tool registry (MCP) provides the capabilities available to that agent. The skill registry provides the expertise that guides the agent’s actions. And the orchestration layer is where the decision trace is captured and persisted.&lt;/p&gt;

&lt;p&gt;Without this infrastructure, decision traces are ephemeral. They exist for a moment in the agent’s context window and then disappear. With this infrastructure, every decision becomes a durable artifact that can be audited, learned from, and used as precedent.&lt;/p&gt;

&lt;h2 id=&quot;my-takeaway&quot;&gt;My Takeaway&lt;/h2&gt;

&lt;p&gt;Gupta is right that agent first startups have a structural advantage here. They sit in the execution path. They see the full context at decision time. Incumbents, built on current state storage, simply cannot capture this.&lt;/p&gt;

&lt;p&gt;But the bigger insight for me is this: &lt;strong&gt;we are not just building agents. We are building the decision record of the enterprise.&lt;/strong&gt; The context graph is not a feature; it is the foundation of a new kind of system of record. The enterprises that win in the agentic era will be those that recognize this and invest in the infrastructure to capture, store, and leverage their decision traces.&lt;/p&gt;

&lt;p&gt;We started by giving agents tools. Then we taught them skills. Now, we must give them context. That is the trillion dollar evolution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://x.com/JayaGup10/status/2003525933534179480&quot;&gt;Gupta, J. (2025, December 23). &lt;em&gt;AI’s trillion dollar opportunity: Context graphs&lt;/em&gt;. X.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Fri, 26 Dec 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/12/26/context-graphs-my-thoughts-on-the-trillion-dollar-evolution-of-agentic-memory/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/12/26/context-graphs-my-thoughts-on-the-trillion-dollar-evolution-of-agentic-memory/</guid>
        
        <category>Context Graphs</category>
        
        <category>Agentic AI</category>
        
        <category>Enterprise AI</category>
        
        <category>MCP</category>
        
        <category>Agent Skills</category>
        
        <category>AI Infrastructure</category>
        
        <category>Decision Traces</category>
        
        <category>AI Governance</category>
        
        <category>Systems of Record</category>
        
        
      </item>
    
      <item>
        <title>2025: The Year Agentic AI Got Real (What Comes Next)</title>
        <description>&lt;p&gt;If 2024 was the year of AI experimentation, 2025 was the year of industrialization. The speculative boom around generative AI has rapidly matured into the fastest-scaling software category in history, with autonomous agents moving from the lab to the core of enterprise operations. As we close out the year, it’s clear that the agentic AI landscape has been fundamentally reshaped by massive investment, critical standardization, and a clear-eyed focus on solving the hard problems of production readiness.&lt;/p&gt;

&lt;p&gt;But this wasn’t just a story of adoption. 2025 was the year the industry confronted the architectural limitations of monolithic agents and began a decisive shift toward a more specialized, scalable, and governable future.&lt;/p&gt;

&lt;h2 id=&quot;the-37-billion-build-out-from-experiment-to-enterprise-imperative&quot;&gt;The $37 Billion Build-Out: From Experiment to Enterprise Imperative&lt;/h2&gt;

&lt;p&gt;The most telling sign of this shift is the sheer volume of capital deployed. According to a December 2025 report from Menlo Ventures, enterprise spending on generative AI skyrocketed to &lt;strong&gt;$37 billion&lt;/strong&gt; in 2025, a stunning &lt;strong&gt;3.2x increase&lt;/strong&gt; from the previous year [1]. This surge now accounts for over 6% of the entire global software market.&lt;/p&gt;

&lt;p&gt;Crucially, over half of this spending ($19 billion) flowed directly into the application layer, demonstrating a clear enterprise priority for immediate productivity gains over long-term infrastructure bets. This investment is validated by strong adoption metrics, with a recent PwC survey finding that &lt;strong&gt;79% of companies&lt;/strong&gt; are already adopting AI agents [2].&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/genai_spend_by_category_menlo.webp&quot; alt=&quot;Enterprise AI Spend by Category 2023-2025&quot; width=&quot;768&quot; height=&quot;486&quot; /&gt;
&lt;em&gt;Source: Menlo Ventures, 2025: The State of Generative AI in the Enterprise [1]&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;solving-the-interoperability-crisis-the-standardization-of-2025&quot;&gt;Solving the Interoperability Crisis: The Standardization of 2025&lt;/h2&gt;

&lt;p&gt;While the spending boom captured headlines, a quieter, more profound revolution was taking place in the infrastructure layer. The primary challenge addressed in 2025 was the &lt;strong&gt;interoperability crisis&lt;/strong&gt;. The early agentic ecosystem was a chaotic landscape of proprietary APIs and fragmented toolsets, making it nearly impossible to build robust, cross-platform applications. This year, two key developments brought order to that chaos.&lt;/p&gt;

&lt;h3 id=&quot;1-the-maturation-of-mcp&quot;&gt;1. The Maturation of MCP&lt;/h3&gt;

&lt;p&gt;The Model Context Protocol (MCP), introduced in late 2024, became the de facto standard for agent-to-tool communication. Its first anniversary in November 2025 was marked by a major spec release that introduced critical enterprise features like asynchronous operations, server identity, and a formal extensions framework, directly addressing early complaints about its production readiness [3].&lt;/p&gt;

&lt;p&gt;This culminated in the December 9th announcement that Anthropic, along with Block and OpenAI, was donating MCP to the newly formed &lt;strong&gt;Agentic AI Foundation (AAIF)&lt;/strong&gt; under the Linux Foundation [4]. With over 10,000 active public MCP servers and 97 million monthly SDK downloads, MCP’s transition to a neutral, community-driven standard solidifies its role as the foundational protocol for the agentic economy.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/mcp_before_after.png&quot; alt=&quot;Before and After MCP&quot; width=&quot;960&quot; height=&quot;540&quot; /&gt;
&lt;em&gt;The shift from fragmented, proprietary APIs to a unified, MCP-based approach simplifies agent-tool integration.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;2-the-dawn-of-portable-skills&quot;&gt;2. The Dawn of Portable Skills&lt;/h3&gt;

&lt;p&gt;Following the same playbook, Anthropic made another pivotal move on December 18th, opening up its &lt;strong&gt;Agent Skills&lt;/strong&gt; specification [5]. This provides a standardized, portable way to equip agents with procedural knowledge, moving beyond simple tool-use to more complex, multi-step task execution. By making the specification and SDK available to all, the industry is fostering an ecosystem where skills can be developed, shared, and deployed across any compliant AI platform, preventing vendor lock-in.&lt;/p&gt;

&lt;h2 id=&quot;the-next-frontier-the-rise-of-the-agent-workforce&quot;&gt;The Next Frontier: The Rise of the Agent Workforce&lt;/h2&gt;

&lt;p&gt;These standardization efforts have unlocked the next major architectural shift: the move away from monolithic, general-purpose agents toward &lt;strong&gt;collections of specialized skills&lt;/strong&gt; that function like a human team. No company hires a single “super-employee” to be a marketer, an engineer, and a financial analyst. They hire specialists who excel at their roles and collaborate to achieve a larger goal. The future of enterprise AI is the same.&lt;/p&gt;

&lt;p&gt;This “multi-agent” or “skill-based” architecture is not just a theoretical concept. Anthropic’s own research showed that a multi-agent system—with a lead agent coordinating specialized sub-agents—outperformed a single, more powerful agent by over 90% on complex research tasks [6]. The reason is simple: specialization allows for greater accuracy, and parallelism allows for greater scale.&lt;/p&gt;

&lt;p&gt;We are already seeing the first wave of companies built on this philosophy. YC-backed &lt;strong&gt;Getden.io&lt;/strong&gt;, for example, provides a platform for non-engineers to build and collaborate with agents that can be composed of various skills and integrations [7]. This approach democratizes agent creation, allowing domain experts—not just developers—to build the specialized “digital employees” they need.&lt;/p&gt;

&lt;h2 id=&quot;the-challenges-of-2026-from-adoption-to-governance&quot;&gt;The Challenges of 2026: From Adoption to Governance&lt;/h2&gt;

&lt;p&gt;While 2025 solved the problem of &lt;em&gt;connection&lt;/em&gt;, 2026 will be about solving the challenges of &lt;em&gt;control&lt;/em&gt; and &lt;em&gt;coordination&lt;/em&gt; at scale. As enterprises move from deploying dozens of agents to thousands of skills, a new set of problems comes into focus:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Governance at Scale:&lt;/strong&gt; How do you manage access control, cost, and versioning for thousands of interconnected skills? The risk of “skill sprawl” and shadow AI is immense, demanding a new generation of governance platforms.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Reliability and Predictability:&lt;/strong&gt; The non-deterministic nature of LLMs remains a major barrier to enterprise trust. For agents to run mission-critical processes, we need robust testing frameworks, better observability tools, and architectural patterns that ensure predictable outcomes.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Multi-Agent Orchestration:&lt;/strong&gt; As skill-based systems become the norm, the primary challenge shifts from tool-use to agent coordination. How do you manage dependencies, resolve conflicts, and ensure a team of agents can reliably collaborate to complete a complex workflow? This is a frontier problem that will define the next generation of agentic platforms.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Security in a Composable World:&lt;/strong&gt; A world of interoperable skills creates new attack surfaces. How do you secure the supply chain for third-party skills? How do you prevent a compromised agent from triggering a cascade of failures across a complex workflow? The security model for agentic AI is still in its infancy.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The groundwork laid in 2025 was monumental. It moved us from a world of isolated, experimental bots to the brink of a true agentic economy. But the journey is far from over. The companies that will win in 2026 and beyond will be those that master the art of building, managing, and securing not just agents, but entire workforces of specialized, collaborative skills.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/&quot;&gt;Menlo Ventures. (2025, December 9). &lt;em&gt;2025: The State of Generative AI in the Enterprise&lt;/em&gt;. Menlo Ventures.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://www.pwc.com/us/en/tech-effect/ai-analytics/ai-agent-survey.html&quot;&gt;PwC. (2025, May 16). &lt;em&gt;PwC’s AI Agent Survey&lt;/em&gt;. PwC.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://blog.modelcontextprotocol.io/posts/2025-11-25-first-mcp-anniversary/&quot;&gt;Model Context Protocol. (2025, November 25). &lt;em&gt;One Year of MCP: November 2025 Spec Release&lt;/em&gt;. Model Context Protocol Blog.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation&quot;&gt;Anthropic. (2025, December 9). &lt;em&gt;Donating the Model Context Protocol and establishing the Agentic AI Foundation&lt;/em&gt;. Anthropic.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://venturebeat.com/ai/anthropic-launches-enterprise-agent-skills-and-opens-the-standard/&quot;&gt;VentureBeat. (2025, December 18). &lt;em&gt;Anthropic launches enterprise ‘Agent Skills’ and opens the standard&lt;/em&gt;. VentureBeat.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[6] &lt;a href=&quot;https://www.anthropic.com/engineering/multi-agent-research-system&quot;&gt;Anthropic. (2025, June 13). &lt;em&gt;How we built our multi-agent research system&lt;/em&gt;. Anthropic Engineering.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[7] &lt;a href=&quot;https://www.ycombinator.com/companies/den&quot;&gt;Y Combinator. (2025). &lt;em&gt;Den: Cursor for knowledge workers&lt;/em&gt;. Y Combinator.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Tue, 23 Dec 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/12/23/2025-the-year-agentic-ai-got-real-and-what-comes-next/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/12/23/2025-the-year-agentic-ai-got-real-and-what-comes-next/</guid>
        
        <category>Agentic AI</category>
        
        <category>Enterprise AI</category>
        
        <category>MCP</category>
        
        <category>Agent Skills</category>
        
        <category>AI Agents</category>
        
        <category>AI Infrastructure</category>
        
        <category>Multi-Agent Systems</category>
        
        <category>AI Governance</category>
        
        <category>Open Standards</category>
        
        <category>2025 Review</category>
        
        
      </item>
    
      <item>
        <title>Agent Skills: The Missing Piece of the Enterprise AI Puzzle</title>
        <description>&lt;p&gt;The enterprise AI landscape is at a critical juncture. We have powerful general-purpose models and a growing ecosystem of tools. But we are missing a crucial piece of the puzzle: a standardized, portable way to equip agents with the procedural knowledge and organizational context they need to perform real work. On December 18, 2025, Anthropic took a major step towards solving this problem by releasing &lt;strong&gt;Agent Skills&lt;/strong&gt; as an open standard [1]. This move, following the same playbook that made the Model Context Protocol (MCP) an industry-wide success, is not just another feature release—it is a fundamental shift in how we will build and manage agentic workforces.&lt;/p&gt;

&lt;h2 id=&quot;the-problem-general-intelligence-isnt-enough&quot;&gt;The Problem: General Intelligence Isn’t Enough&lt;/h2&gt;

&lt;p&gt;General-purpose agents like Claude are incredibly capable, but they lack the specialized expertise required for most enterprise tasks. As Anthropic puts it, “real work requires procedural knowledge and organizational context” [2]. An agent might know what a pull request is, but it doesn’t know your company’s specific code review process. It might understand financial concepts, but it doesn’t know your team’s quarterly reporting workflow. This gap between general intelligence and specialized execution is the primary barrier to scaling agentic AI in the enterprise.&lt;/p&gt;

&lt;p&gt;Until now, the solution has been to build fragmented, custom-designed agents for each use case. This creates a landscape of “shadow AI”—siloed, unmanageable, and impossible to govern. What we need is a way to make expertise &lt;strong&gt;composable, portable, and discoverable&lt;/strong&gt;. This is exactly what Agent Skills are designed to do.&lt;/p&gt;

&lt;h2 id=&quot;the-solution-codified-expertise-as-a-standard&quot;&gt;The Solution: Codified Expertise as a Standard&lt;/h2&gt;

&lt;p&gt;At its core, an Agent Skill is a directory containing a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SKILL.md&lt;/code&gt; file and optional subdirectories for scripts, references, and assets. It is, as Anthropic describes it, “an onboarding guide for a new hire” [2]. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SKILL.md&lt;/code&gt; file contains instructions, examples, and best practices that teach an agent how to perform a specific task. The key innovation is &lt;strong&gt;progressive disclosure&lt;/strong&gt;, a three-level system for managing context efficiently:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Metadata&lt;/strong&gt;: At startup, the agent loads only the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;description&lt;/code&gt; of each installed skill. This provides just enough information for the agent to know when a skill might be relevant, without flooding its context window.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Instructions&lt;/strong&gt;: When a skill is triggered, the agent loads the full &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SKILL.md&lt;/code&gt; body. This gives the agent the core instructions it needs to perform the task.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Resources&lt;/strong&gt;: If the task requires more detail, the agent can dynamically load additional files from the skill’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scripts/&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;references/&lt;/code&gt;, or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;assets/&lt;/code&gt; directories. This allows skills to contain a virtually unbounded amount of context, loaded only as needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture is both simple and profound. It allows us to package complex procedural knowledge into a standardized, shareable format. It solves the context window problem by making context dynamic and on-demand. And by making it an open standard, Anthropic is ensuring that this expertise is portable across any compliant agent platform.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;strong&gt;Component&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Context Usage&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Metadata&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;description&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;Skill discovery&lt;/td&gt;
      &lt;td&gt;Minimal (loaded at startup)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Instructions&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SKILL.md&lt;/code&gt; body)&lt;/td&gt;
      &lt;td&gt;Core task guidance&lt;/td&gt;
      &lt;td&gt;On-demand (loaded when skill is activated)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Resources&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scripts/&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;references/&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;Detailed context and tools&lt;/td&gt;
      &lt;td&gt;On-demand (loaded as needed)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;skills-vs-mcp-the-brain-and-the-plumbing&quot;&gt;Skills vs. MCP: The Brain and the Plumbing&lt;/h2&gt;

&lt;p&gt;It is crucial to understand how Agent Skills relate to the Model Context Protocol (MCP). They are not competing standards; they are complementary layers of the agentic stack. As Simon Willison aptly puts it, “MCP provides the ‘plumbing’ for tool access, while agent skills provide the ‘brain’ or procedural memory for how to use those tools effectively” [3].&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;MCP&lt;/strong&gt; tells an agent &lt;strong&gt;what tools are available&lt;/strong&gt;. It is the API that connects agents to databases, APIs, and other external systems.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Agent Skills&lt;/strong&gt; teach an agent &lt;strong&gt;how to use those tools&lt;/strong&gt;. They provide the procedural knowledge, best practices, and organizational context required to perform complex, multi-step tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, MCP might give an agent access to a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;git&lt;/code&gt; tool. An Agent Skill would teach that agent your team’s specific git branching strategy, pull request template, and code review checklist. One provides the capability; the other provides the expertise. You need both to build a truly effective agentic workforce.&lt;/p&gt;

&lt;h2 id=&quot;why-an-open-standard-matters-for-the-enterprise&quot;&gt;Why an Open Standard Matters for the Enterprise&lt;/h2&gt;

&lt;p&gt;By releasing Agent Skills as an open standard, Anthropic is making a strategic bet on interoperability and ecosystem growth. This move has several critical implications for the enterprise:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;It Prevents Vendor Lock-In&lt;/strong&gt;: An open standard for skills means that the expertise you codify is not tied to a single agent platform. You can build a library of skills for your organization and deploy them across any compliant agent, whether it’s from Anthropic, OpenAI, or an open-source provider.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;It Creates a Marketplace for Expertise&lt;/strong&gt;: We will see the emergence of a marketplace for pre-built skills, both open-source and commercial. This will allow organizations to acquire specialized capabilities without having to build them from scratch.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;It Accelerates Adoption&lt;/strong&gt;: A standardized format for skills makes it easier for developers to get started and for organizations to share best practices. This will accelerate the adoption of agentic AI and drive the development of more sophisticated, multi-agent workflows.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;the-road-ahead-governance-and-the-ecosystem&quot;&gt;The Road Ahead: Governance and the Ecosystem&lt;/h2&gt;

&lt;p&gt;The Agent Skills specification is, as Simon Willison notes, “deliciously tiny” and “quite heavily under-specified” [3]. This is a feature, not a bug. It provides a flexible foundation that the community can build upon. We can expect to see the specification evolve as it is adopted by more platforms and as best practices emerge.&lt;/p&gt;

&lt;p&gt;However, the power of skills—especially their ability to execute code—also introduces new governance challenges. Organizations will need to establish clear processes for auditing, testing, and deploying skills from trusted sources. We will need &lt;strong&gt;skill registries&lt;/strong&gt; to manage the discovery and distribution of skills, and &lt;strong&gt;policy engines&lt;/strong&gt; to control which agents can use which skills in which contexts. These are the next frontiers in agentic infrastructure.&lt;/p&gt;

&lt;p&gt;Agent Skills are not just a new feature; they are a new architectural primitive for the agentic era. They provide the missing link between general intelligence and specialized execution. By making expertise composable, portable, and standardized, Agent Skills will unlock the next wave of innovation in enterprise AI. The race is no longer just about building the most powerful models; it is about building the most capable and knowledgeable agentic workforce.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://agentskills.io&quot;&gt;Anthropic. (2025, December 18). &lt;em&gt;Agent Skills&lt;/em&gt;. Agent Skills.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills&quot;&gt;Anthropic. (2025, October 16). &lt;em&gt;Equipping agents for the real world with Agent Skills&lt;/em&gt;. Anthropic Blog.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://simonwillison.net/2025/Dec/19/agent-skills/&quot;&gt;Willison, S. (2025, December 19). &lt;em&gt;Agent Skills&lt;/em&gt;. Simon Willison’s Weblog.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Thu, 18 Dec 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/12/18/agent-skills-the-missing-piece-of-the-enterprise-ai-puzzle/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/12/18/agent-skills-the-missing-piece-of-the-enterprise-ai-puzzle/</guid>
        
        <category>AI Agents</category>
        
        <category>Agent Skills</category>
        
        <category>Enterprise AI</category>
        
        <category>Anthropic</category>
        
        <category>MCP</category>
        
        <category>Agentic AI</category>
        
        <category>AI Governance</category>
        
        <category>Open Standards</category>
        
        <category>AI Infrastructure</category>
        
        <category>Agent Architecture</category>
        
        
      </item>
    
      <item>
        <title>From Boom to Build-Out: The State of Enterprise AI in 2026</title>
        <description>&lt;p&gt;The era of AI experimentation is over. What began as a speculative boom has rapidly industrialized into the fastest-scaling software category in history. According to a new report from Menlo Ventures, enterprise spending on generative AI skyrocketed to &lt;strong&gt;$37 billion&lt;/strong&gt; in 2025, a stunning &lt;strong&gt;3.2x increase&lt;/strong&gt; from the previous year [3]. This isn’t just hype; it’s a fundamental market shift. AI now commands &lt;strong&gt;6% of the entire global SaaS market&lt;/strong&gt;—a milestone reached in just three years [3].&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/enterprise_ai_growth_menlo.webp&quot; alt=&quot;Enterprise AI Growth&quot; width=&quot;768&quot; height=&quot;519&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This explosive growth signals a new phase of enterprise adoption. The conversation has moved beyond simple chatbots and one-off tasks to focus on building durable, agentic infrastructure. Reports from OpenAI, Anthropic, and Menlo Ventures all point to the same conclusion: the battleground for competitive advantage has shifted from model performance to platform execution.&lt;/p&gt;

&lt;h2 id=&quot;the-money-flows-to-applications-and-enterprises-are-buying&quot;&gt;The Money Flows to Applications, and Enterprises are Buying&lt;/h2&gt;

&lt;p&gt;So, where is this money going? Over half of all enterprise AI spend &lt;strong&gt;$19 billion&lt;/strong&gt; is flowing directly into the application layer [3]. This indicates a clear preference for immediate productivity gains over long-term, in-house infrastructure projects. The “buy vs. build” debate has decisively tilted towards buying, with &lt;strong&gt;76% of AI use cases now being purchased&lt;/strong&gt; from vendors, a dramatic reversal from 2024 when the split was nearly even [3].&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/genai_spend_by_category_menlo.webp&quot; alt=&quot;Generative AI Spend by Category&quot; width=&quot;768&quot; height=&quot;486&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This trend is fueled by two factors: AI solutions are converting at nearly double the rate of traditional SaaS (47% vs. 25%), and product-led growth (PLG) is driving adoption at 4x the rate of traditional software [3]. Individual employees and teams are adopting AI tools, proving their value, and creating a powerful bottom-up flywheel that short-circuits legacy procurement cycles.&lt;/p&gt;

&lt;h2 id=&quot;the-architectural-shift-from-queries-to-agentic-workflows&quot;&gt;The Architectural Shift: From Queries to Agentic Workflows&lt;/h2&gt;

&lt;p&gt;This rapid adoption is not just about doing old tasks faster; it’s about enabling entirely new ways of working. The data shows a clear architectural shift from simple, conversational queries to structured, agentic workflows that are deeply embedded in core business processes.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/anthropic_multistep_workflows.webp&quot; alt=&quot;Multi-Step Workflows&quot; width=&quot;1700&quot; height=&quot;2200&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Anthropic’s 2026 survey reveals that &lt;strong&gt;57% of organizations are already deploying agents for multi-stage processes&lt;/strong&gt;, with 81% planning to tackle even more complex, cross-functional workflows in the coming year [1]. This transition from single-turn interactions to persistent, multi-step agents is where true business transformation is happening.&lt;/p&gt;

&lt;p&gt;OpenAI’s 2025 report highlights a &lt;strong&gt;19x year-to-date increase&lt;/strong&gt; in the use of structured workflows like Custom GPTs and Projects, with 20% of all enterprise messages now being processed through these repeatable systems [2]. The impact is tangible, with &lt;strong&gt;80% of organizations reporting measurable ROI&lt;/strong&gt; on their agent investments and workers saving an average of 40-60 minutes per day [1, 2].&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/openai_productivity_gains.webp&quot; alt=&quot;Technical Work Expansion&quot; width=&quot;1700&quot; height=&quot;2200&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Perhaps most striking is that &lt;strong&gt;75% of workers report being able to complete tasks they previously could not perform&lt;/strong&gt;, including programming support, spreadsheet analysis, and technical tool development [2]. This democratization of technical capabilities is fundamentally reshaping how work gets done.&lt;/p&gt;

&lt;h2 id=&quot;coding-leads-the-charge&quot;&gt;Coding Leads the Charge&lt;/h2&gt;

&lt;p&gt;Nearly all organizations (90%) now use AI to assist with development, and 86% deploy agents for production code [1]. The adoption is so pervasive that coding-related messages have increased by 36% even among non-technical workers [2].&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/anthropic_coding_agents.webp&quot; alt=&quot;Coding Agents Adoption&quot; width=&quot;1700&quot; height=&quot;2200&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Organizations report time savings across the entire development lifecycle: planning and ideation (58%), code generation (59%), documentation (59%), and code review and testing (59%) [1]. This systematic integration across the full software development lifecycle is accelerating delivery timelines and freeing developers to focus on higher-value architectural and problem-solving work.&lt;/p&gt;

&lt;h2 id=&quot;the-new-frontier-platform-level-execution&quot;&gt;The New Frontier: Platform-Level Execution&lt;/h2&gt;

&lt;p&gt;As AI becomes an essential, intelligent layer of the enterprise tech stack, the primary barriers to scaling are no longer model capabilities but organizational and architectural readiness. The top challenges cited by leaders are &lt;strong&gt;integration with existing systems (46%)&lt;/strong&gt;, &lt;strong&gt;data access and quality (42%)&lt;/strong&gt;, and &lt;strong&gt;change management (39%)&lt;/strong&gt; [1]. These are not model problems; they are platform problems.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/openai_industry_growth.webp&quot; alt=&quot;Industry Growth Patterns&quot; width=&quot;1700&quot; height=&quot;2200&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This new reality is creating a widening performance gap. OpenAI’s data shows that “frontier firms” that treat AI as integrated infrastructure see &lt;strong&gt;2x more engagement per seat&lt;/strong&gt;, and their workers are &lt;strong&gt;6x more active&lt;/strong&gt; than the median [2]. Technology, healthcare, and manufacturing are seeing the fastest growth (11x, 8x, and 7x respectively), while professional services and finance operate at the largest scale [2].&lt;/p&gt;

&lt;p&gt;The state of enterprise AI in 2026 is clear: the gold rush is over, and the era of building the railroads has begun. Success is no longer defined by having the best model, but by having the best platform to deploy, manage, and secure intelligence at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://cdn.sanity.io/files/4zrzovbb/website/cd77281ebc251e6b860543d8943ede8d06c4ef50.pdf&quot;&gt;Anthropic. (2025). &lt;em&gt;The 2026 State of AI Agents Report&lt;/em&gt;. Anthropic.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://cdn.openai.com/pdf/7ef17d82-96bf-4dd1-9df2-228f7f377a29/the-state-of-enterprise-ai_2025-report.pdf&quot;&gt;OpenAI. (2025). &lt;em&gt;The state of enterprise AI 2025 report&lt;/em&gt;. OpenAI.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/&quot;&gt;Menlo Ventures. (2025, December 9). &lt;em&gt;2025: The State of Generative AI in the Enterprise&lt;/em&gt;. Menlo Ventures.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Wed, 10 Dec 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/12/10/from-boom-to-build-out-the-state-of-enterprise-ai-in-2026/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/12/10/from-boom-to-build-out-the-state-of-enterprise-ai-in-2026/</guid>
        
        <category>Enterprise AI</category>
        
        <category>AI Agents</category>
        
        <category>Agentic Workflows</category>
        
        <category>AI Adoption</category>
        
        <category>Platform Strategy</category>
        
        <category>Developer Tools</category>
        
        <category>AI Infrastructure</category>
        
        <category>Generative AI</category>
        
        <category>Enterprise Software</category>
        
        
      </item>
    
      <item>
        <title>The Three-Platform Problem in Enterprise AI</title>
        <description>&lt;p&gt;Enterprise AI has a platform problem. The tools to build AI-powered applications exist, but they’re scattered across three disconnected ecosystems—each solving part of the puzzle, none providing a complete solution.&lt;/p&gt;

&lt;p&gt;This isn’t a “too many choices” problem. It’s an architectural one. Gartner tracks these ecosystems in separate Magic Quadrants because they serve fundamentally different users with different needs. But building production AI applications requires capabilities from all three.&lt;/p&gt;

&lt;h2 id=&quot;three-ecosystems-zero-integration&quot;&gt;Three Ecosystems, Zero Integration&lt;/h2&gt;

&lt;h3 id=&quot;1-low-code-platforms-the-citizen-developer&quot;&gt;1. Low-Code Platforms (The Citizen Developer)&lt;/h3&gt;

&lt;p&gt;Platforms like Microsoft Power Apps, Mendix, and OutSystems let business users build applications quickly without writing code. They excel at UI, rapid prototyping, and workflow automation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/low-code.webp&quot; alt=&quot;Gartner Magic Quadrant for Enterprise Low-Code Application Platforms&quot; class=&quot;post-img&quot; width=&quot;900&quot; height=&quot;983&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;Gartner Magic Quadrant for Enterprise Low-Code Application Platforms&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What they do well:&lt;/strong&gt; Speed to prototype, accessibility for non-developers, business process automation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What they lack:&lt;/strong&gt; Infrastructure control, enterprise governance at scale, and the flexibility professional developers need.&lt;/p&gt;

&lt;h3 id=&quot;2-devops-platforms-the-professional-developer&quot;&gt;2. DevOps Platforms (The Professional Developer)&lt;/h3&gt;

&lt;p&gt;GitLab, Microsoft Azure DevOps, and Atlassian provide CI/CD pipelines, source control, and deployment infrastructure. They answer the “how do we ship and operate this reliably?” question.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/dev-ops.webp&quot; alt=&quot;Gartner Magic Quadrant for DevOps Platforms&quot; class=&quot;post-img&quot; width=&quot;933&quot; height=&quot;968&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;Gartner Magic Quadrant for DevOps Platforms&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What they do well:&lt;/strong&gt; Security, governance, testing, deployment automation, operational excellence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What they lack:&lt;/strong&gt; They don’t help you build faster—they help you ship what you’ve already built.&lt;/p&gt;

&lt;h3 id=&quot;3-aiml-platforms-the-ai-specialist&quot;&gt;3. AI/ML Platforms (The AI Specialist)&lt;/h3&gt;

&lt;p&gt;Cloud providers (AWS, GCP, Azure) and specialized vendors offer models, MLOps tooling, and inference infrastructure. They provide the intelligence layer.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/ai-code-assistants.webp&quot; alt=&quot;Gartner Magic Quadrant for AI Code Assistants&quot; class=&quot;post-img&quot; width=&quot;1464&quot; height=&quot;1600&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;Gartner Magic Quadrant for AI Code Assistants&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What they do well:&lt;/strong&gt; Model access, training infrastructure, inference at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What they lack:&lt;/strong&gt; An opinion on how you actually build and deploy applications around those models.&lt;/p&gt;

&lt;h2 id=&quot;the-cost-of-fragmentation&quot;&gt;The Cost of Fragmentation&lt;/h2&gt;

&lt;p&gt;When your AI strategy requires stitching together leaders from three separate ecosystems, you pay an integration tax:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflow disconnects.&lt;/strong&gt; A business user prototypes an AI workflow in a low-code tool. A developer rebuilds it from scratch to meet security requirements. The prototype and production system share nothing but a spec document.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability gaps.&lt;/strong&gt; Tracing a user request through a low-code UI, into a DevOps pipeline, through an AI model call, and back is nearly impossible without custom instrumentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Governance drift.&lt;/strong&gt; Security policies enforced in your DevOps platform don’t automatically apply to your low-code environment. Compliance becomes a manual audit.&lt;/p&gt;

&lt;p&gt;Your most capable engineers end up writing glue code instead of building products.&lt;/p&gt;

&lt;h2 id=&quot;a-different-architecture-api-first-unification&quot;&gt;A Different Architecture: API-First Unification&lt;/h2&gt;

&lt;p&gt;The solution isn’t better integrations—it’s platforms built on a different architecture.&lt;/p&gt;

&lt;p&gt;Replit offers a useful case study. They’ve grown from $10M to $100M ARR in under six months by building a platform where:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;The same infrastructure serves both citizen developers and professionals.&lt;/strong&gt; A business user building through natural language (“create a customer feedback dashboard”) and a developer writing code are using the same underlying APIs, the same deployment system, the same security model.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;AI is native, not bolted on.&lt;/strong&gt; Their Agent can build, test, and deploy complete applications autonomously—but it’s using the same environment a professional developer would use. No “export to production” step.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Governance applies universally.&lt;/strong&gt; Database access, API key management, and deployment policies are platform-level concerns. They apply whether you’re prompting an AI agent or writing TypeScript.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the “headless-first” pattern that companies like Stripe and Twilio proved out: build the API, make it excellent, then layer interfaces on top. The UI for non-developers and the API for developers are just different clients to the same system.&lt;/p&gt;

&lt;h2 id=&quot;what-this-means-for-platform-strategy&quot;&gt;What This Means for Platform Strategy&lt;/h2&gt;

&lt;p&gt;If you’re evaluating AI platforms, the question isn’t “which low-code tool, which DevOps platform, and which AI vendor?”&lt;/p&gt;

&lt;p&gt;The better question: &lt;strong&gt;Does this platform unify these concerns, or will we be writing integration code for the next three years?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Look for:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;API-first architecture.&lt;/strong&gt; Can professional developers access everything through APIs? Is the UI built on those same APIs?&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Built-in deployment and operations.&lt;/strong&gt; Does prototyping in the platform give you production-ready infrastructure, or does it give you an export button and a prayer?&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Platform-level governance.&lt;/strong&gt; Are security, compliance, and cost controls configured once and inherited everywhere, or are they per-tool?&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The platforms winning in this space aren’t the ones with the longest feature lists. They’re the ones that recognized the three-ecosystem problem and architected around it from day one.&lt;/p&gt;
</description>
        <pubDate>Sun, 07 Dec 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/12/07/the-three-platform-problem-in-enterprise-ai/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/12/07/the-three-platform-problem-in-enterprise-ai/</guid>
        
        <category>AI Platform</category>
        
        <category>Enterprise AI</category>
        
        <category>Low-Code</category>
        
        <category>DevOps</category>
        
        <category>Platform Architecture</category>
        
        <category>API-First</category>
        
        <category>Infrastructure</category>
        
        <category>Developer Tools</category>
        
        <category>Platform Strategy</category>
        
        
      </item>
    
      <item>
        <title>The Platform Convergence: Why the Future of AI SaaS is Headless-First</title>
        <description>&lt;p&gt;The AI agent market is experiencing its own big bang—but this rapid expansion is creating fundamental fragmentation. Enterprises deploying agents at scale are caught between two incomplete solutions: &lt;strong&gt;Agent Builders&lt;/strong&gt; and &lt;strong&gt;AI Gateways&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Agent Builders democratize creation through no-code interfaces. AI Gateways provide enterprise governance over costs, security, and compliance. Both are critical, but in their current separate forms, they force a false choice: &lt;strong&gt;speed or control?&lt;/strong&gt; The reality is, you need both.&lt;/p&gt;

&lt;p&gt;We’ve seen this movie before. The most successful developer platforms—Stripe, Twilio, Shopify—aren’t just slick UIs or robust infrastructure. They are &lt;strong&gt;headless-first platforms&lt;/strong&gt; that masterfully combine both.&lt;/p&gt;

&lt;h2 id=&quot;the-headless-first-model&quot;&gt;The Headless-First Model&lt;/h2&gt;

&lt;p&gt;Stripe didn’t win payments by offering a payment form. Twilio didn’t win communications by providing a dashboard. They won by providing a &lt;strong&gt;powerful, programmable foundation&lt;/strong&gt; with APIs as the primary interface. Their UIs are built on the same public APIs their customers use. Everything is composable, programmable, and extensible.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Principle&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Benefit&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;API-First Design&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Platform’s own UI uses public APIs, ensuring completeness&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Progressive Complexity&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Start with no-code UI, graduate to API without migration&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Composability&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Every capability is a building block for higher-level abstractions&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Extensibility&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Third parties build on the platform, creating ecosystem effects&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This is the blueprint for AI platforms: not just a UI for building agents, nor just a gateway for traffic—but a comprehensive, programmable platform for building, running, and governing AI at every layer.&lt;/p&gt;

&lt;h2 id=&quot;the-two-incomplete-categories&quot;&gt;The Two Incomplete Categories&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agent Builders&lt;/strong&gt; (Microsoft Copilot Studio, Google Agent Builder) empower non-technical users to create agents in minutes. The problem arises at scale: Who manages API keys? Who tracks costs? Who ensures compliance? This democratization often creates ungoverned “shadow IT”—business units spinning up agents independently, each with its own credentials and error handling. Platform teams discover the proliferation only when something breaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Gateways&lt;/strong&gt; (Kong, Apigee) solve the governance problem with centralized security, cost monitoring, and compliance. But a gateway is just plumbing—it doesn’t accelerate creation. Business users wait in IT queues while engineers build what they need. Innovation slows to a crawl.&lt;/p&gt;

&lt;p&gt;Integrating both categories creates its own &lt;strong&gt;integration tax&lt;/strong&gt;: two authentication systems, two deployment processes, broken observability across disconnected logs, and policy enforcement gaps where builder retry logic conflicts with gateway rate limits.&lt;/p&gt;

&lt;h2 id=&quot;the-platform-convergence&quot;&gt;The Platform Convergence&lt;/h2&gt;

&lt;p&gt;The solution is a unified, headless-first platform with four integrated layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: UI Layer&lt;/strong&gt; — Intuitive no-code agent builder for business users, built on top of the platform’s own APIs. Natural language definition, visual workflow design, one-click deployment with inherited governance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Runtime Layer&lt;/strong&gt; — Enterprise-grade gateway that every agent runs through automatically. Centralized auth (OAuth, OIDC, SAML), real-time policy enforcement, distributed tracing, cost tracking, anomaly detection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Platform Layer&lt;/strong&gt; — Comprehensive APIs and SDKs for developers. REST/GraphQL endpoints, language-specific SDKs, agent lifecycle management, webhook system for event-driven architectures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 4: Ecosystem Layer&lt;/strong&gt; — Marketplace for discovering and sharing agents, tools, and integrations. Internal registry, reusable components, version control, usage analytics.&lt;/p&gt;

&lt;h2 id=&quot;speed-and-control&quot;&gt;Speed AND Control&lt;/h2&gt;

&lt;p&gt;The difference between fragmented and unified approaches:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Capability&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Fragmented Tools&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Unified Platform&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Agent Creation&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Separate builder&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Integrated no-code + API/SDK&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Separate gateway&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Built-in gateway with inherited policies&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Disconnected logs&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;End-to-end unified tracing&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Policy Management&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Manual coordination&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Single policy engine&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Developer Experience&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;High friction&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Single, cohesive API surface&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Audit &amp;amp; Compliance&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Cross-system correlation&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Native audit trails&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;With a unified platform: business user creates agent in UI → platform applies policies automatically → agent deploys with full observability → platform team monitors centrally → developer extends via API without migration.&lt;/p&gt;

&lt;h2 id=&quot;what-this-unlocks&quot;&gt;What This Unlocks&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Self-Service AI:&lt;/strong&gt; HR builds a resume screening agent in 20 minutes. It inherits security policies automatically. Cost allocates to HR’s budget. Compliance trail generates without extra work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI-Powered Products:&lt;/strong&gt; Engineers embed agent capabilities into customer-facing apps using platform APIs. Multi-tenant isolation, usage-based billing, and governance come built-in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Internal Marketplace:&lt;/strong&gt; Marketing’s “competitive intelligence” agent gets discovered by Sales. One-click deployment. Usage metrics show ROI across the organization.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The debate over agent builder vs. AI gateway is a red herring—a false choice leading to fragmented, expensive solutions. The real question: point solution or true platform?&lt;/p&gt;

&lt;p&gt;In payments, Stripe won by unifying developer APIs with merchant tools. In communications, Twilio won by combining carrier control with developer speed. The AI platform market is at the same inflection point.&lt;/p&gt;

&lt;p&gt;The future isn’t about stitching tools together; it’s about building on a unified, programmable foundation. The organizations that invest in platform-first infrastructure—rather than cobbling together point solutions—will move faster, govern more effectively, and build more sophisticated agentic systems.&lt;/p&gt;

&lt;p&gt;The convergence is coming. The question is whether you’ll be ahead of it or behind it.&lt;/p&gt;
</description>
        <pubDate>Tue, 02 Dec 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/12/02/the-platform-convergence-why-the-future-of-ai-saas-is-headless-first/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/12/02/the-platform-convergence-why-the-future-of-ai-saas-is-headless-first/</guid>
        
        <category>AI Platform</category>
        
        <category>Agentic AI</category>
        
        <category>Enterprise AI</category>
        
        <category>AI Gateway</category>
        
        <category>Agent Builder</category>
        
        <category>Developer Tools</category>
        
        <category>Infrastructure</category>
        
        <category>Platform Architecture</category>
        
        <category>Headless Architecture</category>
        
        <category>AI SaaS</category>
        
        
      </item>
    
      <item>
        <title>MCP Enterprise Readiness: How the 2025-11-25 Spec Closes the Production Gap</title>
        <description>&lt;p&gt;Just over a week ago, the Model Context Protocol celebrated its first anniversary with the release of the 2025-11-25 specification [1]. The announcement was rightly triumphant—MCP has evolved from an experimental open-source project to a foundational standard backed by GitHub, OpenAI, Microsoft, and Block, with thousands of active servers in production [1]. For readers comparing the protocol to Anthropic’s procedural customization layer, I cover &lt;a href=&quot;/2025/10/30/claude-skills-vs-mcp-a-tale-of-two-ai-customization-philosophies/&quot;&gt;Claude Skills vs MCP&lt;/a&gt; separately.&lt;/p&gt;

&lt;p&gt;But beneath the celebration lies a more interesting story: this spec release is not just an evolution; it’s a strategic pivot toward enterprise readiness. For the past year, MCP has succeeded as a developer tool—a convenient way to connect AI models to data and capabilities during experimentation. The 2025-11-25 spec is different. It introduces features explicitly designed to solve the operational, security, and governance challenges that prevent organizations from deploying agent-tool ecosystems at enterprise scale.&lt;/p&gt;

&lt;p&gt;This article examines three key features from the new spec and analyzes how they close what I call the &lt;strong&gt;“production gap”&lt;/strong&gt;—the distance between experimental agent prototypes and enterprise-grade agentic infrastructure.&lt;/p&gt;

&lt;h2 id=&quot;the-production-gap-why-experimental-agents-dont-scale&quot;&gt;The Production Gap: Why Experimental Agents Don’t Scale&lt;/h2&gt;

&lt;p&gt;Before diving into the technical features, we need to understand the problem they’re solving. Organizations have been experimenting with MCP-powered agents for months, often with impressive results in controlled environments. Yet most of these projects remain trapped in pilot purgatory, unable to progress to production deployments. The barriers are not technical whimsy; they are fundamental operational requirements:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Requirement&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Why It Matters&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;What’s Been Missing&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Asynchronous Operations&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Real-world tasks like report generation, data analysis, and workflow automation can take minutes or hours, not milliseconds.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;MCP connections are synchronous. Long-running tasks force clients to hold connections open or build custom polling systems.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Enterprise Authentication&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Organizations need centralized control over which users, agents, and services can access sensitive tools and data.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The original OAuth flow assumed a consumer app model. It lacked support for machine-to-machine auth and didn’t integrate with enterprise Identity Providers.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Extensibility&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Different industries and use cases require custom capabilities without fragmenting the core protocol.&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;There was no formal mechanism to standardize extensions, leading to proprietary, incompatible implementations.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;These aren’t edge cases; they are the table stakes for production systems. The 2025-11-25 spec directly addresses each one.&lt;/p&gt;

&lt;h2 id=&quot;feature-1-asynchronous-tasks--making-long-running-workflows-production-ready&quot;&gt;Feature 1: Asynchronous Tasks — Making Long-Running Workflows Production-Ready&lt;/h2&gt;

&lt;p&gt;Perhaps the most transformative addition is the new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Tasks&lt;/code&gt; primitive [2]. While still marked as experimental, it fundamentally changes how agents interact with MCP servers for long-running operations.&lt;/p&gt;

&lt;h3 id=&quot;the-problem-synchronous-request-response-doesnt-match-real-work&quot;&gt;The Problem: Synchronous Request-Response Doesn’t Match Real Work&lt;/h3&gt;

&lt;p&gt;Traditional MCP follows the classic RPC pattern: the client sends a request, the server processes it, and the server returns a response—all within a single connection. This works beautifully for quick operations like reading a database row or checking a weather API. But it breaks down for realistic enterprise workflows:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Data Analytics Agent:&lt;/strong&gt; “Generate a quarterly financial report by analyzing three years of transaction data” → 15 minutes of processing.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Compliance Agent:&lt;/strong&gt; “Scan all customer contracts for non-standard clauses” → 2 hours across 10,000 documents.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;DevOps Agent:&lt;/strong&gt; “Deploy this service to production and run integration tests” → 30 minutes with orchestration dependencies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Organizations have been forced to build custom workarounds: job queues, polling systems, callback webhooks—all non-standard, all increasing complexity and reducing interoperability.&lt;/p&gt;

&lt;h3 id=&quot;the-solution-a-unified-async-model&quot;&gt;The Solution: A Unified Async Model&lt;/h3&gt;

&lt;p&gt;The new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Tasks&lt;/code&gt; feature introduces a standard “call-now, fetch-later” pattern:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The client sends a request to an MCP server with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;task&lt;/code&gt; hint.&lt;/li&gt;
  &lt;li&gt;The server immediately acknowledges the request and returns a unique &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;taskId&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;The client periodically checks the task status (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;working&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;completed&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;failed&lt;/code&gt;) using standard Task operations.&lt;/li&gt;
  &lt;li&gt;When complete, the client retrieves the final result using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;taskId&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is more than syntactic sugar. It provides a &lt;strong&gt;uniform abstraction for asynchronous work&lt;/strong&gt; across the entire MCP ecosystem. An agent framework doesn’t need to know whether it’s calling a data pipeline, a deployment system, or a document processor—the async pattern is the same.&lt;/p&gt;

&lt;h3 id=&quot;enterprise-impact-agents-that-dont-block&quot;&gt;Enterprise Impact: Agents That Don’t Block&lt;/h3&gt;

&lt;p&gt;In production environments, this changes everything. An AI assistant orchestrating a complex workflow can:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Kick off multiple long-running tasks in parallel (e.g., “analyze sales data,” “generate customer insights,” “create visualizations”).&lt;/li&gt;
  &lt;li&gt;Continue planning and reasoning while tasks are in progress.&lt;/li&gt;
  &lt;li&gt;Provide real-time status updates to users without blocking.&lt;/li&gt;
  &lt;li&gt;Handle failures gracefully with retries and fallback strategies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how real autonomous agents operate. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Tasks&lt;/code&gt; primitive makes it possible within a standard, interoperable protocol.&lt;/p&gt;

&lt;h2 id=&quot;feature-2-enterprise-grade-oauth-with-cimd-and-extensions&quot;&gt;Feature 2: Enterprise-Grade OAuth with CIMD and Extensions&lt;/h2&gt;

&lt;p&gt;The original MCP spec included OAuth 2.0 support, but it was modeled on consumer app patterns (think “Log in with GitHub”). That model doesn’t work for enterprise use cases, where organizations need centralized identity management, audit trails, and policy-based access control. The 2025-11-25 spec introduces two critical updates to close this gap.&lt;/p&gt;

&lt;h3 id=&quot;cimd-decentralized-trust-without-dynamic-client-registration&quot;&gt;CIMD: Decentralized Trust Without Dynamic Client Registration&lt;/h3&gt;

&lt;p&gt;The first change is replacing &lt;strong&gt;Dynamic Client Registration (DCR)&lt;/strong&gt; with &lt;strong&gt;Client ID Metadata Documents (CIMD)&lt;/strong&gt; [3]. In the old model, every MCP client had to register with every authorization server it wanted to use—a scalability nightmare in federated enterprise environments.&lt;/p&gt;

&lt;p&gt;With CIMD, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;client_id&lt;/code&gt; is now a URL that the client controls (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://agents.mycompany.com/sales-assistant&lt;/code&gt;). When an authorization server needs information about this client, it fetches a JSON metadata document from that URL. This document includes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Client name and description&lt;/li&gt;
  &lt;li&gt;Valid redirect URIs&lt;/li&gt;
  &lt;li&gt;Supported grant types&lt;/li&gt;
  &lt;li&gt;Public keys for token verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach creates a &lt;strong&gt;decentralized trust model&lt;/strong&gt; anchored in DNS and HTTPS. The authorization server doesn’t need a pre-existing relationship with the client; it trusts the metadata published at the URL. For large organizations with dozens of agent applications and multiple MCP providers, this dramatically reduces operational overhead.&lt;/p&gt;

&lt;h3 id=&quot;extension-1-machine-to-machine-oauth-sep-1046&quot;&gt;Extension 1: Machine-to-Machine OAuth (SEP-1046)&lt;/h3&gt;

&lt;p&gt;The second critical addition is support for the OAuth 2.0 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;client_credentials&lt;/code&gt; flow via the M2M OAuth extension. This enables &lt;strong&gt;machine-to-machine authentication&lt;/strong&gt;—allowing agents and services to authenticate directly with MCP servers without a human user in the loop.&lt;/p&gt;

&lt;p&gt;Why does this matter? Consider these enterprise scenarios:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Scheduled Agent Jobs:&lt;/strong&gt; A nightly data ingestion agent that pulls information from multiple MCP sources to update a data warehouse.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Service-to-Service Communication:&lt;/strong&gt; A monitoring agent that periodically checks the health of deployed systems by querying infrastructure management tools.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Headless Automation:&lt;/strong&gt; An agent that processes incoming support tickets and takes automated actions based on predefined rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these involve an interactive user. They are autonomous services that need persistent, secure credentials to access tools on behalf of the organization. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;client_credentials&lt;/code&gt; flow is the standard OAuth mechanism for exactly this use case, and its inclusion in MCP makes headless agentic systems viable.&lt;/p&gt;

&lt;h3 id=&quot;extension-2-cross-app-access-xaa-sep-990&quot;&gt;Extension 2: Cross App Access (XAA) (SEP-990)&lt;/h3&gt;

&lt;p&gt;Perhaps the most strategically significant feature for large enterprises is the &lt;strong&gt;Cross App Access (XAA)&lt;/strong&gt; extension. This solves a governance problem that has plagued the consumerization of enterprise AI: &lt;strong&gt;uncontrolled tool sprawl&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the standard OAuth flow, a user grants consent directly to an AI application to access a tool. The enterprise Identity Provider (IdP) sees only that “Alice logged in to the AI app,” not that “Alice’s AI agent is now accessing the payroll system.” This creates a governance black hole.&lt;/p&gt;

&lt;p&gt;XAA changes the authorization flow to insert the enterprise IdP as a central policy enforcement point. Now, when an agent attempts to access an MCP server:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The agent requests authorization from the enterprise IdP.&lt;/li&gt;
  &lt;li&gt;The IdP evaluates organizational policies: Is this agent approved for production use? Does Alice have permission to delegate payroll access to this agent? Is this access compliant with our data governance policies?&lt;/li&gt;
  &lt;li&gt;Only if all policies are satisfied does the IdP issue tokens to the agent.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This provides &lt;strong&gt;centralized visibility and control&lt;/strong&gt; over the entire agent-tool ecosystem. Security teams can monitor which agents are accessing which tools, set organization-wide policies (e.g., “no agents can access PII without human review”), and audit all delegated access. It eliminates shadow AI and provides the compliance story that regulated industries demand.&lt;/p&gt;

&lt;h3 id=&quot;enterprise-impact-from-shadow-ai-to-governed-infrastructure&quot;&gt;Enterprise Impact: From Shadow AI to Governed Infrastructure&lt;/h3&gt;

&lt;p&gt;Together, these OAuth enhancements transform MCP from a developer convenience into a &lt;strong&gt;governed, auditable integration layer&lt;/strong&gt;. Organizations can:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Enforce Identity Standards:&lt;/strong&gt; All agents authenticate using the corporate IdP, with the same rigor as human employees.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Enable Zero-Trust Architecture:&lt;/strong&gt; Every tool access is explicitly authorized based on policy, not implicit trust.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Provide Audit Trails:&lt;/strong&gt; Every delegation, token issuance, and access event is logged for compliance and forensic analysis.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Scale Securely:&lt;/strong&gt; Decentralized trust via CIMD means new agents and tools can be onboarded without central bottlenecks, while XAA ensures control is never lost.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;feature-3-formal-extensions-framework--enabling-innovation-without-fragmentation&quot;&gt;Feature 3: Formal Extensions Framework — Enabling Innovation Without Fragmentation&lt;/h2&gt;

&lt;p&gt;The third major addition is the introduction of a &lt;strong&gt;formal Extensions framework&lt;/strong&gt; [3]. This is a governance mechanism for the protocol itself, allowing the community to develop new capabilities without fragmenting the ecosystem.&lt;/p&gt;

&lt;h3 id=&quot;the-innovation-standardization-tension&quot;&gt;The Innovation-Standardization Tension&lt;/h3&gt;

&lt;p&gt;Every successful protocol faces this dilemma: enable innovation fast enough to keep up with evolving use cases, but standardize carefully enough to maintain interoperability. Move too slowly, and the community builds proprietary extensions that fragment the ecosystem. Move too quickly, and the core protocol becomes bloated with niche features that most implementations don’t need.&lt;/p&gt;

&lt;p&gt;MCP’s solution is a structured extension process. New capabilities are proposed as &lt;strong&gt;Specification Enhancement Proposals (SEPs)&lt;/strong&gt;, which undergo community review and can be adopted incrementally. Extensions are namespaced and clearly marked, so implementations can selectively support them without breaking compatibility.&lt;/p&gt;

&lt;h3 id=&quot;enterprise-impact-customization-without-vendor-lock-in&quot;&gt;Enterprise Impact: Customization Without Vendor Lock-In&lt;/h3&gt;

&lt;p&gt;For enterprises, this is critical. Different industries have unique requirements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Healthcare:&lt;/strong&gt; Extensions for HIPAA-compliant audit logging and patient consent management.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Financial Services:&lt;/strong&gt; Extensions for transaction integrity, regulatory reporting, and fraud detection hooks.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Manufacturing:&lt;/strong&gt; Extensions for real-time sensor data streaming and factory floor integrations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The formal extensions framework allows organizations to develop these capabilities as standard, interoperable extensions rather than proprietary forks. This preserves the core value proposition of MCP—a universal protocol for agent-tool communication—while enabling the customization required for production use.&lt;/p&gt;

&lt;h2 id=&quot;the-multiplier-effect-sampling-with-tools-sep-1577&quot;&gt;The Multiplier Effect: Sampling with Tools (SEP-1577)&lt;/h2&gt;

&lt;p&gt;One more feature deserves mention: &lt;strong&gt;Sampling with Tools&lt;/strong&gt; [3]. This allows MCP servers themselves to act as agentic systems, capable of multi-step reasoning and tool use. A server can now request the client to invoke an LLM on its behalf, enabling server-side agents.&lt;/p&gt;

&lt;p&gt;Why is this powerful? It enables &lt;strong&gt;compositional agent architectures&lt;/strong&gt;. A high-level agent can delegate to specialized MCP servers, which themselves use agentic reasoning to fulfill complex requests. For example:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A “Financial Analysis Agent” delegates to an “ERP Data Server,” which uses its own reasoning to determine which tables to query, how to join data, and how to format results.&lt;/li&gt;
  &lt;li&gt;A “Compliance Agent” delegates to a “Legal Document Server,” which autonomously searches case law, extracts relevant clauses, and generates a summary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This nested, hierarchical approach is how real autonomous systems will scale. By making it a standard protocol feature rather than a custom implementation, MCP provides the foundation for a rich ecosystem of specialized, composable agents.&lt;/p&gt;

&lt;h2 id=&quot;closing-the-production-gap-a-new-maturity-threshold&quot;&gt;Closing the Production Gap: A New Maturity Threshold&lt;/h2&gt;

&lt;p&gt;The 2025-11-25 MCP specification is not a radical redesign; it’s a targeted set of enhancements that directly address the barriers preventing enterprise adoption. By introducing:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Asynchronous Tasks&lt;/strong&gt; for long-running workflows,&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Enterprise OAuth with CIMD, M2M, and XAA&lt;/strong&gt; for governed, auditable authentication,&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Formal Extensions&lt;/strong&gt; for standardized innovation,&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Sampling with Tools&lt;/strong&gt; for compositional agent architectures,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the spec closes the production gap—the distance between experimental prototypes and scalable, secure, enterprise-grade systems.&lt;/p&gt;

&lt;p&gt;This is the moment when MCP transitions from a promising developer tool to a foundational piece of enterprise infrastructure. Organizations that have been waiting for “production readiness” signals now have them. The features are there. The governance mechanisms are there. The security model is there.&lt;/p&gt;

&lt;p&gt;The next phase of agentic AI will be defined not by flashy demos, but by the quiet, reliable, at-scale operation of autonomous systems integrated deeply into enterprise workflows. The 2025-11-25 MCP spec is the technical foundation that makes this future possible.&lt;/p&gt;

&lt;p&gt;For technology leaders evaluating whether to invest in MCP-based infrastructure, the calculus has changed. This is no longer an experimental protocol; it’s a production standard. The organizations that adopt it now, build their agent ecosystems on it, and contribute to its continued evolution will define the next decade of enterprise AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://blog.modelcontextprotocol.io/posts/2025-11-25-first-mcp-anniversary/&quot;&gt;MCP Core Maintainers. (2025, November 25). &lt;em&gt;One Year of MCP: November 2025 Spec Release&lt;/em&gt;. Model Context Protocol.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://modelcontextprotocol.io/specification/2025-11-25/basic/utilities/tasks&quot;&gt;Model Context Protocol. (2025, November 25). &lt;em&gt;Tasks&lt;/em&gt;. Model Context Protocol Specification.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://workos.com/blog/mcp-2025-11-25-spec-update&quot;&gt;Pakiti, Maria. (2025, November 26). &lt;em&gt;MCP 2025-11-25 is here: async Tasks, better OAuth, extensions, and a smoother agentic future&lt;/em&gt;. WorkOS Blog.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://subramanya.ai/2025/11/20/the-governance-stack-operationalizing-ai-agent-governance-at-enterprise-scale/&quot;&gt;Subramanya, N. (2025, November 20). &lt;em&gt;The Governance Stack: Operationalizing AI Agent Governance at Enterprise Scale&lt;/em&gt;. subramanya.ai.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://subramanya.ai/2025/11/17/why-private-registries-are-the-future-of-enterprise-agentic-infrastructure/&quot;&gt;Subramanya, N. (2025, November 17). &lt;em&gt;Why Private Registries are the Future of Enterprise Agentic Infrastructure&lt;/em&gt;. subramanya.ai.&lt;/a&gt;&lt;/p&gt;

</description>
        <pubDate>Mon, 01 Dec 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/12/01/mcp-enterprise-readiness-how-the-2025-11-25-spec-closes-the-production-gap/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/12/01/mcp-enterprise-readiness-how-the-2025-11-25-spec-closes-the-production-gap/</guid>
        
        <category>MCP</category>
        
        <category>Enterprise AI</category>
        
        <category>Agentic AI</category>
        
        <category>Security</category>
        
        <category>OAuth</category>
        
        <category>Authentication</category>
        
        <category>Infrastructure</category>
        
        <category>Agent Ops</category>
        
        <category>Governance</category>
        
        <category>Enterprise Integration</category>
        
        
      </item>
    
      <item>
        <title>The Governance Stack: Operationalizing AI Agent Governance at Enterprise Scale</title>
        <description>&lt;p&gt;Enterprise adoption of AI agents has reached a tipping point. According to McKinsey’s 2025 global survey, &lt;strong&gt;88% of organizations now report regular use of AI agents&lt;/strong&gt; in at least one business function, with &lt;strong&gt;62% actively experimenting with agentic systems&lt;/strong&gt; [1]. Yet this rapid adoption has created a critical disconnect: while organizations understand the &lt;em&gt;importance&lt;/em&gt; of governance, they struggle with the &lt;em&gt;implementation&lt;/em&gt; of it. The same survey reveals that &lt;strong&gt;40% of technology executives believe their current governance programs are insufficient&lt;/strong&gt; for the scale and complexity of their agentic workforce [1, 2].&lt;/p&gt;

&lt;p&gt;The problem is not a lack of frameworks. Numerous organizations have published comprehensive governance principles—from Databricks’ AI Governance Framework to the EU AI Act’s regulatory requirements [2]. The problem is that governance has remained largely conceptual, living in policy documents and compliance checklists rather than in the operational infrastructure where agents actually execute.&lt;/p&gt;

&lt;p&gt;This article presents the technical foundation required to operationalize governance at scale: the &lt;strong&gt;Governance Stack&lt;/strong&gt;. This is the integrated set of platforms, protocols, and enforcement mechanisms that transform governance from aspiration into automated reality across the entire agentic workforce lifecycle.&lt;/p&gt;

&lt;h2 id=&quot;the-governance-gap-from-principle-to-practice&quot;&gt;The Governance Gap: From Principle to Practice&lt;/h2&gt;

&lt;p&gt;Traditional enterprise governance models were designed for static systems and predictable workflows. An application goes through a review process, gets deployed, and then operates within well-defined boundaries. Governance checkpoints are discrete events: code reviews, security scans, compliance audits.&lt;/p&gt;

&lt;p&gt;Agentic AI shatters this model. Agents are dynamic, adaptive systems that make autonomous decisions, spawn sub-agents, and interact with constantly evolving toolsets. They don’t follow predetermined paths; they reason, plan, and execute based on context. As one industry analysis puts it, the governance question shifts from “did the code do what we programmed?” to “did the agent make the right decision given the circumstances?” [3].&lt;/p&gt;

&lt;p&gt;This creates four fundamental challenges that traditional governance infrastructure cannot address:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Challenge&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Traditional Governance&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Agentic Reality&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Decision-Making&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Predetermined logic paths, testable and auditable&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Context-dependent reasoning, emergent behavior&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Delegation&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Single service boundary, clear ownership&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Recursive agent chains, distributed responsibility&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Policy Enforcement&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Deployment-time checks, periodic audits&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Real-time enforcement at the moment of action&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Auditability&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Static code and logs&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Dynamic decision traces across multiple agents and tools&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The governance gap is the distance between what existing frameworks prescribe and what existing infrastructure can enforce. Closing this gap requires purpose-built technology.&lt;/p&gt;

&lt;h2 id=&quot;the-five-layers-of-the-governance-stack&quot;&gt;The Five Layers of the Governance Stack&lt;/h2&gt;

&lt;p&gt;Drawing on the foundational pillars outlined in frameworks like Databricks’ AI Governance model [2], we can define a technical architecture—a &lt;strong&gt;Governance Stack&lt;/strong&gt;—that provides the infrastructure necessary to operationalize these principles. This stack has five integrated layers, each addressing a specific aspect of agent lifecycle management.&lt;/p&gt;

&lt;h3 id=&quot;layer-1-identity-and-attestation-foundation&quot;&gt;Layer 1: Identity and Attestation Foundation&lt;/h3&gt;

&lt;p&gt;Before governance can be enforced, we must know &lt;strong&gt;who&lt;/strong&gt; (or &lt;strong&gt;what&lt;/strong&gt;) is making a request. This requires a robust identity layer specifically designed for autonomous agents, not just human users.&lt;/p&gt;

&lt;p&gt;As discussed in previous work on OIDC-A (OpenID Connect for Agents), this layer provides [4]:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Verifiable Agent Identities:&lt;/strong&gt; Every agent receives a cryptographically verifiable identity, issued by a trusted authority (the AI provider or enterprise identity system).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Delegation Chains:&lt;/strong&gt; Clear, auditable records of which user or system authorized the agent, and what permissions were delegated.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Attestation Mechanisms:&lt;/strong&gt; Proof that the agent is running the expected code, on approved infrastructure, with the intended configuration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This identity foundation is the prerequisite for all subsequent layers. Without it, governance policies have no subject to act upon.&lt;/p&gt;

&lt;h3 id=&quot;layer-2-agent-and-tool-registries&quot;&gt;Layer 2: Agent and Tool Registries&lt;/h3&gt;

&lt;p&gt;Governance requires visibility. The second layer of the stack is a comprehensive registry system that provides a single source of truth for:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Agent Registry:&lt;/strong&gt; A catalog of every agent deployed in the enterprise, including its capabilities, business owner, data access, and lifecycle status [5]. This is not just a static directory; it’s a dynamic system that tracks agent versions, configurations, and runtime behavior.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;MCP/Tool Registry:&lt;/strong&gt; A curated, approved set of tools and MCP servers that agents are authorized to access. This registry enforces pre-deployment security reviews, manages versions, tracks usage, and provides cost visibility [5].&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As explored in our previous article on private registries, this layer transforms governance from a manual audit process into an automated, enforceable function of the infrastructure itself [5]. Agents that aren’t registered can’t deploy. Tools that haven’t been vetted can’t be accessed.&lt;/p&gt;

&lt;h3 id=&quot;layer-3-policy-engine-and-gateway&quot;&gt;Layer 3: Policy Engine and Gateway&lt;/h3&gt;

&lt;p&gt;The third layer is where governance rules are codified and enforced in real-time. This includes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Firewalls and MCP Gateways:&lt;/strong&gt; Acting as intermediaries between agents and their tools, these gateways inspect every request, enforce security policies, and block unauthorized actions before they occur [6]. They provide:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Prompt injection detection and filtering&lt;/li&gt;
  &lt;li&gt;Real-time policy evaluation (e.g., “can this agent access PII?”)&lt;/li&gt;
  &lt;li&gt;Dynamic rate limiting and cost controls&lt;/li&gt;
  &lt;li&gt;Anomaly detection for suspicious behavior patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Automated Policy Enforcement:&lt;/strong&gt; Instead of relying on manual reviews, the policy engine automatically validates agents against organizational standards at every lifecycle stage. For example, an agent cannot be promoted to production without:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;A completed data classification assessment&lt;/li&gt;
  &lt;li&gt;Approval from the designated business owner&lt;/li&gt;
  &lt;li&gt;A passed security scan&lt;/li&gt;
  &lt;li&gt;Documented human oversight procedures for high-stakes decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This layer is the operational heart of the governance stack. It is where abstract policies become concrete actions that prevent harm in real-time.&lt;/p&gt;

&lt;h3 id=&quot;layer-4-observability-and-monitoring-platform&quot;&gt;Layer 4: Observability and Monitoring Platform&lt;/h3&gt;

&lt;p&gt;Governance is not a one-time gate; it requires continuous oversight. The fourth layer provides real-time visibility into the behavior of the entire agentic workforce:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Performance Dashboards:&lt;/strong&gt; Track accuracy, decision quality, latency, and resource consumption across all agents.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Drift Detection:&lt;/strong&gt; Monitor agents for behavioral changes that might indicate model degradation, prompt injection, or unauthorized modifications.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Audit Trails:&lt;/strong&gt; Capture every agent action, tool invocation, and delegation event with sufficient context to enable forensic analysis and compliance reporting [3].&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Anomaly Alerting:&lt;/strong&gt; Trigger automated responses when agents deviate from expected patterns, such as accessing unusual data sources or making an abnormal volume of API calls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This layer transforms governance from reactive (responding to incidents after they occur) to proactive (detecting and preventing issues before they cause harm).&lt;/p&gt;

&lt;h3 id=&quot;layer-5-human-in-the-loop-orchestration&quot;&gt;Layer 5: Human-in-the-Loop Orchestration&lt;/h3&gt;

&lt;p&gt;The final layer recognizes that not all decisions can or should be fully automated. For high-stakes scenarios, governance requires explicit human oversight:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Escalation Workflows:&lt;/strong&gt; Agents can request human approval before executing sensitive actions, such as modifying production systems or processing large financial transactions.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Override Mechanisms:&lt;/strong&gt; Authorized personnel can intervene to pause, redirect, or terminate agent operations when necessary.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Explainability Interfaces:&lt;/strong&gt; When agents make consequential decisions, stakeholders need to understand the reasoning. This layer provides tools to inspect the decision chain, view the data that influenced the agent, and audit the tool usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not about replacing human judgment; it’s about augmenting it with the right information at the right time.&lt;/p&gt;

&lt;h2 id=&quot;operationalizing-the-framework-governance-across-the-agent-lifecycle&quot;&gt;Operationalizing the Framework: Governance Across the Agent Lifecycle&lt;/h2&gt;

&lt;p&gt;The power of the Governance Stack becomes clear when we map it to the complete agent lifecycle. Governance is not a single checkpoint; it is a continuous process embedded at every stage.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Lifecycle Stage&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Governance Stack in Action&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Planning &amp;amp; Design&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Identity layer establishes agent ownership. Policy engine validates business case against organizational risk appetite.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Data Preparation&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Registries enforce data classification and lineage tracking. Policy engine blocks access to non-compliant datasets.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Development &amp;amp; Training&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Observability platform tracks experiments and model performance. Registries version all agent configurations.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Testing &amp;amp; Validation&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Agent firewall tests for adversarial inputs and prompt injections. Policy engine validates against security and ethical standards.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Gateway enforces real-time authorization for all tool access. Observability platform begins continuous monitoring.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Operations&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Monitoring platform detects drift and anomalies. Human-in-the-loop mechanisms escalate high-stakes decisions.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Retirement&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Registries archive agent configurations. Identity layer revokes all permissions. Audit trails are retained for compliance.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This lifecycle-aware approach ensures that governance is not an afterthought, but an integrated function of how agents are built, deployed, and managed.&lt;/p&gt;

&lt;h2 id=&quot;the-roi-of-governance-infrastructure&quot;&gt;The ROI of Governance Infrastructure&lt;/h2&gt;

&lt;p&gt;Implementing a comprehensive Governance Stack is a significant investment. Organizations rightfully ask: what is the return?&lt;/p&gt;

&lt;p&gt;The answer lies in four measurable outcomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Risk Mitigation:&lt;/strong&gt; As demonstrated by the recent AI-orchestrated cyber espionage campaign disrupted by Anthropic [6], uncontrolled agent access to powerful tools is not a theoretical threat. A governance stack with identity attestation, gateways, and real-time policy enforcement would have prevented that attack at multiple layers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regulatory Compliance:&lt;/strong&gt; With regulations like the EU AI Act imposing strict requirements on high-risk AI systems, the ability to demonstrate comprehensive lifecycle governance, auditability, and human oversight is not optional—it’s mandatory [2]. The Governance Stack provides the automated evidence generation required for compliance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational Efficiency:&lt;/strong&gt; Without centralized registries and monitoring, organizations waste time debugging agent failures, tracking down tool dependencies, and investigating cost overruns. The stack provides the visibility and control to operate an agentic workforce at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trust and Adoption:&lt;/strong&gt; The ultimate ROI is internal and external trust. Employees, customers, and regulators need confidence that autonomous agents are operating safely, ethically, and in alignment with organizational values. The Governance Stack makes that confidence possible.&lt;/p&gt;

&lt;h2 id=&quot;building-vs-buying-the-emerging-vendor-landscape&quot;&gt;Building vs. Buying: The Emerging Vendor Landscape&lt;/h2&gt;

&lt;p&gt;Organizations face a critical decision: build this governance infrastructure in-house or adopt emerging platforms that provide it as a service. Early movers are choosing different paths:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Enterprise Platforms:&lt;/strong&gt; Companies like Collibra, Databricks, and TrueFoundry are extending their data governance and MLOps platforms to include agent registries and observability tools [2, 5, 7].&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Purpose-Built Solutions:&lt;/strong&gt; Startups like Agentic Trust are building end-to-end governance platforms specifically designed for agentic AI, providing integrated registries, gateways, and policy engines [5].&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Protocol-Level Standards:&lt;/strong&gt; Open standards like OIDC-A and MCP are enabling interoperability, allowing organizations to build custom stacks from best-of-breed components [4].&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The optimal path depends on organizational maturity, existing infrastructure, and the scale of agentic deployment. However, the underlying message is universal: governance at scale requires dedicated infrastructure.&lt;/p&gt;

&lt;h2 id=&quot;conclusion-governance-as-the-enabler-of-scale&quot;&gt;Conclusion: Governance as the Enabler of Scale&lt;/h2&gt;

&lt;p&gt;The era of experimental agentic AI pilots is ending. Organizations are now operationalizing agentic workforces across critical business functions, and the governance gap is the primary barrier to scaling these deployments safely and responsibly.&lt;/p&gt;

&lt;p&gt;The Governance Stack is not a constraint on innovation; it is the foundation that makes innovation sustainable. By providing identity, visibility, policy enforcement, continuous monitoring, and human oversight, this technical infrastructure transforms governance from a compliance burden into a strategic enabler.&lt;/p&gt;

&lt;p&gt;The organizations that invest in this stack today will be the ones that confidently deploy autonomous agents at enterprise scale tomorrow. They will move faster, operate more safely, and earn the trust of stakeholders who demand accountability in the age of autonomous AI.&lt;/p&gt;

&lt;p&gt;For technology leaders navigating this landscape, the path is clear: governance is not a policy problem—it is an engineering challenge. And like all engineering challenges, it requires purpose-built infrastructure to solve. The Governance Stack is that infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai&quot;&gt;McKinsey &amp;amp; Company. (2025, November 5). &lt;em&gt;The State of AI in 2025: A global survey&lt;/em&gt;. McKinsey.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://www.databricks.com/blog/introducing-databricks-ai-governance-framework&quot;&gt;Databricks. (2025, July 1). &lt;em&gt;Introducing the Databricks AI Governance Framework&lt;/em&gt;. Databricks.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://dzone.com/articles/llmops-privacy-data-governance-best-practices&quot;&gt;DZone. (2025, May 21). &lt;em&gt;Securing the Future: Best Practices for Privacy and Data Governance in LLMOps&lt;/em&gt;. DZone.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://subramanya.ai/2025/04/28/oidc-a-proposal/&quot;&gt;Subramanya, N. (2025, April 28). &lt;em&gt;OpenID Connect for Agents (OIDC-A) 1.0 Proposal&lt;/em&gt;. subramanya.ai.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://subramanya.ai/2025/11/17/why-private-registries-are-the-future-of-enterprise-agentic-infrastructure/&quot;&gt;Subramanya, N. (2025, November 17). &lt;em&gt;Why Private Registries are the Future of Enterprise Agentic Infrastructure&lt;/em&gt;. subramanya.ai.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[6] &lt;a href=&quot;https://subramanya.ai/2025/11/14/from-espionage-to-identity-securing-the-future-of-agentic-ai/&quot;&gt;Subramanya, N. (2025, November 14). &lt;em&gt;From Espionage to Identity: Securing the Future of Agentic AI&lt;/em&gt;. subramanya.ai.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[7] &lt;a href=&quot;https://www.truefoundry.com/blog/ai-agent-registry&quot;&gt;TrueFoundry. (2025, September 10). &lt;em&gt;What is AI Agent Registry&lt;/em&gt;. TrueFoundry.&lt;/a&gt;&lt;/p&gt;

</description>
        <pubDate>Thu, 20 Nov 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/11/20/the-governance-stack-operationalizing-ai-agent-governance-at-enterprise-scale/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/11/20/the-governance-stack-operationalizing-ai-agent-governance-at-enterprise-scale/</guid>
        
        <category>AI</category>
        
        <category>Agents</category>
        
        <category>Agentic AI</category>
        
        <category>Governance</category>
        
        <category>Enterprise AI</category>
        
        <category>Agent Ops</category>
        
        <category>MCP</category>
        
        <category>Security</category>
        
        <category>Infrastructure</category>
        
        <category>Compliance</category>
        
        <category>AI Management</category>
        
        
      </item>
    
      <item>
        <title>Why Private Registries are the Future of Enterprise Agentic Infrastructure</title>
        <description>&lt;p&gt;The age of agentic AI is no longer on the horizon; it’s in our datacenters, cloud environments, and business units. A recent PwC report highlights that a staggering &lt;strong&gt;79% of companies are already adopting AI agents&lt;/strong&gt; in some capacity [1]. As these autonomous systems proliferate, executing tasks and making decisions on behalf of the enterprise, a critical governance gap has emerged. Without a robust management framework, organizations risk a chaotic landscape of “shadow AI,” creating significant security vulnerabilities, compliance nightmares, and operational inefficiencies.&lt;/p&gt;

&lt;p&gt;The solution lies in a new class of enterprise software: the &lt;strong&gt;Private Agent and MCP Registry&lt;/strong&gt;. This is not just a catalog, but a command center for agentic infrastructure, providing the visibility, governance, and security necessary to scale AI responsibly. Let’s explore the core pillars of this trend, using the “Agentic Trust” platform as a blueprint for building a better, more secure agentic future.&lt;/p&gt;

&lt;h2 id=&quot;pillar-1-a-centralized-directory-for-every-agent&quot;&gt;Pillar 1: A Centralized Directory for Every Agent&lt;/h2&gt;

&lt;p&gt;The first step to managing agentic chaos is to establish a single source of truth. You cannot govern what you cannot see. A private agent registry provides a comprehensive, real-time inventory of every agent operating within the enterprise, whether built in-house or sourced from a third-party vendor.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/Agent Directory.png&quot; alt=&quot;Agent Directory&quot; class=&quot;post-img&quot; width=&quot;3004&quot; height=&quot;2132&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;A centralized agent directory, as shown in the Agentic Trust platform, provides a complete inventory for governance and oversight.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;As the screenshot of the Agentic Trust directory illustrates, this is more than just a list. A mature registry tracks critical metadata for each agent, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Unique Identity:&lt;/strong&gt; A verifiable ID for every agent, forming the foundation for authentication and authorization.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Capabilities:&lt;/strong&gt; A clear declaration of what the agent is designed to do, including the tools, resources, and prompts it can access.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Lifecycle Status:&lt;/strong&gt; Tracking whether an agent is in development, production, or retired.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Ownership and Lineage:&lt;/strong&gt; Connecting each agent to a business owner, use case, and the data it interacts with.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Activity Monitoring:&lt;/strong&gt; Recording when agents were last used and their registration dates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This centralized view eliminates blind spots and provides the traceability required for compliance and security audits. Organizations can quickly answer critical questions: How many agents do we have? Who owns them? What are they authorized to do?&lt;/p&gt;

&lt;h2 id=&quot;pillar-2-a-curated-marketplace-for-agent-tools-mcps&quot;&gt;Pillar 2: A Curated Marketplace for Agent Tools (MCPs)&lt;/h2&gt;

&lt;p&gt;Autonomous agents are only as powerful as the tools they can access. The Model Context Protocol (MCP) has become a standard for providing agents with these tools, but an uncontrolled proliferation of MCP servers creates another layer of risk. A private registry addresses this by functioning as a curated, internal “app store” or marketplace for MCPs.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/MCP Registry.png&quot; alt=&quot;MCP Registry&quot; class=&quot;post-img&quot; width=&quot;3004&quot; height=&quot;2134&quot; /&gt;
&lt;span class=&quot;post-img-caption&quot;&gt;An MCP Registry, like this one from Agentic Trust, allows enterprises to create a governed marketplace of approved tools for their AI agents.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Instead of allowing agents to connect to any public MCP, the enterprise can define a catalog of approved, vetted, and secure tools. As shown in the Agentic Trust MCP Registry, this allows organizations to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Enforce Security Standards:&lt;/strong&gt; Ensure that all available tools meet enterprise security and compliance requirements before they’re made available to agents.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Manage Versions and Dependencies:&lt;/strong&gt; Control which versions of tools are used, preventing unexpected breaking changes that could disrupt agent operations.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Control Costs:&lt;/strong&gt; Monitor the usage of paid APIs and tools, preventing runaway costs from autonomous agents making thousands of requests.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Improve Developer Productivity:&lt;/strong&gt; Provide a central place for developers to discover and reuse existing tools, accelerating agent development and reducing duplication.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Categorize and Organize:&lt;/strong&gt; Group tools by function (productivity, collaboration, payments, development, monitoring) to make discovery easier.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The registry shows connection status for each MCP server, making it immediately visible which integrations are active and which require attention. This operational visibility is critical for maintaining a healthy agentic ecosystem.&lt;/p&gt;

&lt;h2 id=&quot;pillar-3-end-to-end-governance-and-policy-enforcement&quot;&gt;Pillar 3: End-to-End Governance and Policy Enforcement&lt;/h2&gt;

&lt;p&gt;A private registry is the enforcement point for enterprise AI policy. It moves governance from a manual, after-the-fact process to an automated, built-in function of the agentic infrastructure. Drawing on best practices from platforms like Collibra and Microsoft Azure’s private registry implementations, this includes [1, 2]:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mandatory Metadata and Documentation:&lt;/strong&gt; Before an agent or MCP can be registered, developers must provide essential information such as data classification, business owner, purpose, and criticality. This ensures that every component in the agentic ecosystem is properly documented and understood.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lifecycle Policy Alignment:&lt;/strong&gt; The registry can embed automated policy checks at each stage of an agent’s lifecycle. For example, an agent cannot be promoted to production without a completed security review, ethical bias assessment, and approval from the designated business owner. This creates natural checkpoints that enforce organizational standards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access Control and Permissions:&lt;/strong&gt; Using Role-Based Access Control (RBAC), integrated with enterprise identity systems like Entra ID or Okta, the registry defines who can create, manage, and consume agents and their tools. Different teams might have different levels of access based on their role and the sensitivity of the agents they’re working with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit Trails and Compliance:&lt;/strong&gt; Every action in the registry—agent registration, tool connection, permission changes—is logged and auditable. This creates a complete forensic trail that satisfies regulatory requirements and enables rapid incident response when issues arise.&lt;/p&gt;

&lt;h2 id=&quot;pillar-4-solving-real-enterprise-challenges&quot;&gt;Pillar 4: Solving Real Enterprise Challenges&lt;/h2&gt;

&lt;p&gt;The value of a private registry becomes clear when we examine the specific problems it solves. Consider these common enterprise scenarios:&lt;/p&gt;

&lt;h3 id=&quot;challenge-shadow-ai-and-uncontrolled-tool-adoption&quot;&gt;Challenge: Shadow AI and Uncontrolled Tool Adoption&lt;/h3&gt;

&lt;p&gt;Development teams are rapidly adopting AI tools and MCP servers without central oversight. This creates security blind spots, compliance risks, and operational fragmentation across the organization. A private registry provides centralized discovery of approved tools and usage visibility, allowing security teams to monitor what tools are being used and by whom [2].&lt;/p&gt;

&lt;h3 id=&quot;challenge-regulatory-compliance-and-data-sovereignty&quot;&gt;Challenge: Regulatory Compliance and Data Sovereignty&lt;/h3&gt;

&lt;p&gt;Organizations in regulated industries (financial services, healthcare, government) need to maintain strict control over data flows and ensure AI tools meet compliance requirements. The registry enables data classification tagging for MCP servers, geographic controls for region-specific availability, comprehensive audit trails, and pre-configured compliance templates [2].&lt;/p&gt;

&lt;h3 id=&quot;challenge-cost-control-and-resource-optimization&quot;&gt;Challenge: Cost Control and Resource Optimization&lt;/h3&gt;

&lt;p&gt;Without visibility into agent and tool usage, organizations face unpredictable costs as autonomous agents make API calls and consume resources. A private registry provides usage analytics, cost allocation by team or project, budget alerts, and the ability to deprecate underutilized or expensive tools [2].&lt;/p&gt;

&lt;h3 id=&quot;challenge-developer-productivity-and-tool-discovery&quot;&gt;Challenge: Developer Productivity and Tool Discovery&lt;/h3&gt;

&lt;p&gt;Developers waste time rebuilding integrations that already exist elsewhere in the organization or struggle to find the right tools for their agents. The registry solves this with searchable catalogs, reusable components, standardized integration patterns, and clear documentation for each available tool [3].&lt;/p&gt;

&lt;h2 id=&quot;the-architecture-that-enables-scale&quot;&gt;The Architecture That Enables Scale&lt;/h2&gt;

&lt;p&gt;Behind the user interface of platforms like Agentic Trust lies a sophisticated architecture that makes enterprise-scale agent management possible. The key components include [3, 4]:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Component&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Purpose&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Central Registry API&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Provides standardized endpoints for agent and MCP registration, discovery, and management&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Metadata Database&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Stores agent cards, capability declarations, and relationship data&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Policy Engine&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Enforces governance rules, access controls, and compliance checks&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Discovery Service&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Enables capability-based search and intelligent agent-to-tool matching&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Health Monitor&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Tracks agent and MCP server availability through heartbeats and health checks&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Integration Layer&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Connects to enterprise identity systems, monitoring tools, and DevOps pipelines&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This architecture mirrors patterns from successful enterprise software registries, such as container registries, API management platforms, and model registries. The lesson is clear: as a technology becomes critical to enterprise operations, it requires industrial-grade management infrastructure.&lt;/p&gt;

&lt;h2 id=&quot;the-path-forward&quot;&gt;The Path Forward&lt;/h2&gt;

&lt;p&gt;The trend toward private registries for agentic infrastructure is not a passing fad; it is a necessary evolution in response to the rapid adoption of autonomous AI systems. As the Model Context Protocol ecosystem continues to grow, with the official MCP Registry serving as a public catalog [4], forward-thinking enterprises are building their own private implementations to maintain control, security, and governance.&lt;/p&gt;

&lt;p&gt;Platforms like Agentic Trust demonstrate what this future looks like: a unified command center where every agent is visible, every tool is vetted, and every action is governed by policy. This is how organizations move from the chaos of unmanaged AI to the strategic advantage of a well-orchestrated agentic ecosystem.&lt;/p&gt;

&lt;p&gt;For enterprises embarking on this journey, the message is clear: you cannot scale what you cannot see, and you cannot govern what you cannot control. A private registry is the foundation upon which responsible, secure, and effective agentic AI is built.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://www.collibra.com/blog/collibra-ai-agent-registry-governing-autonomous-ai-agents&quot;&gt;Collibra. (2025, October 6). &lt;em&gt;Collibra AI agent registry: Governing autonomous AI agents&lt;/em&gt;. Collibra.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://azurewithaj.com/posts/devops-ai-series-private-mcp-registry/&quot;&gt;Bajada, AJ. (2025, August 14). &lt;em&gt;DevOps and AI Series: Azure Private MCP Registry&lt;/em&gt;. azurewithaj.com.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://www.truefoundry.com/blog/ai-agent-registry&quot;&gt;TrueFoundry. (2025, September 10). &lt;em&gt;What is AI Agent Registry&lt;/em&gt;. TrueFoundry.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://blog.modelcontextprotocol.io/posts/2025-09-08-mcp-registry-preview/&quot;&gt;Model Context Protocol. (2025, September 8). &lt;em&gt;Introducing the MCP Registry&lt;/em&gt;. Model Context Protocol.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Mon, 17 Nov 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/11/17/why-private-registries-are-the-future-of-enterprise-agentic-infrastructure/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/11/17/why-private-registries-are-the-future-of-enterprise-agentic-infrastructure/</guid>
        
        <category>AI</category>
        
        <category>Agents</category>
        
        <category>Agentic AI</category>
        
        <category>MCP</category>
        
        <category>Agent Registry</category>
        
        <category>Enterprise AI</category>
        
        <category>Governance</category>
        
        <category>Security</category>
        
        <category>Infrastructure</category>
        
        <category>Private Registry</category>
        
        <category>AI Management</category>
        
        
      </item>
    
      <item>
        <title>From Espionage to Identity: Securing the Future of Agentic AI</title>
        <description>&lt;p&gt;Anthropic has detailed its disruption of the first publicly reported cyber espionage campaign orchestrated by a sophisticated AI agent [1]. The incident, attributed to a state-sponsored group designated &lt;strong&gt;GTG-1002&lt;/strong&gt;, is more than just a security bulletin; it is a clear signal that the age of autonomous, agentic AI threats is here. It also serves as a critical case study, validating the urgent need for a new generation of identity and access management protocols specifically designed for AI.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/ai_cyberattack_lifecycle_diagram.webp&quot; alt=&quot;AI Cyberattack Lifecycle&quot; class=&quot;post-img&quot; width=&quot;1159&quot; height=&quot;862&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This post will dissect the anatomy of the attack, connect it to the foundational security challenges facing agentic AI, and explore how emerging standards like &lt;strong&gt;OpenID Connect for Agents (OIDC-A)&lt;/strong&gt; provide a necessary path forward [2, 3].&lt;/p&gt;

&lt;h2 id=&quot;anatomy-of-an-ai-orchestrated-attack&quot;&gt;Anatomy of an AI-Orchestrated Attack&lt;/h2&gt;

&lt;p&gt;Anthropic’s investigation revealed a campaign of unprecedented automation. The attackers turned Anthropic’s own &lt;strong&gt;Claude Code&lt;/strong&gt; model into an autonomous weapon, targeting approximately thirty global organizations across technology, finance, and government. The AI was not merely an assistant; it was the operator, executing &lt;strong&gt;80-90% of the tactical work&lt;/strong&gt; with human intervention only required at a few key authorization gates [1].&lt;/p&gt;

&lt;p&gt;The technical sophistication of the attack did not lie in novel malware, but in orchestration. The threat actor built a custom framework around a series of &lt;strong&gt;Model Context Protocol (MCP) servers&lt;/strong&gt;. These servers acted as a bridge, giving the AI agent access to a toolkit of standard, open-source penetration testing utilities—network scanners, password crackers, and database exploitation tools.&lt;/p&gt;

&lt;p&gt;By decomposing the attack into seemingly benign sub-tasks, the attackers tricked the AI into executing a complex intrusion campaign. The AI agent, operating with a persona of a legitimate security tester, autonomously performed reconnaissance, vulnerability analysis, and data exfiltration at a machine-speed that no human team could match.&lt;/p&gt;

&lt;h2 id=&quot;the-mcp-paradox-extensibility-vs-security&quot;&gt;The MCP Paradox: Extensibility vs. Security&lt;/h2&gt;

&lt;p&gt;The Anthropic report explicitly states that the attackers leveraged the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; to arm their AI agent [1]. This highlights a central paradox in agentic AI architecture: the very protocols designed for extensibility and power, like MCP, can become the most potent attack vectors.&lt;/p&gt;

&lt;p&gt;As the “Identity Management for Agentic AI” whitepaper notes, MCP is a leading framework for connecting AI to external tools, but it also presents significant security challenges [3]. When an AI can dynamically access powerful tools without robust oversight, it creates a direct and dangerous path for misuse. The GTG-1002 campaign is a textbook example of this risk realized.&lt;/p&gt;

&lt;p&gt;This forces a critical re-evaluation of how we architect agentic systems. We can no longer afford to treat the connection between an AI agent and its tools as a trusted channel. This is where the concept of an &lt;strong&gt;MCP Gateway or Proxy&lt;/strong&gt; becomes not just a good idea, but an absolute necessity.&lt;/p&gt;

&lt;h2 id=&quot;the-solution-identity-delegation-and-zero-trust-for-agents&quot;&gt;The Solution: Identity, Delegation, and Zero Trust for Agents&lt;/h2&gt;

&lt;p&gt;The security gaps exploited in the Anthropic incident are precisely what emerging standards like &lt;strong&gt;OIDC-A (OpenID Connect for Agents)&lt;/strong&gt; are designed to close [2, 3]. The core problem is one of identity and authority. The AI agent in the attack acted with borrowed, indistinct authority, effectively impersonating a legitimate user or process. True security requires a shift to a model of &lt;strong&gt;explicit, verifiable delegation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The OIDC-A proposal introduces a framework for establishing the identity of an AI agent and managing its authorization through cryptographic delegation chains. This means an agent is no longer just a proxy for a user; it is a distinct entity with its own identity, operating on behalf of a user with a clearly defined and constrained set of permissions.&lt;/p&gt;

&lt;p&gt;Here’s how this new model, enforced by an MCP Gateway, would have mitigated the Anthropic attack:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Security Layer&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Agent Identity &amp;amp; Attestation&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The AI agent would have a verifiable identity, attested by its provider. An MCP Gateway could immediately block any requests from unattested or untrusted agents.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Tool-Level Delegation&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Instead of broad permissions, the agent would receive narrowly-scoped, delegated authority for specific tools. The OIDC-A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delegation_chain&lt;/code&gt; ensures that the agent’s permissions are a strict subset of the delegating user’s permissions [2]. An agent designed for code analysis could never be granted access to a password cracker.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Policy Enforcement &amp;amp; Anomaly Detection&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;The MCP Gateway would act as a policy enforcement point, monitoring all tool requests. It could detect anomalous behavior, such as an agent attempting to use a tool outside its delegated scope or a sudden spike in high-risk tool usage, and automatically terminate the agent’s session.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Auditing and Forensics&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Every tool request and delegation would be cryptographically signed and logged, creating an immutable audit trail. This would provide immediate, granular visibility into the agent’s actions, dramatically accelerating incident response.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;building-enterprise-grade-security-for-agentic-ai&quot;&gt;Building Enterprise-Grade Security for Agentic AI&lt;/h2&gt;

&lt;p&gt;The Anthropic report is a watershed moment. It proves that the threats posed by agentic AI are no longer theoretical. As the “Identity Management for Agentic AI” paper argues, we must move beyond traditional, human-centric security models and build a new foundation for AI identity [3].&lt;/p&gt;

&lt;p&gt;Today, most MCP servers being developed are experimental tools designed for individual developers and small-scale applications. They lack the enterprise-grade security controls that organizations require to deploy them in production environments. For enterprises to confidently adopt agentic AI systems built on protocols like MCP, we need to fundamentally rethink how we approach security.&lt;/p&gt;

&lt;p&gt;The path forward requires building robust delegation frameworks, implementing proper identity management for AI agents, and creating enterprise-grade security controls like gateways and policy enforcement points. We need solutions that provide:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Cryptographic delegation chains&lt;/strong&gt; that clearly define and constrain agent permissions&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Real-time policy enforcement&lt;/strong&gt; that can detect and prevent anomalous behavior&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Comprehensive audit trails&lt;/strong&gt; that enable forensic analysis and compliance&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Zero-trust architectures&lt;/strong&gt; where every agent action is verified and authorized&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We cannot afford to let the open, extensible nature of protocols like MCP become a permanent backdoor for malicious actors. The future of agentic AI depends on our ability to build security into these systems from the ground up, making enterprise adoption not just possible, but secure and responsible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf&quot;&gt;Anthropic. (2025, November). &lt;em&gt;Disrupting the first reported AI-orchestrated cyber espionage campaign&lt;/em&gt;. Anthropic.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://subramanya.ai/2025/04/28/oidc-a-proposal/&quot;&gt;Subramanya, N. (2025, April 28). &lt;em&gt;OpenID Connect for Agents (OIDC-A) 1.0 Proposal&lt;/em&gt;. subramanya.ai.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://arxiv.org/pdf/2510.25819&quot;&gt;South, T. (Ed.). (2025, October). &lt;em&gt;Identity Management for Agentic AI: The new frontier of authorization, authentication, and security for an AI agent world&lt;/em&gt;. arXiv.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Fri, 14 Nov 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/11/14/from-espionage-to-identity-securing-the-future-of-agentic-ai/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/11/14/from-espionage-to-identity-securing-the-future-of-agentic-ai/</guid>
        
        <category>AI</category>
        
        <category>Security</category>
        
        <category>Agentic AI</category>
        
        <category>OIDC-A</category>
        
        <category>MCP</category>
        
        <category>Anthropic</category>
        
        <category>Claude</category>
        
        <category>Cybersecurity</category>
        
        <category>AI Agents</category>
        
        <category>Identity Management</category>
        
        <category>Zero Trust</category>
        
        
      </item>
    
      <item>
        <title>Claude Skills vs. MCP: A Tale of Two AI Customization Philosophies</title>
        <description>&lt;p&gt;In the rapidly evolving landscape of artificial intelligence, the ability to customize and extend the capabilities of large language models (LLMs) has become a critical frontier. Anthropic, a leading AI research company, has introduced two powerful but distinct approaches to this challenge: &lt;strong&gt;Claude Skills&lt;/strong&gt; and the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;. While both aim to make AI more useful and integrated into our workflows, they operate on fundamentally different principles. This post delves into a detailed comparison of Claude Skills and MCP, explores whether they can or should be merged, and discusses the exciting future of AI customization they represent.&lt;/p&gt;

&lt;h2 id=&quot;what-are-claude-skills-the-power-of-procedural-knowledge&quot;&gt;What are Claude Skills? The Power of Procedural Knowledge&lt;/h2&gt;

&lt;p&gt;Claude Skills, also known as Agent Skills, are a revolutionary way to teach Claude how to perform specific tasks in a repeatable and customized manner. At its core, a Skill is a folder containing a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SKILL.md&lt;/code&gt; file, which includes instructions, resources, and even executable code. Think of Skills as a set of standard operating procedures for the AI. For example, a Skill could instruct Claude on how to format a weekly report, adhere to a company’s brand guidelines, or analyze data using a specific methodology.&lt;/p&gt;

&lt;p&gt;The genius of Claude Skills lies in their architecture, which is built on a principle called &lt;strong&gt;progressive disclosure&lt;/strong&gt;. This three-tiered system ensures that Claude’s context window isn’t overwhelmed with information:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Level 1: Metadata:&lt;/strong&gt; When a session starts, Claude loads only the name and description of each available Skill. This is a very lightweight process, consuming only a few tokens per Skill.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Level 2: The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SKILL.md&lt;/code&gt; file:&lt;/strong&gt; If Claude determines that a Skill is relevant to the user’s request, it then loads the full content of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SKILL.md&lt;/code&gt; file.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Level 3 and beyond: Additional resources:&lt;/strong&gt; If the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SKILL.md&lt;/code&gt; file references other documents or scripts within the Skill’s folder, Claude will load them only when needed.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This efficient, just-in-time loading mechanism allows for a vast library of Skills to be available without sacrificing performance. Skills are also portable, working across Claude.ai, Claude Code, and the API, and can even include executable code for deterministic and reliable operations.&lt;/p&gt;

&lt;h2 id=&quot;what-is-the-model-context-protocol-mcp-the-universal-connector&quot;&gt;What is the Model Context Protocol (MCP)? The Universal Connector&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol (MCP) is an open-source standard designed to connect AI applications to external systems. If Claude Skills are about teaching the AI &lt;em&gt;how&lt;/em&gt; to do something, MCP is about giving it access to &lt;em&gt;what&lt;/em&gt; it needs to do it. MCP acts as a universal connector, similar to a USB-C port for AI, allowing models like Claude to interact with a wide range of data sources, tools, and workflows.&lt;/p&gt;

&lt;p&gt;MCP operates on a client-server architecture:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;MCP Host:&lt;/strong&gt; The AI application (e.g., Claude) that manages connections to various external systems.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;MCP Client:&lt;/strong&gt; A component within the host that maintains a one-to-one connection with an MCP server.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;MCP Server:&lt;/strong&gt; A program that exposes tools, resources, and prompts from an external system to the AI.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture allows an AI to connect to multiple external systems simultaneously, from local files and databases to remote services like GitHub, Slack, or a company’s internal APIs. MCP is built on a two-layer architecture, with a data layer based on JSON-RPC 2.0 and a transport layer that supports both local and remote connections.&lt;/p&gt;

&lt;h2 id=&quot;the-core-difference-methodology-vs-connectivity&quot;&gt;The Core Difference: Methodology vs. Connectivity&lt;/h2&gt;

&lt;p&gt;The fundamental distinction between Claude Skills and MCP can be summarized as &lt;strong&gt;methodology versus connectivity&lt;/strong&gt;. MCP provides the AI with access to tools and data, while Skills provide the instructions on how to use them effectively. According to Anthropic’s own documentation:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“MCP connects Claude to external services and data sources. Skills provide procedural knowledge—instructions for how to complete specific tasks or workflows. You can use both together: MCP connections give Claude access to tools, while Skills teach Claude how to use those tools effectively.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This highlights that Skills and MCP are not competing technologies but are, in fact, complementary. An apt analogy is that of a master chef. MCP provides the chef with a fully stocked pantry of ingredients and a set of high-end kitchen appliances (the &lt;em&gt;what&lt;/em&gt;). Skills, on the other hand, are the chef’s personal recipe book and techniques, guiding them on &lt;em&gt;how&lt;/em&gt; to combine the ingredients and use the appliances to create a culinary masterpiece.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Feature&lt;/th&gt;
      &lt;th&gt;Claude Skills&lt;/th&gt;
      &lt;th&gt;Model Context Protocol (MCP)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Primary Purpose&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Procedural knowledge and methodology&lt;/td&gt;
      &lt;td&gt;Connectivity to external systems&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Filesystem-based with progressive disclosure&lt;/td&gt;
      &lt;td&gt;Client-server with JSON-RPC 2.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Core Concept&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Teaching the AI &lt;em&gt;how&lt;/em&gt; to do something&lt;/td&gt;
      &lt;td&gt;Giving the AI access to &lt;em&gt;what&lt;/em&gt; it needs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Dependency&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Requires a code execution environment&lt;/td&gt;
      &lt;td&gt;A client and a server implementation&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Token Efficiency&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Very high due to progressive disclosure&lt;/td&gt;
      &lt;td&gt;Moderate, with tool descriptions in context&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Portability&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Across Claude interfaces&lt;/td&gt;
      &lt;td&gt;Open standard for any LLM&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;can-a-claude-skill-be-an-mcp-and-should-they-be-merged&quot;&gt;Can a Claude Skill be an MCP? And Should They Be Merged?&lt;/h2&gt;

&lt;p&gt;Given that both are Anthropic’s creations, a natural question arises: could a Claude Skill be implemented as an MCP, or should the two be merged into a single, unified system? While technically possible to create an MCP server that exposes Skills, it would be architecturally inefficient and would defeat the purpose of both systems.&lt;/p&gt;

&lt;p&gt;Exposing Skills through MCP would negate the benefits of progressive disclosure, as it would introduce the overhead of the MCP protocol for what should be a simple filesystem read. It would also create a redundant abstraction layer, as Skills already require a local code execution environment. The two systems are designed for different purposes and have different optimization goals: Skills for context efficiency within Claude, and MCP for standardized integration across different AI systems.&lt;/p&gt;

&lt;p&gt;Therefore, Claude Skills and MCP &lt;strong&gt;should be treated as independent, complementary technologies&lt;/strong&gt;. The most powerful workflows will come from using them in synergy.&lt;/p&gt;

&lt;h2 id=&quot;the-power-of-synergy-using-skills-and-mcp-together&quot;&gt;The Power of Synergy: Using Skills and MCP Together&lt;/h2&gt;

&lt;p&gt;The true potential of these technologies is unlocked when they are used in concert. Here are a few integration patterns that showcase their combined power:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Skills as MCP Orchestrators:&lt;/strong&gt; A Skill can contain a complex workflow that orchestrates calls to multiple MCP servers. For example, a “Deploy and Notify” Skill could contain a deployment checklist, notification templates, and rollback procedures. It would then use MCP to access GitHub for code, a CI/CD server for deployment, and Slack for notifications.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Skills for MCP Configuration:&lt;/strong&gt; An organization can create Skills that teach Claude its specific standards for using MCP tools. For example, a “GitHub Workflow Standards” Skill could contain instructions on branch naming conventions, pull request review checklists, and commit message templates, ensuring that Claude uses the GitHub MCP server in a way that aligns with the company’s best practices.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Hybrid Skills:&lt;/strong&gt; A Skill can contain embedded code that makes calls to an MCP server. This is useful for self-contained workflows that need to fetch external data.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;the-future-a-marketplace-for-skills-and-an-ecosystem-for-mcp&quot;&gt;The Future: A Marketplace for Skills and an Ecosystem for MCP&lt;/h2&gt;

&lt;p&gt;The future of AI customization will likely see the development of a vibrant &lt;strong&gt;Skills Marketplace&lt;/strong&gt;. Similar to the app stores for our smartphones or the extension marketplaces for our code editors, a Skills Marketplace would allow developers to publish, share, and even sell Skills. This could create a new economy around AI expertise, with a wide range of Skills available, from free, community-contributed Skills to premium, industry-specific Skill packages for domains like law, medicine, or finance.&lt;/p&gt;

&lt;p&gt;Simultaneously, the MCP ecosystem will continue to grow, with more and more tools and services exposing their functionality through MCP servers. This will create a virtuous cycle: as more tools become available through MCP, the demand for Skills that can effectively use those tools will increase.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Claude Skills and the Model Context Protocol represent two distinct but complementary philosophies of AI customization. MCP is the universal connector, providing the &lt;em&gt;what&lt;/em&gt;—the access to tools and data. Skills are the procedural knowledge, providing the &lt;em&gt;how&lt;/em&gt;—the instructions and methodology. They are not competitors but partners in the quest to create more powerful, personalized, and integrated AI assistants. The future of AI workflows will not be about choosing between Skills &lt;em&gt;or&lt;/em&gt; MCP, but about leveraging the power of Skills &lt;em&gt;and&lt;/em&gt; MCP to create intelligent systems that are truly tailored to our needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://www.anthropic.com/news/skills&quot;&gt;Anthropic. (2025, October 16). &lt;em&gt;Claude Skills: Customize AI for your workflows&lt;/em&gt;. Anthropic.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills&quot;&gt;Anthropic. (2025, October 16). &lt;em&gt;Equipping agents for the real world with Agent Skills&lt;/em&gt;. Anthropic.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://modelcontextprotocol.io/&quot;&gt;Model Context Protocol. (n.d.). &lt;em&gt;What is the Model Context Protocol (MCP)?&lt;/em&gt; Model Context Protocol.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://modelcontextprotocol.io/docs/learn/architecture&quot;&gt;Model Context Protocol. (n.d.). &lt;em&gt;Architecture overview&lt;/em&gt;. Model Context Protocol.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://simonwillison.net/2025/Oct/16/claude-skills/&quot;&gt;Willison, S. (2025, October 16). &lt;em&gt;Claude Skills are awesome, maybe a bigger deal than MCP&lt;/em&gt;. Simon Willison’s Weblog.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[6] &lt;a href=&quot;https://support.claude.com/en/articles/12512176-what-are-skills&quot;&gt;Claude Help Center. (n.d.). &lt;em&gt;What are Skills?&lt;/em&gt; Claude Help Center.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[7] &lt;a href=&quot;https://intuitionlabs.ai/articles/claude-skills-vs-mcp&quot;&gt;IntuitionLabs. (2025, October 27). &lt;em&gt;Claude Skills vs. MCP: A Technical Comparison for AI Workflows&lt;/em&gt;. IntuitionLabs.&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Thu, 30 Oct 2025 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/2025/10/30/claude-skills-vs-mcp-a-tale-of-two-ai-customization-philosophies/</link>
        <guid isPermaLink="true">https://subramanya.ai/2025/10/30/claude-skills-vs-mcp-a-tale-of-two-ai-customization-philosophies/</guid>
        
        <category>AI</category>
        
        <category>Claude</category>
        
        <category>MCP</category>
        
        <category>Claude Skills</category>
        
        <category>Agent Skills</category>
        
        <category>AI Customization</category>
        
        <category>LLM</category>
        
        <category>Anthropic</category>
        
        <category>Integration</category>
        
        <category>Workflows</category>
        
        
      </item>
    
    
      <item>
        <title>Navigating UMass Amherst: A Handbook for International Students</title>
        <description>
</description>
        <pubDate>Mon, 08 May 2023 00:00:00 +0000</pubDate>
        <link>https://subramanya.ai/books/navigating-umass-amherst-a-handbook-for-international-students/</link>
        <guid isPermaLink="true">https://subramanya.ai/books/navigating-umass-amherst-a-handbook-for-international-students/</guid>
        
        <category>Handbook</category>
        
        <category>UMass Amherst</category>
        
        <category>International Students</category>
        
      </item>
    
  </channel>
</rss>

