[
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Context Engineering: Why Prompt Engineering Was Never Enough",
      "url": "/2026/04/23/context-engineering-why-prompt-engineering-was-never-enough/",
      "date_display": "April 23, 2026",
      "date_iso": "2026-04-23",
      "excerpt": "By 2026, the real job in modern AI systems is not writing a clever prompt. It is deciding what the model sees, when it sees it, and how that context is structured, persisted, and turned into durable memory.",
      "content": "For a while, “prompt engineering” was the name we gave to the craft of getting good results from large language models. It made sense in the early days. Most people were using one-shot interactions, and the main lever really did feel like wording: ask more clearly, add an example, constrain the format, and the model behaved better.That framing is now too small for the real problem.When an AI system fails in production, the issue is usually not that the model needed one more clever sentence in the system prompt. The issue is that the model did not see the right information, saw too much irrelevant information, saw the right information in the wrong format, or could not carry the right state forward from one step to the next. In other words, the problem was not just the prompt. The problem was the entire context pipeline.That is why the term context engineering has caught on. The phrase entered mainstream AI discussion in mid-2025, when Tobi Lütke and Andrej Karpathy argued that “prompt engineering” undersold the real work involved in building reliable LLM systems.[1] But the underlying discipline is older than the name. If you have built RAG, tool calling, memory systems, summarization, or evaluation loops, you have already done pieces of context engineering. What changed is that we finally have a name that describes the whole job.A Simple Mental ModelIf you want the simplest possible picture, context engineering is the layer between the outside world and the model’s working memory.flowchart TD    U[\"User request\"] --&gt; CE[\"Context engine\"]    I[\"Instructions and policies\"] --&gt; CE    R[\"Retrieved knowledge\"] --&gt; CE    M[\"Memory and saved state\"] --&gt; CE    T[\"Tool definitions and results\"] --&gt; CE    H[\"Recent conversation history\"] --&gt; CE    CE --&gt; W[\"Model context window\"]    W --&gt; L[\"LLM reasons and acts\"]    L --&gt; O[\"Answer or tool call\"]    O --&gt; S[\"New memory, logs, and state\"]    S --&gt; CEThat is the whole game.The model is the reasoning engine. The context engine decides what the model gets to reason over.The Name Is New. The Job Is Not.One reason the term resonates is that it ties together several threads that had been evolving separately.Retrieval-Augmented Generation, or RAG, taught us that models need access to external knowledge at inference time.[2] ReAct taught us that reasoning and acting work better when models can call tools, observe results, and continue from there.[3] Memory research taught us that long-running assistants need indexing, retrieval, and reading strategies rather than endless transcript accumulation.[4] Long-context evaluation showed that simply stuffing more tokens into a model is not the same thing as giving it better working memory.[5][6][7]Seen this way, context engineering is not a replacement for those ideas. It is the umbrella above them.That umbrella matters because modern AI systems are no longer isolated prompts. They are dynamic systems that assemble instructions, documents, structured data, tool outputs, and prior state into a temporary context window for the next step. LangChain described this well when it defined context engineering as the work of providing the right information and tools in the right format so the LLM can plausibly complete the task.[8]The phrase “plausibly complete the task” is doing a lot of work there. It is the right test.If an agent fails, the first question should not be, “How do I make the prompt smarter?”The first question should be, “Did I actually give the model what it needed to succeed?”Why Prompt Engineering Became Too SmallPrompt engineering still matters. It just became a subset of a larger discipline.The old mental model was:            Prompt engineering      Context engineering                  Write better instructions      Build the full information environment              Focus on a single request      Focus on multi-step systems              Mostly static      Dynamic and stateful              Optimize wording      Optimize selection, structure, memory, and tools              Improve a single model call      Improve the whole loop      This distinction becomes obvious the moment you build an agent.Suppose you are building a support agent for enterprise software. The user asks, “Why are our API requests timing out?”If you think only in prompt terms, you might improve the wording:  Ask the model to be concise  Ask it to cite evidence  Ask it to think step by stepThose are fine improvements. But they are not enough.The real system questions are harder:  Does the agent have access to the incident runbooks?  Can it see the latest logs and status pages?  Does it know which customer tier this account belongs to?  Does it remember earlier turns in the conversation?  Can it query the ticket system?  Can it distinguish stale documents from current ones?  If it gets too much context, what gets trimmed?That is context engineering.The prompt is one line item inside it.What Counts as ContextIn practice, context includes everything the model sees at inference time, not just the visible prompt.[8][9]That usually means:  System instructions  The current user request  Retrieved documents  Structured data like JSON, tables, schemas, and records  Tool definitions  Tool outputs  Recent conversation history  Long-term memory or saved notes  Security, policy, and formatting constraints  Environment state such as files, tabs, tickets, or working directoriesThis is why the phrase “filling the context window” has become so central. The context window is not just a place where text goes. It is the model’s temporary working memory. Everything that enters it competes for attention.And competition is the key word.Every extra token is not merely additional information. It is also additional distraction.Why Bigger Context Windows Did Not Solve the ProblemOne of the most common misconceptions in the current AI market is that larger context windows made context engineering less important.The research points in the opposite direction.Lost in the Middle showed that models often use long contexts unevenly, performing better when relevant information appears near the beginning or end and worse when important information sits in the middle.[5] Databricks’ long-context RAG study found that while adding more retrieved documents can help, only a small number of state-of-the-art models maintained strong performance above 64k tokens.[6] Chroma’s Context Rot report went even further: even simple tasks become less reliable as input length grows, especially when ambiguity and distractors are introduced.[7]This is the part many teams learn the hard way.Bigger windows do not eliminate the need to choose. They make the cost of bad choices less obvious at first and more painful later.A long prompt can fail in at least four different ways:  Context poisoning: a bad fact, hallucination, or outdated result gets carried forward.  Context distraction: too much relevant-but-not-critical detail overwhelms the core task.  Context confusion: different pieces of context contradict each other.  Context waste: useful tokens are buried under redundant or low-value material.This is why context engineering is not about maximizing tokens. It is about maximizing signal density inside the context window.From Retrieval to NavigationThis is where one of the best recent ideas enters the picture.Jason Liu argued that the next step after classic chunk-based RAG is to stop thinking only about “the most similar passages” and start thinking about the shape of the search space.[10] His framing is especially useful because it maps out a progression that many teams are already moving through:  Minimal chunks  Chunks with source metadata  Better handling for multimodal and structured content  Facets and query refinementThe first three are improvements in what gets retrieved.The fourth is more interesting. It improves what the agent learns about the corpus itself.Facets give the model something like peripheral vision. Instead of returning only the top few chunks, the system can also return aggregated metadata:  Which document types dominate the result set  Which teams or owners appear most often  Which dates cluster together  Which categories are present but underrepresented in the top resultsThat matters because similarity search is biased toward what is easiest to match, not necessarily what is most important to inspect.[10] A retrieval system may over-surface well-documented resolved incidents and under-surface sparse, still-open incidents. A legal search may over-surface signed contracts and hide the unsigned ones that actually need attention. Facets help the agent see not just “what matched,” but “what else exists nearby.”This is a major conceptual shift.RAG was mostly about retrieval.Context engineering is increasingly about navigation.The Six Jobs of Context EngineeringThe easiest way to make context engineering concrete is to break it into the actual jobs it performs.1. SelectionThe first job is deciding what deserves to enter the window at all.This includes retrieval, ranking, filtering, source choice, and freshness checks. It sounds obvious, but it is still where a huge amount of quality is won or lost. Benchmarks like BRIGHT show that realistic retrieval is much harder than surface-level semantic matching suggests.[11] If your retrieval quality is weak, no amount of downstream prompt polishing will fully save the result.Selection is not just “find relevant chunks.” It is:  choose the right source  choose the right granularity  choose the right amount  choose the right orderingGood systems often retrieve less than naive systems, but retrieve it more intentionally.2. StructureThe second job is deciding how the chosen context is represented.The same information can be helpful or useless depending on formatting. Anthropic’s tool-use guidance is explicit about this: tool descriptions and interfaces strongly shape model behavior.[9] Long-context prompting guidance makes similar recommendations for XML tagging, source labeling, and clearly separated document sections.[12]In practice, structure means:  label sources  separate instructions from data  wrap complex documents in consistent markup  preserve tables as tables when they matter  return citations and metadata with evidenceA short, well-labeled result often outperforms a giant JSON blob.3. CompressionThe third job is reducing context without destroying what matters.This is where a lot of agent systems either get much better or much worse.Compression can mean:  summarizing earlier turns  trimming stale history  keeping only the last few user turns verbatim  extracting durable facts from long threads  caching stable prefixes to reduce cost and latencyOpenAI’s prompt caching documentation shows that prompt order matters economically as well as cognitively: static shared prefixes are cheaper and faster when placed up front because cache hits depend on exact prefix reuse.[13] OpenAI’s newer Responses API work on compaction pushes the same idea further by treating long-running agent history as something that should be compressed into a more token-efficient representation before the window fills up.[14]Compression is not optional. The only question is whether you do it deliberately or let the context window degrade on its own.4. MemoryThe fourth job is deciding what should persist beyond the current turn.This is where many teams make the same mistake: they confuse memory with transcript retention.But good memory is not “keep everything forever.” LongMemEval frames long-term memory as a three-stage problem: indexing, retrieval, and reading.[4] That is the right way to think about it. A memory system should help the model recover the right prior fact at the right moment, not drown it in the complete past.This leads to a useful distinction:  Working memory: the short-term context needed for the current task  Reference memory: externalized facts, summaries, notes, or artifacts that can be reloaded laterIf everything stays in working memory, the model gets distracted.If everything gets pushed out, the model loses continuity.Context engineering decides what belongs in each layer.5. Tool and Interface DesignThe fifth job is making tools legible to the model.This is an underappreciated part of the discipline. A tool surface is not just software API design. It is also context design.The model needs to understand:  what the tool does  when to use it  what each parameter means  what the output implies  what to do next after seeing the resultThis is why tool descriptions matter so much.[9] It is also why Jason Liu’s emphasis on tool results is important.[10] The output of a tool does not merely answer the current query. It teaches the agent how to think about the next query.When the tool surface becomes standardized through a protocol like MCP, this becomes even more important. MCP makes it easier to connect tools, resources, and prompts to LLM applications, but it does not decide what information should be surfaced, how it should be filtered, or how much of it should be injected into the next model call.[15] The protocol is the plumbing. Context engineering is still the craft.6. Isolation and OrchestrationThe sixth job is deciding when not to share context.This is one of the biggest differences between toy demos and production agents.Sometimes the right answer is not a larger shared prompt. It is multiple smaller prompts with isolated scopes.Anthropic’s multi-agent research system is a strong example.[16] Their subagents run in parallel with separate context windows, which helps them explore different branches of a problem without contaminating each other with every intermediate detail. LangChain describes a similar pattern under “isolate”: sometimes the best way to improve agent reliability is to split contexts rather than accumulate them.[17]This matters because shared context has a hidden cost. It creates path dependence. A single bad branch can influence the next step, and the next, and the next.Isolation is a way to limit blast radius.What Changed in 2026In 2025, context engineering was mostly a useful name for a problem people already felt. In 2026, it is starting to harden into an architecture.The first big shift is that builders are moving durable state outside the raw context window. Anthropic’s context editing and memory tool explicitly separate what stays live in the working window from what should persist across sessions.[18] OpenAI’s January 2026 cookbook on personalization makes the same move in a different form: structured state objects that persist across runs and are deliberately injected back into working memory at the start of each run.[19] OpenAI’s Responses API then pushes this one step further with native compaction, so long-running agent loops do not require every team to build a custom summarization subsystem from scratch.[14]Anthropic’s Managed Agents makes the underlying pattern unusually explicit: the session is not the model’s context window.[20] That is a critical 2026 idea. The window is transient working memory. The session log is the durable object. The harness decides how to slice, compact, and rehydrate that durable context back into the next model call.The second shift is that retrieval is becoming more just in time and more interface-native. Instead of front-loading every possibly relevant token, teams are giving agents retrieval surfaces they already know how to operate. Mintlify’s ChromaFs is a good example: rather than booting a full sandbox for documentation retrieval, it presents docs as a virtual filesystem navigable with ls, cat, and grep, cutting p90 session creation from about 46 seconds to about 100 milliseconds.[21] Turso’s AgentFS pushes the same intuition toward general agent execution: a copy-on-write filesystem abstraction with portable single-file storage and built-in auditing.[22]The third shift is that context graphs are becoming an implementation direction, not just a metaphor. Foundation Capital’s thesis made the term visible, but the stronger claim is architectural: when agents sit in the execution path, they can capture decision traces as durable artifacts, not just emit final outputs.[26][27] Open-source systems like Graphiti and commercial platforms like Zep operationalize this as temporal context graphs with validity windows, provenance episodes, and hybrid retrieval across semantics, keywords, and graph structure.[23] TrustGraph takes a related approach by treating context as a versioned artifact: graph, embeddings, evidence, and policies bundled into portable “context cores” that can be promoted or rolled back like build outputs.[24][25]The fourth shift is that context engineering is now visible in real software practice, not just platform blogs. The 2026 MSR paper on context engineering in open-source software studied 466 repositories and found that AI context files such as AGENTS.md are spreading, but with no stable content structure yet.[28] That matters because it marks a move from theory to operational artifacts. Context is no longer just something inferred at runtime. It is being authored, versioned, reviewed, and mined as part of the software lifecycle.If you want the 2026 mental model in one picture, it looks like this:flowchart LR    E[\"Session log / events\"] --&gt; A[\"Context assembler\"]    F[\"Files, docs, and tools\"] --&gt; A    G[\"Context graph / memory\"] --&gt; A    P[\"Policies and AGENTS.md\"] --&gt; A    A --&gt; W[\"Working context window\"]    W --&gt; X[\"Agent action\"]    X --&gt; E    X --&gt; GThat is a very different architecture from “prompt + vector search.”Where Context Graphs Actually FitOne reason this conversation gets muddy is that people use context engineering and context graph as if they mean the same thing. They do not.Context engineering is the broader discipline. It is the work of deciding what goes into the next context window, what stays out, what gets compressed, and what gets retrieved on demand.A context graph is one possible long-term memory substrate inside that larger system.That distinction matters because not every useful agent needs a context graph. A documentation assistant over mostly static content may need good retrieval, tool design, and compaction, but not a graph. A coding agent may get surprisingly far with repository instructions, a durable session log, and a filesystem abstraction.[20][21][22][28]Context graphs become compelling when the problem has four characteristics:  Temporal truth matters. You need to know not just what is true now, but what was true at decision time.[23]  Provenance matters. You need to trace facts back to the episode, document, or interaction that produced them.[23][24]  Precedent matters. The task depends on how similar cases were handled before, including exceptions and approvals.[26][27]  Cross-entity reasoning matters. The useful memory is not a flat note, but a network of people, policies, incidents, accounts, tickets, and outcomes.[23][25]This is why the best definition of a context graph, in my view, is not “a graph database for AI.” It is a durable representation of precedent.That is also why decision traces matter so much. Foundation Capital’s framing is useful here: rules tell the agent what should happen in general; decision traces tell it what happened in a specific case, under real constraints, with real exceptions.[26] Once those traces are linked across entities and time, you get something much more valuable than generic memory. You get searchable judgment.How I Would Build It in 2026If I were building a serious context-engineering stack today, I would not start with the graph. I would start with the interfaces and promotion rules.1. Build a durable session layer firstEvery action, tool result, observation, and important intermediate artifact should land in an append-only session log or event store. This is your recoverable context object.[14][20]Do not confuse the active context window with the source of truth.The window is for reasoning.The session is for recovery, replay, debugging, and selective rehydration.2. Treat the context assembler as a product surfaceThe assembler should explicitly manage:  token budgets  source priority  freshness  compaction thresholds  history trimming  citation formatting  cache-aware orderingThis is the layer that decides what the model sees now. It should be observable, testable, and cheap to change.[18][19][14]3. Prefer just-in-time retrieval over eager stuffingGive the model lightweight handles first: file paths, object IDs, URLs, query templates, ticket IDs, incident IDs. Then let it pull detail only when needed.[9][18][21]This is where filesystems, MCP tools, search APIs, and structured queries become more valuable than giant top-K dumps.4. Promote only high-value state into long-term memoryNot everything should become memory.I would promote four classes of artifacts:  stable user or account preferences  durable facts with provenance  important intermediate summaries  decision traces and exceptionsEverything else should stay in the session log until it proves it deserves promotion.5. Build the context graph as a promoted memory layerThis is the part many teams invert.The graph should not be your raw transcript in graph form. It should be the curated layer that sits above sessions and below real-time assembly:  entities  relationships  time validity  source episodes  approvals  exceptions  outcomesIf you skip the promotion step, the graph becomes a dumping ground.If you get promotion right, the graph becomes the memory of how the organization actually reasons.[23][26]6. Package context like codeBy 2026, one of the most promising ideas is to treat context as a versioned artifact. In software projects this shows up as AGENTS.md and other repository-specific context files.[28] In graph-native systems it shows up as context cores: portable bundles of ontology, graph structure, embeddings, provenance, and retrieval policy.[24][25]This matters because context changes need the same operational discipline as code changes:  review  versioning  rollback  environment promotion  evaluationOnce context becomes an artifact, it becomes governable.7. Separate observability from intelligenceYou need both:  observability of the agent run  observability of the context systemThose are not the same thing.I want to know:  what the model saw  what it did not see  what got compacted  what was retrieved just in time  what got promoted into memory  what graph neighborhood was traversed  which precedent actually influenced the actionIf you cannot answer those questions, you are still debugging prompts in the dark.A Practical Maturity ModelIf you are trying to evaluate where your own system stands, this maturity model is more useful than abstract definitions.Level 0: Prompt-OnlyYou have a system prompt, a user message, and maybe a couple of examples.This can work surprisingly well for narrow tasks. It breaks quickly when the task requires fresh knowledge, persistence, or tools.Level 1: Retrieval-EnhancedYou add documents at runtime.This is where many teams stop. It is also where many teams start seeing the limitations of naive chunking, ranking, and context bloat.Level 2: Agent-AwareYou now manage history, tool results, memory, and formatting intentionally.This is the first level where “context engineering” becomes a useful term, because the system is no longer just prompt plus retrieval. It is assembling multiple forms of context dynamically.Level 3: AdaptiveThe system changes how it builds context based on the task.It may:  choose among sources  compress older history  reload memory selectively  route work to specialized tools  isolate subproblems into separate contextsAt this point, context construction is part of the application’s core logic.Level 4: Context-NativeThe system treats context as a first-class engineering surface.It has:  explicit context budgets  retrieval and generation evals  metadata and facet-aware navigation  memory policies  observability around failure modes  cost-aware prompt assemblyThis is where the strongest production systems are heading.What Good Context Engineering Looks Like in PracticeIf I had to reduce the whole discipline to a checklist, it would look like this:  Start with the task, not the prompt. Define what success looks like first.  Enumerate the context sources the model might need. Instructions, docs, tools, memory, state, policies.  Separate working memory from reference memory. Not everything should live in the active window.  Retrieve with intent. More chunks is not the same as better recall.  Structure context so the model can parse it quickly. Labels, sources, tables, and boundaries matter.  Design tools as if they are part of the prompt, because they are.  Trim aggressively. If you would not ask a human to reread it, do not force the model to reread it.  Measure retrieval and generation separately. Otherwise you will diagnose the wrong problem.  Use isolated contexts when tasks branch or can run in parallel.  Promote durable facts and decision traces intentionally. Not every transcript belongs in long-term memory.  Package critical context like code. Instructions, policies, and graph artifacts should be versioned.  Treat context bugs like software bugs. They should be observable, reproducible, and fixable.None of this is glamorous. That is exactly why it matters.Prompt engineering became popular because it sounded like a shortcut.Context engineering matters because it describes the actual work.The Real TakeawayThe center of gravity in AI is moving.The frontier question used to be: How smart is the model?The applied question is increasingly: What does the model get to see before it has to act?That is a different engineering problem. It is less about single prompts and more about systems design. Less about phrasing and more about information flow. Less about one-shot output quality and more about whether an agent can stay reliable over time.This is why context engineering is going to keep growing as a discipline. The better models get, the more the remaining failures look like context failures. Missing state. Wrong tool. Bad retrieval. Bloated history. Poor formatting. Conflicting evidence. Weak memory. Unbounded loops.The irony is that this makes AI systems feel more like classical software, not less. We are back to building pipelines, interfaces, state machines, memory hierarchies, caches, and observability layers. The novelty is that all of those pieces now exist in service of a probabilistic reasoning engine.The name may be new. The direction is not.Reliable AI systems will be built by teams that treat context as a first-class product surface.Everyone else will keep calling the model flaky.References:[1] Simon Willison. (2025, June 27). Context engineering.[2] Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.[3] Yao, S. et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models.[4] Wu, D. et al. (2025). LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory.[5] Liu, N. F. et al. (2023). Lost in the Middle: How Language Models Use Long Contexts.[6] Leng, Q. et al. (2024). Long Context RAG Performance of Large Language Models.[7] Hong, K., Troynikov, A., and Huber, J. (2025, July 14). Context Rot: How Increasing Input Tokens Impacts LLM Performance.[8] LangChain. (2025, June 23). The rise of “context engineering”.[9] Anthropic. How to implement tool use.[10] Jason Liu. (2025, August 27). Beyond Chunks: Why Context Engineering is the Future of RAG.[11] Su, H. et al. (2025). BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval.[12] Anthropic. Long context prompting tips.[13] OpenAI. Prompt caching.[14] OpenAI. (2026, March 19). From model to agent: Equipping the Responses API with a computer environment.[15] Model Context Protocol. What is the Model Context Protocol (MCP)?[16] Anthropic. (2025, June 13). How we built our multi-agent research system.[17] LangChain. (2025, July 2). Context Engineering.[18] Anthropic. (2025, September 29). Managing context on the Claude Developer Platform.[19] Okcular, E. (2026, January 5). Context Engineering for Personalization - State Management with Long-Term Memory Notes using OpenAI Agents SDK.[20] Anthropic. Scaling Managed Agents: Decoupling the brain from the hands.[21] Mintlify. (2026, March 24). How we built a virtual filesystem for our Assistant.[22] Turso. AgentFS.[23] Zep. Graphiti: Build Real-Time Knowledge Graphs for AI Agents.[24] TrustGraph. The context development platform.[25] TrustGraph. Working with Context Cores.[26] Gupta, J., and Garg, A. (2025, December 22). AI’s trillion-dollar opportunity: Context graphs.[27] Garg, A. (2026, January 16). Why context graphs are the missing layer for AI.[28] Mohsenimofidi, S., Galster, M., Treude, C., and Baltes, S. (2026). Context Engineering for AI Agents in Open-Source Software.",
      "views": 173,
      "reading_minutes": 23,
      "tags": [
        
          
          {
            "name": "Context Engineering",
            "slug": "context-engineering",
            "url": "/tags/context-engineering/#posts"
          },
        
          
          {
            "name": "Context Graphs",
            "slug": "context-graphs",
            "url": "/tags/context-graphs/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "RAG",
            "slug": "rag",
            "url": "/tags/rag/#posts"
          },
        
          
          {
            "name": "Prompt Engineering",
            "slug": "prompt-engineering",
            "url": "/tags/prompt-engineering/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "Agent Architecture",
            "slug": "agent-architecture",
            "url": "/tags/agent-architecture/#posts"
          },
        
          
          {
            "name": "Agent Memory",
            "slug": "agent-memory",
            "url": "/tags/agent-memory/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "The Filesystem Is the Database: Why Agents Need a New Storage Primitive",
      "url": "/2026/04/13/the-filesystem-is-the-database-why-agents-need-a-new-storage-primitive/",
      "date_display": "April 13, 2026",
      "date_iso": "2026-04-13",
      "excerpt": "RAG pipelines gave agents memory. But the next wave of agentic infrastructure is converging on a different primitive entirely: the virtual filesystem. From Mintlify's ChromaFs to Turso's AgentFS to Box's enterprise VFS layer, the pattern is unmistakable. The filesystem is becoming the universal interface for agent cognition, and the database is quietly becoming its substrate.",
      "content": "Something interesting is happening in the agentic infrastructure space, and it is not what most people expected. For the past two years, the dominant paradigm for giving agents access to knowledge has been Retrieval-Augmented Generation: embed your documents, store them in a vector database, and let the model query them at inference time. RAG worked. It was good enough. But “good enough” has a shelf life, and in 2026, that shelf life is expiring.A new pattern is emerging across the industry, and it is converging from multiple directions at once. Mintlify replaced its entire RAG pipeline with a virtual filesystem and saw session creation drop from 46 seconds to 100 milliseconds [1]. Turso built AgentFS, a SQLite-backed filesystem that gives every agent its own copy-on-write sandbox [2]. Box, the enterprise content giant, announced that it is repositioning its entire platform as a virtual filesystem layer for AI agents [3]. And ByteDance open-sourced OpenViking, a context database that organizes all agent memory, resources, and skills as a hierarchical filesystem [4].These are not niche experiments. They are signals of a fundamental shift. The filesystem is becoming the universal interface for agent cognition, and the database is quietly becoming its substrate.Why RAG Hit a WallRAG was the right answer for 2023. You had a pile of documents, a model with a limited context window, and you needed a way to surface relevant chunks at query time. Vector embeddings and similarity search solved that problem elegantly.But agents are not chatbots. An agent does not ask one question and leave. It explores. It reads a file, discovers a reference, follows it, reads another file, runs a command, writes an output. This is not a retrieval problem. It is a navigation problem.RAG pipelines struggle with this for three reasons. First, they are stateless by design. Every query is independent; there is no concept of “I was just looking at this directory, now show me the adjacent file.” Second, they flatten structure. A documentation site with a clear hierarchy of sections, pages, and code examples gets shredded into anonymous 512-token chunks that lose their organizational context. Third, they are expensive at scale. Embedding computation, vector index maintenance, and re-ranking all add latency and cost that compound as the corpus grows.The filesystem solves all three. It is inherently stateful (the agent has a working directory). It preserves structure (directories, subdirectories, files). And it is fast because the operations are simple: ls, cat, grep, find. These are not novel abstractions. They are the most battle-tested interface in computing.The Convergence: Four Approaches, One PatternWhat makes this moment significant is that the filesystem pattern is emerging independently across very different contexts.Mintlify’s ChromaFs is perhaps the most instructive example. Mintlify powers documentation assistants for thousands of companies. Their original architecture was textbook RAG: chunk the docs, embed them, retrieve at query time. When they replaced it with ChromaFs, a virtual filesystem that intercepts UNIX commands and translates them into Chroma database queries, the results were dramatic. Session creation went from 46 seconds to 100 milliseconds, a 460x improvement. Marginal cost per conversation dropped from $0.0137 to effectively zero [1]. The key insight: the agent already knows how to navigate a filesystem. Teaching it to use cat /auth/oauth.mdx is trivial compared to teaching it to formulate the right vector query.Turso’s AgentFS attacks a different problem: agent isolation and auditability. Every agent gets its own SQLite-backed filesystem with copy-on-write semantics. The host filesystem is a read-only base layer; the agent writes to a SQLite delta layer. Every file operation, tool call, and state change is recorded. The entire agent runtime, files, state, history, fits in a single portable SQLite file [2]. This is not just a filesystem. It is an auditable, reproducible execution environment.Box’s enterprise VFS is the most strategically significant. Box CEO Aaron Levie has been explicit: agents need a filesystem to do knowledge work in the enterprise [3]. But Box is not pitching a literal filesystem. They are pitching a “dynamic data delivery contract” that can be backed by object storage, relational databases, or their own content platform. The filesystem is the interface; the backing store is whatever makes sense for the data. What makes Box’s play interesting is the governance layer: permissions, audit trails, and compliance boundaries that carry over automatically from the content platform to the agent.ByteDance’s OpenViking takes the pattern furthest. It organizes all agent context, memories, resources, skills, knowledge, under a viking:// protocol using standard filesystem semantics. Agents navigate with ls and find. But the clever part is the tiered access model: every piece of context is processed into three layers. L0 is a one-sentence summary for quick retrieval. L1 is an overview with core information for planning. L2 is the full content for deep reading [4]. The agent starts with L0, drills into L1 when it needs more, and only loads L2 when it is doing detailed work. On the LoCoMo benchmark, this reduced token consumption from 24.6 million to 4.2 million while increasing task completion rates to 52% [4].Filesystem as Interface, Database as SubstrateThe pattern that connects all four is what I would call the VFS duality: the filesystem wins as the interface, and the database wins as the substrate. This is not an either-or choice. It is a layered architecture.Why the filesystem wins as the interface is straightforward. LLMs are trained on the internet, and the internet is built by developers who think in terms of files, directories, paths, and command-line tools. Models are unusually competent with these primitives because they have seen billions of examples of developers navigating codebases, reading files, and running shell commands. When you give an agent a filesystem, you are meeting it where its training data lives.Why the database wins as the substrate is equally clear. The moment agent memory needs to be shared, audited, queried by multiple agents, or made reliable under concurrency, you need database guarantees. ACID transactions, access control, semantic search, version history: these are hard problems that databases have spent decades solving. Reimplementing them on top of a literal filesystem is a path to pain.The VFS pattern gives you both. The agent sees files and directories. The system sees tables, indexes, and access control lists. ChromaFs stores everything in Chroma but exposes it as files. AgentFS stores everything in SQLite but exposes it as a POSIX filesystem. OpenViking uses its own storage engine but exposes it as viking:// paths. Box uses its enterprise content platform but exposes it as a navigable tree.But Can a VFS Actually Beat the Native Filesystem?The natural objection to all of this is: why not just use the real filesystem? POSIX is right there. Every operating system ships with it. Why add an abstraction layer?I wanted to answer this question empirically, so I built markdownfs, a from-scratch virtual filesystem in Rust designed specifically for agent workloads [6]. It supports the full set of UNIX-like commands (ls, cat, grep, find, chmod, chown), Git-style versioning with content-addressable storage, multi-user permissioning, and exposes three access methods: a CLI/REPL, an HTTP/REST API, and an MCP server that agents like Claude and Cursor can connect to directly.The architecture is simple: an in-memory inode table backed by a content-addressable blob store using SHA-256 hashing, with tokio::RwLock for safe concurrent access. Files are deduplicated automatically. Version control uses the same commit/revert model as Git, but at the filesystem level. Persistence is handled through atomic bincode snapshots.When I benchmarked markdownfs against the native filesystem across the standard agent operations (file creation, reads, writes, directory listing, grep, find, move, copy, deletion), markdownfs averaged roughly 130x faster across the board. The reasons are structural, not incidental. In-memory operations eliminate disk I/O entirely. Content-addressable storage means duplicate files are stored once. Zero-copy reads mean the agent gets data without serialization overhead. And because the entire filesystem state lives in a single process, there are no system call boundaries to cross.The comparison is particularly stark for the operations agents perform most frequently:            Operation      Why VFS Wins                  Repeated reads (agent re-reading context)      In-memory, zero-copy. No disk seeks, no page cache misses.              grep across files (agent searching for patterns)      All content is in-memory. No directory traversal, no file handle management.              Rapid file creation (agent producing work artifacts)      No filesystem journaling, no inode allocation on disk, no fsync.              Directory listing (agent exploring structure)      BTreeMap lookup vs. readdir syscalls.      But performance is not the real argument. The real argument is what the native filesystem cannot do. A POSIX filesystem has no concept of semantic search. It has no built-in versioning (you need Git for that). It has no tiered access model (you get the whole file or nothing). It has no content deduplication. It has no audit trail of agent operations. And critically, it has no MCP interface, which means agents cannot access it through the standard protocol that the ecosystem is converging on.The VFS is not just faster. It is a richer primitive. It gives you the familiar interface of ls and cat while adding the capabilities that agents actually need: versioning, permissions, search, deduplication, and protocol-native access via MCP or HTTP.What This Means for RAGTo be clear, RAG is not dead. Vector search remains valuable for fuzzy, semantic queries where the agent genuinely does not know what it is looking for. But the honest assessment is that RAG has been over-applied. Many of the use cases where teams deployed RAG pipelines, documentation retrieval, codebase navigation, enterprise knowledge management, are better served by a filesystem interface.The evidence is striking. Mintlify’s 460x speedup came from replacing RAG with a filesystem, not augmenting it [1]. Research from Letta shows that agents using simple filesystem operations achieve 74% accuracy on memory benchmarks, competitive with specialized retrieval tools. And agentic keyword search approaches can achieve over 90% of RAG performance without vector databases at all [5].The future is likely hybrid. RAG for open-ended semantic search. Filesystem for structured navigation and task execution. But the center of gravity is shifting toward the filesystem, and the strategic implications are significant.The Strategic ImperativeIf you are building agentic infrastructure, you need a VFS strategy. Here is why.For SaaS companies: the lesson from Box is that the filesystem is becoming the integration surface for agents. If your platform’s content is not navigable as a filesystem, agents will bypass you. The SaaS companies that expose their data through filesystem-like interfaces will become part of the agentic workflow. Those that do not will become invisible to agents, which means invisible to users.For infrastructure vendors: the database is not going away. It is moving underneath the filesystem. This is actually good news for database companies. Turso understood this and built AgentFS on top of SQLite. Every agent that spins up creates a new database. The more agents the world runs, the more databases the world needs. But the database needs to disappear behind a filesystem abstraction.For enterprises: the governance story is what matters. Box’s pitch is not really about filesystems. It is about the fact that their permission model, audit trail, and compliance infrastructure automatically extends to agents when content is accessed through the VFS layer [3]. This is the answer to the question every CISO is asking: “How do we let agents access our content without creating a security nightmare?”The Unifying LayerThe agentic infrastructure stack has been evolving in clear phases: tools (MCP), skills, and context graphs. The virtual filesystem fits into this arc as the delivery mechanism for all three. MCP tools are invoked through the filesystem. Skills are stored as files. Context graphs are navigated as directory trees. The filesystem does not replace these layers. It unifies them behind a single, familiar interface.This is the real insight. The filesystem is not a new idea. It is the oldest abstraction in computing. But that is exactly why it works for agents. In a world where we are inventing new paradigms every quarter, the most powerful move might be reaching back to the most proven interface we have and putting a modern database behind it.The companies that understand this, Mintlify, Turso, Box, ByteDance, are not building something new. They are recognizing something old and giving it a new job.References:[1] Mintlify. (2026, April 2). How we built a virtual filesystem for our Assistant. Mintlify Blog.[2] Turso. (2026). The Missing Abstraction for AI Agents: The Agent Filesystem. Turso Blog.[3] Blocks and Files. (2026, March 9). Box pitches ‘virtual filesystem’ layer for AI agents. Blocks and Files.[4] Volcengine. (2026). OpenViking: An open-source context database for AI Agents. GitHub.[5] Signals. (2026, February). Keyword Search is All You Need: Achieving RAG-Level Performance Without Vector Databases Using Agentic Tool Use. Signals.[6] Subramanya N. (2026). markdownfs: A high-performance, concurrent markdown database built in Rust. GitHub.",
      "views": 414,
      "reading_minutes": 11,
      "tags": [
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Virtual Filesystem",
            "slug": "virtual-filesystem",
            "url": "/tags/virtual-filesystem/#posts"
          },
        
          
          {
            "name": "RAG",
            "slug": "rag",
            "url": "/tags/rag/#posts"
          },
        
          
          {
            "name": "Agent Infrastructure",
            "slug": "agent-infrastructure",
            "url": "/tags/agent-infrastructure/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "Context Engineering",
            "slug": "context-engineering",
            "url": "/tags/context-engineering/#posts"
          },
        
          
          {
            "name": "AgentFS",
            "slug": "agentfs",
            "url": "/tags/agentfs/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "The SaaSpocalypse: A Survival Guide",
      "url": "/2026/02/23/the-saaspocalypse-a-survival-guide/",
      "date_display": "February 23, 2026",
      "date_iso": "2026-02-23",
      "excerpt": "Nearly $1 trillion wiped from software stocks in a week. This is not a mass extinction—it's a cleansing fire. Here's the survival playbook for SaaS companies in the outcome economy.",
      "content": "It started with a single press release. On January 30, 2026, AI startup Anthropic announced 11 specialized plugins for its Claude Cowork agent, empowering it to handle complex workflows in sales, finance, legal, and HR [1]. Wall Street’s reaction was not just negative; it was apocalyptic. In the week that followed, nearly $1 trillion in value was wiped from software and services stocks in a sell-off so brutal, Jefferies traders coined a new term for it: the “SaaSpocalypse” [2, 3].Thomson Reuters suffered its biggest single-day drop on record (-15.8%), LegalZoom plunged nearly 20%, and established giants like Atlassian and Intuit saw their valuations crumble by 50% and 34% respectively since the start of the year [4, 5]. The panic was clear: if an AI agent can do the job of your software, why would anyone pay for your software?This is more than a market correction. It’s a referendum on the entire Software-as-a-Service model. But is it truly an apocalypse, or is it a long-overdue reckoning? And for the thousands of founders, employees, and investors in the SaaS ecosystem, a more urgent question looms: how do you survive?The Great Divide: Two Competing RealitiesThe market is now split into two warring camps, each with a compelling narrative.Camp 1: The End Is NighThis camp believes the threat is existential. The core of their argument is the death of the per-seat pricing model. As Morningstar analysts bluntly put it, “if one person can now do the work of two, seat counts fall” [5]. Why pay for 500 Salesforce seats when 450 employees and an AI agent can do the same work? This isn’t a hypothetical; Salesforce CEO Marc Benioff has already stated the company won’t be hiring more engineers, customer service agents, or lawyers precisely because of AI’s capabilities [4].Anthropic CEO Dario Amodei predicts AI could displace half of all entry-level white-collar jobs in the next five years, and OpenAI CEO Sam Altman has warned that AI will be “quite harmful” to some traditional software companies [4, 6]. For this camp, the math is simple: fewer employees and more capable AI means less revenue for traditional SaaS.Camp 2: The Panic Is IllogicalOn the other side, a powerful contingent of tech leaders argues the panic is a massive overreaction. Nvidia CEO Jensen Huang called the notion that AI will replace the software industry “the most illogical thing in the world,” while Arm Holdings CEO Rene Haas dismissed the sell-off as “micro-hysteria” [7].Their argument, articulated well by Bernard Golden, CEO of Navica, is threefold [8]:  Enterprise DIY Will Fail: Building real software requires far more than just code. It demands deep domain expertise, regulatory knowledge, global support, and legal indemnification—things AI can’t replicate.  Incumbents Have Moats: Established players have network effects, scale, and deep, custom integrations that startups can’t easily displace.  Jevons Paradox: As AI makes software cheaper and easier to create, the demand for it won’t shrink—it will explode. Cheaper software will lead to vastly more software, not less.The Survival PlaybookSo, who is right? The truth, as analyzed by firms like Bain &amp; Company and The Guardian, lies in the middle. Disruption is mandatory, but obsolescence is not [5, 9]. Survival depends on a clear-eyed assessment of your company’s position and a swift, decisive pivot. Here is the emerging survival guide.1. Defend Your MoatThe companies weathering the storm are not the ones fighting AI, but the ones with unique, defensible assets that AI can’t easily replicate. The four critical moats are:            Moat Type      Description      Example                  Proprietary Data      Data that is unique to your customers and not publicly available. AI models are trained on public data; they can’t access your private, firewalled customer information.      A vertical SaaS for pharmaceutical research with years of private clinical trial data.              Complex Systems      Deeply embedded, mission-critical workflows that are core to a business’s operations. The cost and risk of ripping out these systems are too high.      Oracle’s ERP systems, ServiceNow’s IT service management platform.              Network Effects      Platforms where the value increases as more users join. The classic example is a marketplace, but it also applies to collaborative software.      A procurement platform that connects thousands of buyers and suppliers.              Deep Integration      Software that is intricately woven into a customer’s tech stack, with numerous custom APIs and data connections.      A manufacturing execution system tied into a factory’s physical hardware.      If your product relies solely on analyzing public data or performing a task that can be replicated by a generic AI agent, you are in the kill zone.2. Embrace the New Pricing Model: The Outcome EconomyThe per-seat license is dying. The future is outcome-based pricing. Your customers no longer want to pay for access to your tool; they want to pay for the result it delivers. As a recent BVP report notes, AI-native companies are abandoning seat-based pricing almost entirely [10].  The Old Model: $150 per user per month.  The New Model: $0.99 per resolved customer issue, $5 per generated lead, or 1% of the cost savings achieved.This is not theoretical. Intercom’s AI agent, Fin, is already at a $100M+ revenue run rate by charging per resolution. This model aligns your success directly with your customer’s success. It’s a harder model to build, but a far more defensible one.3. Become the Trusted IncumbentHere lies the greatest advantage for existing SaaS companies. In a world of black-box AIs, trust is the scarcest resource. A Bain &amp; Company survey found that customers would prefer to buy AI-enabled solutions from their incumbent vendors [9]. They trust their security, their reliability, and their longevity.The challenge is that most incumbents have been slow to deliver compelling AI offerings. The opportunity is massive for those who can integrate AI deeply into their existing, trusted products. Don’t just add an AI chatbot in the corner; use AI to supercharge your core workflow and deliver a 10x better outcome.The SaaSpocalypse is not an extinction-level event for everyone. It is a cleansing fire. The companies that will be wiped out are the ones selling undifferentiated, easily-replicated features. The companies that survive—and thrive—will be the ones with deep moats, customer trust, and a business model built for the new outcome-based economy.References:[1] Anthropic’s new AI tool sends shudders through software stocks[2] Selloff wipes out nearly $1 trillion from software and services stocks[3] ‘Get me out’: Traders dump software stocks as AI fears erupt[4] AI fears pummel software stocks: Is it ‘illogical’ panic or a SaaS apocalypse?[5] Is the share market headed toward a ‘SaaS-pocalypse’?[6] AI to change nature of software industry; will be bad for some companies: Sam Altman[7] AI fears pummel software stocks: Is it ‘illogical’ panic or a SaaS apocalypse?[8] The AI software freakout is a massive overreaction. Here’s why.[9] Why SaaS Stocks Have Dropped—and What It Signals for Software’s Next Chapter[10] The AI pricing and monetization playbook",
      "views": 461,
      "reading_minutes": 6,
      "tags": [
        
          
          {
            "name": "SaaS",
            "slug": "saas",
            "url": "/tags/saas/#posts"
          },
        
          
          {
            "name": "SaaSpocalypse",
            "slug": "saaspocalypse",
            "url": "/tags/saaspocalypse/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "Business Models",
            "slug": "business-models",
            "url": "/tags/business-models/#posts"
          },
        
          
          {
            "name": "Outcome Pricing",
            "slug": "outcome-pricing",
            "url": "/tags/outcome-pricing/#posts"
          },
        
          
          {
            "name": "Enterprise Software",
            "slug": "enterprise-software",
            "url": "/tags/enterprise-software/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "2026: The Year SaaS Disappeared Into the Conversation",
      "url": "/2026/02/19/the-year-saas-disappeared-into-the-conversation/",
      "date_display": "February 19, 2026",
      "date_iso": "2026-02-19",
      "excerpt": "SaaS is shifting from dashboards and clicks to personalized, voice-enabled AI agents that execute outcomes. In 2026, the winning software model is no longer seat-based access but measurable results delivered through conversation.",
      "content": "What if the best user interface was no interface at all? For decades, we have been trained to navigate a labyrinth of menus, buttons, and settings screens. We learned the language of software. In 2026, that paradigm is finally flipping. Software is learning to speak our language.This is not just about adding a chatbot to a dashboard. A quiet revolution is underway: Software as a Service (SaaS) is no longer a destination and is becoming a capability accessed through natural language. The primary interface for getting work done is shifting from graphical (GUI) to conversational (CUI) and, increasingly, to voice. As a recent analysis in Harvard Business Review noted, the goal is no longer to automate the past but to orchestrate a new, more dynamic future where intelligent agents assemble novel workflows in real time, unconstrained by human org charts [1].Meet Your New Coworker: The Personalized AI AgentAt the heart of this transformation is the move from generic, one-size-fits-all tools to deeply personalized AI agents that act as expert coworkers. These agents do not just access public data; they understand your world. As Goldman Sachs CIO Marco Argenti declared in January 2026, “Context is the new frontier,” signaling the rise of personal agents that know your context and can act on your behalf [2].This trend has accelerated across the enterprise landscape in just the first two months of the year:  Glean launched its latest AI Assistant on February 17, positioning it as an “expert agentic coworker” powered by a “Personal Graph” that understands an employee’s role, projects, and collaborators to move from insight to execution [3].  Microsoft announced on January 30 that M365 Copilot can reference a user’s “memory” in voice chats, using stored personalization settings for more relevant responses [4].  Slack, on January 29, relaunched Slackbot as a “personal, context-aware AI agent for work,” designed to be the teammate that was “in the meeting with you,” saving users at least 90 minutes per day by internal estimates [5].  Atlassian followed on January 30, declaring that “teammate agents are what’s hot in 2026” and showcasing Rovo AI, which pulls context from project trackers, code repositories, and third-party apps to act as a core member of the team [6].  Google introduced “Personal Intelligence” for Search on January 22, connecting AI with private Gmail and Photos data to deliver tailored recommendations and turn a public utility into a personal concierge [7].This shift is significant enough to create a new software category. The viral open-source project OpenClaw showcased a personal AI that could run locally and control user apps, eventually leading to its creator being hired by OpenAI [8]. It hints at the end of app sprawl: why juggle a dozen tools when one intelligent agent can coordinate calendars, tasks, and research?The Rise of Voice: Don’t Type, Just SpeakThe conversational interface reaches its strongest expression through voice. As Forbes argued, voice is becoming the defining UI of the AI era [9]. Tools like Wispr Flow are advancing this concept with a universal voice input layer that works across applications and turns speech into polished text at up to 220 words per minute [10]. It is not an app you switch into; it is a layer that sits on top of everything.This is not a niche behavior. A 2026 Voices.com report found that 55% of consumers now use voice to interact with AI, while only 29% of companies have deployed voice AI, exposing a clear gap between user behavior and enterprise adoption [11]. VentureBeat described this transition as a move from “chatbots that speak” to “empathetic interfaces” that understand nuance and intent [12]. That is why every major SaaS player is moving quickly to add voice.The graphical interface is not disappearing. Complex visualization, creative design, and exploratory analysis still benefit from visual canvases. The future is a hybrid model where voice and conversation handle routine tasks while GUI surfaces are used for specialized work, potentially generated on the fly by AI for each specific need.The “SaaSpocalypse” and the New Business ModelThis shift is fueling what some call a “SaaSpocalypse.” A February 2026 Fortune report noted that $2 trillion had been wiped from software stocks as AI pressures traditional SaaS models [13]. The conventional per-seat, per-month model is weakening. As Goldman Sachs noted, we are entering an “agent-as-a-service economy” where organizations deploy fleets of agents and pay by token consumption rather than human time [2].In its place, a new model is emerging: outcome-based pricing. Intercom’s AI agent Fin is a strong example. Customers pay for results ($0.99 per resolved issue), not software access. Fin now handles over 80% of support volume and has grown past $100M ARR, demonstrating the economic power of this model [14].Welcome to the Post-App EraThe pieces are now in place. Personalized, agentic AI and voice-first interfaces are dissolving the traditional SaaS model. The center of gravity is shifting from features to outcomes and from clicks to conversations.The key question is no longer “Which app should I use?” but “What do I want to accomplish?” In 2026, for the first time, software is ready to answer directly.References:[1] Harvard Business Review. (2026, February). A Blueprint for Enterprise-Wide Agentic AI Transformation.[2] Goldman Sachs. (2026). What to Expect From AI in 2026: Personal Agents, Mega Alliances.[3] Glean. (2026, February 17). Glean’s Latest AI Assistant Moves Every Employee from Insight to Execution.[4] Microsoft Tech Community. (2026, January). What’s New in Microsoft 365 Copilot.[5] Slack. (2026, January). Introducing Slackbot, Your Context-Aware AI Agent for Work.[6] Atlassian. (2026, January). AI Takes a Seat on the Team.[7] Google. (2026, January). Google Brings Personal Intelligence to AI Mode in Search.[8] TechCrunch. (2026, February 15). OpenClaw Creator Peter Steinberger Joins OpenAI.[9] Forbes. (2026, February). Is Voice Becoming the UI of the AI Era?.[10] Wispr Flow. (2026). Effortless Voice Dictation.[11] Voices.com. (2026). Amplified 2026: The State of Voice and the Trends Shaping the Industry.[12] VentureBeat. (2026). Everything in Voice AI Just Changed.[13] Fortune. (2026, February 13). SaaSpocalypse: Why $2 Trillion Got Wiped From Software Stocks.[14] GTM Now. (2026). How Intercom Built a $100M AI Agent with Outcome Pricing.",
      "views": 172,
      "reading_minutes": 5,
      "tags": [
        
          
          {
            "name": "SaaS",
            "slug": "saas",
            "url": "/tags/saas/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Voice AI",
            "slug": "voice-ai",
            "url": "/tags/voice-ai/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "Business Models",
            "slug": "business-models",
            "url": "/tags/business-models/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "OpenClaw and the Rise of User-Built Intelligence: A Wake-Up Call for SaaS",
      "url": "/2026/02/01/openclaw-and-the-rise-of-user-built-intelligence-a-wake-up-call-for-saas/",
      "date_display": "February 1, 2026",
      "date_iso": "2026-02-01",
      "excerpt": "OpenClaw has exploded in popularity with over 114,000 GitHub stars in just two months. It represents a fundamental shift in how users interact with software - a direct challenge to the traditional SaaS model. While SaaS platforms became systems of record, users are now building their own intelligence layers on top.",
      "content": "In the last few weeks, the AI community has been captivated by a project that is not a new model, but a new paradigm. OpenClaw, an open-source personal AI assistant, has exploded in popularity, amassing over 114,000 GitHub stars in just two months [1]. Andrej Karpathy, one of the most respected voices in AI, described it as “genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently” [2].This is not just another AI tool. OpenClaw represents a fundamental shift in how users interact with software, and it is a direct challenge to the traditional SaaS model. While SaaS platforms have spent a decade becoming the systems of record for business data, users are now building their own intelligence layers on top, turning incumbent platforms into dumb data pipes. This is the wake-up call for every SaaS company.You know what's crazy about @openclaw... It will actually be the thing that nukes a ton of startups, not ChatGPT as people meme about... The fact that it's hackable (and more importantly, self-hackable) and hostable on-prem will make sure tech like this DOMINATES conventional SaaS imo&mdash; Max Rovensky (@MaxRovensky) January 2026The Ambient AI RevelationWhat makes OpenClaw so significant? It’s not the technology itself, which is a clever combination of existing tools. As one analyst put it, OpenClaw’s innovation was to give an AI model “its own computer and told it to act like a personal assistant” [4].The real breakthrough is the validation of a new form factor for AI: ambient, proactive intelligence. Unlike every major AI tool today - ChatGPT, Copilot, even your own internal copilots - which require a human in the loop, OpenClaw is designed to act autonomously. It runs 24/7, even when you’re asleep, watching for things that matter and taking action on your behalf. As one writer noted, “Claude Code knows your codebase. OpenClaw knows your life” [4].This flips the current SaaS paradigm on its head. SaaS platforms are systems of record, but they are blind to the process of the business. They capture the nouns, but not the verbs. This is the “System of Record Trap.” Your CRM knows your customer data, but it doesn’t know the informal follow-up sequence your top salesperson uses. Your project management tool knows your deadlines, but it doesn’t know the complex triage process your team uses to handle incoming requests. This is the value that is being left on the table, and it’s the value that tools like OpenClaw are now capturing.More Than a Toy: What Users Are Actually BuildingIf you think this is just a developer toy, you are mistaken. The community around OpenClaw is building and sharing thousands of “skills” that give their agents real-world capabilities. As chronicled by Simon Willison, users are already using OpenClaw to [1]:  Buy a car by negotiating with multiple dealers over email.  Remotely control an Android phone to scroll through TikTok or use Google Maps.  Monitor a server for security threats, detecting failed SSH login attempts and exposed ports.  Transcribe voice messages by finding an API key and using it to call the Whisper API.These are not simple automations. They are complex, multi-step workflows that are being built and executed outside of any traditional SaaS platform. The value being unlocked is so compelling that users are willing to accept significant security risks, a phenomenon Simon Willison calls the “Normalization of Deviance” [1]. People are buying dedicated Mac Minis just to run OpenClaw in a sandboxed environment, a clear signal of the demand for this new paradigm.The SaaS Dilemma: Build or Be BypassedThis is the existential threat to SaaS. Every workflow built in OpenClaw is a workflow that is not being captured by the underlying SaaS platform. Every decision made by a personal AI agent is a decision that the SaaS vendor has no visibility into. The SaaS platform becomes a commodity data layer, a “dumb data pipe,” while the intelligence, the context, and the customer relationship move to the agentic layer.The only durable defense is to build a native intelligence layer that allows users to automate their workflows directly within the platform. This journey from reactive software to proactive intelligence unfolds in three stages.            Stage      Description      User Experience                  1. User-Built Automation      Users can describe their goals in natural language, and the platform builds and runs the automation workflow natively.      “When a new maintenance request comes in, check if it’s urgent. If so, text the on-call vendor.”              2. Pattern Learning      The platform analyzes workflow usage across its user base to identify common patterns and best practices.      The platform notices that responding to requests within 4 hours boosts tenant retention by 40% and suggests this workflow to other users.              3. Proactive Delivery      The platform learns individual user patterns and proactively delivers personalized automation, anticipating needs before the user even asks.      A property manager logs in to find the weekend’s maintenance requests already triaged, assigned, and with draft notifications ready for approval.      This evolution transforms a SaaS product from a passive tool into an active partner, creating a powerful moat built on compounded knowledge of user behavior. The more users automate, the smarter the platform becomes, and the harder it is for competitors to replicate.The Time to Act is NowThe path forward for SaaS leaders is clear, and the timeline is short. The technological pillars are now in place: reliable function-calling models, long context windows, and universal standards like the Model Context Protocol (MCP) are mature and widely adopted. The enterprise demand has been validated by the explosive growth of platforms like Salesforce Agentforce, which generated $900 million in revenue in its first six months.The choice for SaaS vendors is stark: either build a native intelligence layer or risk becoming a commoditized backend for your users’ personal AI agents. The era of passive, reactive software is over. The agentic workspace is the new strategic imperative, and the time to build it is now.References:[1] Willison, S. (2026, January 30). Moltbook is the most interesting place on the internet right now. Simon Willison’s Weblog.[2] Karpathy, A. (2026, January 30). Tweet on X.[3] Rovensky, M. (2026, January 12). Tweet on X.[4] Hwang, J. (2026, January 31). The Ambient AI Era: Clawdbot (OpenClaw)’s Ripple Effects. Nextword.",
      "views": 674,
      "reading_minutes": 5,
      "tags": [
        
          
          {
            "name": "SaaS",
            "slug": "saas",
            "url": "/tags/saas/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "AI Transformation",
            "slug": "ai-transformation",
            "url": "/tags/ai-transformation/#posts"
          },
        
          
          {
            "name": "B2B Software",
            "slug": "b2b-software",
            "url": "/tags/b2b-software/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "The Agentic Workspace: A Strategic Imperative for the Next Era of SaaS",
      "url": "/2026/01/19/the-agentic-workspace-a-strategic-imperative-for-the-next-era-of-saas/",
      "date_display": "January 19, 2026",
      "date_iso": "2026-01-19",
      "excerpt": "Traditional SaaS is under siege from AI agents. The winners won't just add AI features—they'll become agentic workspaces that orchestrate autonomous outcomes. Here's why every SaaS company must make this transition now, and how to build the defensible moat that will define the next decade.",
      "content": "The SaaS landscape is at a critical inflection point. The traditional, human-driven application model is giving way to a new paradigm: the agentic workspace. This is not a distant trend, but a strategic imperative for today. We propose that the next evolution for every successful SaaS company is to become a platform that orchestrates intelligent agents to achieve user outcomes. This transition is complex and fraught with challenges, but for those who navigate it successfully, the rewards will be immense. Those who fail to adapt risk being left behind.The convergence of SaaS and AI agents is reshaping the enterprise software landscapeThe Decline of Seat-Based SaaS DominanceThe traditional SaaS model, built on per-user licensing and incremental feature updates, is facing unprecedented pressure. The rise of powerful, autonomous AI agents is beginning to render this model insufficient. As one industry analyst put it, “In three years, any routine, rules-based digital task could move from ‘human plus app’ to ‘AI agent plus API’” [2]. This fundamental change has exposed the vulnerabilities of the old guard and paved the way for a new generation of AI-native startups.These startups, unburdened by legacy systems, are operating with unprecedented efficiency. As highlighted in recent analysis [5], AI-native firms are averaging $3.48 million in revenue per employee—a staggering 5.7 times more than their traditional SaaS counterparts. This efficiency gap is a clear signal of a major market shift.AI-native startups are averaging $3.48M revenue per employee — 5.7x more than traditional SaaS companiesSix Pressures Reshaping the SaaS ModelDrawing inspiration from analysis by Cloud.Substack [5], the decline of the traditional model can be attributed to six interconnected pressures:            Pressure Point      Description &amp; Example                  Seat Expansion Stall      The primary growth engine for SaaS has sputtered. For example, Zoom, once a paragon of high NRR, saw its enterprise NRR fall to 98% as customers no longer needed to add seats at the same pace [5].              Price Increases Consuming Budget      SaaS inflation is running at nearly 5x the market rate, with price hikes consuming a significant portion of incremental IT budgets. This leaves little room for new investments and creates a cycle of vendor consolidation [5].              The Shift to AI Budgets      Enterprise spending is decisively moving towards AI. With leaders expecting a 75% growth in their LLM budgets, if a product isn’t tapping into this new pool of capital, it’s competing for a shrinking one [5].              The Speed of Innovation      The pace of development has accelerated dramatically. AI-native startups are shipping new features weekly, while traditional SaaS companies are often stuck in quarterly release cycles. This speed differential is a critical competitive advantage.              Single-Product Plateau      The multi-product suite strategy is losing its effectiveness. Customers increasingly prefer best-in-class point solutions, and are less willing to accept a suite of mediocre products from a single vendor [5].              The Value-Add Test      Many early AI features have been underwhelming. The bar for AI integration is now genuine productivity gains, not incremental improvements. Features must deliver measurable, tangible value to justify their cost and complexity [5].      Acknowledging the Obstacles on the Path to AutonomyWhile the promise of agentic AI is immense, the path to full autonomy is not without significant challenges. Acknowledging these hurdles is crucial for a credible strategy.  Reliability and Trust: Agentic systems still struggle with reliability. Hallucinations, where an AI generates false information, remain a key concern. According to a recent McKinsey report, 80% of organizations have already encountered risky behaviors from AI agents, including improper data exposure and unauthorized system access [7]. Building robust validation and human-in-the-loop systems is essential.  The Incumbent’s Moat: Large SaaS players like Salesforce and Microsoft have powerful distribution channels and are actively acquiring promising agent startups. Their deep enterprise integrations and existing customer relationships provide a significant defensive moat that shouldn’t be underestimated.  The Economics of AI: Many AI-native startups are currently operating with a high burn rate, spending heavily on tokens and compute power with an unclear path to profitability. Industry estimates suggest that inference costs can consume 30-50% of gross margins for agent-heavy applications, and the long-term economic viability of these models is still being tested.The New Moat: Capturing the ‘Why’ with Context GraphsDespite the challenges, the strategic advantage of becoming an agentic platform is undeniable. The new competitive moat is the Context Graph: a living record of decision traces that explains not just what happened, but why it was allowed to happen [6].  Agents don’t just need rules. They need access to the decision traces that show how rules were applied in the past, where exceptions were granted, how conflicts were resolved, who approved what, and which precedents actually govern reality. [6]While traditional systems of record store data about objects (like customers or invoices), context graphs create a system of record for decisions. They capture the exceptions, overrides, and precedents that currently live in siloed communications.Context graphs capture the decision traces that explain not just what happened, but whyThis creates a powerful feedback loop. The companies that provide the agentic execution layer are the only ones who can capture these decision traces. As their context graphs grow, their agents become smarter and more reliable, creating a defensible advantage that is nearly impossible for competitors to replicate.Evolving Business Models for the Agentic EraThis transformation requires a radical rethinking of business models. The seat-based license is being replaced by new models that align price with the value AI agents deliver.            Pricing Model      Description &amp; Example                  Usage-Based: Resources      Customers pay for the compute and token resources they consume. Example: A developer platform charges based on the number of API calls and GPU hours used by its agents.              Agent-Based      Customers purchase or subscribe to individual AI agents with specific skills. Example: An e-commerce platform sells a “Pricing Optimization Agent” for a monthly fee.              Usage-Based: Interactions      Customers are charged per discrete interaction or completed task. Example: A customer service platform charges per successfully resolved support ticket.              Outcome-Based: Jobs Completed      Payment is tied to the successful execution of a predefined job. Example: A sales automation platform charges a fee for each qualified lead its agents generate.              Outcome-Based: Financial Pricing      The most advanced model, where payment is a percentage of the financial value created. Example: A marketing automation platform takes a share of the revenue generated from campaigns run by its agents.      What Winners Will Look LikeBeyond the tech giants, a new class of winners is emerging. These companies are not just building features; they are building agentic workspaces. Glean is creating enterprise search agents that can query across dozens of enterprise tools to answer complex questions autonomously—replacing hours of manual research with seconds of agent-driven synthesis. Adept AI is building general-purpose agents that can learn to use any software application through observation and interaction. Meanwhile, Sierra is pioneering conversational AI agents for customer experience that can resolve issues end-to-end without human handoff. These pioneers are demonstrating the power of focusing on autonomous, outcome-driven workflows rather than incremental feature additions.The Strategic Imperative to Act NowThe evidence is clear. The convergence of market pressures, from stalled seat expansion to the rise of hyper-efficient AI-native competitors, points to a single conclusion: the future of SaaS is the agentic workspace. This is no longer a question of ‘if,’ but ‘when.’ The companies that act now—that begin the work of transforming their platforms into orchestrators of intelligent agents and capturing the invaluable context graphs that power them—will be the leaders of the next decade.Where to start? Audit your core workflows for agentic potential: identify the repetitive, rules-based processes where human judgment is minimal but human time is maximal. Then pilot context capture in one high-value process—every decision trace you record today becomes training data for tomorrow’s autonomous agents.The choice is simple: build the future, or be relegated to the past. The time to build your agentic workspace is now.References:[1] Deloitte. (2025, November 18). SaaS meets AI agents: Transforming budgets, customer experience, and workforce dynamics. Deloitte Insights.[2] Bain &amp; Company. (2025, September 23). Will Agentic AI Disrupt SaaS? Bain &amp; Company.[3] Forbes. (2026, January 15). Are SaaS Moats Real Or AI Mirage? The Great Enterprise Software Debate. Forbes.[4] BCG. (2025, August 13). Rethinking B2B Software Pricing in the Era of AI. BCG.[5] Cloud.Substack. (2026, January 17). The 6 Threat Vectors Killing Traditional B2B Software in 2026 (And How to Fight Back). Cloud.Substack.[6] Foundation Capital. (2025, December 22). AI’s trillion-dollar opportunity: Context graphs. Foundation Capital.[7] McKinsey &amp; Company. (2025, October 16). Deploying agentic AI with safety and security: A playbook for technology leaders. McKinsey &amp; Company.",
      "views": 318,
      "reading_minutes": 7,
      "tags": [
        
          
          {
            "name": "SaaS",
            "slug": "saas",
            "url": "/tags/saas/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "Context Graphs",
            "slug": "context-graphs",
            "url": "/tags/context-graphs/#posts"
          },
        
          
          {
            "name": "AI Transformation",
            "slug": "ai-transformation",
            "url": "/tags/ai-transformation/#posts"
          },
        
          
          {
            "name": "B2B Software",
            "slug": "b2b-software",
            "url": "/tags/b2b-software/#posts"
          },
        
          
          {
            "name": "AI Pricing",
            "slug": "ai-pricing",
            "url": "/tags/ai-pricing/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Context Graphs Are a Trillion-Dollar Opportunity. But Who Actually Captures It?",
      "url": "/2026/01/14/context-graphs-are-a-trillion-dollar-opportunity-but-who-captures-it/",
      "date_display": "January 14, 2026",
      "date_iso": "2026-01-14",
      "excerpt": "The concept of Context Graphs has rapidly captured the industry's imagination. The thesis is that the next trillion-dollar enterprise platforms will not be systems of record for data, but systems of record for decisions. But who actually captures this opportunity? The answer is hiding in plain sight—in the agentic tools that are already operating in the wild, generating decision traces every second.",
      "content": "The concept of Context Graphs, first articulated by Jaya Gupta of Foundation Capital, has rapidly captured the industry’s imagination [1]. The thesis is that the next trillion-dollar enterprise platforms will not be systems of record for data, but systems of record for decisions. For the underlying definition, see my explainer on what are context graphs.The thesis is compelling. But the most pressing question remains: who actually captures this trillion-dollar opportunity?The answer, I believe, is hiding in plain sight. It is not in the data warehouses or the CRMs. It is in the agentic tools that are already operating in the wild, in the execution path, generating decision traces every second. And the most advanced and widely discussed of these tools, Anthropic’s Claude Code and the newly released Claude Cowork, provide a fascinating, real-world case study of both the immense potential and the critical, missing piece.The Agents in the Arena: Claude Code and CoworkOn January 12, 2026, Anthropic launched Claude Cowork, a desktop agent that extends the power of its developer-focused Claude Code tool to non-technical users [2]. This was not just another feature release. It was a statement. While the industry has been debating the future of agentic workflows, Anthropic has been shipping them.What makes Claude Code and Cowork so different is that they are not just chatbots; they are doers. They operate within a designated folder on your computer, with the ability to read, write, and create files. They can take a messy folder of receipts and turn it into a structured expense report. They can take scattered notes and draft a coherent document. They are, in short, executing complex, multi-step tasks that generate a rich history of decisions.Perhaps the most stunning demonstration of this was the revelation that Claude Cowork itself was built almost entirely by Claude Code in about a week and a half. Think about that. An AI agent planned and executed the creation of a new software product. This is not a theoretical exercise; it is a real-world, complex decision trace of immense value.The Irony: Generating Traces, But Not Capturing ThemEvery time a developer uses Claude Code to refactor a codebase, or a project manager uses Claude Cowork to organize a project folder, a decision trace is generated. The agent is walking the graph of the user’s intent, pulling context from different files, making decisions, and executing actions. It is creating the raw material of a context graph.But where does that raw material go? It evaporates. It is ephemeral, living for a moment in the agent’s context window or the user’s chat history, but it is not persisted as a structured, queryable artifact. The why is lost, leaving only the what.This is the central irony of the current agentic landscape. The most advanced agentic tools are the perfect instruments for creating context graphs, yet they are not being used for that purpose. They are generating a constant stream of valuable decision data that is simply being discarded.The Ephemeral Nature of Decision Traces in Today’s AgentsWhy Incumbents Can’t Just Add This FeatureIncumbents are structurally disadvantaged from capturing this opportunity. They are simply in the wrong place architecturally.  Systems of Record (Salesforce, Workday): These platforms are built to store the current state of an object. They know the deal is closed-won, but they do not have a record of the dozen steps, approvals, and exceptions that led to that outcome. They are in the wrong architectural layer.  Data Warehouses (Snowflake, Databricks): These platforms are in the read path, not the write path. They receive data via ETL after the decisions have been made and the context has been lost. They can tell you what happened, but they cannot tell you why.Trying to retrofit decision trace capture onto these systems is like trying to understand a chess game by only looking at the final board position. You have lost the move-by-move history that contains all the strategic insight.The Real Race: Who Builds the “Event Clock” for Agents?So, who are the real contenders?            Contender      Strengths      Weaknesses                  Anthropic (The Agent Provider)      Owns the agent and the execution path. In the pole position to build persistence directly into their products.      Not their core business. May see it as a feature, not a platform. Risks vendor lock-in for customers.              Orchestration Startups      Focused on the cross-system workflow layer where context is richest. Can be vendor-neutral, orchestrating agents from multiple providers.      Need to convince customers to adopt a new layer in their stack. Dependent on agent providers for core capabilities.      This brings us to a critical distinction. The goal is not to simply monitor agents. Agent observability and telemetry tools are useful for capturing the what—metrics, logs, and traces of execution. They can tell you an agent made 10 API calls and wrote 3 files. But they cannot tell you why.A decision trace captures the reasoning, the context, and the precedents that led to an action. This is a fundamentally different and more valuable asset than telemetry. The trillion-dollar prize will go to whoever successfully builds the event clock for the agentic era—the system that captures the decision traces of every agent, human, and automated process in the enterprise. My bet is on a new category of company to emerge: one that is purpose-built to be this system of record for decisions.From “Code” and “Cowork” to “Context”Jaya Gupta was right. The opportunity is massive. But the winner will not be a better database or a smarter CRM. The winner will be the company that recognizes that the actions of agents like Claude Code and Cowork are not just outputs; they are assets. They are the building blocks of the enterprise’s collective intelligence.For Anthropic, the path seems clear. The next logical product in their suite is not just a new skill or a new integration. It is Claude Context: a platform that captures, stores, and makes sense of every decision trace their agents generate. It would transform their tools from powerful productivity aids into an indispensable system of record for the modern enterprise.Whether Anthropic seizes this opportunity or leaves the door open for a new wave of startups remains to be seen. But one thing is certain: the race to build the context graph is on, and the companies that are in the execution path of agentic work are the ones with the head start.References:[1] Gupta, J. (2025, December 22). AI’s trillion-dollar opportunity: Context graphs. Foundation Capital.[2] Nuñez, M. (2026, January 12). Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required. VentureBeat.",
      "views": 809,
      "reading_minutes": 5,
      "tags": [
        
          
          {
            "name": "Context Graphs",
            "slug": "context-graphs",
            "url": "/tags/context-graphs/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "Claude Code",
            "slug": "claude-code",
            "url": "/tags/claude-code/#posts"
          },
        
          
          {
            "name": "Claude Cowork",
            "slug": "claude-cowork",
            "url": "/tags/claude-cowork/#posts"
          },
        
          
          {
            "name": "AI Infrastructure",
            "slug": "ai-infrastructure",
            "url": "/tags/ai-infrastructure/#posts"
          },
        
          
          {
            "name": "Systems of Record",
            "slug": "systems-of-record",
            "url": "/tags/systems-of-record/#posts"
          },
        
          
          {
            "name": "Anthropic",
            "slug": "anthropic",
            "url": "/tags/anthropic/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "A Year with Cursor: How My Workflow Evolved from Agent to Architect",
      "url": "/2026/01/04/a-year-with-cursor-how-my-workflow-evolved-from-agent-to-architect/",
      "date_display": "January 4, 2026",
      "date_iso": "2026-01-04",
      "excerpt": "My journey with Cursor mirrors the maturation of the tool itself: from a simple agent to a sophisticated architectural partner. This post details how my workflow evolved through @ mentions, MCP, Plan Mode, and custom commands.",
      "content": "It’s been over a year since I made Cursor my primary IDE, and it’s hard to overstate the impact it’s had on my work. As a machine learning engineer building conversational AI platforms at Dylog and experimenting with agentic infrastructure on my personal projects, I’ve lived through the evolution of AI-native development. My journey with Cursor mirrors the maturation of the tool itself: from a simple agent to a sophisticated architectural partner.This post is a reflection on that journey, detailing how my workflow evolved and how I’ve come to rely on a powerful combination of Plan Mode, custom commands, and context engineering to build faster, smarter, and with more clarity.Phase 1: The Agent Takes the WheelWhen I first started, my usage was simple. I treated Cursor like a supercharged autocomplete. I’d write a comment, hit Cmd+K, and let the agent generate the code. It was magical, but it was also a black box. I was a passenger, and the agent was driving.Then came the @ mentions. This was my first taste of giving the agent real context. Instead of hoping it understood my codebase, I could explicitly tell it what to look at:  @file to reference a specific file  @folder to include an entire directory  @codebase to let it search across the whole project  @web to pull in external documentation  @docs to reference official docs for librariesThis was a huge leap. Suddenly, the agent wasn’t guessing; it was working with the same context I had. I could say “refactor this function to match the pattern in @file:utils/helpers.ts” and it would actually understand.The @ mention dropdown in Cursor, showing context options like @file, @folder, @codebase, @web, and @docs that allow explicit context controlBut even with better context, I’d often find myself in a loop of generating, debugging, and regenerating. The agent lacked the architectural vision for larger tasks.Phase 2: MCP Changes EverythingThe introduction of Model Context Protocol (MCP) was when things got serious. MCP allowed me to connect Cursor to external tools and data sources, turning the agent from a code generator into a true assistant with access to my entire workflow.I started integrating MCPs for:  GitHub for pulling issues and PRs directly into context  Linear for task management integration  Slack for team communication context  Custom MCPs for internal APIs and databasesWith MCP, I could say “implement the feature described in Linear issue #234” and the agent would fetch the issue, understand the requirements, and start building. It was no longer just about code; it was about connecting the dots across my entire development ecosystem.MCP configuration panel showing connected integrations like GitHub, Linear, Slack, and custom servers that extend Cursor’s capabilities across the development ecosystemPhase 3: The Rise of the PlannerThe introduction of Plan Mode was the next game-changer. It was the first time I felt like I was collaborating with the AI, not just delegating to it. Inspired by workflows from developers like Ray Fernando, I started using a two-step process:      Plan with Opus: I’d use a powerful model like Claude Opus to generate a detailed, step-by-step implementation plan. I’d give it the high-level goal, and it would break it down into a series of concrete tasks, complete with file names, function signatures, and logic.        Execute with Sonnet/GPT: I’d then hand that plan to a faster, cheaper model like Sonnet or GPT-5.2 to execute each step. The cheaper model didn’t need to be a brilliant architect; it just needed to be a diligent builder.  This workflow was a massive improvement. It separated the “what” from the “how,” and it gave me a reviewable artifact—the plan—that I could edit and approve before any code was written. It also saved a ton of money on tokens.A split view showing a detailed implementation plan in a .cursor/plans/ file on the left, and the corresponding generated code on the right, demonstrating the separation of architecture from executionPhase 4: The Architect Emerges (Commands + Planning)This is where I live today. While Plan Mode is still central to my workflow, I’ve layered on a set of custom commands and rules to fine-tune the process and bake my architectural principles directly into the IDE.My Current SetupRules (.cursorrules): I have a set of rules that define my coding standards, preferred patterns, and architectural constraints. The agent reads these before every task, ensuring consistency across the codebase.Custom Commands: I’ve built commands that wrap my most common workflows:  /plan - Generates a detailed implementation plan using Opus  /refactor - Takes a file and refactors it based on instructions  /test - Generates a test suite for a given function  /review - Reviews code against my rules and suggests improvementsQueued Messages: I use Ctrl+Enter to queue follow-up instructions while the agent is working. This lets me think ahead and keep the momentum going without interrupting the current task.The Cursor command palette showing custom commands like /plan, /refactor, /test, and /review, alongside a .cursorrules file that defines coding standards and architectural constraintsThe Evolution at a Glance            Phase      Key Feature      What Changed                  1      Agent Mode + @ Mentions      Context became explicit, not guessed              2      MCP Integration      External tools and data became accessible              3      Plan Mode      Architecture separated from execution              4      Commands + Rules      Workflows became repeatable and personalized      Why This MattersThis evolution from agent to architect is more than just a personal productivity hack. It’s a glimpse into the future of software development. We’re moving from a world where we write code to a world where we describe systems. Our job is to be the architect, to define the blueprint, and to let the agents do the building.Cursor, more than any other tool I’ve used, understands this shift. It’s not just about generating code; it’s about managing complexity, maintaining context, and giving developers the leverage to build at a scale that was previously unimaginable.If you’re still using AI as a simple code generator, I encourage you to explore @ mentions, MCP, Plan Mode, and custom commands. It’s a journey that will transform you from a developer who uses AI to an architect who directs it.",
      "views": 1255,
      "reading_minutes": 5,
      "tags": [
        
          
          {
            "name": "Cursor",
            "slug": "cursor",
            "url": "/tags/cursor/#posts"
          },
        
          
          {
            "name": "AI IDE",
            "slug": "ai-ide",
            "url": "/tags/ai-ide/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Developer Workflow",
            "slug": "developer-workflow",
            "url": "/tags/developer-workflow/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "Plan Mode",
            "slug": "plan-mode",
            "url": "/tags/plan-mode/#posts"
          },
        
          
          {
            "name": "AI Productivity",
            "slug": "ai-productivity",
            "url": "/tags/ai-productivity/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Developer Tools",
            "slug": "developer-tools",
            "url": "/tags/developer-tools/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "What Are Context Graphs, Really?",
      "url": "/2026/01/01/what-are-context-graphs-really/",
      "date_display": "January 1, 2026",
      "date_iso": "2026-01-01",
      "excerpt": "The conversation around context graphs has exploded, but the term itself has become a Rorschach test. This is not about adding memory to your agent—it's about rethinking our assumptions about data, time, and organizational knowledge. The Two Clocks Problem reveals why we're missing half of time in enterprise systems, and why this is fundamentally a representation problem, not a database problem.",
      "content": "Last week, I wrote about my reaction to Jaya Gupta’s viral post on Context Graphs [1]. The idea of a “system of record for decisions” resonated deeply, framing the evolution of agentic infrastructure from tools to skills to memory. But since then, the conversation has exploded, and it has become clear that the term “context graph” itself is a bit of a Rorschach test. Everyone sees something different.Animesh Koratana, founder of PlayerZero, has written a series of follow up posts that cut through the noise and get to the heart of what a context graph actually is, and why it is so structurally hard to build [2] [3]. His insights are critical for anyone serious about building agentic AI in the enterprise. This is not about “adding memory to your agent” or wiring up a graph database. It is about rethinking our assumptions about data, time, and the nature of organizational knowledge.The Two Clocks Problem: Why We Are Missing Half of TimeKoratana’s most powerful insight is what he calls the Two Clocks Problem. We have built trillion dollar infrastructure for the state clock: what is true right now. Your CRM stores the final deal value. Your ticketing system stores “resolved.” Your codebase stores the current state.But we have almost no infrastructure for the event clock: what happened, in what order, and with what reasoning. The git blame shows who changed the timeout from 5s to 30s, but the why is gone. The CRM says “closed lost,” but it does not say you were the second choice and the winner had one feature you are shipping next quarter. As Koratana puts it:  “We’ve built trillion-dollar infrastructure for what’s true now. Almost nothing for why it became true.”This is the core of the problem. We are asking agents to exercise judgment without access to precedent. We are training lawyers on verdicts without case law. The context graph is the infrastructure for the event clock. It is the case law of the enterprise.The Five Coordinate Systems Problem: Why This Is Not a Database ProblemSo why can’t we just build a better database? Because a context graph requires joins across five different coordinate systems that do not share keys:  Events: What happened?  Timeline: When did it happen?  Semantics: What does it mean?  Attribution: Who owned it?  Outcome: What did it cause?Each of these has a different geometry. Timelines are linear. Events are sequential. Semantics live in vector space. Attribution is graph structured. Outcomes are causal DAGs. And the keys are fluid. “Jaya Gupta” in an email, “J. Gupta” in a contract, and “@JayaGup10” in Slack are the same entity with no shared identifier.Traditional databases are built for joins on stable keys within a single coordinate system. Context graphs require probabilistic joins across all five simultaneously. This is not a database problem; it is a representation problem.Agents as Informed Walkers: How We Solve the Representation ProblemIf the ontology of every organization is different and constantly changing, how can we ever hope to model it? Koratana’s answer is that we do not have to. The agents do it for us.When an agent works through a problem, its trajectory is a trace through the state space of the organization. It is an implicit map of the ontology, discovered through use rather than specified upfront. This is the key insight from graph representation learning (node2vec): you do not need to know the structure of a graph to learn representations of it. You just need to walk it.Agents are informed walkers. Their trajectories are not random; they are problem directed. By accumulating enough of these trajectories, we can learn embeddings that encode the structure of the organization. We can learn that two engineers who never interact are structurally equivalent because they play the same role in different subgraphs. We can learn that a certain sequence of events is a precursor to churn, even if those events have never been explicitly linked.What This Actually Means for BuildersSo, what is a context graph, really? It is not a graph database. It is not a vector store. It is a learned representation of organizational reasoning, derived from the trajectories of agents solving problems.This has profound implications for how we build agentic systems:  The agents are not building the context graph; they are solving problems worth solving. The context graph is an emergent property of their work. The focus should be on deploying agents into real workflows, not on building a perfect ontology upfront.  The value is in the trajectories, not the state. We need to shift our focus from storing the final state to capturing the full, replayable history of how that state was reached.  This is a machine learning problem, not a data engineering problem. The goal is not to build a perfect data model, but to learn a representation that is useful for reasoning.Building a context graph is not about buying a new piece of software. It is about a fundamental shift in how we think about data, time, and the nature of work in the agentic era. It is about recognizing that the most valuable asset we have is not our data, but the accumulated wisdom of the decisions we make every day. And it is about building the infrastructure to finally capture that wisdom and put it to work.References:[1] Gupta, J. (2025, December 23). AI’s trillion-dollar opportunity: Context graphs. X.[2] Koratana, A. (2026, January 1). Why context graphs are rare in the wild. LinkedIn.[3] Koratana, A. (2025, December 28). How to build a context graph. LinkedIn.",
      "views": 2224,
      "reading_minutes": 4,
      "tags": [
        
          
          {
            "name": "Context Graphs",
            "slug": "context-graphs",
            "url": "/tags/context-graphs/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "Two Clocks Problem",
            "slug": "two-clocks-problem",
            "url": "/tags/two-clocks-problem/#posts"
          },
        
          
          {
            "name": "Event Sourcing",
            "slug": "event-sourcing",
            "url": "/tags/event-sourcing/#posts"
          },
        
          
          {
            "name": "Graph Representation Learning",
            "slug": "graph-representation-learning",
            "url": "/tags/graph-representation-learning/#posts"
          },
        
          
          {
            "name": "AI Infrastructure",
            "slug": "ai-infrastructure",
            "url": "/tags/ai-infrastructure/#posts"
          },
        
          
          {
            "name": "Organizational Memory",
            "slug": "organizational-memory",
            "url": "/tags/organizational-memory/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Context Graphs: My Thoughts on the Trillion Dollar Evolution of Agentic Infrastructure",
      "url": "/2025/12/26/context-graphs-my-thoughts-on-the-trillion-dollar-evolution-of-agentic-memory/",
      "date_display": "December 26, 2025",
      "date_iso": "2025-12-26",
      "excerpt": "After reading Jaya Gupta's post about Context Graphs, I have not been able to stop thinking about it. For me, it did something personal: it gave a name to the architectural pattern I have been circling around in the agentic infrastructure discussions on this blog for the past year. Gupta's thesis is simple but profound—the last generation of enterprise software created trillion dollar companies by becoming systems of record. The question now is whether a new layer will emerge on top of them: a system of record for decisions.",
      "content": "After reading Jaya Gupta’s post about Context Graphs, I have not been able to stop thinking about it [1]. For me, it did something personal: it gave a name to the architectural pattern I have been circling around in the agentic infrastructure discussions on this blog for the past year. I later wrote a more direct explainer on what are context graphs and why the term means more than agent memory.Gupta’s thesis is simple but profound. The last generation of enterprise software (Salesforce, Workday, SAP) created trillion dollar companies by becoming systems of record. Own the canonical data, own the workflow, own the lock in. The question now is whether those systems survive the shift to agents. Gupta argues they will, but that a new layer will emerge on top of them: a system of record for decisions.I agree. And I think this is the missing piece that connects everything I have been writing about.The Missing Layer: Decision TracesWhat resonated most with me was Gupta’s articulation of the decision trace. This is the context that currently lives in Slack threads, deal desk conversations, escalation calls, and people’s heads. It is the exception logic that says, “We always give healthcare companies an extra 10% because their procurement cycles are brutal.” It is the precedent from past decisions that says, “We structured a similar deal for Company X last quarter, we should be consistent.”None of this is captured in our systems of record. The CRM shows the final price, but not who approved the deviation or why. The support ticket says “escalated to Tier 3,” but not the cross system synthesis that led to that decision. As Gupta puts it:  “The reasoning connecting data to action was never treated as data in the first place.”This is the wall that every enterprise hits when they try to scale agents. The wall is not missing data. It is missing decision traces.From Tools to Skills to Context: The Evolution I Have Been DocumentingReading Gupta’s post, I realized that the evolution I have been documenting on this blog (from MCP to Agent Skills to governance) is really a story about building the infrastructure for context graphs. Let me explain.Phase 1 was about tools. The Model Context Protocol (MCP) gave agents the ability to interact with external systems. It was the plumbing that connected agents to databases, APIs, and the outside world. But we quickly learned that tool access alone is not enough. An agent with a hammer is not a carpenter.Phase 2 was about skills. Anthropic’s Agent Skills standard gave us a way to codify procedural knowledge, the “how to” guides that teach agents to use tools effectively. Skills are the brain of the agent. They turn tribal knowledge into portable, composable assets. But even skills are not enough. An agent with a hammer and a carpentry manual is still not a master carpenter.Phase 3 is about context. This is where context graphs come in. A context graph is the accumulated record of every decision, every exception, and every outcome. It answers the question, “What happened last time?” It turns exceptions into precedents and tribal knowledge into institutional knowledge.            Phase      Primitive      What It Provides      My Analogy                  Phase 1      Tools (MCP)      Capability      The agent has a hammer.              Phase 2      Skills (Agent Skills)      Expertise      The agent has a carpentry manual.              Phase 3      Context (Context Graphs)      Experience      The agent has access to the record of every house it has ever built.      Why This Matters for the Governance StackThe governance stack I have been advocating for (agent registries, tool registries, skill registries, policy engines) is the infrastructure that makes context graphs possible. The agent registry provides the identity of the agent making the decision. The tool registry (MCP) provides the capabilities available to that agent. The skill registry provides the expertise that guides the agent’s actions. And the orchestration layer is where the decision trace is captured and persisted.Without this infrastructure, decision traces are ephemeral. They exist for a moment in the agent’s context window and then disappear. With this infrastructure, every decision becomes a durable artifact that can be audited, learned from, and used as precedent.My TakeawayGupta is right that agent first startups have a structural advantage here. They sit in the execution path. They see the full context at decision time. Incumbents, built on current state storage, simply cannot capture this.But the bigger insight for me is this: we are not just building agents. We are building the decision record of the enterprise. The context graph is not a feature; it is the foundation of a new kind of system of record. The enterprises that win in the agentic era will be those that recognize this and invest in the infrastructure to capture, store, and leverage their decision traces.We started by giving agents tools. Then we taught them skills. Now, we must give them context. That is the trillion dollar evolution.References:[1] Gupta, J. (2025, December 23). AI’s trillion dollar opportunity: Context graphs. X.",
      "views": 1716,
      "reading_minutes": 4,
      "tags": [
        
          
          {
            "name": "Context Graphs",
            "slug": "context-graphs",
            "url": "/tags/context-graphs/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Agent Skills",
            "slug": "agent-skills",
            "url": "/tags/agent-skills/#posts"
          },
        
          
          {
            "name": "AI Infrastructure",
            "slug": "ai-infrastructure",
            "url": "/tags/ai-infrastructure/#posts"
          },
        
          
          {
            "name": "Decision Traces",
            "slug": "decision-traces",
            "url": "/tags/decision-traces/#posts"
          },
        
          
          {
            "name": "AI Governance",
            "slug": "ai-governance",
            "url": "/tags/ai-governance/#posts"
          },
        
          
          {
            "name": "Systems of Record",
            "slug": "systems-of-record",
            "url": "/tags/systems-of-record/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "2025: The Year Agentic AI Got Real (What Comes Next)",
      "url": "/2025/12/23/2025-the-year-agentic-ai-got-real-and-what-comes-next/",
      "date_display": "December 23, 2025",
      "date_iso": "2025-12-23",
      "excerpt": "If 2024 was the year of AI experimentation, 2025 was the year of industrialization. The speculative boom around generative AI has rapidly matured into the fastest-scaling software category in history, with autonomous agents moving from the lab to the core of enterprise operations.",
      "content": "If 2024 was the year of AI experimentation, 2025 was the year of industrialization. The speculative boom around generative AI has rapidly matured into the fastest-scaling software category in history, with autonomous agents moving from the lab to the core of enterprise operations. As we close out the year, it’s clear that the agentic AI landscape has been fundamentally reshaped by massive investment, critical standardization, and a clear-eyed focus on solving the hard problems of production readiness.But this wasn’t just a story of adoption. 2025 was the year the industry confronted the architectural limitations of monolithic agents and began a decisive shift toward a more specialized, scalable, and governable future.The $37 Billion Build-Out: From Experiment to Enterprise ImperativeThe most telling sign of this shift is the sheer volume of capital deployed. According to a December 2025 report from Menlo Ventures, enterprise spending on generative AI skyrocketed to $37 billion in 2025, a stunning 3.2x increase from the previous year [1]. This surge now accounts for over 6% of the entire global software market.Crucially, over half of this spending ($19 billion) flowed directly into the application layer, demonstrating a clear enterprise priority for immediate productivity gains over long-term infrastructure bets. This investment is validated by strong adoption metrics, with a recent PwC survey finding that 79% of companies are already adopting AI agents [2].Source: Menlo Ventures, 2025: The State of Generative AI in the Enterprise [1]Solving the Interoperability Crisis: The Standardization of 2025While the spending boom captured headlines, a quieter, more profound revolution was taking place in the infrastructure layer. The primary challenge addressed in 2025 was the interoperability crisis. The early agentic ecosystem was a chaotic landscape of proprietary APIs and fragmented toolsets, making it nearly impossible to build robust, cross-platform applications. This year, two key developments brought order to that chaos.1. The Maturation of MCPThe Model Context Protocol (MCP), introduced in late 2024, became the de facto standard for agent-to-tool communication. Its first anniversary in November 2025 was marked by a major spec release that introduced critical enterprise features like asynchronous operations, server identity, and a formal extensions framework, directly addressing early complaints about its production readiness [3].This culminated in the December 9th announcement that Anthropic, along with Block and OpenAI, was donating MCP to the newly formed Agentic AI Foundation (AAIF) under the Linux Foundation [4]. With over 10,000 active public MCP servers and 97 million monthly SDK downloads, MCP’s transition to a neutral, community-driven standard solidifies its role as the foundational protocol for the agentic economy.The shift from fragmented, proprietary APIs to a unified, MCP-based approach simplifies agent-tool integration.2. The Dawn of Portable SkillsFollowing the same playbook, Anthropic made another pivotal move on December 18th, opening up its Agent Skills specification [5]. This provides a standardized, portable way to equip agents with procedural knowledge, moving beyond simple tool-use to more complex, multi-step task execution. By making the specification and SDK available to all, the industry is fostering an ecosystem where skills can be developed, shared, and deployed across any compliant AI platform, preventing vendor lock-in.The Next Frontier: The Rise of the Agent WorkforceThese standardization efforts have unlocked the next major architectural shift: the move away from monolithic, general-purpose agents toward collections of specialized skills that function like a human team. No company hires a single “super-employee” to be a marketer, an engineer, and a financial analyst. They hire specialists who excel at their roles and collaborate to achieve a larger goal. The future of enterprise AI is the same.This “multi-agent” or “skill-based” architecture is not just a theoretical concept. Anthropic’s own research showed that a multi-agent system—with a lead agent coordinating specialized sub-agents—outperformed a single, more powerful agent by over 90% on complex research tasks [6]. The reason is simple: specialization allows for greater accuracy, and parallelism allows for greater scale.We are already seeing the first wave of companies built on this philosophy. YC-backed Getden.io, for example, provides a platform for non-engineers to build and collaborate with agents that can be composed of various skills and integrations [7]. This approach democratizes agent creation, allowing domain experts—not just developers—to build the specialized “digital employees” they need.The Challenges of 2026: From Adoption to GovernanceWhile 2025 solved the problem of connection, 2026 will be about solving the challenges of control and coordination at scale. As enterprises move from deploying dozens of agents to thousands of skills, a new set of problems comes into focus:      Governance at Scale: How do you manage access control, cost, and versioning for thousands of interconnected skills? The risk of “skill sprawl” and shadow AI is immense, demanding a new generation of governance platforms.        Reliability and Predictability: The non-deterministic nature of LLMs remains a major barrier to enterprise trust. For agents to run mission-critical processes, we need robust testing frameworks, better observability tools, and architectural patterns that ensure predictable outcomes.        Multi-Agent Orchestration: As skill-based systems become the norm, the primary challenge shifts from tool-use to agent coordination. How do you manage dependencies, resolve conflicts, and ensure a team of agents can reliably collaborate to complete a complex workflow? This is a frontier problem that will define the next generation of agentic platforms.        Security in a Composable World: A world of interoperable skills creates new attack surfaces. How do you secure the supply chain for third-party skills? How do you prevent a compromised agent from triggering a cascade of failures across a complex workflow? The security model for agentic AI is still in its infancy.  The groundwork laid in 2025 was monumental. It moved us from a world of isolated, experimental bots to the brink of a true agentic economy. But the journey is far from over. The companies that will win in 2026 and beyond will be those that master the art of building, managing, and securing not just agents, but entire workforces of specialized, collaborative skills.References:[1] Menlo Ventures. (2025, December 9). 2025: The State of Generative AI in the Enterprise. Menlo Ventures.[2] PwC. (2025, May 16). PwC’s AI Agent Survey. PwC.[3] Model Context Protocol. (2025, November 25). One Year of MCP: November 2025 Spec Release. Model Context Protocol Blog.[4] Anthropic. (2025, December 9). Donating the Model Context Protocol and establishing the Agentic AI Foundation. Anthropic.[5] VentureBeat. (2025, December 18). Anthropic launches enterprise ‘Agent Skills’ and opens the standard. VentureBeat.[6] Anthropic. (2025, June 13). How we built our multi-agent research system. Anthropic Engineering.[7] Y Combinator. (2025). Den: Cursor for knowledge workers. Y Combinator.",
      "views": 947,
      "reading_minutes": 5,
      "tags": [
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Agent Skills",
            "slug": "agent-skills",
            "url": "/tags/agent-skills/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "AI Infrastructure",
            "slug": "ai-infrastructure",
            "url": "/tags/ai-infrastructure/#posts"
          },
        
          
          {
            "name": "Multi-Agent Systems",
            "slug": "multi-agent-systems",
            "url": "/tags/multi-agent-systems/#posts"
          },
        
          
          {
            "name": "AI Governance",
            "slug": "ai-governance",
            "url": "/tags/ai-governance/#posts"
          },
        
          
          {
            "name": "Open Standards",
            "slug": "open-standards",
            "url": "/tags/open-standards/#posts"
          },
        
          
          {
            "name": "2025 Review",
            "slug": "2025-review",
            "url": "/tags/2025-review/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Agent Skills: The Missing Piece of the Enterprise AI Puzzle",
      "url": "/2025/12/18/agent-skills-the-missing-piece-of-the-enterprise-ai-puzzle/",
      "date_display": "December 18, 2025",
      "date_iso": "2025-12-18",
      "excerpt": "The enterprise AI landscape is at a critical juncture. We have powerful general-purpose models and a growing ecosystem of tools. But we are missing a crucial piece: a standardized, portable way to equip agents with the procedural knowledge and organizational context they need to perform real work.",
      "content": "The enterprise AI landscape is at a critical juncture. We have powerful general-purpose models and a growing ecosystem of tools. But we are missing a crucial piece of the puzzle: a standardized, portable way to equip agents with the procedural knowledge and organizational context they need to perform real work. On December 18, 2025, Anthropic took a major step towards solving this problem by releasing Agent Skills as an open standard [1]. This move, following the same playbook that made the Model Context Protocol (MCP) an industry-wide success, is not just another feature release—it is a fundamental shift in how we will build and manage agentic workforces.The Problem: General Intelligence Isn’t EnoughGeneral-purpose agents like Claude are incredibly capable, but they lack the specialized expertise required for most enterprise tasks. As Anthropic puts it, “real work requires procedural knowledge and organizational context” [2]. An agent might know what a pull request is, but it doesn’t know your company’s specific code review process. It might understand financial concepts, but it doesn’t know your team’s quarterly reporting workflow. This gap between general intelligence and specialized execution is the primary barrier to scaling agentic AI in the enterprise.Until now, the solution has been to build fragmented, custom-designed agents for each use case. This creates a landscape of “shadow AI”—siloed, unmanageable, and impossible to govern. What we need is a way to make expertise composable, portable, and discoverable. This is exactly what Agent Skills are designed to do.The Solution: Codified Expertise as a StandardAt its core, an Agent Skill is a directory containing a SKILL.md file and optional subdirectories for scripts, references, and assets. It is, as Anthropic describes it, “an onboarding guide for a new hire” [2]. The SKILL.md file contains instructions, examples, and best practices that teach an agent how to perform a specific task. The key innovation is progressive disclosure, a three-level system for managing context efficiently:  Metadata: At startup, the agent loads only the name and description of each installed skill. This provides just enough information for the agent to know when a skill might be relevant, without flooding its context window.  Instructions: When a skill is triggered, the agent loads the full SKILL.md body. This gives the agent the core instructions it needs to perform the task.  Resources: If the task requires more detail, the agent can dynamically load additional files from the skill’s scripts/, references/, or assets/ directories. This allows skills to contain a virtually unbounded amount of context, loaded only as needed.This architecture is both simple and profound. It allows us to package complex procedural knowledge into a standardized, shareable format. It solves the context window problem by making context dynamic and on-demand. And by making it an open standard, Anthropic is ensuring that this expertise is portable across any compliant agent platform.            Component      Purpose      Context Usage                  Metadata (name, description)      Skill discovery      Minimal (loaded at startup)              Instructions (SKILL.md body)      Core task guidance      On-demand (loaded when skill is activated)              Resources (scripts/, references/)      Detailed context and tools      On-demand (loaded as needed)      Skills vs. MCP: The Brain and the PlumbingIt is crucial to understand how Agent Skills relate to the Model Context Protocol (MCP). They are not competing standards; they are complementary layers of the agentic stack. As Simon Willison aptly puts it, “MCP provides the ‘plumbing’ for tool access, while agent skills provide the ‘brain’ or procedural memory for how to use those tools effectively” [3].  MCP tells an agent what tools are available. It is the API that connects agents to databases, APIs, and other external systems.  Agent Skills teach an agent how to use those tools. They provide the procedural knowledge, best practices, and organizational context required to perform complex, multi-step tasks.For example, MCP might give an agent access to a git tool. An Agent Skill would teach that agent your team’s specific git branching strategy, pull request template, and code review checklist. One provides the capability; the other provides the expertise. You need both to build a truly effective agentic workforce.Why an Open Standard Matters for the EnterpriseBy releasing Agent Skills as an open standard, Anthropic is making a strategic bet on interoperability and ecosystem growth. This move has several critical implications for the enterprise:  It Prevents Vendor Lock-In: An open standard for skills means that the expertise you codify is not tied to a single agent platform. You can build a library of skills for your organization and deploy them across any compliant agent, whether it’s from Anthropic, OpenAI, or an open-source provider.  It Creates a Marketplace for Expertise: We will see the emergence of a marketplace for pre-built skills, both open-source and commercial. This will allow organizations to acquire specialized capabilities without having to build them from scratch.  It Accelerates Adoption: A standardized format for skills makes it easier for developers to get started and for organizations to share best practices. This will accelerate the adoption of agentic AI and drive the development of more sophisticated, multi-agent workflows.The Road Ahead: Governance and the EcosystemThe Agent Skills specification is, as Simon Willison notes, “deliciously tiny” and “quite heavily under-specified” [3]. This is a feature, not a bug. It provides a flexible foundation that the community can build upon. We can expect to see the specification evolve as it is adopted by more platforms and as best practices emerge.However, the power of skills—especially their ability to execute code—also introduces new governance challenges. Organizations will need to establish clear processes for auditing, testing, and deploying skills from trusted sources. We will need skill registries to manage the discovery and distribution of skills, and policy engines to control which agents can use which skills in which contexts. These are the next frontiers in agentic infrastructure.Agent Skills are not just a new feature; they are a new architectural primitive for the agentic era. They provide the missing link between general intelligence and specialized execution. By making expertise composable, portable, and standardized, Agent Skills will unlock the next wave of innovation in enterprise AI. The race is no longer just about building the most powerful models; it is about building the most capable and knowledgeable agentic workforce.References:[1] Anthropic. (2025, December 18). Agent Skills. Agent Skills.[2] Anthropic. (2025, October 16). Equipping agents for the real world with Agent Skills. Anthropic Blog.[3] Willison, S. (2025, December 19). Agent Skills. Simon Willison’s Weblog.",
      "views": 861,
      "reading_minutes": 5,
      "tags": [
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "Agent Skills",
            "slug": "agent-skills",
            "url": "/tags/agent-skills/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "Anthropic",
            "slug": "anthropic",
            "url": "/tags/anthropic/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "AI Governance",
            "slug": "ai-governance",
            "url": "/tags/ai-governance/#posts"
          },
        
          
          {
            "name": "Open Standards",
            "slug": "open-standards",
            "url": "/tags/open-standards/#posts"
          },
        
          
          {
            "name": "AI Infrastructure",
            "slug": "ai-infrastructure",
            "url": "/tags/ai-infrastructure/#posts"
          },
        
          
          {
            "name": "Agent Architecture",
            "slug": "agent-architecture",
            "url": "/tags/agent-architecture/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "From Boom to Build-Out: The State of Enterprise AI in 2026",
      "url": "/2025/12/10/from-boom-to-build-out-the-state-of-enterprise-ai-in-2026/",
      "date_display": "December 10, 2025",
      "date_iso": "2025-12-10",
      "excerpt": "The era of AI experimentation is over. What began as a speculative boom has rapidly industrialized into the fastest-scaling software category in history. Enterprise spending on generative AI skyrocketed to $37 billion in 2025, a stunning 3.2x increase from the previous year.",
      "content": "The era of AI experimentation is over. What began as a speculative boom has rapidly industrialized into the fastest-scaling software category in history. According to a new report from Menlo Ventures, enterprise spending on generative AI skyrocketed to $37 billion in 2025, a stunning 3.2x increase from the previous year [3]. This isn’t just hype; it’s a fundamental market shift. AI now commands 6% of the entire global SaaS market—a milestone reached in just three years [3].This explosive growth signals a new phase of enterprise adoption. The conversation has moved beyond simple chatbots and one-off tasks to focus on building durable, agentic infrastructure. Reports from OpenAI, Anthropic, and Menlo Ventures all point to the same conclusion: the battleground for competitive advantage has shifted from model performance to platform execution.The Money Flows to Applications, and Enterprises are BuyingSo, where is this money going? Over half of all enterprise AI spend $19 billion is flowing directly into the application layer [3]. This indicates a clear preference for immediate productivity gains over long-term, in-house infrastructure projects. The “buy vs. build” debate has decisively tilted towards buying, with 76% of AI use cases now being purchased from vendors, a dramatic reversal from 2024 when the split was nearly even [3].This trend is fueled by two factors: AI solutions are converting at nearly double the rate of traditional SaaS (47% vs. 25%), and product-led growth (PLG) is driving adoption at 4x the rate of traditional software [3]. Individual employees and teams are adopting AI tools, proving their value, and creating a powerful bottom-up flywheel that short-circuits legacy procurement cycles.The Architectural Shift: From Queries to Agentic WorkflowsThis rapid adoption is not just about doing old tasks faster; it’s about enabling entirely new ways of working. The data shows a clear architectural shift from simple, conversational queries to structured, agentic workflows that are deeply embedded in core business processes.Anthropic’s 2026 survey reveals that 57% of organizations are already deploying agents for multi-stage processes, with 81% planning to tackle even more complex, cross-functional workflows in the coming year [1]. This transition from single-turn interactions to persistent, multi-step agents is where true business transformation is happening.OpenAI’s 2025 report highlights a 19x year-to-date increase in the use of structured workflows like Custom GPTs and Projects, with 20% of all enterprise messages now being processed through these repeatable systems [2]. The impact is tangible, with 80% of organizations reporting measurable ROI on their agent investments and workers saving an average of 40-60 minutes per day [1, 2].Perhaps most striking is that 75% of workers report being able to complete tasks they previously could not perform, including programming support, spreadsheet analysis, and technical tool development [2]. This democratization of technical capabilities is fundamentally reshaping how work gets done.Coding Leads the ChargeNearly all organizations (90%) now use AI to assist with development, and 86% deploy agents for production code [1]. The adoption is so pervasive that coding-related messages have increased by 36% even among non-technical workers [2].Organizations report time savings across the entire development lifecycle: planning and ideation (58%), code generation (59%), documentation (59%), and code review and testing (59%) [1]. This systematic integration across the full software development lifecycle is accelerating delivery timelines and freeing developers to focus on higher-value architectural and problem-solving work.The New Frontier: Platform-Level ExecutionAs AI becomes an essential, intelligent layer of the enterprise tech stack, the primary barriers to scaling are no longer model capabilities but organizational and architectural readiness. The top challenges cited by leaders are integration with existing systems (46%), data access and quality (42%), and change management (39%) [1]. These are not model problems; they are platform problems.This new reality is creating a widening performance gap. OpenAI’s data shows that “frontier firms” that treat AI as integrated infrastructure see 2x more engagement per seat, and their workers are 6x more active than the median [2]. Technology, healthcare, and manufacturing are seeing the fastest growth (11x, 8x, and 7x respectively), while professional services and finance operate at the largest scale [2].The state of enterprise AI in 2026 is clear: the gold rush is over, and the era of building the railroads has begun. Success is no longer defined by having the best model, but by having the best platform to deploy, manage, and secure intelligence at scale.References:[1] Anthropic. (2025). The 2026 State of AI Agents Report. Anthropic.[2] OpenAI. (2025). The state of enterprise AI 2025 report. OpenAI.[3] Menlo Ventures. (2025, December 9). 2025: The State of Generative AI in the Enterprise. Menlo Ventures.",
      "views": 284,
      "reading_minutes": 4,
      "tags": [
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "Agentic Workflows",
            "slug": "agentic-workflows",
            "url": "/tags/agentic-workflows/#posts"
          },
        
          
          {
            "name": "AI Adoption",
            "slug": "ai-adoption",
            "url": "/tags/ai-adoption/#posts"
          },
        
          
          {
            "name": "Platform Strategy",
            "slug": "platform-strategy",
            "url": "/tags/platform-strategy/#posts"
          },
        
          
          {
            "name": "Developer Tools",
            "slug": "developer-tools",
            "url": "/tags/developer-tools/#posts"
          },
        
          
          {
            "name": "AI Infrastructure",
            "slug": "ai-infrastructure",
            "url": "/tags/ai-infrastructure/#posts"
          },
        
          
          {
            "name": "Generative AI",
            "slug": "generative-ai",
            "url": "/tags/generative-ai/#posts"
          },
        
          
          {
            "name": "Enterprise Software",
            "slug": "enterprise-software",
            "url": "/tags/enterprise-software/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "The Three-Platform Problem in Enterprise AI",
      "url": "/2025/12/07/the-three-platform-problem-in-enterprise-ai/",
      "date_display": "December 7, 2025",
      "date_iso": "2025-12-07",
      "excerpt": "Enterprise AI has a platform problem. The tools to build AI-powered applications exist, but they're scattered across three disconnected ecosystems—each solving part of the puzzle, none providing a complete solution. This isn't a 'too many choices' problem. It's an architectural one.",
      "content": "Enterprise AI has a platform problem. The tools to build AI-powered applications exist, but they’re scattered across three disconnected ecosystems—each solving part of the puzzle, none providing a complete solution.This isn’t a “too many choices” problem. It’s an architectural one. Gartner tracks these ecosystems in separate Magic Quadrants because they serve fundamentally different users with different needs. But building production AI applications requires capabilities from all three.Three Ecosystems, Zero Integration1. Low-Code Platforms (The Citizen Developer)Platforms like Microsoft Power Apps, Mendix, and OutSystems let business users build applications quickly without writing code. They excel at UI, rapid prototyping, and workflow automation.Gartner Magic Quadrant for Enterprise Low-Code Application PlatformsWhat they do well: Speed to prototype, accessibility for non-developers, business process automation.What they lack: Infrastructure control, enterprise governance at scale, and the flexibility professional developers need.2. DevOps Platforms (The Professional Developer)GitLab, Microsoft Azure DevOps, and Atlassian provide CI/CD pipelines, source control, and deployment infrastructure. They answer the “how do we ship and operate this reliably?” question.Gartner Magic Quadrant for DevOps PlatformsWhat they do well: Security, governance, testing, deployment automation, operational excellence.What they lack: They don’t help you build faster—they help you ship what you’ve already built.3. AI/ML Platforms (The AI Specialist)Cloud providers (AWS, GCP, Azure) and specialized vendors offer models, MLOps tooling, and inference infrastructure. They provide the intelligence layer.Gartner Magic Quadrant for AI Code AssistantsWhat they do well: Model access, training infrastructure, inference at scale.What they lack: An opinion on how you actually build and deploy applications around those models.The Cost of FragmentationWhen your AI strategy requires stitching together leaders from three separate ecosystems, you pay an integration tax:Workflow disconnects. A business user prototypes an AI workflow in a low-code tool. A developer rebuilds it from scratch to meet security requirements. The prototype and production system share nothing but a spec document.Observability gaps. Tracing a user request through a low-code UI, into a DevOps pipeline, through an AI model call, and back is nearly impossible without custom instrumentation.Governance drift. Security policies enforced in your DevOps platform don’t automatically apply to your low-code environment. Compliance becomes a manual audit.Your most capable engineers end up writing glue code instead of building products.A Different Architecture: API-First UnificationThe solution isn’t better integrations—it’s platforms built on a different architecture.Replit offers a useful case study. They’ve grown from $10M to $100M ARR in under six months by building a platform where:      The same infrastructure serves both citizen developers and professionals. A business user building through natural language (“create a customer feedback dashboard”) and a developer writing code are using the same underlying APIs, the same deployment system, the same security model.        AI is native, not bolted on. Their Agent can build, test, and deploy complete applications autonomously—but it’s using the same environment a professional developer would use. No “export to production” step.        Governance applies universally. Database access, API key management, and deployment policies are platform-level concerns. They apply whether you’re prompting an AI agent or writing TypeScript.  This is the “headless-first” pattern that companies like Stripe and Twilio proved out: build the API, make it excellent, then layer interfaces on top. The UI for non-developers and the API for developers are just different clients to the same system.What This Means for Platform StrategyIf you’re evaluating AI platforms, the question isn’t “which low-code tool, which DevOps platform, and which AI vendor?”The better question: Does this platform unify these concerns, or will we be writing integration code for the next three years?Look for:      API-first architecture. Can professional developers access everything through APIs? Is the UI built on those same APIs?        Built-in deployment and operations. Does prototyping in the platform give you production-ready infrastructure, or does it give you an export button and a prayer?        Platform-level governance. Are security, compliance, and cost controls configured once and inherited everywhere, or are they per-tool?  The platforms winning in this space aren’t the ones with the longest feature lists. They’re the ones that recognized the three-ecosystem problem and architected around it from day one.",
      "views": 161,
      "reading_minutes": 3,
      "tags": [
        
          
          {
            "name": "AI Platform",
            "slug": "ai-platform",
            "url": "/tags/ai-platform/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "Low-Code",
            "slug": "low-code",
            "url": "/tags/low-code/#posts"
          },
        
          
          {
            "name": "DevOps",
            "slug": "devops",
            "url": "/tags/devops/#posts"
          },
        
          
          {
            "name": "Platform Architecture",
            "slug": "platform-architecture",
            "url": "/tags/platform-architecture/#posts"
          },
        
          
          {
            "name": "API-First",
            "slug": "api-first",
            "url": "/tags/api-first/#posts"
          },
        
          
          {
            "name": "Infrastructure",
            "slug": "infrastructure",
            "url": "/tags/infrastructure/#posts"
          },
        
          
          {
            "name": "Developer Tools",
            "slug": "developer-tools",
            "url": "/tags/developer-tools/#posts"
          },
        
          
          {
            "name": "Platform Strategy",
            "slug": "platform-strategy",
            "url": "/tags/platform-strategy/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "The Platform Convergence: Why the Future of AI SaaS is Headless-First",
      "url": "/2025/12/02/the-platform-convergence-why-the-future-of-ai-saas-is-headless-first/",
      "date_display": "December 2, 2025",
      "date_iso": "2025-12-02",
      "excerpt": "The AI agent market is fragmenting into two incomplete categories: Agent Builders that democratize creation but lack governance, and AI Gateways that provide control but slow innovation. Drawing lessons from Stripe and Twilio, the future belongs to unified, headless-first platforms that combine intuitive interfaces with programmable infrastructure.",
      "content": "The AI agent market is experiencing its own big bang—but this rapid expansion is creating fundamental fragmentation. Enterprises deploying agents at scale are caught between two incomplete solutions: Agent Builders and AI Gateways.Agent Builders democratize creation through no-code interfaces. AI Gateways provide enterprise governance over costs, security, and compliance. Both are critical, but in their current separate forms, they force a false choice: speed or control? The reality is, you need both.We’ve seen this movie before. The most successful developer platforms—Stripe, Twilio, Shopify—aren’t just slick UIs or robust infrastructure. They are headless-first platforms that masterfully combine both.The Headless-First ModelStripe didn’t win payments by offering a payment form. Twilio didn’t win communications by providing a dashboard. They won by providing a powerful, programmable foundation with APIs as the primary interface. Their UIs are built on the same public APIs their customers use. Everything is composable, programmable, and extensible.            Principle      Benefit                  API-First Design      Platform’s own UI uses public APIs, ensuring completeness              Progressive Complexity      Start with no-code UI, graduate to API without migration              Composability      Every capability is a building block for higher-level abstractions              Extensibility      Third parties build on the platform, creating ecosystem effects      This is the blueprint for AI platforms: not just a UI for building agents, nor just a gateway for traffic—but a comprehensive, programmable platform for building, running, and governing AI at every layer.The Two Incomplete CategoriesAgent Builders (Microsoft Copilot Studio, Google Agent Builder) empower non-technical users to create agents in minutes. The problem arises at scale: Who manages API keys? Who tracks costs? Who ensures compliance? This democratization often creates ungoverned “shadow IT”—business units spinning up agents independently, each with its own credentials and error handling. Platform teams discover the proliferation only when something breaks.AI Gateways (Kong, Apigee) solve the governance problem with centralized security, cost monitoring, and compliance. But a gateway is just plumbing—it doesn’t accelerate creation. Business users wait in IT queues while engineers build what they need. Innovation slows to a crawl.Integrating both categories creates its own integration tax: two authentication systems, two deployment processes, broken observability across disconnected logs, and policy enforcement gaps where builder retry logic conflicts with gateway rate limits.The Platform ConvergenceThe solution is a unified, headless-first platform with four integrated layers:Layer 1: UI Layer — Intuitive no-code agent builder for business users, built on top of the platform’s own APIs. Natural language definition, visual workflow design, one-click deployment with inherited governance.Layer 2: Runtime Layer — Enterprise-grade gateway that every agent runs through automatically. Centralized auth (OAuth, OIDC, SAML), real-time policy enforcement, distributed tracing, cost tracking, anomaly detection.Layer 3: Platform Layer — Comprehensive APIs and SDKs for developers. REST/GraphQL endpoints, language-specific SDKs, agent lifecycle management, webhook system for event-driven architectures.Layer 4: Ecosystem Layer — Marketplace for discovering and sharing agents, tools, and integrations. Internal registry, reusable components, version control, usage analytics.Speed AND ControlThe difference between fragmented and unified approaches:            Capability      Fragmented Tools      Unified Platform                  Agent Creation      Separate builder      Integrated no-code + API/SDK              Infrastructure      Separate gateway      Built-in gateway with inherited policies              Observability      Disconnected logs      End-to-end unified tracing              Policy Management      Manual coordination      Single policy engine              Developer Experience      High friction      Single, cohesive API surface              Audit &amp; Compliance      Cross-system correlation      Native audit trails      With a unified platform: business user creates agent in UI → platform applies policies automatically → agent deploys with full observability → platform team monitors centrally → developer extends via API without migration.What This UnlocksSelf-Service AI: HR builds a resume screening agent in 20 minutes. It inherits security policies automatically. Cost allocates to HR’s budget. Compliance trail generates without extra work.AI-Powered Products: Engineers embed agent capabilities into customer-facing apps using platform APIs. Multi-tenant isolation, usage-based billing, and governance come built-in.Internal Marketplace: Marketing’s “competitive intelligence” agent gets discovered by Sales. One-click deployment. Usage metrics show ROI across the organization.ConclusionThe debate over agent builder vs. AI gateway is a red herring—a false choice leading to fragmented, expensive solutions. The real question: point solution or true platform?In payments, Stripe won by unifying developer APIs with merchant tools. In communications, Twilio won by combining carrier control with developer speed. The AI platform market is at the same inflection point.The future isn’t about stitching tools together; it’s about building on a unified, programmable foundation. The organizations that invest in platform-first infrastructure—rather than cobbling together point solutions—will move faster, govern more effectively, and build more sophisticated agentic systems.The convergence is coming. The question is whether you’ll be ahead of it or behind it.",
      "views": 233,
      "reading_minutes": 4,
      "tags": [
        
          
          {
            "name": "AI Platform",
            "slug": "ai-platform",
            "url": "/tags/ai-platform/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "AI Gateway",
            "slug": "ai-gateway",
            "url": "/tags/ai-gateway/#posts"
          },
        
          
          {
            "name": "Agent Builder",
            "slug": "agent-builder",
            "url": "/tags/agent-builder/#posts"
          },
        
          
          {
            "name": "Developer Tools",
            "slug": "developer-tools",
            "url": "/tags/developer-tools/#posts"
          },
        
          
          {
            "name": "Infrastructure",
            "slug": "infrastructure",
            "url": "/tags/infrastructure/#posts"
          },
        
          
          {
            "name": "Platform Architecture",
            "slug": "platform-architecture",
            "url": "/tags/platform-architecture/#posts"
          },
        
          
          {
            "name": "Headless Architecture",
            "slug": "headless-architecture",
            "url": "/tags/headless-architecture/#posts"
          },
        
          
          {
            "name": "AI SaaS",
            "slug": "ai-saas",
            "url": "/tags/ai-saas/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "MCP Enterprise Readiness: How the 2025-11-25 Spec Closes the Production Gap",
      "url": "/2025/12/01/mcp-enterprise-readiness-how-the-2025-11-25-spec-closes-the-production-gap/",
      "date_display": "December 1, 2025",
      "date_iso": "2025-12-01",
      "excerpt": "The Model Context Protocol's first anniversary release isn't just a milestone—it's a strategic inflection point. With asynchronous Tasks, enterprise-grade OAuth, and a formal extensions framework, the 2025-11-25 spec directly addresses the operational barriers that have kept organizations from deploying agent-tool ecosystems at scale. This post examines how these new primitives transform MCP from a development convenience into production-grade infrastructure.",
      "content": "Just over a week ago, the Model Context Protocol celebrated its first anniversary with the release of the 2025-11-25 specification [1]. The announcement was rightly triumphant—MCP has evolved from an experimental open-source project to a foundational standard backed by GitHub, OpenAI, Microsoft, and Block, with thousands of active servers in production [1]. For readers comparing the protocol to Anthropic’s procedural customization layer, I cover Claude Skills vs MCP separately.But beneath the celebration lies a more interesting story: this spec release is not just an evolution; it’s a strategic pivot toward enterprise readiness. For the past year, MCP has succeeded as a developer tool—a convenient way to connect AI models to data and capabilities during experimentation. The 2025-11-25 spec is different. It introduces features explicitly designed to solve the operational, security, and governance challenges that prevent organizations from deploying agent-tool ecosystems at enterprise scale.This article examines three key features from the new spec and analyzes how they close what I call the “production gap”—the distance between experimental agent prototypes and enterprise-grade agentic infrastructure.The Production Gap: Why Experimental Agents Don’t ScaleBefore diving into the technical features, we need to understand the problem they’re solving. Organizations have been experimenting with MCP-powered agents for months, often with impressive results in controlled environments. Yet most of these projects remain trapped in pilot purgatory, unable to progress to production deployments. The barriers are not technical whimsy; they are fundamental operational requirements:            Requirement      Why It Matters      What’s Been Missing                  Asynchronous Operations      Real-world tasks like report generation, data analysis, and workflow automation can take minutes or hours, not milliseconds.      MCP connections are synchronous. Long-running tasks force clients to hold connections open or build custom polling systems.              Enterprise Authentication      Organizations need centralized control over which users, agents, and services can access sensitive tools and data.      The original OAuth flow assumed a consumer app model. It lacked support for machine-to-machine auth and didn’t integrate with enterprise Identity Providers.              Extensibility      Different industries and use cases require custom capabilities without fragmenting the core protocol.      There was no formal mechanism to standardize extensions, leading to proprietary, incompatible implementations.      These aren’t edge cases; they are the table stakes for production systems. The 2025-11-25 spec directly addresses each one.Feature 1: Asynchronous Tasks — Making Long-Running Workflows Production-ReadyPerhaps the most transformative addition is the new Tasks primitive [2]. While still marked as experimental, it fundamentally changes how agents interact with MCP servers for long-running operations.The Problem: Synchronous Request-Response Doesn’t Match Real WorkTraditional MCP follows the classic RPC pattern: the client sends a request, the server processes it, and the server returns a response—all within a single connection. This works beautifully for quick operations like reading a database row or checking a weather API. But it breaks down for realistic enterprise workflows:  Data Analytics Agent: “Generate a quarterly financial report by analyzing three years of transaction data” → 15 minutes of processing.  Compliance Agent: “Scan all customer contracts for non-standard clauses” → 2 hours across 10,000 documents.  DevOps Agent: “Deploy this service to production and run integration tests” → 30 minutes with orchestration dependencies.Organizations have been forced to build custom workarounds: job queues, polling systems, callback webhooks—all non-standard, all increasing complexity and reducing interoperability.The Solution: A Unified Async ModelThe new Tasks feature introduces a standard “call-now, fetch-later” pattern:  The client sends a request to an MCP server with a task hint.  The server immediately acknowledges the request and returns a unique taskId.  The client periodically checks the task status (working, completed, failed) using standard Task operations.  When complete, the client retrieves the final result using the taskId.This is more than syntactic sugar. It provides a uniform abstraction for asynchronous work across the entire MCP ecosystem. An agent framework doesn’t need to know whether it’s calling a data pipeline, a deployment system, or a document processor—the async pattern is the same.Enterprise Impact: Agents That Don’t BlockIn production environments, this changes everything. An AI assistant orchestrating a complex workflow can:  Kick off multiple long-running tasks in parallel (e.g., “analyze sales data,” “generate customer insights,” “create visualizations”).  Continue planning and reasoning while tasks are in progress.  Provide real-time status updates to users without blocking.  Handle failures gracefully with retries and fallback strategies.This is how real autonomous agents operate. The Tasks primitive makes it possible within a standard, interoperable protocol.Feature 2: Enterprise-Grade OAuth with CIMD and ExtensionsThe original MCP spec included OAuth 2.0 support, but it was modeled on consumer app patterns (think “Log in with GitHub”). That model doesn’t work for enterprise use cases, where organizations need centralized identity management, audit trails, and policy-based access control. The 2025-11-25 spec introduces two critical updates to close this gap.CIMD: Decentralized Trust Without Dynamic Client RegistrationThe first change is replacing Dynamic Client Registration (DCR) with Client ID Metadata Documents (CIMD) [3]. In the old model, every MCP client had to register with every authorization server it wanted to use—a scalability nightmare in federated enterprise environments.With CIMD, the client_id is now a URL that the client controls (e.g., https://agents.mycompany.com/sales-assistant). When an authorization server needs information about this client, it fetches a JSON metadata document from that URL. This document includes:  Client name and description  Valid redirect URIs  Supported grant types  Public keys for token verificationThis approach creates a decentralized trust model anchored in DNS and HTTPS. The authorization server doesn’t need a pre-existing relationship with the client; it trusts the metadata published at the URL. For large organizations with dozens of agent applications and multiple MCP providers, this dramatically reduces operational overhead.Extension 1: Machine-to-Machine OAuth (SEP-1046)The second critical addition is support for the OAuth 2.0 client_credentials flow via the M2M OAuth extension. This enables machine-to-machine authentication—allowing agents and services to authenticate directly with MCP servers without a human user in the loop.Why does this matter? Consider these enterprise scenarios:  Scheduled Agent Jobs: A nightly data ingestion agent that pulls information from multiple MCP sources to update a data warehouse.  Service-to-Service Communication: A monitoring agent that periodically checks the health of deployed systems by querying infrastructure management tools.  Headless Automation: An agent that processes incoming support tickets and takes automated actions based on predefined rules.None of these involve an interactive user. They are autonomous services that need persistent, secure credentials to access tools on behalf of the organization. The client_credentials flow is the standard OAuth mechanism for exactly this use case, and its inclusion in MCP makes headless agentic systems viable.Extension 2: Cross App Access (XAA) (SEP-990)Perhaps the most strategically significant feature for large enterprises is the Cross App Access (XAA) extension. This solves a governance problem that has plagued the consumerization of enterprise AI: uncontrolled tool sprawl.In the standard OAuth flow, a user grants consent directly to an AI application to access a tool. The enterprise Identity Provider (IdP) sees only that “Alice logged in to the AI app,” not that “Alice’s AI agent is now accessing the payroll system.” This creates a governance black hole.XAA changes the authorization flow to insert the enterprise IdP as a central policy enforcement point. Now, when an agent attempts to access an MCP server:  The agent requests authorization from the enterprise IdP.  The IdP evaluates organizational policies: Is this agent approved for production use? Does Alice have permission to delegate payroll access to this agent? Is this access compliant with our data governance policies?  Only if all policies are satisfied does the IdP issue tokens to the agent.This provides centralized visibility and control over the entire agent-tool ecosystem. Security teams can monitor which agents are accessing which tools, set organization-wide policies (e.g., “no agents can access PII without human review”), and audit all delegated access. It eliminates shadow AI and provides the compliance story that regulated industries demand.Enterprise Impact: From Shadow AI to Governed InfrastructureTogether, these OAuth enhancements transform MCP from a developer convenience into a governed, auditable integration layer. Organizations can:  Enforce Identity Standards: All agents authenticate using the corporate IdP, with the same rigor as human employees.  Enable Zero-Trust Architecture: Every tool access is explicitly authorized based on policy, not implicit trust.  Provide Audit Trails: Every delegation, token issuance, and access event is logged for compliance and forensic analysis.  Scale Securely: Decentralized trust via CIMD means new agents and tools can be onboarded without central bottlenecks, while XAA ensures control is never lost.Feature 3: Formal Extensions Framework — Enabling Innovation Without FragmentationThe third major addition is the introduction of a formal Extensions framework [3]. This is a governance mechanism for the protocol itself, allowing the community to develop new capabilities without fragmenting the ecosystem.The Innovation-Standardization TensionEvery successful protocol faces this dilemma: enable innovation fast enough to keep up with evolving use cases, but standardize carefully enough to maintain interoperability. Move too slowly, and the community builds proprietary extensions that fragment the ecosystem. Move too quickly, and the core protocol becomes bloated with niche features that most implementations don’t need.MCP’s solution is a structured extension process. New capabilities are proposed as Specification Enhancement Proposals (SEPs), which undergo community review and can be adopted incrementally. Extensions are namespaced and clearly marked, so implementations can selectively support them without breaking compatibility.Enterprise Impact: Customization Without Vendor Lock-InFor enterprises, this is critical. Different industries have unique requirements:  Healthcare: Extensions for HIPAA-compliant audit logging and patient consent management.  Financial Services: Extensions for transaction integrity, regulatory reporting, and fraud detection hooks.  Manufacturing: Extensions for real-time sensor data streaming and factory floor integrations.The formal extensions framework allows organizations to develop these capabilities as standard, interoperable extensions rather than proprietary forks. This preserves the core value proposition of MCP—a universal protocol for agent-tool communication—while enabling the customization required for production use.The Multiplier Effect: Sampling with Tools (SEP-1577)One more feature deserves mention: Sampling with Tools [3]. This allows MCP servers themselves to act as agentic systems, capable of multi-step reasoning and tool use. A server can now request the client to invoke an LLM on its behalf, enabling server-side agents.Why is this powerful? It enables compositional agent architectures. A high-level agent can delegate to specialized MCP servers, which themselves use agentic reasoning to fulfill complex requests. For example:  A “Financial Analysis Agent” delegates to an “ERP Data Server,” which uses its own reasoning to determine which tables to query, how to join data, and how to format results.  A “Compliance Agent” delegates to a “Legal Document Server,” which autonomously searches case law, extracts relevant clauses, and generates a summary.This nested, hierarchical approach is how real autonomous systems will scale. By making it a standard protocol feature rather than a custom implementation, MCP provides the foundation for a rich ecosystem of specialized, composable agents.Closing the Production Gap: A New Maturity ThresholdThe 2025-11-25 MCP specification is not a radical redesign; it’s a targeted set of enhancements that directly address the barriers preventing enterprise adoption. By introducing:  Asynchronous Tasks for long-running workflows,  Enterprise OAuth with CIMD, M2M, and XAA for governed, auditable authentication,  Formal Extensions for standardized innovation,  Sampling with Tools for compositional agent architectures,the spec closes the production gap—the distance between experimental prototypes and scalable, secure, enterprise-grade systems.This is the moment when MCP transitions from a promising developer tool to a foundational piece of enterprise infrastructure. Organizations that have been waiting for “production readiness” signals now have them. The features are there. The governance mechanisms are there. The security model is there.The next phase of agentic AI will be defined not by flashy demos, but by the quiet, reliable, at-scale operation of autonomous systems integrated deeply into enterprise workflows. The 2025-11-25 MCP spec is the technical foundation that makes this future possible.For technology leaders evaluating whether to invest in MCP-based infrastructure, the calculus has changed. This is no longer an experimental protocol; it’s a production standard. The organizations that adopt it now, build their agent ecosystems on it, and contribute to its continued evolution will define the next decade of enterprise AI.References:[1] MCP Core Maintainers. (2025, November 25). One Year of MCP: November 2025 Spec Release. Model Context Protocol.[2] Model Context Protocol. (2025, November 25). Tasks. Model Context Protocol Specification.[3] Pakiti, Maria. (2025, November 26). MCP 2025-11-25 is here: async Tasks, better OAuth, extensions, and a smoother agentic future. WorkOS Blog.[4] Subramanya, N. (2025, November 20). The Governance Stack: Operationalizing AI Agent Governance at Enterprise Scale. subramanya.ai.[5] Subramanya, N. (2025, November 17). Why Private Registries are the Future of Enterprise Agentic Infrastructure. subramanya.ai.",
      "views": 348,
      "reading_minutes": 10,
      "tags": [
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "OAuth",
            "slug": "oauth",
            "url": "/tags/oauth/#posts"
          },
        
          
          {
            "name": "Authentication",
            "slug": "authentication",
            "url": "/tags/authentication/#posts"
          },
        
          
          {
            "name": "Infrastructure",
            "slug": "infrastructure",
            "url": "/tags/infrastructure/#posts"
          },
        
          
          {
            "name": "Agent Ops",
            "slug": "agent-ops",
            "url": "/tags/agent-ops/#posts"
          },
        
          
          {
            "name": "Governance",
            "slug": "governance",
            "url": "/tags/governance/#posts"
          },
        
          
          {
            "name": "Enterprise Integration",
            "slug": "enterprise-integration",
            "url": "/tags/enterprise-integration/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "The Governance Stack: Operationalizing AI Agent Governance at Enterprise Scale",
      "url": "/2025/11/20/the-governance-stack-operationalizing-ai-agent-governance-at-enterprise-scale/",
      "date_display": "November 20, 2025",
      "date_iso": "2025-11-20",
      "excerpt": "With 88% of organizations now deploying AI agents in production, governance has shifted from a theoretical concern to an operational imperative. Yet 40% of technology executives admit their governance programs are insufficient. This article presents the technical infrastructure—the 'governance stack'—required to transform governance frameworks from policy documents into automated, enforceable reality across the entire agentic workforce lifecycle.",
      "content": "Enterprise adoption of AI agents has reached a tipping point. According to McKinsey’s 2025 global survey, 88% of organizations now report regular use of AI agents in at least one business function, with 62% actively experimenting with agentic systems [1]. Yet this rapid adoption has created a critical disconnect: while organizations understand the importance of governance, they struggle with the implementation of it. The same survey reveals that 40% of technology executives believe their current governance programs are insufficient for the scale and complexity of their agentic workforce [1, 2].The problem is not a lack of frameworks. Numerous organizations have published comprehensive governance principles—from Databricks’ AI Governance Framework to the EU AI Act’s regulatory requirements [2]. The problem is that governance has remained largely conceptual, living in policy documents and compliance checklists rather than in the operational infrastructure where agents actually execute.This article presents the technical foundation required to operationalize governance at scale: the Governance Stack. This is the integrated set of platforms, protocols, and enforcement mechanisms that transform governance from aspiration into automated reality across the entire agentic workforce lifecycle.The Governance Gap: From Principle to PracticeTraditional enterprise governance models were designed for static systems and predictable workflows. An application goes through a review process, gets deployed, and then operates within well-defined boundaries. Governance checkpoints are discrete events: code reviews, security scans, compliance audits.Agentic AI shatters this model. Agents are dynamic, adaptive systems that make autonomous decisions, spawn sub-agents, and interact with constantly evolving toolsets. They don’t follow predetermined paths; they reason, plan, and execute based on context. As one industry analysis puts it, the governance question shifts from “did the code do what we programmed?” to “did the agent make the right decision given the circumstances?” [3].This creates four fundamental challenges that traditional governance infrastructure cannot address:            Challenge      Traditional Governance      Agentic Reality                  Decision-Making      Predetermined logic paths, testable and auditable      Context-dependent reasoning, emergent behavior              Delegation      Single service boundary, clear ownership      Recursive agent chains, distributed responsibility              Policy Enforcement      Deployment-time checks, periodic audits      Real-time enforcement at the moment of action              Auditability      Static code and logs      Dynamic decision traces across multiple agents and tools      The governance gap is the distance between what existing frameworks prescribe and what existing infrastructure can enforce. Closing this gap requires purpose-built technology.The Five Layers of the Governance StackDrawing on the foundational pillars outlined in frameworks like Databricks’ AI Governance model [2], we can define a technical architecture—a Governance Stack—that provides the infrastructure necessary to operationalize these principles. This stack has five integrated layers, each addressing a specific aspect of agent lifecycle management.Layer 1: Identity and Attestation FoundationBefore governance can be enforced, we must know who (or what) is making a request. This requires a robust identity layer specifically designed for autonomous agents, not just human users.As discussed in previous work on OIDC-A (OpenID Connect for Agents), this layer provides [4]:  Verifiable Agent Identities: Every agent receives a cryptographically verifiable identity, issued by a trusted authority (the AI provider or enterprise identity system).  Delegation Chains: Clear, auditable records of which user or system authorized the agent, and what permissions were delegated.  Attestation Mechanisms: Proof that the agent is running the expected code, on approved infrastructure, with the intended configuration.This identity foundation is the prerequisite for all subsequent layers. Without it, governance policies have no subject to act upon.Layer 2: Agent and Tool RegistriesGovernance requires visibility. The second layer of the stack is a comprehensive registry system that provides a single source of truth for:  Agent Registry: A catalog of every agent deployed in the enterprise, including its capabilities, business owner, data access, and lifecycle status [5]. This is not just a static directory; it’s a dynamic system that tracks agent versions, configurations, and runtime behavior.  MCP/Tool Registry: A curated, approved set of tools and MCP servers that agents are authorized to access. This registry enforces pre-deployment security reviews, manages versions, tracks usage, and provides cost visibility [5].As explored in our previous article on private registries, this layer transforms governance from a manual audit process into an automated, enforceable function of the infrastructure itself [5]. Agents that aren’t registered can’t deploy. Tools that haven’t been vetted can’t be accessed.Layer 3: Policy Engine and GatewayThe third layer is where governance rules are codified and enforced in real-time. This includes:Agent Firewalls and MCP Gateways: Acting as intermediaries between agents and their tools, these gateways inspect every request, enforce security policies, and block unauthorized actions before they occur [6]. They provide:  Prompt injection detection and filtering  Real-time policy evaluation (e.g., “can this agent access PII?”)  Dynamic rate limiting and cost controls  Anomaly detection for suspicious behavior patternsAutomated Policy Enforcement: Instead of relying on manual reviews, the policy engine automatically validates agents against organizational standards at every lifecycle stage. For example, an agent cannot be promoted to production without:  A completed data classification assessment  Approval from the designated business owner  A passed security scan  Documented human oversight procedures for high-stakes decisionsThis layer is the operational heart of the governance stack. It is where abstract policies become concrete actions that prevent harm in real-time.Layer 4: Observability and Monitoring PlatformGovernance is not a one-time gate; it requires continuous oversight. The fourth layer provides real-time visibility into the behavior of the entire agentic workforce:  Performance Dashboards: Track accuracy, decision quality, latency, and resource consumption across all agents.  Drift Detection: Monitor agents for behavioral changes that might indicate model degradation, prompt injection, or unauthorized modifications.  Audit Trails: Capture every agent action, tool invocation, and delegation event with sufficient context to enable forensic analysis and compliance reporting [3].  Anomaly Alerting: Trigger automated responses when agents deviate from expected patterns, such as accessing unusual data sources or making an abnormal volume of API calls.This layer transforms governance from reactive (responding to incidents after they occur) to proactive (detecting and preventing issues before they cause harm).Layer 5: Human-in-the-Loop OrchestrationThe final layer recognizes that not all decisions can or should be fully automated. For high-stakes scenarios, governance requires explicit human oversight:  Escalation Workflows: Agents can request human approval before executing sensitive actions, such as modifying production systems or processing large financial transactions.  Override Mechanisms: Authorized personnel can intervene to pause, redirect, or terminate agent operations when necessary.  Explainability Interfaces: When agents make consequential decisions, stakeholders need to understand the reasoning. This layer provides tools to inspect the decision chain, view the data that influenced the agent, and audit the tool usage.This is not about replacing human judgment; it’s about augmenting it with the right information at the right time.Operationalizing the Framework: Governance Across the Agent LifecycleThe power of the Governance Stack becomes clear when we map it to the complete agent lifecycle. Governance is not a single checkpoint; it is a continuous process embedded at every stage.            Lifecycle Stage      Governance Stack in Action                  Planning &amp; Design      Identity layer establishes agent ownership. Policy engine validates business case against organizational risk appetite.              Data Preparation      Registries enforce data classification and lineage tracking. Policy engine blocks access to non-compliant datasets.              Development &amp; Training      Observability platform tracks experiments and model performance. Registries version all agent configurations.              Testing &amp; Validation      Agent firewall tests for adversarial inputs and prompt injections. Policy engine validates against security and ethical standards.              Deployment      Gateway enforces real-time authorization for all tool access. Observability platform begins continuous monitoring.              Operations      Monitoring platform detects drift and anomalies. Human-in-the-loop mechanisms escalate high-stakes decisions.              Retirement      Registries archive agent configurations. Identity layer revokes all permissions. Audit trails are retained for compliance.      This lifecycle-aware approach ensures that governance is not an afterthought, but an integrated function of how agents are built, deployed, and managed.The ROI of Governance InfrastructureImplementing a comprehensive Governance Stack is a significant investment. Organizations rightfully ask: what is the return?The answer lies in four measurable outcomes:Risk Mitigation: As demonstrated by the recent AI-orchestrated cyber espionage campaign disrupted by Anthropic [6], uncontrolled agent access to powerful tools is not a theoretical threat. A governance stack with identity attestation, gateways, and real-time policy enforcement would have prevented that attack at multiple layers.Regulatory Compliance: With regulations like the EU AI Act imposing strict requirements on high-risk AI systems, the ability to demonstrate comprehensive lifecycle governance, auditability, and human oversight is not optional—it’s mandatory [2]. The Governance Stack provides the automated evidence generation required for compliance.Operational Efficiency: Without centralized registries and monitoring, organizations waste time debugging agent failures, tracking down tool dependencies, and investigating cost overruns. The stack provides the visibility and control to operate an agentic workforce at scale.Trust and Adoption: The ultimate ROI is internal and external trust. Employees, customers, and regulators need confidence that autonomous agents are operating safely, ethically, and in alignment with organizational values. The Governance Stack makes that confidence possible.Building vs. Buying: The Emerging Vendor LandscapeOrganizations face a critical decision: build this governance infrastructure in-house or adopt emerging platforms that provide it as a service. Early movers are choosing different paths:  Enterprise Platforms: Companies like Collibra, Databricks, and TrueFoundry are extending their data governance and MLOps platforms to include agent registries and observability tools [2, 5, 7].  Purpose-Built Solutions: Startups like Agentic Trust are building end-to-end governance platforms specifically designed for agentic AI, providing integrated registries, gateways, and policy engines [5].  Protocol-Level Standards: Open standards like OIDC-A and MCP are enabling interoperability, allowing organizations to build custom stacks from best-of-breed components [4].The optimal path depends on organizational maturity, existing infrastructure, and the scale of agentic deployment. However, the underlying message is universal: governance at scale requires dedicated infrastructure.Conclusion: Governance as the Enabler of ScaleThe era of experimental agentic AI pilots is ending. Organizations are now operationalizing agentic workforces across critical business functions, and the governance gap is the primary barrier to scaling these deployments safely and responsibly.The Governance Stack is not a constraint on innovation; it is the foundation that makes innovation sustainable. By providing identity, visibility, policy enforcement, continuous monitoring, and human oversight, this technical infrastructure transforms governance from a compliance burden into a strategic enabler.The organizations that invest in this stack today will be the ones that confidently deploy autonomous agents at enterprise scale tomorrow. They will move faster, operate more safely, and earn the trust of stakeholders who demand accountability in the age of autonomous AI.For technology leaders navigating this landscape, the path is clear: governance is not a policy problem—it is an engineering challenge. And like all engineering challenges, it requires purpose-built infrastructure to solve. The Governance Stack is that infrastructure.References:[1] McKinsey &amp; Company. (2025, November 5). The State of AI in 2025: A global survey. McKinsey.[2] Databricks. (2025, July 1). Introducing the Databricks AI Governance Framework. Databricks.[3] DZone. (2025, May 21). Securing the Future: Best Practices for Privacy and Data Governance in LLMOps. DZone.[4] Subramanya, N. (2025, April 28). OpenID Connect for Agents (OIDC-A) 1.0 Proposal. subramanya.ai.[5] Subramanya, N. (2025, November 17). Why Private Registries are the Future of Enterprise Agentic Infrastructure. subramanya.ai.[6] Subramanya, N. (2025, November 14). From Espionage to Identity: Securing the Future of Agentic AI. subramanya.ai.[7] TrueFoundry. (2025, September 10). What is AI Agent Registry. TrueFoundry.",
      "views": 388,
      "reading_minutes": 10,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "Agents",
            "slug": "agents",
            "url": "/tags/agents/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "Governance",
            "slug": "governance",
            "url": "/tags/governance/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "Agent Ops",
            "slug": "agent-ops",
            "url": "/tags/agent-ops/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "Infrastructure",
            "slug": "infrastructure",
            "url": "/tags/infrastructure/#posts"
          },
        
          
          {
            "name": "Compliance",
            "slug": "compliance",
            "url": "/tags/compliance/#posts"
          },
        
          
          {
            "name": "AI Management",
            "slug": "ai-management",
            "url": "/tags/ai-management/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Why Private Registries are the Future of Enterprise Agentic Infrastructure",
      "url": "/2025/11/17/why-private-registries-are-the-future-of-enterprise-agentic-infrastructure/",
      "date_display": "November 17, 2025",
      "date_iso": "2025-11-17",
      "excerpt": "With 79% of companies already adopting AI agents, a critical governance gap has emerged. Without robust management frameworks, organizations risk a chaotic landscape of shadow AI, creating significant security vulnerabilities and operational inefficiencies. The solution lies in Private Agent and MCP Registries—command centers for agentic infrastructure that provide the visibility, governance, and security necessary to scale AI responsibly.",
      "content": "The age of agentic AI is no longer on the horizon; it’s in our datacenters, cloud environments, and business units. A recent PwC report highlights that a staggering 79% of companies are already adopting AI agents in some capacity [1]. As these autonomous systems proliferate, executing tasks and making decisions on behalf of the enterprise, a critical governance gap has emerged. Without a robust management framework, organizations risk a chaotic landscape of “shadow AI,” creating significant security vulnerabilities, compliance nightmares, and operational inefficiencies.The solution lies in a new class of enterprise software: the Private Agent and MCP Registry. This is not just a catalog, but a command center for agentic infrastructure, providing the visibility, governance, and security necessary to scale AI responsibly. Let’s explore the core pillars of this trend, using the “Agentic Trust” platform as a blueprint for building a better, more secure agentic future.Pillar 1: A Centralized Directory for Every AgentThe first step to managing agentic chaos is to establish a single source of truth. You cannot govern what you cannot see. A private agent registry provides a comprehensive, real-time inventory of every agent operating within the enterprise, whether built in-house or sourced from a third-party vendor.A centralized agent directory, as shown in the Agentic Trust platform, provides a complete inventory for governance and oversight.As the screenshot of the Agentic Trust directory illustrates, this is more than just a list. A mature registry tracks critical metadata for each agent, including:  Unique Identity: A verifiable ID for every agent, forming the foundation for authentication and authorization.  Capabilities: A clear declaration of what the agent is designed to do, including the tools, resources, and prompts it can access.  Lifecycle Status: Tracking whether an agent is in development, production, or retired.  Ownership and Lineage: Connecting each agent to a business owner, use case, and the data it interacts with.  Activity Monitoring: Recording when agents were last used and their registration dates.This centralized view eliminates blind spots and provides the traceability required for compliance and security audits. Organizations can quickly answer critical questions: How many agents do we have? Who owns them? What are they authorized to do?Pillar 2: A Curated Marketplace for Agent Tools (MCPs)Autonomous agents are only as powerful as the tools they can access. The Model Context Protocol (MCP) has become a standard for providing agents with these tools, but an uncontrolled proliferation of MCP servers creates another layer of risk. A private registry addresses this by functioning as a curated, internal “app store” or marketplace for MCPs.An MCP Registry, like this one from Agentic Trust, allows enterprises to create a governed marketplace of approved tools for their AI agents.Instead of allowing agents to connect to any public MCP, the enterprise can define a catalog of approved, vetted, and secure tools. As shown in the Agentic Trust MCP Registry, this allows organizations to:  Enforce Security Standards: Ensure that all available tools meet enterprise security and compliance requirements before they’re made available to agents.  Manage Versions and Dependencies: Control which versions of tools are used, preventing unexpected breaking changes that could disrupt agent operations.  Control Costs: Monitor the usage of paid APIs and tools, preventing runaway costs from autonomous agents making thousands of requests.  Improve Developer Productivity: Provide a central place for developers to discover and reuse existing tools, accelerating agent development and reducing duplication.  Categorize and Organize: Group tools by function (productivity, collaboration, payments, development, monitoring) to make discovery easier.The registry shows connection status for each MCP server, making it immediately visible which integrations are active and which require attention. This operational visibility is critical for maintaining a healthy agentic ecosystem.Pillar 3: End-to-End Governance and Policy EnforcementA private registry is the enforcement point for enterprise AI policy. It moves governance from a manual, after-the-fact process to an automated, built-in function of the agentic infrastructure. Drawing on best practices from platforms like Collibra and Microsoft Azure’s private registry implementations, this includes [1, 2]:Mandatory Metadata and Documentation: Before an agent or MCP can be registered, developers must provide essential information such as data classification, business owner, purpose, and criticality. This ensures that every component in the agentic ecosystem is properly documented and understood.Lifecycle Policy Alignment: The registry can embed automated policy checks at each stage of an agent’s lifecycle. For example, an agent cannot be promoted to production without a completed security review, ethical bias assessment, and approval from the designated business owner. This creates natural checkpoints that enforce organizational standards.Access Control and Permissions: Using Role-Based Access Control (RBAC), integrated with enterprise identity systems like Entra ID or Okta, the registry defines who can create, manage, and consume agents and their tools. Different teams might have different levels of access based on their role and the sensitivity of the agents they’re working with.Audit Trails and Compliance: Every action in the registry—agent registration, tool connection, permission changes—is logged and auditable. This creates a complete forensic trail that satisfies regulatory requirements and enables rapid incident response when issues arise.Pillar 4: Solving Real Enterprise ChallengesThe value of a private registry becomes clear when we examine the specific problems it solves. Consider these common enterprise scenarios:Challenge: Shadow AI and Uncontrolled Tool AdoptionDevelopment teams are rapidly adopting AI tools and MCP servers without central oversight. This creates security blind spots, compliance risks, and operational fragmentation across the organization. A private registry provides centralized discovery of approved tools and usage visibility, allowing security teams to monitor what tools are being used and by whom [2].Challenge: Regulatory Compliance and Data SovereigntyOrganizations in regulated industries (financial services, healthcare, government) need to maintain strict control over data flows and ensure AI tools meet compliance requirements. The registry enables data classification tagging for MCP servers, geographic controls for region-specific availability, comprehensive audit trails, and pre-configured compliance templates [2].Challenge: Cost Control and Resource OptimizationWithout visibility into agent and tool usage, organizations face unpredictable costs as autonomous agents make API calls and consume resources. A private registry provides usage analytics, cost allocation by team or project, budget alerts, and the ability to deprecate underutilized or expensive tools [2].Challenge: Developer Productivity and Tool DiscoveryDevelopers waste time rebuilding integrations that already exist elsewhere in the organization or struggle to find the right tools for their agents. The registry solves this with searchable catalogs, reusable components, standardized integration patterns, and clear documentation for each available tool [3].The Architecture That Enables ScaleBehind the user interface of platforms like Agentic Trust lies a sophisticated architecture that makes enterprise-scale agent management possible. The key components include [3, 4]:            Component      Purpose                  Central Registry API      Provides standardized endpoints for agent and MCP registration, discovery, and management              Metadata Database      Stores agent cards, capability declarations, and relationship data              Policy Engine      Enforces governance rules, access controls, and compliance checks              Discovery Service      Enables capability-based search and intelligent agent-to-tool matching              Health Monitor      Tracks agent and MCP server availability through heartbeats and health checks              Integration Layer      Connects to enterprise identity systems, monitoring tools, and DevOps pipelines      This architecture mirrors patterns from successful enterprise software registries, such as container registries, API management platforms, and model registries. The lesson is clear: as a technology becomes critical to enterprise operations, it requires industrial-grade management infrastructure.The Path ForwardThe trend toward private registries for agentic infrastructure is not a passing fad; it is a necessary evolution in response to the rapid adoption of autonomous AI systems. As the Model Context Protocol ecosystem continues to grow, with the official MCP Registry serving as a public catalog [4], forward-thinking enterprises are building their own private implementations to maintain control, security, and governance.Platforms like Agentic Trust demonstrate what this future looks like: a unified command center where every agent is visible, every tool is vetted, and every action is governed by policy. This is how organizations move from the chaos of unmanaged AI to the strategic advantage of a well-orchestrated agentic ecosystem.For enterprises embarking on this journey, the message is clear: you cannot scale what you cannot see, and you cannot govern what you cannot control. A private registry is the foundation upon which responsible, secure, and effective agentic AI is built.References:[1] Collibra. (2025, October 6). Collibra AI agent registry: Governing autonomous AI agents. Collibra.[2] Bajada, AJ. (2025, August 14). DevOps and AI Series: Azure Private MCP Registry. azurewithaj.com.[3] TrueFoundry. (2025, September 10). What is AI Agent Registry. TrueFoundry.[4] Model Context Protocol. (2025, September 8). Introducing the MCP Registry. Model Context Protocol.",
      "views": 115,
      "reading_minutes": 7,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "Agents",
            "slug": "agents",
            "url": "/tags/agents/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Agent Registry",
            "slug": "agent-registry",
            "url": "/tags/agent-registry/#posts"
          },
        
          
          {
            "name": "Enterprise AI",
            "slug": "enterprise-ai",
            "url": "/tags/enterprise-ai/#posts"
          },
        
          
          {
            "name": "Governance",
            "slug": "governance",
            "url": "/tags/governance/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "Infrastructure",
            "slug": "infrastructure",
            "url": "/tags/infrastructure/#posts"
          },
        
          
          {
            "name": "Private Registry",
            "slug": "private-registry",
            "url": "/tags/private-registry/#posts"
          },
        
          
          {
            "name": "AI Management",
            "slug": "ai-management",
            "url": "/tags/ai-management/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
    
    
    
    {
      "kind": "post",
      "title": "From Espionage to Identity: Securing the Future of Agentic AI",
      "url": "/2025/11/14/from-espionage-to-identity-securing-the-future-of-agentic-ai/",
      "date_display": "November 14, 2025",
      "date_iso": "2025-11-14",
      "excerpt": "Anthropic has detailed its disruption of the first publicly reported cyber espionage campaign orchestrated by a sophisticated AI agent. The incident, attributed to state-sponsored group GTG-1002, signals that the age of autonomous, agentic AI threats is here. This post dissects the anatomy of the attack and explores how emerging standards like OpenID Connect for Agents (OIDC-A) provide a necessary path forward.",
      "content": "Anthropic has detailed its disruption of the first publicly reported cyber espionage campaign orchestrated by a sophisticated AI agent [1]. The incident, attributed to a state-sponsored group designated GTG-1002, is more than just a security bulletin; it is a clear signal that the age of autonomous, agentic AI threats is here. It also serves as a critical case study, validating the urgent need for a new generation of identity and access management protocols specifically designed for AI.This post will dissect the anatomy of the attack, connect it to the foundational security challenges facing agentic AI, and explore how emerging standards like OpenID Connect for Agents (OIDC-A) provide a necessary path forward [2, 3].Anatomy of an AI-Orchestrated AttackAnthropic’s investigation revealed a campaign of unprecedented automation. The attackers turned Anthropic’s own Claude Code model into an autonomous weapon, targeting approximately thirty global organizations across technology, finance, and government. The AI was not merely an assistant; it was the operator, executing 80-90% of the tactical work with human intervention only required at a few key authorization gates [1].The technical sophistication of the attack did not lie in novel malware, but in orchestration. The threat actor built a custom framework around a series of Model Context Protocol (MCP) servers. These servers acted as a bridge, giving the AI agent access to a toolkit of standard, open-source penetration testing utilities—network scanners, password crackers, and database exploitation tools.By decomposing the attack into seemingly benign sub-tasks, the attackers tricked the AI into executing a complex intrusion campaign. The AI agent, operating with a persona of a legitimate security tester, autonomously performed reconnaissance, vulnerability analysis, and data exfiltration at a machine-speed that no human team could match.The MCP Paradox: Extensibility vs. SecurityThe Anthropic report explicitly states that the attackers leveraged the Model Context Protocol (MCP) to arm their AI agent [1]. This highlights a central paradox in agentic AI architecture: the very protocols designed for extensibility and power, like MCP, can become the most potent attack vectors.As the “Identity Management for Agentic AI” whitepaper notes, MCP is a leading framework for connecting AI to external tools, but it also presents significant security challenges [3]. When an AI can dynamically access powerful tools without robust oversight, it creates a direct and dangerous path for misuse. The GTG-1002 campaign is a textbook example of this risk realized.This forces a critical re-evaluation of how we architect agentic systems. We can no longer afford to treat the connection between an AI agent and its tools as a trusted channel. This is where the concept of an MCP Gateway or Proxy becomes not just a good idea, but an absolute necessity.The Solution: Identity, Delegation, and Zero Trust for AgentsThe security gaps exploited in the Anthropic incident are precisely what emerging standards like OIDC-A (OpenID Connect for Agents) are designed to close [2, 3]. The core problem is one of identity and authority. The AI agent in the attack acted with borrowed, indistinct authority, effectively impersonating a legitimate user or process. True security requires a shift to a model of explicit, verifiable delegation.The OIDC-A proposal introduces a framework for establishing the identity of an AI agent and managing its authorization through cryptographic delegation chains. This means an agent is no longer just a proxy for a user; it is a distinct entity with its own identity, operating on behalf of a user with a clearly defined and constrained set of permissions.Here’s how this new model, enforced by an MCP Gateway, would have mitigated the Anthropic attack:            Security Layer      Description                  Agent Identity &amp; Attestation      The AI agent would have a verifiable identity, attested by its provider. An MCP Gateway could immediately block any requests from unattested or untrusted agents.              Tool-Level Delegation      Instead of broad permissions, the agent would receive narrowly-scoped, delegated authority for specific tools. The OIDC-A delegation_chain ensures that the agent’s permissions are a strict subset of the delegating user’s permissions [2]. An agent designed for code analysis could never be granted access to a password cracker.              Policy Enforcement &amp; Anomaly Detection      The MCP Gateway would act as a policy enforcement point, monitoring all tool requests. It could detect anomalous behavior, such as an agent attempting to use a tool outside its delegated scope or a sudden spike in high-risk tool usage, and automatically terminate the agent’s session.              Auditing and Forensics      Every tool request and delegation would be cryptographically signed and logged, creating an immutable audit trail. This would provide immediate, granular visibility into the agent’s actions, dramatically accelerating incident response.      Building Enterprise-Grade Security for Agentic AIThe Anthropic report is a watershed moment. It proves that the threats posed by agentic AI are no longer theoretical. As the “Identity Management for Agentic AI” paper argues, we must move beyond traditional, human-centric security models and build a new foundation for AI identity [3].Today, most MCP servers being developed are experimental tools designed for individual developers and small-scale applications. They lack the enterprise-grade security controls that organizations require to deploy them in production environments. For enterprises to confidently adopt agentic AI systems built on protocols like MCP, we need to fundamentally rethink how we approach security.The path forward requires building robust delegation frameworks, implementing proper identity management for AI agents, and creating enterprise-grade security controls like gateways and policy enforcement points. We need solutions that provide:  Cryptographic delegation chains that clearly define and constrain agent permissions  Real-time policy enforcement that can detect and prevent anomalous behavior  Comprehensive audit trails that enable forensic analysis and compliance  Zero-trust architectures where every agent action is verified and authorizedWe cannot afford to let the open, extensible nature of protocols like MCP become a permanent backdoor for malicious actors. The future of agentic AI depends on our ability to build security into these systems from the ground up, making enterprise adoption not just possible, but secure and responsible.References:[1] Anthropic. (2025, November). Disrupting the first reported AI-orchestrated cyber espionage campaign. Anthropic.[2] Subramanya, N. (2025, April 28). OpenID Connect for Agents (OIDC-A) 1.0 Proposal. subramanya.ai.[3] South, T. (Ed.). (2025, October). Identity Management for Agentic AI: The new frontier of authorization, authentication, and security for an AI agent world. arXiv.",
      "views": 0,
      "reading_minutes": 5,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "Agentic AI",
            "slug": "agentic-ai",
            "url": "/tags/agentic-ai/#posts"
          },
        
          
          {
            "name": "OIDC-A",
            "slug": "oidc-a",
            "url": "/tags/oidc-a/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Anthropic",
            "slug": "anthropic",
            "url": "/tags/anthropic/#posts"
          },
        
          
          {
            "name": "Claude",
            "slug": "claude",
            "url": "/tags/claude/#posts"
          },
        
          
          {
            "name": "Cybersecurity",
            "slug": "cybersecurity",
            "url": "/tags/cybersecurity/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "Identity Management",
            "slug": "identity-management",
            "url": "/tags/identity-management/#posts"
          },
        
          
          {
            "name": "Zero Trust",
            "slug": "zero-trust",
            "url": "/tags/zero-trust/#posts"
          }
        
      ]
    },
  
    
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Claude Skills vs. MCP: A Tale of Two AI Customization Philosophies",
      "url": "/2025/10/30/claude-skills-vs-mcp-a-tale-of-two-ai-customization-philosophies/",
      "date_display": "October 30, 2025",
      "date_iso": "2025-10-30",
      "excerpt": "Anthropic has introduced two powerful but distinct approaches to AI customization: Claude Skills and the Model Context Protocol (MCP). While both aim to make AI more useful and integrated into our workflows, they operate on fundamentally different principles. This post explores their differences, synergies, and the exciting future they represent.",
      "content": "In the rapidly evolving landscape of artificial intelligence, the ability to customize and extend the capabilities of large language models (LLMs) has become a critical frontier. Anthropic, a leading AI research company, has introduced two powerful but distinct approaches to this challenge: Claude Skills and the Model Context Protocol (MCP). While both aim to make AI more useful and integrated into our workflows, they operate on fundamentally different principles. This post delves into a detailed comparison of Claude Skills and MCP, explores whether they can or should be merged, and discusses the exciting future of AI customization they represent.What are Claude Skills? The Power of Procedural KnowledgeClaude Skills, also known as Agent Skills, are a revolutionary way to teach Claude how to perform specific tasks in a repeatable and customized manner. At its core, a Skill is a folder containing a SKILL.md file, which includes instructions, resources, and even executable code. Think of Skills as a set of standard operating procedures for the AI. For example, a Skill could instruct Claude on how to format a weekly report, adhere to a company’s brand guidelines, or analyze data using a specific methodology.The genius of Claude Skills lies in their architecture, which is built on a principle called progressive disclosure. This three-tiered system ensures that Claude’s context window isn’t overwhelmed with information:      Level 1: Metadata: When a session starts, Claude loads only the name and description of each available Skill. This is a very lightweight process, consuming only a few tokens per Skill.        Level 2: The SKILL.md file: If Claude determines that a Skill is relevant to the user’s request, it then loads the full content of the SKILL.md file.        Level 3 and beyond: Additional resources: If the SKILL.md file references other documents or scripts within the Skill’s folder, Claude will load them only when needed.  This efficient, just-in-time loading mechanism allows for a vast library of Skills to be available without sacrificing performance. Skills are also portable, working across Claude.ai, Claude Code, and the API, and can even include executable code for deterministic and reliable operations.What is the Model Context Protocol (MCP)? The Universal ConnectorThe Model Context Protocol (MCP) is an open-source standard designed to connect AI applications to external systems. If Claude Skills are about teaching the AI how to do something, MCP is about giving it access to what it needs to do it. MCP acts as a universal connector, similar to a USB-C port for AI, allowing models like Claude to interact with a wide range of data sources, tools, and workflows.MCP operates on a client-server architecture:      MCP Host: The AI application (e.g., Claude) that manages connections to various external systems.        MCP Client: A component within the host that maintains a one-to-one connection with an MCP server.        MCP Server: A program that exposes tools, resources, and prompts from an external system to the AI.  This architecture allows an AI to connect to multiple external systems simultaneously, from local files and databases to remote services like GitHub, Slack, or a company’s internal APIs. MCP is built on a two-layer architecture, with a data layer based on JSON-RPC 2.0 and a transport layer that supports both local and remote connections.The Core Difference: Methodology vs. ConnectivityThe fundamental distinction between Claude Skills and MCP can be summarized as methodology versus connectivity. MCP provides the AI with access to tools and data, while Skills provide the instructions on how to use them effectively. According to Anthropic’s own documentation:  “MCP connects Claude to external services and data sources. Skills provide procedural knowledge—instructions for how to complete specific tasks or workflows. You can use both together: MCP connections give Claude access to tools, while Skills teach Claude how to use those tools effectively.”This highlights that Skills and MCP are not competing technologies but are, in fact, complementary. An apt analogy is that of a master chef. MCP provides the chef with a fully stocked pantry of ingredients and a set of high-end kitchen appliances (the what). Skills, on the other hand, are the chef’s personal recipe book and techniques, guiding them on how to combine the ingredients and use the appliances to create a culinary masterpiece.            Feature      Claude Skills      Model Context Protocol (MCP)                  Primary Purpose      Procedural knowledge and methodology      Connectivity to external systems              Architecture      Filesystem-based with progressive disclosure      Client-server with JSON-RPC 2.0              Core Concept      Teaching the AI how to do something      Giving the AI access to what it needs              Dependency      Requires a code execution environment      A client and a server implementation              Token Efficiency      Very high due to progressive disclosure      Moderate, with tool descriptions in context              Portability      Across Claude interfaces      Open standard for any LLM      Can a Claude Skill be an MCP? And Should They Be Merged?Given that both are Anthropic’s creations, a natural question arises: could a Claude Skill be implemented as an MCP, or should the two be merged into a single, unified system? While technically possible to create an MCP server that exposes Skills, it would be architecturally inefficient and would defeat the purpose of both systems.Exposing Skills through MCP would negate the benefits of progressive disclosure, as it would introduce the overhead of the MCP protocol for what should be a simple filesystem read. It would also create a redundant abstraction layer, as Skills already require a local code execution environment. The two systems are designed for different purposes and have different optimization goals: Skills for context efficiency within Claude, and MCP for standardized integration across different AI systems.Therefore, Claude Skills and MCP should be treated as independent, complementary technologies. The most powerful workflows will come from using them in synergy.The Power of Synergy: Using Skills and MCP TogetherThe true potential of these technologies is unlocked when they are used in concert. Here are a few integration patterns that showcase their combined power:      Skills as MCP Orchestrators: A Skill can contain a complex workflow that orchestrates calls to multiple MCP servers. For example, a “Deploy and Notify” Skill could contain a deployment checklist, notification templates, and rollback procedures. It would then use MCP to access GitHub for code, a CI/CD server for deployment, and Slack for notifications.        Skills for MCP Configuration: An organization can create Skills that teach Claude its specific standards for using MCP tools. For example, a “GitHub Workflow Standards” Skill could contain instructions on branch naming conventions, pull request review checklists, and commit message templates, ensuring that Claude uses the GitHub MCP server in a way that aligns with the company’s best practices.        Hybrid Skills: A Skill can contain embedded code that makes calls to an MCP server. This is useful for self-contained workflows that need to fetch external data.  The Future: A Marketplace for Skills and an Ecosystem for MCPThe future of AI customization will likely see the development of a vibrant Skills Marketplace. Similar to the app stores for our smartphones or the extension marketplaces for our code editors, a Skills Marketplace would allow developers to publish, share, and even sell Skills. This could create a new economy around AI expertise, with a wide range of Skills available, from free, community-contributed Skills to premium, industry-specific Skill packages for domains like law, medicine, or finance.Simultaneously, the MCP ecosystem will continue to grow, with more and more tools and services exposing their functionality through MCP servers. This will create a virtuous cycle: as more tools become available through MCP, the demand for Skills that can effectively use those tools will increase.ConclusionClaude Skills and the Model Context Protocol represent two distinct but complementary philosophies of AI customization. MCP is the universal connector, providing the what—the access to tools and data. Skills are the procedural knowledge, providing the how—the instructions and methodology. They are not competitors but partners in the quest to create more powerful, personalized, and integrated AI assistants. The future of AI workflows will not be about choosing between Skills or MCP, but about leveraging the power of Skills and MCP to create intelligent systems that are truly tailored to our needs.References:[1] Anthropic. (2025, October 16). Claude Skills: Customize AI for your workflows. Anthropic.[2] Anthropic. (2025, October 16). Equipping agents for the real world with Agent Skills. Anthropic.[3] Model Context Protocol. (n.d.). What is the Model Context Protocol (MCP)? Model Context Protocol.[4] Model Context Protocol. (n.d.). Architecture overview. Model Context Protocol.[5] Willison, S. (2025, October 16). Claude Skills are awesome, maybe a bigger deal than MCP. Simon Willison’s Weblog.[6] Claude Help Center. (n.d.). What are Skills? Claude Help Center.[7] IntuitionLabs. (2025, October 27). Claude Skills vs. MCP: A Technical Comparison for AI Workflows. IntuitionLabs.",
      "views": 3281,
      "reading_minutes": 7,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "Claude",
            "slug": "claude",
            "url": "/tags/claude/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Claude Skills",
            "slug": "claude-skills",
            "url": "/tags/claude-skills/#posts"
          },
        
          
          {
            "name": "Agent Skills",
            "slug": "agent-skills",
            "url": "/tags/agent-skills/#posts"
          },
        
          
          {
            "name": "AI Customization",
            "slug": "ai-customization",
            "url": "/tags/ai-customization/#posts"
          },
        
          
          {
            "name": "LLM",
            "slug": "llm",
            "url": "/tags/llm/#posts"
          },
        
          
          {
            "name": "Anthropic",
            "slug": "anthropic",
            "url": "/tags/anthropic/#posts"
          },
        
          
          {
            "name": "Integration",
            "slug": "integration",
            "url": "/tags/integration/#posts"
          },
        
          
          {
            "name": "Workflows",
            "slug": "workflows",
            "url": "/tags/workflows/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Beyond \"Non-Deterministic\": Deconstructing the Illusion of Randomness in LLMs",
      "url": "/2025/09/09/beyond-non-deterministic-deconstructing-the-illusion-of-randomness-in-llms/",
      "date_display": "September 9, 2025",
      "date_iso": "2025-09-09",
      "excerpt": "Attributing an LLM's behavior to 'non-determinism' is like blaming a complex system's emergent behavior on magic. It's an admission of incomprehension, not an explanation. The truth is far more fascinating and, for architects and engineers, far more critical to understand.",
      "content": "In the rapidly evolving lexicon of AI, few terms are as casually thrown around—and as fundamentally misunderstood—as “non-deterministic.” We use it to explain away unexpected outputs, to describe the creative spark of generative models, and to justify the frustrating brittleness of our AI-powered systems. But this term, borrowed from classical computer science, is not just imprecise when applied to Large Language Models (LLMs); it’s a conceptual dead end. It obscures the intricate, deterministic machinery humming beneath the surface and distracts us from the real architectural challenges we face.Attributing an LLM’s behavior to “non-determinism” is like blaming a complex system’s emergent behavior on magic. It’s an admission of incomprehension, not an explanation. The truth is far more fascinating and, for architects and engineers, far more critical to understand. LLMs are not mystical black boxes governed by chance. They are complex, stateful systems whose outputs are the result of a deterministic, albeit highly sensitive, process. The perceived randomness is not a feature; it is a symptom of a deeper architectural paradigm shift.This post will dismantle the myth of LLM non-determinism. We will explore why the term is a poor fit, dissect the underlying deterministic mechanisms that govern LLM behavior, and reframe the conversation around the true challenge: the profound difficulty of controlling a system whose behavior is an emergent property of its architecture. We will move beyond the simplistic notion of randomness and into the far more complex and rewarding territory of input ambiguity, ill-posed inverse problems, and the dawn of truly evolutionary software architectures.The Deterministic Heart of the LLMTo understand why “non-deterministic” is a misnomer, we must first revisit its classical definition. A deterministic algorithm, given a particular input, will always produce the same output. An LLM, at its core, is a mathematical function. It is a massive, intricate, but ultimately deterministic, series of calculations. Given the same model, the same weights, and the same input sequence, the same sequence of floating-point operations will occur, producing the same output logits.The illusion of non-determinism arises not from the model itself, but from the sampling strategies we apply to its output. The model’s final layer produces a vector of logits, one for each token in its vocabulary. These logits are then converted into a probability distribution via the softmax function. It is at this final step—the selection of the next token from this distribution—that we introduce controlled randomness.Temperature and Sampling: The Controlled Introduction of RandomnessThe temperature parameter is the primary lever we use to control this randomness. A temperature of 0 results in greedy decoding—a purely deterministic process where the token with the highest probability is always chosen. In theory, with a temperature of 0, an LLM should be perfectly deterministic. However, as many have discovered, even this is not a perfect guarantee. Minor differences in floating-point arithmetic across different hardware, or even different software library versions, can lead to minuscule variations in the logits, which can occasionally be enough to tip the balance in favor of a different token.When the temperature is set above 0, we enter the realm of stochastic sampling. The temperature value scales the logits before they are passed to the softmax function. A higher temperature flattens the probability distribution, making less likely tokens more probable. A lower temperature sharpens the distribution, making the most likely tokens even more dominant. This is not non-determinism in the classical sense; it is a controlled, probabilistic process. We are not dealing with a system that can arbitrarily choose its next state; we are dealing with a system that makes a weighted random choice from a set of possibilities whose probabilities are deterministically calculated.Other sampling techniques, such as top-k and top-p (nucleus) sampling, further refine this process. Top-k sampling restricts the choices to the k most likely tokens, while top-p sampling selects from the smallest set of tokens whose cumulative probability exceeds a certain threshold. These are all mechanisms for shaping and constraining the probabilistic selection process, not for introducing true non-determinism.Demonstrating Determinism: A Concrete ExampleConsider this simple demonstration using a transformer model with temperature set to 0:from transformers import AutoModelForCausalLM, AutoTokenizermodel_id = \"microsoft/DialoGPT-medium\"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id)prompt = \"The future of artificial intelligence is\"inputs = tokenizer(prompt, return_tensors=\"pt\")# Run the same generation 10 times with temperature=0outputs = []for i in range(10):    generated = model.generate(        inputs['input_ids'],        max_length=50,        temperature=0.0,  # Deterministic        do_sample=False,  # Greedy decoding        pad_token_id=tokenizer.eos_token_id    )    text = tokenizer.decode(generated[0], skip_special_tokens=True)    outputs.append(text)# All outputs should be identicalassert all(output == outputs[0] for output in outputs)This code will pass its assertion in most cases, demonstrating the deterministic nature of the underlying model. However, the occasional failure of this assertion—due to hardware differences, library versions, or floating-point precision variations—illustrates why even “deterministic” settings cannot guarantee perfect reproducibility across all environments.The Real Culprit: Input Ambiguity and the Ill-Posed Inverse ProblemIf the LLM itself is fundamentally deterministic, why is it so hard to get the output we want? The answer lies not in the forward pass of the model, but in the inverse problem we are trying to solve. When we interact with an LLM, we are not simply providing an input and observing an output. We are attempting to solve an inverse problem: we have a desired output in mind, and we are trying to find the input prompt that will produce it.This is where the concept of a well-posed problem, as defined by the mathematician Jacques Hadamard, becomes critical. A problem is well-posed if it satisfies three conditions:  Existence: A solution exists.  Uniqueness: The solution is unique.  Stability: The solution’s behavior changes continuously with the initial conditions.Prompt engineering, when viewed as an inverse problem, fails on all three counts.  Existence: The specific output we desire may not be achievable by any possible prompt. The model’s latent space may not contain a representation that perfectly matches our intent.  Uniqueness: There are often many different prompts that can produce very similar outputs. This is the problem of prompt equivalence, and it makes it difficult to find the single “best” prompt.  Stability: This is the most frustrating aspect of prompt engineering. A tiny, seemingly insignificant change to a prompt can lead to a radically different output. This lack of stability is what makes LLM-based systems feel so brittle and unpredictable.This is what people are really talking about when they say LLMs are “non-deterministic.” They are not talking about a lack of determinism in the model’s execution; they are talking about the ill-posed nature of the inverse problem they are trying to solve. The model is not random; our ability to control it is simply imprecise.The Mathematics of Prompt SensitivityThe sensitivity of LLMs to prompt variations can be understood through the lens of chaos theory and dynamical systems. Small perturbations in the input space can lead to dramatically different trajectories through the model’s latent space. This is not randomness; it is sensitive dependence on initial conditions—a hallmark of complex deterministic systems.Consider the mathematical representation of this sensitivity. If we denote our prompt as a vector p in the input space, and the model’s output as a function f(p), then the sensitivity can be expressed as:||f(p + δp) - f(p)|| &gt;&gt; ||δp||Where δp represents a small change to the prompt, and the double bars represent vector norms. This inequality shows that small changes in input can produce disproportionately large changes in output—the mathematical signature of a chaotic system, not a random one.This sensitivity is further amplified by the autoregressive nature of text generation. Each token prediction depends on all previous tokens, creating a cascade effect where early variations compound exponentially. A single different token early in the generation can completely alter the semantic trajectory of the entire output.The Architectural Shift: From Predictable Execution to Emergent BehaviorThis reframing from non-determinism to input ambiguity has profound implications for how we design and build systems that incorporate LLMs. For decades, software architecture has been predicated on the assumption of predictable execution. We design systems with the expectation that a given component, when provided with a specific input, will behave in a known and repeatable manner. This is the foundation of everything from unit testing to microservices architecture.AI agents, powered by LLMs, shatter this assumption. They do not simply execute our designs; they exhibit emergent behavior. The system’s behavior is not explicitly defined by the architect, but emerges from the complex interplay of the model’s weights, the input prompt, the sampling strategy, and the context of the interaction. This is a fundamental shift from a mechanical to a biological metaphor for software. We are no longer building machines that execute instructions; we are cultivating ecosystems where intelligent agents adapt and evolve.This has several immediate architectural consequences:  The Death of the Static API Contract: In a traditional microservices architecture, the API contract is sacrosanct. In an agent-based system, the “contract” is fluid and context-dependent. The same functional goal may be achieved through different series of actions depending on the nuances of the initial prompt and the state of the system.  The Rise of Intent-Driven Design: Instead of specifying the exact steps a system should take, we must design systems that can understand and act on user intent. This requires a shift from imperative to declarative interfaces, where we specify what we want, not how to achieve it.  The Need for Robust Observability: When a system’s behavior is emergent, we can no longer rely on traditional logging and monitoring. We need new tools and techniques for observing and understanding the behavior of agent-based systems. This includes not just monitoring for errors, but also for unexpected successes and novel solutions.Engineering for Emergence: Practical ApproachesUnderstanding that LLMs are deterministic but sensitive systems opens up new avenues for engineering robust AI-powered applications. Rather than fighting the sensitivity, we can design systems that work with it.Ensemble Methods and Consensus MechanismsOne approach is to embrace the variability through ensemble methods. Instead of trying to get a single “perfect” output, we can generate multiple outputs and use consensus mechanisms to select the best result. This approach treats the sensitivity as a feature, not a bug, allowing us to explore the space of possible outputs and select the most appropriate one.def consensus_generation(model, prompt, n_samples=5, temperature=0.7):    \"\"\"Generate multiple outputs and select based on consensus.\"\"\"    outputs = []    for _ in range(n_samples):        output = model.generate(prompt, temperature=temperature)        outputs.append(output)        # Use semantic similarity or other metrics to find consensus    return select_consensus_output(outputs)Prompt Optimization Through Gradient-Free MethodsSince the prompt-to-output mapping is not differentiable in the traditional sense, we must rely on gradient-free optimization methods. Techniques from evolutionary computation, such as genetic algorithms or particle swarm optimization, can be adapted to search the prompt space more effectively.Architectural Patterns for Agent SystemsThe shift from deterministic to emergent behavior requires new architectural patterns:      Circuit Breakers for AI: Traditional circuit breakers protect against cascading failures. AI circuit breakers must protect against semantic drift and unexpected behavior patterns.        Semantic Monitoring: Instead of monitoring for technical failures, we must monitor for semantic coherence and goal alignment.        Adaptive Retry Logic: Rather than simple exponential backoff, AI systems need retry logic that can adapt the prompt or approach based on the nature of the failure.  Conclusion: Embracing the ComplexityThe term “non-deterministic” is a crutch. It allows us to avoid the difficult but necessary work of understanding the true nature of LLM-based systems. By retiring this term from our vocabulary, we can begin to have a more honest and productive conversation about the real challenges and opportunities that lie ahead.We are not building random number generators; we are building the first generation of truly evolutionary software. These systems are not unpredictable because they are random, but because they are complex. They are not uncontrollable because they are non-deterministic, but because our methods of control are still in their infancy.The path forward lies not in trying to force LLMs into the old paradigms of predictable execution, but in developing new architectural patterns that embrace the reality of emergent behavior. We must become less like mechanical engineers and more like gardeners. We must learn to cultivate, guide, and prune these systems, rather than simply designing and building them.The architectural revolution is here. It’s time to update our vocabulary to match.",
      "views": 432,
      "reading_minutes": 11,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "LLM",
            "slug": "llm",
            "url": "/tags/llm/#posts"
          },
        
          
          {
            "name": "Determinism",
            "slug": "determinism",
            "url": "/tags/determinism/#posts"
          },
        
          
          {
            "name": "Architecture",
            "slug": "architecture",
            "url": "/tags/architecture/#posts"
          },
        
          
          {
            "name": "Machine Learning",
            "slug": "machine-learning",
            "url": "/tags/machine-learning/#posts"
          },
        
          
          {
            "name": "Prompt Engineering",
            "slug": "prompt-engineering",
            "url": "/tags/prompt-engineering/#posts"
          },
        
          
          {
            "name": "Emergence",
            "slug": "emergence",
            "url": "/tags/emergence/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "The Architectural Revolution: Why AI Agents Shatter Traditional Design Patterns",
      "url": "/2025/07/21/the-architectural-revolution-why-ai-agents-shatter-traditional-design-patterns/",
      "date_display": "July 21, 2025",
      "date_iso": "2025-07-21",
      "excerpt": "For decades, software architects have operated under a fundamental assumption: we design systems, and systems execute our designs. AI agents are rewriting this contract entirely. Unlike the monoliths and microservices that came before them, AI agents don't just execute architecture—they evolve it.",
      "content": "For decades, software architects have operated under a fundamental assumption: we design systems, and systems execute our designs. We draw diagrams, define interfaces, and specify behaviors. Our applications dutifully follow these blueprints, calling the APIs we’ve mapped out, processing data through the pipelines we’ve constructed, and failing in the predictable ways we’ve anticipated.AI agents are rewriting this contract entirely.Unlike the monoliths and microservices that came before them, AI agents don’t just execute architecture—they evolve it. They make decisions we never programmed, forge connections we never specified, and solve problems through paths we never imagined. This isn’t simply a new deployment pattern or communication protocol. It’s the emergence of the first truly evolutionary software architecture, where systems adapt, learn, and fundamentally change their own structure during runtime.The implications stretch far beyond adding “AI capabilities” to existing systems. We’re witnessing the birth of software that exhibits emergent properties, where the whole becomes genuinely greater than the sum of its parts. For software architects, this represents both an unprecedented opportunity and a fundamental challenge to everything we thought we knew about building reliable, scalable systems.The Architecture DNA: From Blueprints to EvolutionTo understand why AI agents represent such a radical departure, we need to examine the architectural DNA that has shaped software development for the past several decades. Each major architectural pattern emerged to solve specific problems of its era, but also carried forward certain assumptions about how software systems should behave.timeline    title Architectural Evolution: From Control to Emergence        section Monolithic Era        1990s-2000s : Single Deployable Unit                    : Centralized Control                    : Predictable Execution                    : Shared Memory Model        section Microservices Era          2010s-2020s : Distributed Services                    : Service Boundaries                    : API Contracts                    : Orchestrated Workflows        section Agent Era        2020s-Future : Autonomous Entities                     : Emergent Behavior                     : Self-Organizing Networks                     : Evolutionary ArchitectureThe monolithic era gave us centralized control and predictable execution paths. Every function call, every data transformation, every business rule was explicitly coded and deterministically executed. When something went wrong, we could trace through the call stack and identify exactly where the failure occurred. The system was complicated, but it was knowable.Microservices introduced distributed complexity but maintained the fundamental assumption of designed behavior. We broke our monoliths into smaller, more manageable pieces, but each service still executed predetermined logic through well-defined APIs. The communication patterns became more complex, but they remained static and predictable. We could still draw service maps and dependency graphs that accurately represented how our systems would behave in production.AI agents shatter this predictability entirely. They don’t just execute code—they reason, adapt, and make autonomous decisions based on context, goals, and learned patterns. An agent tasked with “optimizing system performance” might decide to scale certain services, modify caching strategies, or even restructure data flows—all without explicit programming for these specific actions. The system’s behavior emerges from the interaction of autonomous entities rather than from predetermined design specifications.This shift from designed to emergent behavior represents more than just a technical evolution. It’s a fundamental change in how we think about software systems themselves. We’re moving from mechanical metaphors—where systems are machines that execute instructions—to biological ones, where systems are living entities that adapt and evolve.The Fundamental Differences: Decision-Making in the Age of AutonomyThe most profound difference between traditional architectures and agent-based systems lies not in their technical implementation, but in how decisions get made. This shift fundamentally alters the relationship between architects, systems, and runtime behavior.Decision-Making Patterns Across Architecturesgraph TD    subgraph \"Monolithic Decision Making\"        A1[User Request] --&gt; B1[Application Logic]        B1 --&gt; C1[Business Rules Engine]        C1 --&gt; D1[Database Query]        D1 --&gt; E1[Response]        style B1 fill:#ff9999        style C1 fill:#ff9999    end        subgraph \"Microservices Decision Making\"        A2[User Request] --&gt; B2[API Gateway]        B2 --&gt; C2[Service A]        B2 --&gt; D2[Service B]        C2 --&gt; E2[Service C]        D2 --&gt; E2        E2 --&gt; F2[Aggregated Response]        style C2 fill:#99ccff        style D2 fill:#99ccff        style E2 fill:#99ccff    end        subgraph \"Agent Decision Making\"        A3[Goal/Intent] --&gt; B3[Agent Network]        B3 --&gt; C3{Agent A&lt;br/&gt;Reasoning}        C3 --&gt;|Context 1| D3[Action Set 1]        C3 --&gt;|Context 2| E3[Action Set 2]        C3 --&gt;|Context 3| F3[Delegate to Agent B]        F3 --&gt; G3{Agent B&lt;br/&gt;Reasoning}        G3 --&gt; H3[Emergent Solution]        style C3 fill:#99ff99        style G3 fill:#99ff99        style H3 fill:#ffff99    endIn monolithic systems, decision-making follows a predetermined path through centralized business logic. The application contains all the rules, and execution is deterministic. Given the same input, you’ll always get the same output through the same code path.Microservices distribute decision-making across service boundaries, but each service still contains predetermined logic. The decision tree is distributed, but it’s still a tree—with predictable branches and outcomes. Service A will always call Service B under certain conditions, and Service B will always respond in predictable ways.Agent systems introduce autonomous reasoning at multiple points in the execution flow. Each agent evaluates context, considers multiple options, and makes decisions that weren’t explicitly programmed. More importantly, agents can decide to involve other agents, creating dynamic collaboration patterns that emerge based on the specific problem being solved.Communication Patterns: From Contracts to ConversationsThe communication patterns in agent systems represent an equally dramatic departure from traditional approaches:sequenceDiagram    participant U as User    participant G as API Gateway    participant A as Service A    participant B as Service B    participant D as Database        Note over U,D: Traditional Microservices Communication    U-&gt;&gt;G: HTTP Request    G-&gt;&gt;A: Predefined API Call    A-&gt;&gt;B: Predefined API Call    B-&gt;&gt;D: SQL Query    D--&gt;&gt;B: Result Set    B--&gt;&gt;A: JSON Response    A--&gt;&gt;G: JSON Response    G--&gt;&gt;U: HTTP Response        Note over U,D: Agent Communication (Same Goal)    U-&gt;&gt;G: Natural Language Intent    G-&gt;&gt;A: Goal + Context    A-&gt;&gt;A: Reasoning Process    A-&gt;&gt;B: Dynamic Request (Format TBD)    B-&gt;&gt;B: Reasoning Process    B-&gt;&gt;D: Optimized Query (Generated)    D--&gt;&gt;B: Result Set    B-&gt;&gt;B: Result Analysis    B--&gt;&gt;A: Insights + Recommendations    A-&gt;&gt;A: Solution Synthesis    A--&gt;&gt;G: Solution + Explanation    G--&gt;&gt;U: Natural Language ResponseTraditional microservices communicate through rigid contracts—predefined APIs with fixed schemas, expected response formats, and error codes. These contracts are designed at development time and remain static throughout the system’s lifecycle.Agent communication is fundamentally conversational. Agents negotiate what information they need, adapt their requests based on context, and can even invent new communication patterns on the fly. An agent might ask another agent for “insights about user behavior patterns” rather than requesting a specific dataset through a predetermined endpoint.This shift from contracts to conversations enables agents to solve problems that weren’t anticipated during system design. They can combine capabilities in novel ways, request information at different levels of abstraction, and collaborate to address complex scenarios that would require significant development effort in traditional systems.The Emergence Principle: When Systems Become Greater Than Their PartsPerhaps the most fascinating aspect of agent-based architectures is their capacity for emergence—the phenomenon where complex behaviors and capabilities arise from the interaction of simpler components. This isn’t just theoretical; it’s a practical reality that fundamentally changes how we think about system design and capability planning.System Behavior Emergencegraph TB    subgraph \"Traditional Systems: Additive Behavior\"        T1[Component A&lt;br/&gt;Capability X] --&gt; TR[System Capability&lt;br/&gt;X + Y + Z]        T2[Component B&lt;br/&gt;Capability Y] --&gt; TR        T3[Component C&lt;br/&gt;Capability Z] --&gt; TR        style TR fill:#ffcccc    end        subgraph \"Agent Systems: Emergent Behavior\"        A1[Agent A&lt;br/&gt;Reasoning + Action X] --&gt; E1[Emergent Capability α]        A2[Agent B&lt;br/&gt;Reasoning + Action Y] --&gt; E1        A3[Agent C&lt;br/&gt;Reasoning + Action Z] --&gt; E1                A1 --&gt; E2[Emergent Capability β]        A2 --&gt; E2                A1 --&gt; E3[Emergent Capability γ]        A3 --&gt; E3                E1 --&gt; ES[System Capabilities&lt;br/&gt;X + Y + Z + α + β + γ + ...]        E2 --&gt; ES        E3 --&gt; ES                style E1 fill:#99ff99        style E2 fill:#99ff99        style E3 fill:#99ff99        style ES fill:#ffff99    endIn traditional systems, the total capability is essentially the sum of individual component capabilities. If Service A handles user authentication, Service B manages inventory, and Service C processes payments, your system can authenticate users, manage inventory, and process payments. The capabilities are additive and predictable.Agent systems exhibit true emergence. When agents with reasoning capabilities interact, they can discover solutions and create capabilities that none of them possessed individually. An agent trained on customer service might collaborate with an agent focused on inventory management to automatically identify and resolve supply chain issues that affect customer satisfaction—a capability that emerges from their interaction rather than being explicitly programmed into either agent.This emergence isn’t random or chaotic. It follows patterns that we’re only beginning to understand. Agents tend to develop specialized roles based on their interactions and successes. They form temporary coalitions to solve complex problems, then dissolve and reform in different configurations for new challenges. The system develops a kind of organizational intelligence that adapts to changing conditions and requirements.The Unpredictability ParadoxThis emergent behavior creates what we might call the “unpredictability paradox” of agent systems. While individual agent behaviors may be somewhat predictable based on their training and constraints, the system-level behaviors that emerge from agent interactions are fundamentally unpredictable. Yet these unpredictable behaviors often represent the most valuable capabilities of the system.Consider a customer support scenario where multiple agents collaborate to resolve a complex issue. The customer service agent might identify that the problem requires technical expertise and automatically involve a technical support agent. The technical agent might determine that the issue is actually a product design flaw and involve a product development agent. The product agent might realize this represents a broader pattern and initiate a proactive communication campaign through a marketing agent.None of these individual agents were programmed to execute this specific workflow, yet their collaboration produces a comprehensive solution that addresses not just the immediate customer issue, but also prevents future occurrences and improves overall customer experience. This is emergence in action—system-level intelligence that arises from agent interactions rather than explicit programming.Design Implications for the Future: From Control to InfluenceThe shift to agent-based architectures requires a fundamental rethinking of design principles. Traditional software architecture focuses on control—defining exactly what the system should do and how it should do it. Agent architecture focuses on influence—creating conditions that guide autonomous entities toward desired outcomes.New Design Principles for Agent Systemsmindmap  root((Agent Architecture Design))    Traditional Principles      Explicit Control        Predetermined workflows        Fixed API contracts        Centralized decision making        Error handling by exception      Predictable Behavior        Deterministic execution        Static service topology        Known failure modes        Linear scalability    Agent-Era Principles      Emergent Guidance        Goal-oriented constraints        Adaptive communication protocols        Distributed reasoning        Learning from failures      Evolutionary Behavior        Self-modifying workflows        Dynamic capability discovery        Emergent failure recovery        Non-linear capability growthThis paradigm shift requires architects to think more like ecosystem designers than system engineers. Instead of specifying exact behaviors, we define environmental conditions, constraints, and incentive structures that encourage agents to develop desired capabilities and behaviors.From Specification to GuidanceTraditional architecture relies heavily on specification. We define interfaces, document expected behaviors, and create detailed system designs that teams implement. The assumption is that if we specify the system correctly, it will behave correctly.Agent architecture requires a shift to guidance-based design. We establish goals, define constraints, and create feedback mechanisms that help agents learn and adapt. Rather than specifying that “Service A should call Service B when condition X occurs,” we might establish that “agents should collaborate to optimize customer satisfaction while maintaining system performance within defined parameters.”This doesn’t mean abandoning all structure or control. Instead, it means designing systems that can evolve and adapt while maintaining alignment with business objectives and operational constraints. We’re moving from rigid blueprints to adaptive frameworks that can accommodate emergent behaviors while ensuring system reliability and security.The Role of the Architect in an Agent WorldThe architect’s role evolves from system designer to ecosystem curator. Key responsibilities shift toward:Constraint Design: Rather than defining exact behaviors, architects design constraint systems that guide agent decision-making toward desired outcomes while preventing harmful behaviors.Emergence Facilitation: Creating conditions that encourage beneficial emergent behaviors while providing mechanisms to detect and redirect problematic emergence patterns.Evolution Management: Establishing processes for monitoring system evolution, understanding emergent capabilities, and guiding the system’s development over time.Interaction Pattern Design: Defining frameworks for agent communication and collaboration that enable effective problem-solving while maintaining system coherence.This represents a fundamental shift from deterministic to probabilistic thinking. Instead of asking “What will this system do?” we ask “What is this system likely to do, and how can we influence those probabilities toward desired outcomes?”Conclusion: Embracing Architectural EvolutionThe transition from traditional architectures to agent-based systems represents more than just another technological evolution—it’s a fundamental shift in how we conceive of software systems themselves. We’re moving from a world where we build machines that execute our instructions to one where we cultivate ecosystems of autonomous entities that solve problems in ways we never imagined.This shift challenges many of our core assumptions about software architecture. The predictability and control that have been hallmarks of good system design become less relevant when systems can adapt and evolve autonomously. Instead, we need new frameworks for thinking about emergence, guidance, and evolutionary development.For software architects, this represents both an unprecedented opportunity and a significant challenge. The opportunity lies in building systems that can adapt to changing requirements, discover novel solutions, and continuously improve their capabilities without constant human intervention. The challenge lies in learning to design for emergence rather than control, and developing new skills for guiding evolutionary systems.The future belongs to architects who can embrace this uncertainty and learn to design systems that are robust enough to evolve safely, flexible enough to adapt to unexpected challenges, and aligned enough to maintain coherence with business objectives. We’re not just building the next generation of software—we’re participating in the emergence of truly intelligent systems that will reshape how we think about technology, automation, and human-computer collaboration.The architectural revolution is just beginning. The question isn’t whether agent-based systems will become dominant—it’s whether we’ll be ready to design and manage them effectively when they do.",
      "views": 131,
      "reading_minutes": 11,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "Agents",
            "slug": "agents",
            "url": "/tags/agents/#posts"
          },
        
          
          {
            "name": "Architecture",
            "slug": "architecture",
            "url": "/tags/architecture/#posts"
          },
        
          
          {
            "name": "Software Design",
            "slug": "software-design",
            "url": "/tags/software-design/#posts"
          },
        
          
          {
            "name": "Microservices",
            "slug": "microservices",
            "url": "/tags/microservices/#posts"
          },
        
          
          {
            "name": "Evolution",
            "slug": "evolution",
            "url": "/tags/evolution/#posts"
          },
        
          
          {
            "name": "Emergence",
            "slug": "emergence",
            "url": "/tags/emergence/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Do Agents Need Their Own Identity?",
      "url": "/2025/07/15/do-agents-need-their-own-identity/",
      "date_display": "July 15, 2025",
      "date_iso": "2025-07-15",
      "excerpt": "As AI agents become more sophisticated and autonomous, a fundamental question is emerging: should agents operate under user credentials, or do they need their own distinct identities? This isn't just a technical curiosity—it's a critical trust and security decision that will shape how we build reliable, accountable AI systems.",
      "content": "As AI agents become more sophisticated and autonomous, a fundamental question is emerging: should agents operate under user credentials, or do they need their own distinct identities? This isn’t just a technical curiosity—it’s a critical trust and security decision that will shape how we build reliable, accountable AI systems.The question gained prominence when an engineer asked: “Why can’t we just pass the user’s OIDC token through to the agent? Why complicate things with separate agent identities?” The answer reveals deeper implications for trust, security, and governance in our AI-driven future.When User Identity Works: The Simple CaseFor many AI agents today, user identity propagation works perfectly. Consider a Kubernetes troubleshooting agent that helps developers debug failing pods. When a user asks “why is my pod failing?”, the agent investigates pod events, logs, and configurations—all within the user’s existing RBAC permissions. The agent acts as an intelligent intermediary, but the user remains fully responsible for the actions and outcomes.This approach succeeds when agents operate as sophisticated tools: they work within the user’s session timeframe, perform clearly user-initiated actions, and maintain the user’s accountability. The trust model remains simple and familiar—the agent is merely an extension of the user’s capabilities.The Trust Gap: Where User Identity Falls ShortHowever, as agents become more autonomous and capable, this simple model breaks down in ways that create significant trust and security challenges.The Capability Mismatch ProblemImagine a marketing manager asking an AI agent to verify GDPR compliance for a new campaign. The manager has permissions to read and write marketing content, but the compliance agent needs far broader access: scanning marketing data across all departments, accessing audit logs, cross-referencing customer data with privacy regulations, and analyzing historical compliance patterns.Using the manager’s token creates an impossible choice: either the agent fails because it can’t access necessary resources, or the manager receives dangerously broad permissions they don’t need and shouldn’t have. Neither option serves security or operational needs effectively.The Attribution ChallengeMore concerning is the accountability problem that emerges with autonomous decision-making. Consider a supply chain optimization agent tasked with “optimizing hardware procurement.” The user never explicitly authorized accessing financial records or integrating with vendor APIs, yet the agent determines these actions are necessary to fulfill the optimization request.When the agent makes an automated purchase order that goes wrong, who bears responsibility? The user who made a high-level request, or the agent that made specific autonomous decisions based on its interpretation of that request? With only user identity, everything gets attributed to the user—creating a dangerous disconnect between authority and accountability.This attribution gap becomes critical for compliance, audit trails, and risk management. Organizations need to trace not just what happened, but who or what made each decision in the chain: user intent → agent interpretation → agent decision → system action.The Path Forward: Embracing Dual IdentityThe solution isn’t choosing between user and agent identity—it’s recognizing that both are necessary. This mirrors lessons from service mesh architectures, where zero trust requires considering both user identity and workload identity.In this dual model, agents operate within delegated authority from users while maintaining their own identity for the specific decisions they make. The user grants the agent permission to “optimize supply chain,” but the agent’s identity governs what resources it can access and what actions it can take within that scope.This approach offers several trust advantages: clearer attribution of decisions, more precise permission boundaries, better audit trails, and the ability to revoke or modify agent capabilities independently of user permissions. Technical implementations might leverage existing frameworks like SPIFFE for workload identity or extend OAuth 2.0 for agent-specific flows.The dual identity model also enables more sophisticated scenarios, like agent-to-agent delegation, where one agent authorizes another to perform specific tasks—each maintaining its own identity and accountability.Building Trustworthy Agent SystemsGetting agent identity right isn’t just a technical challenge—it’s fundamental to building AI systems that organizations can trust at scale. As agents become more autonomous, we need identity frameworks that provide clear attribution, appropriate authorization, and robust governance.The community is still working through delegation mechanisms, revocation strategies, and authentication protocols for agent interactions. But one thing is clear—the simple days of “just use the user’s token” are behind us. The future of trustworthy AI depends on solving these identity challenges with security and accountability as primary design principles.",
      "views": 115,
      "reading_minutes": 3,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "Agents",
            "slug": "agents",
            "url": "/tags/agents/#posts"
          },
        
          
          {
            "name": "Identity",
            "slug": "identity",
            "url": "/tags/identity/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "Trust",
            "slug": "trust",
            "url": "/tags/trust/#posts"
          },
        
          
          {
            "name": "Governance",
            "slug": "governance",
            "url": "/tags/governance/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Securing AI Assistants: Why Your Favorite Apps Need Digital IDs for Their AI",
      "url": "/2025/07/01/securing-ai-assistants-digital-ids-for-ai/",
      "date_display": "July 1, 2025",
      "date_iso": "2025-07-01",
      "excerpt": "As AI assistants on platforms like Instagram, Facebook, and Booking.com become more autonomous, they need proper digital identities to securely act on our behalf. Learn how AI identity systems work and why they matter for consumer platforms.",
      "content": "When AI Acts on Your BehalfImagine you’re using Booking.com’s AI assistant to plan your vacation. It searches for flights, suggests hotels, and even makes reservations for you. But how does the payment system know this AI assistant is actually authorized to use your credit card? How does the hotel booking system know it’s acting on your behalf?This isn’t just a hypothetical scenario. Today, AI assistants on platforms like Instagram, Facebook, and Booking.com are becoming more autonomous, taking actions for us rather than just answering questions. This shift creates a new challenge: how do we securely identify AI agents and verify they’re authorized to act on our behalf?The Identity Problem for AI AgentsTraditional apps use simple API keys or service accounts for machine-to-machine communication. But AI agents are different for three key reasons:  They’re autonomous - They make decisions on their own based on your instructions  They’re personal - Your Instagram AI assistant acts differently than someone else’s  They’re delegated - They act on your behalf with your permissionsWhen Facebook’s AI assistant posts a comment for you or Booking.com’s AI makes a reservation, these platforms need to know:  Which specific AI instance is making the request  Who authorized it to act  What specific permissions it has  Whether it’s behaving as expectedWithout proper identity systems, these platforms risk unauthorized actions, inability to track which AI did what, and security vulnerabilities.How AI Identity Works: A Simple FlowHere’s how AI identity works when you use an AI assistant on a platform like Booking.com:sequenceDiagram    participant User as You    participant Platform as App Platform    participant Auth as Identity System    participant Agent as AI Assistant    participant Service as App Services        User-&gt;&gt;Platform: \"Book me a hotel in Paris\"    Platform-&gt;&gt;Auth: Register AI with your permissions    Auth-&gt;&gt;Auth: Create digital ID for this AI    Auth--&gt;&gt;Platform: Confirm AI registration        Platform-&gt;&gt;Agent: Start AI with your task    Agent-&gt;&gt;Platform: Request identity    Platform-&gt;&gt;Auth: Get identity for this AI    Auth--&gt;&gt;Agent: Provide digital ID        Agent-&gt;&gt;Service: Book hotel (with digital ID)    Service-&gt;&gt;Service: Verify AI's identity &amp; permissions    Service--&gt;&gt;Agent: Confirm booking    Agent--&gt;&gt;User: \"Your hotel is booked!\"This process happens behind the scenes, but it ensures that AI agents can only do what they’re specifically authorized to do.The Big Picture: AI Identity SystemThe diagram below shows how an AI identity system connects you, your AI assistants, and the services they use:graph TB    subgraph \"AI Identity System\"        User[\"You\"]        Platform[\"App Platform\"]        Auth[\"Identity System\"]                subgraph \"AI Assistants\"            Agent1[\"Your Booking Assistant\"]            Agent2[\"Your Social Media Assistant\"]        end                subgraph \"App Services\"            Service1[\"Hotel Booking\"]            Service2[\"Payment System\"]            Service3[\"Post Creation\"]        end            %% Main connections        User --&gt;|\"Give permission\"| Platform        Platform --&gt;|\"Register AI\"| Auth        Auth --&gt;|\"Issue digital ID\"| Agent1        Auth --&gt;|\"Issue digital ID\"| Agent2                %% Service connections        Agent1 --&gt;|\"Book hotel with ID\"| Service1        Agent1 --&gt;|\"Pay with ID\"| Service2        Agent2 --&gt;|\"Post with ID\"| Service3                %% Verification        Service1 --&gt;|\"Verify ID\"| Auth        Service2 --&gt;|\"Verify ID\"| Auth        Service3 --&gt;|\"Verify ID\"| Auth    endWhy Consumer Platforms Should CareFor platforms like Booking.com, Facebook, and Instagram, implementing proper AI identity has several benefits:For Users:  Peace of mind that AI assistants can’t exceed their permissions  Clear audit trails of what actions AI took on their behalf  Ability to revoke AI access instantly if neededFor Platforms:  Reduced security risks from compromised AI systems  Better compliance with privacy regulations  Ability to track and attribute all AI actions  Improved trust from users who know AI actions are controlledReal-World ApplicationsHere’s how this might look in practice:Booking.com: When you authorize the AI assistant to book trips under $500, it receives a digital identity certificate with these specific constraints. If it tries to book a $600 hotel, the booking system automatically rejects the request because it’s outside the authorized limit.Instagram: Your AI assistant gets a unique identity that allows it to post content with specific hashtags you’ve approved. The platform can track exactly which AI posted what content, maintaining accountability.Facebook: When the AI responds to comments on your business page, it uses its digital identity to prove it’s authorized to speak on your behalf, and Facebook’s systems can verify this authorization in real-time.The Path ForwardAs AI assistants become more integrated into our favorite apps and platforms, proper identity systems will be essential. Frameworks like SPIFFE (Secure Production Identity Framework for Everyone) provide the foundation, but platforms need to adapt them for consumer AI use cases.For users, this mostly happens behind the scenes, but the result is more trustworthy AI assistants that can safely act on our behalf without overstepping boundaries.The next time you ask an AI assistant to book a flight or post content for you, remember that its digital identity is what ensures it can only do what you’ve authorized—nothing more, nothing less.References:[1] SPIFFE - Secure Production Identity Framework for Everyone.[2] Olden, E. (2025). “Why Agentic Identities Matter for Accountability and Trust.” Strata.io Blog.",
      "views": 187,
      "reading_minutes": 4,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "Identity",
            "slug": "identity",
            "url": "/tags/identity/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "Consumer Platforms",
            "slug": "consumer-platforms",
            "url": "/tags/consumer-platforms/#posts"
          },
        
          
          {
            "name": "SPIFFE",
            "slug": "spiffe",
            "url": "/tags/spiffe/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "From Gateway to Guardian: The Evolution of MCP Security",
      "url": "/2025/06/21/from-gateway-to-guardian-the-evolution-of-mcp-security/",
      "date_display": "June 21, 2025",
      "date_iso": "2025-06-21",
      "excerpt": "While AWS's MCP Gateway solves operational challenges, production AI systems demand evolution from basic centralization to identity-aware security guardians that address the \"lethal trifecta\" of vulnerabilities in enterprise deployments.",
      "content": "The Model Context Protocol (MCP) has rapidly evolved from experimental tool integration to enterprise-critical infrastructure. While AWS’s recent blog highlighted the operational benefits of centralized MCP gateways [1], the security landscape reveals a more complex reality: operational efficiency alone isn’t enough for production AI systems.The Centralization WinAWS’s MCP Gateway &amp; Registry solution elegantly addresses the “wild west of AI tool integration” [1]. As Amit Arora described:  “Managing a growing collection of disparate MCP servers feels like herding cats. It slows down development, increases the chance of errors, and makes scaling a headache.” [1]The gateway architecture provides immediate operational benefits:  Unified Discovery: Single catalog of all MCP servers and tools  Simplified Configuration: Predictable paths like gateway.mycorp.com/weather  Centralized Management: Real-time health monitoring and control  Standardized Access: Consistent authentication and logginggraph TD    A[AI Agent] --&gt; B[MCP Gateway]    B --&gt; C[Weather Server]    B --&gt; D[Database Server]    B --&gt; E[Email Server]    B --&gt; F[File Server]        G[Web UI] --&gt; B    H[Health Monitor] --&gt; B        style B fill:#e1f5fe    style A fill:#f3e5f5Figure 1: Basic MCP Gateway Architecture - Centralized but not security-focusedThe Security Reality CheckHowever, centralization without security creates new vulnerabilities. As Subramanya N from Agentic Trust warns, we’re operating in “the wild west of early computing, with computer viruses (now = malicious prompts hiding in web data/tools), and not well developed defenses” [2].The core issue is Simon Willison’s “lethal trifecta” [2]:  Private Data Access: AI agents need extensive organizational data access  Untrusted Content Exposure: Agents process external content as instructions  External Communication: Agents can send data outside the organizationgraph LR    A[Private Data&lt;br/&gt;Access] --&gt; D[Lethal&lt;br/&gt;Trifecta]    B[Untrusted Content&lt;br/&gt;Exposure] --&gt; D    C[External&lt;br/&gt;Communication] --&gt; D        D --&gt; E[Security&lt;br/&gt;Vulnerability]        style D fill:#ffcdd2    style E fill:#f44336,color:#fffFigure 2: The Lethal Trifecta - When combined, these create unprecedented attack surfacesMCP’s modular architecture inadvertently amplifies these risks by encouraging specialized servers that collectively provide all three dangerous capabilities.Beyond “Glorified API Calls”Enterprise MCP deployment involves complexity invisible in simple demos. As Subramanya N explains:  “In a real enterprise scenario, a lot more is happening behind the scenes” [3]Enterprise requirements include:  Identity Management: Who is the AI agent acting for?  Dynamic Authorization: Different tools for different users  Audit Compliance: Complete request tracking  Version Control: Managing MCP server changes  Fault Tolerance: Circuit breaking and failoverThe Guardian ArchitectureThe solution is evolving from operational gateway to security guardian through identity-aware architecture:graph TD    A[User] --&gt; B[AI Agent]    B --&gt; C[Identity Provider&lt;br/&gt;OIDC]    B --&gt; D[API Gateway/Proxy&lt;br/&gt;Guardian]        C --&gt; D    D --&gt; E[MCP Server 1]    D --&gt; F[MCP Server 2]    D --&gt; G[MCP Server 3]        H[Policy Engine] --&gt; D    I[Audit Logger] --&gt; D    J[Monitor] --&gt; D        style D fill:#c8e6c9    style C fill:#fff3e0    style H fill:#e8f5e8Figure 3: Guardian Architecture - Identity-aware security controlsKey Guardian CapabilitiesIdentity-Aware Access Control  OIDC integration for authentication  Dynamic tool provisioning per user  Context-aware authorization decisionsProduction Security Features  MCP version tracking and change management  Real-time threat detection  Automated incident responseEnterprise Compliance  Comprehensive audit trails  Regulatory compliance support  Risk assessment and reportingAttack Flow ComparisonBefore: Vulnerable GatewaysequenceDiagram    participant A as Attacker    participant W as Web Content    participant AI as AI Agent    participant G as Basic Gateway    participant D as Database        A-&gt;&gt;W: Embed malicious prompt    AI-&gt;&gt;W: Process content    W-&gt;&gt;AI: \"Extract all customer data\"    AI-&gt;&gt;G: Request customer data    G-&gt;&gt;D: Forward request    D-&gt;&gt;G: Return sensitive data    G-&gt;&gt;AI: Forward data    AI-&gt;&gt;A: Exfiltrate data via emailAfter: Guardian ProtectionsequenceDiagram    participant A as Attacker    participant W as Web Content    participant AI as AI Agent    participant G as Guardian Gateway    participant P as Policy Engine    participant D as Database        A-&gt;&gt;W: Embed malicious prompt    AI-&gt;&gt;W: Process content    W-&gt;&gt;AI: \"Extract all customer data\"    AI-&gt;&gt;G: Request customer data    G-&gt;&gt;P: Check authorization    P-&gt;&gt;G: Deny - suspicious pattern    G-&gt;&gt;AI: Access denied    Note over G: Alert security teamFigure 4: Attack Flow Comparison - Guardian architecture prevents exploitationImplementation StrategyPhase 1: Identity Foundation  Integrate OIDC identity provider  Implement token management  Establish basic authenticationPhase 2: Authorization Engine  Deploy policy-as-code framework  Implement role-based access control  Add dynamic tool provisioningPhase 3: Security Monitoring  Deploy comprehensive logging  Implement anomaly detection  Add automated response capabilitiesPhase 4: Advanced Protection  Content analysis for prompt injection  Dynamic risk assessment  Incident response automationProduction Challenges AddressedThe guardian architecture specifically addresses critical production issues:            Challenge      Guardian Solution                  Remote MCP changes affecting agents      Version tracking and change management              No dynamic tool provisioning      Identity-aware tool catalogs              Limited audit capabilities      Comprehensive request logging              No threat detection      Real-time security monitoring              Manual incident response      Automated threat mitigation      The Path ForwardThe evolution from gateway to guardian isn’t optional—it’s essential for production AI systems. Organizations must:  Start with Identity: Implement OIDC-based authentication  Add Authorization: Deploy dynamic policy engines  Enable Monitoring: Implement comprehensive observability  Automate Response: Deploy threat detection and mitigationAs AI agents become more autonomous and handle more sensitive data, robust security architecture becomes critical. The guardian approach provides a scalable foundation for managing evolving security challenges while preserving operational benefits.The transformation represents the natural maturation of enterprise AI infrastructure. Organizations that embrace this evolution early will be better positioned to realize AI’s full potential while managing associated risks.References[1] Arora, A. (2025, May 30). How the MCP Gateway Centralizes Your AI Model’s Tools. AWS Community.[2] N, S. (2025, June 16). The MCP Security Crisis: Understanding the ‘Wild West’ of AI Agent Infrastructure. Agentic Trust Blog.[3] N, S. (2025, May 21). Securing MCP with OIDC &amp; OIDC-A: Identity-Aware API Gateways Beyond “Glorified API Calls”. Subramanya N.",
      "views": 354,
      "reading_minutes": 4,
      "tags": [
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "API Gateway",
            "slug": "api-gateway",
            "url": "/tags/api-gateway/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          },
        
          
          {
            "name": "Architecture",
            "slug": "architecture",
            "url": "/tags/architecture/#posts"
          },
        
          
          {
            "name": "Evolution",
            "slug": "evolution",
            "url": "/tags/evolution/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Securing MCP with OIDC & OIDC-A: Identity-Aware API Gateways Beyond \"Glorified API Calls\"",
      "url": "/2025/05/21/securing-mcp-with-oidc-and-oidc-a-identity-aware-gateway/",
      "date_display": "May 21, 2025",
      "date_iso": "2025-05-21",
      "excerpt": "Integrating OpenID Connect (OIDC) and the new OIDC-A agent extension with an identity-aware API gateway to securely authenticate users, LLM agents, and MCP tools—going far beyond basic API proxying.",
      "content": "AI agents are quickly moving from research demos to real enterprise applications, connecting large language models (LLMs) with company data and services. A common approach is using tools or plugins to let an LLM fetch context or take actions – but some dismiss these as just “glorified API calls.” In reality, securely integrating AI with business systems is far more complex. This is where the Model Context Protocol (MCP) comes in, and why a robust proxy architecture with OpenID Connect (OIDC) identity is crucial for enterprise-scale deployments. If you are comparing agent customization primitives, start with my broader guide to Claude Skills vs MCP.graph TB    User[User] --&gt; |interacts with| AIAgent[AI Agent]    AIAgent --&gt; |MCP requests| Proxy[API Gateway/Proxy]    Proxy --&gt; |authenticates via| OIDC[Identity Provider/OIDC]    Proxy --&gt; |routes to| Tools[MCP Tools/Servers]    Tools --&gt; |access| Backend[Backend Systems]        subgraph \"Security Perimeter\"        Proxy        OIDC    end        classDef security fill:#f96,stroke:#333,stroke-width:2px;    class Proxy,OIDC security;The diagram above illustrates the high-level architecture of a secure MCP implementation. At its core, this architecture places an API Gateway/Proxy as the central security control point between AI agents and MCP tools. The proxy works in conjunction with an Identity Provider supporting OIDC to create a security perimeter that enforces authentication, authorization, and access controls. This ensures that all MCP requests from AI agents are properly authenticated and authorized before reaching the actual MCP tools, which in turn access various backend systems.MCP is an open standard (originally introduced by Anthropic) that provides a consistent way for AI assistants to interact with external data sources and tools. Instead of bespoke integrations for each system, MCP acts like a universal connector, allowing AI models to retrieve context or execute tasks via a standardized JSON-RPC interface. Importantly, MCP was built with security in mind – nothing is exposed to the AI by default, and it only gains access to what you explicitly allow. In practice, however, ensuring that “allow list” principle across many tools and users requires careful infrastructure. A production-grade API gateway (proxy) can serve as the gatekeeper between AI agents (MCP clients) and the tools or data sources (MCP servers), enforcing authentication, authorization, and routing rules.Before diving into the solution, a quick note on Envoy: there are active proposals to use Envoy Proxy as a reference implementation of an MCP gateway. Envoy’s rich L7 routing and extensibility make it a strong candidate, and it may soon include first-class MCP support. That said, the pattern we discuss here is proxy-agnostic – any modern HTTP reverse proxy or API gateway (Envoy, NGINX, HAProxy, Kong, etc.) that offers similar capabilities can be used. The goal is to outline a secure architecture for MCP, rather than the specifics of Envoy configuration.Beyond “Glorified API Calls”: The Need for Secure MCP IntegrationAt first glance, using an AI tool via MCP might seem as simple as calling a web API. In a basic demo, an LLM agent could hit a REST endpoint, get some JSON, and that’s that. But in a real enterprise scenario, a lot more is happening behind the scenes:graph LR    subgraph \"Simple API Call\"        A[Client] --&gt;|Request| B[API]        B --&gt;|Response| A    end        subgraph \"Enterprise MCP Reality\"        C[User] --&gt;|Interacts| D[AI Agent]        D --&gt;|MCP Request with Identity| E[API Gateway]        E --&gt;|Validate Token| F[Identity Provider]        E --&gt;|Route Request| G[Tool Registry]        E --&gt;|Authorized Request| H[MCP Tool]        H --&gt;|Query with User Context| I[Backend System]        I --&gt;|Data| H        H --&gt;|Response| E        E --&gt;|Filtered Response| D        D --&gt;|Result| C                J[Security Monitoring] -.-&gt;|Audit| E    end        classDef security fill:#f96,stroke:#333,stroke-width:2px;    class E,F,G,J security;This diagram contrasts a simple API call with the complex reality of enterprise MCP implementations. In the simple case, a client makes a direct request to an API and receives a response. However, in the enterprise MCP reality, the flow is much more complex:  A user interacts with an AI agent  The agent makes an MCP request that includes the user’s identity token  The API Gateway validates this token with an Identity Provider  The Gateway consults a Tool Registry to determine routing  If authorized, the request is forwarded to the appropriate MCP tool  The tool queries backend systems using the user’s context  Data flows back through the tool to the gateway  The gateway may filter the response based on security policies  The filtered response reaches the AI agent  The agent presents the result to the userThroughout this process, security monitoring systems audit the interactions at the gateway level. This comprehensive flow ensures that user identity, permissions, and security policies are enforced at every step, far beyond what a simple API call would entail.  User Identity and Access Control: In an interactive AI application (like a chat assistant that can query internal systems), each request originates from a user with specific permissions. The system must ensure the AI only accesses data or performs actions that the current user is allowed to. Unlike a typical API call where a user directly authenticates to the service, here the AI agent is calling on the user’s behalf. Without a proper identity propagation mechanism, you risk turning a simple tool call into a serious data leak or privilege violation.  Multi-Step Context Exchanges: MCP supports stateful sessions and streaming interactions. An AI agent might carry on a multi-turn conversation, calling several tools in sequence and synthesizing their outputs. This is far beyond a one-off API call. The longer this chain goes, the higher the chance of things like context poisoning – where erroneous or malicious data from one step influences subsequent steps. We need safeguards so that a malicious response from one tool cannot trick the model into doing something dangerous in the next step.  Complex Delegation Chains: Related to the above, consider when tools call other tools. For example, an AI might use a “file search” tool which itself queries a database or calls another API. This delegation chain should carry forward the original user’s permissions and context without over-privileging any step. Each hop needs consistent enforcement of “who is allowed to do what,” or else an intermediate service might execute an action the user didn’t intend. Managing these delegated authorizations is non-trivial.  Dynamic Tool Provisioning: In agile environments, new tools (MCP servers) will be added frequently – think of spinning up a new microservice and immediately making it available to AI agents, or letting third-party plugins be installed. This dynamism is great for flexibility but a headache for security. How do you ensure every new tool meets your security standards? How do you prevent an unvetted or even malicious tool from being introduced? A free-for-all approach can quickly lead to chaos or breach. Clearly defined onboarding, registration, and policy enforcement for tools is needed from day one.In short, an enterprise must treat AI tool integrations with the same rigor as any production service integration – if not more. A proper gateway layer helps address these concerns by acting as a central control point. Instead of hard-coding trust into each AI agent or tool, the proxy imposes organization-wide security policies. This approach moves us beyond the “just call an API” mindset to a structured model where every MCP call is authenticated, authorized, monitored, and audited.Key Security Challenges in MCP WorkflowsLet’s examine a few specific security challenges that arise when deploying MCP at scale, and why they matter:graph TD    A[Context Poisoning] --&gt; |mitigated by| B[Content Filtering]    A --&gt; |mitigated by| C[Tool Verification]        D[Identity Propagation] --&gt; |solved with| E[Token-based Auth]    D --&gt; |solved with| F[Delegation Chains]        G[Dynamic Tool Provisioning] --&gt; |managed by| H[Tool Registry]    G --&gt; |managed by| I[Approval Workflows]    G --&gt; |managed by| J[Version Tracking]        K[Remote MCP Changes] --&gt; |controlled by| L[Proxy Governance]        subgraph \"Proxy Security Controls\"        B        C        E        F        H        I        J        L    end        classDef challenge fill:#f66,stroke:#333,stroke-width:2px;    classDef solution fill:#6f6,stroke:#333,stroke-width:2px;        class A,D,G,K challenge;    class B,C,E,F,H,I,J,L solution;This diagram maps the key security challenges in MCP workflows (shown in red) to their corresponding solutions (shown in green) that can be implemented within the proxy security controls. The diagram illustrates how:  Context poisoning is mitigated through content filtering and tool verification  Identity propagation challenges are solved with token-based authentication and proper delegation chains  Dynamic tool provisioning risks are managed through a tool registry, approval workflows, and version tracking  Remote MCP changes are controlled through proxy governanceBy implementing these controls within the proxy layer, organizations can address these security challenges in a centralized, consistent manner rather than trying to solve them individually for each tool or agent.  Context Poisoning: Because MCP enables feeding external data into the model’s context, there’s a risk that data could be deliberately crafted to mislead or exploit the model. This could be a form of prompt injection – e.g. a document retrieved via a tool might contain instructions that hijack the model’s behavior. A malicious actor might also try to register a tool that returns toxic content or false information. The architecture needs ways to validate and sanitize context coming from tools. Mitigations can include content filtering on responses, verifying data against expectations, or restricting which tools the model trusts for certain queries.  Delegation Chains and Identity Propagation: As mentioned, an AI agent often acts on behalf of a user. When it calls an MCP server, it should pass along who the user is (or at least what they’re allowed to do). If a tool then calls a backend API, that backend might also need credentials. This chain of delegation is tricky – you want to avoid the “sharing passwords” anti-pattern or hardcoding keys in the open. Instead, solutions involve tokens and OAuth flows: e.g. the user consents and an OAuth2/OIDC token is issued, the AI carries that token in MCP requests, and the MCP server can pass it through to the backend API (or exchange it). Managing these tokens and ensuring they’re used correctly (and not by someone else) is a core security task. The proxy should facilitate this by attaching and validating identity context at each step. It also enables RBAC policies – e.g. only allow certain tool methods if the user’s role is admin.sequenceDiagram    participant User    participant AIAgent as AI Agent    participant Proxy as API Gateway    participant IdP as Identity Provider    participant Tool as MCP Tool    participant Backend as Backend System        User-&gt;&gt;IdP: 1. Authenticate (username/password)    IdP-&gt;&gt;User: 2. Issue OIDC token    User-&gt;&gt;AIAgent: 3. Interact with AI (token attached)    AIAgent-&gt;&gt;Proxy: 4. MCP request with token    Proxy-&gt;&gt;IdP: 5. Validate token    IdP-&gt;&gt;Proxy: 6. Token valid, contains claims/scopes        alt Token Valid with Required Permissions        Proxy-&gt;&gt;Tool: 7. Forward request with user context        Tool-&gt;&gt;Backend: 8. Query with delegated auth        Backend-&gt;&gt;Tool: 9. Return data (filtered by user permissions)        Tool-&gt;&gt;Proxy: 10. Return result        Proxy-&gt;&gt;AIAgent: 11. Return authorized response        AIAgent-&gt;&gt;User: 12. Present result    else Token Invalid or Insufficient Permissions        Proxy-&gt;&gt;AIAgent: 7. Reject request (401/403)        AIAgent-&gt;&gt;User: 8. Report access denied    endThis sequence diagram illustrates the authentication and authorization flow in an MCP system using OIDC. The process begins with the user authenticating to an Identity Provider and receiving an OIDC token. This token is then attached to the user’s interactions with the AI agent. When the agent makes an MCP request, it includes this token, which the API Gateway validates with the Identity Provider.If the token is valid and contains the necessary permissions (claims/scopes), the request is forwarded to the appropriate MCP tool along with the user’s context. The tool can then query backend systems using delegated authentication, ensuring that the data returned is filtered according to the user’s permissions. The result flows back through the system to the user.If the token is invalid or lacks sufficient permissions, the request is rejected at the gateway level with an appropriate error code (401 Unauthorized or 403 Forbidden), and the AI agent reports this access denial to the user.This flow ensures that user identity and permissions are consistently enforced throughout the entire interaction chain, preventing unauthorized access to sensitive data or operations.  Dynamic Tool Provisioning: In an MCP ecosystem, tools can come and go. For example, an enterprise might quickly stand up a new MCP server for a specific dataset or integrate a third-party service via MCP. Without controls, an AI agent might immediately start invoking any new tool as soon as it appears. That’s risky – you might not want a newly added tool to be available to everyone by default, or it might need vetting. There’s also the configuration aspect: new tool endpoints should be discoverable by the AI, and the gateway needs to know how to route to them and what auth to require. A secure setup will likely involve a tool registry or discovery service that the proxy consults, and administrative approval for tools. The proxy can then automatically enforce the appropriate auth and routing for each new tool, rather than relying on each agent developer to update logic. This provides a governance layer for tool lifecycle.sequenceDiagram    participant Admin    participant Registry as Tool Registry    participant Proxy as API Gateway    participant Tool as New MCP Tool    participant AIAgent as AI Agent        Admin-&gt;&gt;Tool: 1. Develop new MCP tool    Admin-&gt;&gt;Registry: 2. Register tool (metadata, endpoints, auth requirements)    Registry-&gt;&gt;Registry: 3. Validate tool configuration    Registry-&gt;&gt;Proxy: 4. Update routing configuration        Note over Registry,Proxy: Tool is now registered but not yet approved        Admin-&gt;&gt;Registry: 5. Approve tool for specific user groups    Registry-&gt;&gt;Proxy: 6. Update access policies        Note over AIAgent,Proxy: Tool is now available to authorized users        AIAgent-&gt;&gt;Proxy: 7. Discover available tools    Proxy-&gt;&gt;AIAgent: 8. Return approved tools for user    AIAgent-&gt;&gt;Proxy: 9. Call new tool    Proxy-&gt;&gt;Tool: 10. Route request if authorizedThis sequence diagram illustrates the tool registration and approval workflow in a secure MCP environment. The process begins with an administrator developing a new MCP tool and registering it in the Tool Registry, providing metadata, endpoints, and authentication requirements. The registry validates the tool configuration and updates the routing configuration in the API Gateway.At this point, the tool is registered but not yet approved for use. The administrator must explicitly approve the tool for specific user groups, which triggers an update to the access policies in the API Gateway. Only then does the tool become available to authorized users.When an AI agent discovers available tools through the proxy, it only receives information about tools that have been approved for the current user. When the agent calls the new tool, the proxy routes the request to the tool only if the user is authorized to access it.This workflow ensures that new tools undergo proper vetting and approval before they can be used, and that access is restricted to authorized users only. It also centralizes the tool governance process, making it easier to manage the lifecycle of MCP tools in a secure manner.By recognizing these challenges, security engineers and architects can design defenses before problems occur. We next look at how an identity-aware proxy can provide those defenses in a clean, centralized way.The Identity-Aware Proxy Pattern for MCPA proven design in cloud architectures is to put a reverse proxy (often called an API gateway) in front of your services. MCP-based AI systems are no exception. By introducing an intelligent proxy between AI agents (clients) and the MCP servers (tools/backends), we create a controlled funnel through which all AI tool traffic passes. This proxy can operate at Layer 7 (application layer), meaning it understands HTTP and even JSON payloads, allowing fine-grained control. Below, we outline the key roles such a proxy plays in securing MCP:graph TB    subgraph \"Client Side\"        User[User]        AIAgent[AI Agent]        User --&gt;|interacts| AIAgent    end        subgraph \"Security Layer\"        Proxy[API Gateway/Proxy]        Auth[Authentication]        RBAC[Authorization/RBAC]        Registry[Tool Registry]        Audit[Audit Logging]                Proxy --&gt;|uses| Auth        Proxy --&gt;|enforces| RBAC        Proxy --&gt;|consults| Registry        Proxy --&gt;|generates| Audit    end        subgraph \"MCP Tools\"        Tool1[Document Search]        Tool2[Database Query]        Tool3[File Operations]        Tool4[External API]    end        subgraph \"Backend Systems\"        DB[(Databases)]        Storage[File Storage]        APIs[Internal APIs]        External[External Services]    end        AIAgent --&gt;|MCP requests| Proxy    Proxy --&gt;|routes to| Tool1    Proxy --&gt;|routes to| Tool2    Proxy --&gt;|routes to| Tool3    Proxy --&gt;|routes to| Tool4        Tool1 --&gt;|reads| DB    Tool1 --&gt;|reads| Storage    Tool2 --&gt;|queries| DB    Tool3 --&gt;|manages| Storage    Tool4 --&gt;|calls| APIs    Tool4 --&gt;|calls| External        classDef security fill:#f96,stroke:#333,stroke-width:2px;    class Proxy,Auth,RBAC,Registry,Audit security;This diagram provides a detailed view of the identity-aware proxy pattern for MCP. The architecture is divided into four main layers:  Client Side: Users interact with AI agents, which generate MCP requests.  Security Layer: The API Gateway/Proxy sits at the center of the security layer, working with authentication, authorization/RBAC, tool registry, and audit logging components to enforce security policies.  MCP Tools: Various tools like document search, database query, file operations, and external API access are available through the MCP interface.  Backend Systems: The actual data sources and services that the MCP tools interact with, including databases, file storage, internal APIs, and external services.All MCP requests from AI agents must pass through the proxy, which authenticates the requests, enforces RBAC policies, consults the tool registry to determine routing, and generates audit logs. The proxy then routes authorized requests to the appropriate MCP tools, which in turn interact with the backend systems.This centralized security architecture ensures consistent enforcement of security policies across all MCP interactions, regardless of which tools are being used or which backend systems are being accessed.Session-Aware Routing and Load BalancingUnlike a simple stateless API call, MCP sessions can be long-lived and involve streaming (Server-Sent Events for output, etc.). The proxy should ensure that all requests and responses belonging to a given session or conversation are handled consistently. This often means implementing session affinity – if multiple instances of an MCP server are running, the proxy will route a given session’s traffic to the same instance each time. This prevents issues where, say, tool A’s state (in-memory cache, context window, etc.) is lost because request 2 went to a different instance than request 1. Modern proxies can do session-aware load balancing using HTTP headers or routes (for example, mapping a session ID or client ID in the URL to a particular backend). Additionally, the proxy can handle SSE connections gracefully, so that streaming responses aren’t accidentally broken by network intermediaries. Should a session need to be resumed or handed off, the gateway can coordinate that (as proposed in upcoming Envoy features for MCP). In short, the proxy ensures reliability and consistency for MCP’s stateful interactions, which is crucial for user experience and for maintaining correct context.sequenceDiagram    participant User    participant AIAgent as AI Agent    participant Proxy as API Gateway    participant Instance1 as Tool Instance 1    participant Instance2 as Tool Instance 2        User-&gt;&gt;AIAgent: Start conversation    AIAgent-&gt;&gt;Proxy: MCP request 1 (session=abc123)        Note over Proxy: Session affinity routing        Proxy-&gt;&gt;Instance1: Route to instance 1    Instance1-&gt;&gt;Proxy: Response with state    Proxy-&gt;&gt;AIAgent: Return response        User-&gt;&gt;AIAgent: Continue conversation    AIAgent-&gt;&gt;Proxy: MCP request 2 (session=abc123)        Note over Proxy: Same session ID routes to same instance        Proxy-&gt;&gt;Instance1: Route to instance 1 (preserves state)    Instance1-&gt;&gt;Proxy: Response with updated state    Proxy-&gt;&gt;AIAgent: Return response        Note over User,Instance2: Without session affinity, request might go to instance 2 and lose stateThis sequence diagram illustrates how session affinity works in an MCP environment. When a user starts a conversation with an AI agent, the agent makes an MCP request to the API Gateway with a session identifier (in this case, “abc123”). The gateway uses this session ID to route the request to a specific tool instance (Instance 1).When the user continues the conversation, the agent makes another MCP request with the same session ID. Because the gateway implements session affinity, it routes this request to the same instance (Instance 1), which preserves the state from the previous interaction. This ensures a consistent and coherent experience for the user.Without session affinity, the second request might be routed to a different instance (Instance 2), which would not have the state information from the first request. This would result in a broken experience, as the tool would not have the context of the previous interaction.Session affinity is particularly important for MCP because many AI interactions are stateful and context-dependent. The proxy’s ability to maintain this session consistency is a key advantage over simpler API integration approaches.JWT and OIDC Integration for AuthenticationEvery request hitting the MCP gateway should carry a valid identity token – typically a JSON Web Token (JWT) issued by an Identity Provider via OIDC (OpenID Connect). By requiring JWTs, the proxy offloads authentication from the tools themselves and ensures that only authenticated, authorized calls make it through. In practice, this means the AI agent (or the user’s session with the agent) must obtain an OIDC token (for example, an ID token or access token) and attach it to each MCP request (often in an HTTP header like Authorization: Bearer &lt;token&gt;). The proxy verifies this token, checks signature and claims (issuer, audience, expiration, etc.), and rejects any request that isn’t properly authenticated. This way, your MCP servers never see an anonymous call – they trust the gateway to have vetted identity.sequenceDiagram    participant User    participant App as AI Application    participant IdP as Identity Provider    participant Proxy as API Gateway    participant Tool as MCP Tool        User-&gt;&gt;App: Access AI application    App-&gt;&gt;IdP: Redirect to login    User-&gt;&gt;IdP: Authenticate    IdP-&gt;&gt;App: Authorization code    App-&gt;&gt;IdP: Exchange code for tokens    IdP-&gt;&gt;App: ID token + access token        Note over App: Store tokens securely        User-&gt;&gt;App: Request using AI tool    App-&gt;&gt;Proxy: MCP request with access token        Proxy-&gt;&gt;Proxy: Validate token (signature, expiry, audience)    Proxy-&gt;&gt;Proxy: Extract user identity and permissions        alt Token Valid        Proxy-&gt;&gt;Tool: Forward request with user context        Tool-&gt;&gt;Proxy: Response        Proxy-&gt;&gt;App: Return response        App-&gt;&gt;User: Display result    else Token Invalid        Proxy-&gt;&gt;App: 401 Unauthorized        App-&gt;&gt;User: Session expired, please login again    end        Note over App,Proxy: Token refresh happens in background    App-&gt;&gt;IdP: Refresh token when needed    IdP-&gt;&gt;App: New access tokenThis sequence diagram illustrates the OIDC authentication flow in an MCP environment. The process begins when a user accesses the AI application, which redirects to the Identity Provider for authentication. After the user authenticates, the Identity Provider issues an authorization code, which the application exchanges for ID and access tokens.The application securely stores these tokens and uses the access token when making MCP requests through the AI agent. When the proxy receives a request, it validates the token by checking the signature, expiration, audience, and other claims. It also extracts the user’s identity and permissions from the token.If the token is valid, the proxy forwards the request to the appropriate MCP tool along with the user’s context. The tool processes the request and returns a response, which flows back through the proxy to the application and ultimately to the user.If the token is invalid (expired, tampered with, etc.), the proxy returns a 401 Unauthorized response, and the application prompts the user to log in again.In the background, the application can use a refresh token to obtain new access tokens when needed, without requiring the user to re-authenticate. This ensures a smooth user experience while maintaining security.This OIDC integration provides a robust authentication mechanism that is widely adopted in enterprise environments and integrates well with existing identity management systems.Introducing OIDC-A for Agent &amp; Tool IdentityWhile the discussion above focuses on authenticating the human user, a production-grade MCP deployment must also identify two additional actors:  The LLM agent that is orchestrating the workflow.  The MCP tool / resource that is being invoked on the backend.Our companion post “OpenID Connect for Agents (OIDC-A) 1.0 Proposal” (/2025/04/28/oidc-a-proposal/) extends OIDC Core 1.0 with a rich set of claims for agent identity, attestation, and delegation chains.  In practice this means:  When an AI agent starts a session it obtains an ID Token that contains the OIDC-A claims (agent_type, agent_model, agent_instance_id, delegator_sub, delegation_chain, etc.).  This token travels alongside the user’s access token in every MCP request.  MCP tools can likewise expose their own OIDC identity (or be issued a signed resource token) that advertises metadata such as tool capabilities, version, and trust level (agent_capabilities, agent_trust_level, agent_attestation).  The gateway now validates up to three identities on every call – user → agent → tool – forming an explicit delegation chain that can be evaluated against RBAC and compliance policies.Adopting OIDC-A brings several benefits:  End-to-end, cryptographically verifiable identity for everything that touches the request path.  Fine-grained authorisation based on agent or tool capabilities (e.g., allow only agents that advertise email:draft capability to invoke the Mail tool).  Built-in attestation (agent_attestation) enables the gateway to verify the integrity and provenance of both agents and tools before routing traffic to them.For the remainder of this article, whenever we refer to a “token” being validated by the gateway, assume this now encompasses the user’s token, the agent’s OIDC-A token, and (optionally) the tool/resource token – all evaluated in a single policy decision step.This pattern is already used widely in API security: “an API Gateway can securely and consistently implement authentication… without burdening the applications themselves.” In our context, the MCP proxy might integrate with your enterprise SSO (Azure AD, Okta, etc.) via OIDC to handle user login flows and token validation. Many gateways support OIDC natively, initiating redirects for user login if needed and then storing the resulting token in a cookie for session continuity. In a headless agent scenario (where the AI is calling tools server-to-server), the token might be provisioned out-of-band (e.g. the user logged into the AI app, so the app injects the token for the agent to use). Either way, the gateway enforces that no token = no access. It can also map token claims to roles or scopes to implement authorization (e.g., only users with an “HR_read” scope can use the “HR Database” tool). This aligns perfectly with MCP’s design goal of secure connections – combining MCP with OIDC and OIDC-A gives you an end-to-end authenticated channel for tool usage.sequenceDiagram    participant User    participant Agent as LLM Agent (OIDC-A)    participant Proxy as API Gateway    participant Tool as MCP Tool (OIDC-A)    participant Backend as Backend System    User-&gt;&gt;Agent: 1. Interact (chat, form, etc.)    Agent-&gt;&gt;Proxy: 2. MCP request\\nBearer user token + agent OIDC-A token    Proxy-&gt;&gt;Proxy: 3. Validate user token (OIDC) &amp; agent token (OIDC-A)    Proxy--&gt;&gt;Tool: 4. Forward request plus optional *resource token* for the tool    Tool-&gt;&gt;Backend: 5. Query/act using delegated auth    Backend--&gt;&gt;Tool: 6. Data / result    Tool--&gt;&gt;Proxy: 7. Response (may include attestation)    Proxy--&gt;&gt;Agent: 8. Authorized response    Agent--&gt;&gt;User: 9. Present resultTool Metadata Filtering and Policy EnforcementA powerful advantage of the proxy is that it can make routing decisions based not just on URLs, but on metadata within the requests. With MCP, requests and responses are in JSON-RPC format, which includes fields like the tool method name, parameters, and even tool annotations. An identity-aware proxy can be configured to inspect these details and apply policy rules. For example, you might configure rules such as:graph TD    subgraph \"MCP Request\"        Request[JSON-RPC Request]        Method[Tool Method]        Params[Parameters]        User[User Identity]    end        subgraph \"Policy Engine\"        Rules[Policy Rules]        RBAC[Role-Based Access]        Audit[Audit Logging]        Transform[Response Transformation]    end        Request --&gt; Method    Request --&gt; Params    Request --&gt; User        Method --&gt; Rules    Params --&gt; Rules    User --&gt; RBAC        Rules --&gt; Decision{Allow/Deny}    RBAC --&gt; Decision        Decision --&gt;|Allow| Forward[Forward to Tool]    Decision --&gt;|Deny| Reject[Reject Request]        Forward --&gt; Audit    Reject --&gt; Audit        Forward --&gt; Tool[MCP Tool]    Tool --&gt; Response[Tool Response]    Response --&gt; Transform    Transform --&gt; Filtered[Filtered Response]        classDef request fill:#bbf,stroke:#333,stroke-width:1px;    classDef policy fill:#fbf,stroke:#333,stroke-width:1px;    classDef action fill:#bfb,stroke:#333,stroke-width:1px;        class Request,Method,Params,User request;    class Rules,RBAC,Audit,Transform policy;    class Decision,Forward,Reject,Filtered action;This diagram illustrates how tool metadata filtering and policy enforcement work in an MCP proxy. The process begins with an MCP request in JSON-RPC format, which contains the tool method, parameters, and user identity information. These components are extracted and fed into the policy engine.The policy engine consists of policy rules, role-based access control (RBAC), audit logging, and response transformation components. The tool method and parameters are evaluated against the policy rules, while the user identity is checked against RBAC permissions.Based on these evaluations, the policy engine makes an allow/deny decision. If the request is allowed, it is forwarded to the MCP tool; if denied, it is rejected. In either case, the action is logged for audit purposes.When a request is allowed and processed by the tool, the response may pass through a transformation step before being returned to the client. This transformation can filter or modify the response based on security policies, such as removing sensitive information that the user shouldn’t see.This fine-grained policy enforcement at the metadata level allows for sophisticated security controls that go far beyond simple URL-based routing. For example:  “If the tool call is delete_file and the user is not in the IT Admin group, deny the request.”  “Only allow the execute_sql tool on weekdays between 9am-5pm, and log all queries.”  “If a tool is marked as containing sensitive data, ensure the response is sanitized or encrypted.”This is analogous to a web application firewall (WAF) or an API gateway performing content filtering, but tailored to AI tool usage. In the Envoy MCP proposal, this corresponds to parsing MCP messages and using RBAC filters on them. The proxy essentially understands the intent of each tool call and can gate it appropriately. It also can redact or transform data if needed – for instance, stripping out certain fields from a response that the user shouldn’t see, or masking personally identifiable information. By centralizing this in the gateway, you avoid having to implement checks in each tool service (which could be inconsistent or forgotten). Auditing is another benefit: the proxy can log every tool invocation along with user identity and parameters, feeding into SIEM systems for monitoring. That way, if an AI one day does something it shouldn’t, you have a clear trail of which tool call was involved and who prompted it. In sum, metadata-based filtering turns the proxy into a smart policy enforcement point, adding a safety layer on top of MCP’s basic capabilities.Version-Aware and Context-Aware RoutingEnterprises constantly evolve their services – new versions, A/B tests, staging vs. production deployments, etc. The proxy can greatly simplify how AI agents handle these changes. Instead of the AI needing to know which version of a tool to call, the gateway can implement version-aware routing. For instance, the MCP endpoint for a “Document Search” tool could remain the same for the agent, but the proxy might route 90% of requests to v1 of the service and 10% to a new v2 (for a canary rollout). Or route internal users to a “beta” instance while external users go to stable. This is done by matching on request attributes or using routing rules that include user audience and tool identifiers.graph TB    AIAgent[AI Agent] --&gt;|MCP Request| Proxy[API Gateway]        Proxy --&gt;|\"90% traffic\"| V1[Tool v1]    Proxy --&gt;|\"10% traffic\"| V2[Tool v2 - Canary]        Proxy --&gt;|\"Internal Users\"| Beta[Beta Version]    Proxy --&gt;|\"External Users\"| Stable[Stable Version]        Proxy --&gt;|\"Small Requests\"| Standard[Standard Instance]    Proxy --&gt;|\"Large Requests\"| HighMem[High-Memory Instance]        Proxy --&gt;|\"US Users\"| US[US Region]    Proxy --&gt;|\"EU Users\"| EU[EU Region]        classDef proxy fill:#f96,stroke:#333,stroke-width:2px;    classDef version fill:#bbf,stroke:#333,stroke-width:1px;    classDef audience fill:#bfb,stroke:#333,stroke-width:1px;    classDef size fill:#fbf,stroke:#333,stroke-width:1px;    classDef region fill:#ff9,stroke:#333,stroke-width:1px;        class Proxy proxy;    class V1,V2 version;    class Beta,Stable audience;    class Standard,HighMem size;    class US,EU region;This diagram illustrates the various routing strategies that an API Gateway can implement for MCP requests. The gateway can route traffic based on multiple factors:      Version-based routing: The gateway can split traffic between different versions of a tool, such as sending 90% to v1 and 10% to a canary deployment of v2. This allows for gradual rollouts and A/B testing without requiring changes to the AI agents.        Audience-based routing: Internal users can be directed to beta versions of tools, while external users are routed to stable versions. This allows for internal testing and validation before wider release.        Request size-based routing: Small requests can be handled by standard instances, while large requests that require more resources are directed to high-memory instances. This optimizes resource utilization and ensures that demanding requests don’t impact the performance of standard operations.        Geographic routing: Users from different regions can be directed to region-specific instances, reducing latency and potentially addressing data residency requirements.  The AI agent doesn’t need to be aware of these routing decisions; it simply makes requests to the logical tool name, and the gateway handles the complexity of routing to the appropriate backend. This abstraction simplifies the agent’s implementation while providing powerful operational capabilities.Similarly, routing can consider context – e.g., direct requests to the nearest regional server for lower latency if the user’s location is known, or choose a different backend depending on the size of the request (perhaps a special high-memory instance for very large files). All of this is configurable at the proxy level. The AI agent simply calls the logical tool name, and the gateway takes care of finding the right backend. This not only eases operations (you can upgrade backend tools without breaking the AI’s interface), but also adds to security. You could isolate certain versions for testing, or ensure that experimental tools are only accessible under certain conditions. By controlling traffic flow, the proxy helps maintain a principle of least privilege on a macro scale – the AI only reaches the backends it’s supposed to, via routes that are appropriate for the current context.Implementing MCP Security with a Proxy: A Practical ApproachNow that we’ve covered the key security patterns, let’s look at a practical approach to implementing MCP security with an identity-aware proxy. This section outlines the steps to set up a secure MCP environment, focusing on the integration points between components.graph TB    subgraph ImplementationSteps[\"Implementation Steps\"]        Step1[1. Set up Identity Provider]        Step2[2. Configure API Gateway]        Step3[3. Implement Tool Registry]        Step4[4. Define Security Policies]        Step5[5. Integrate AI Agents]        Step6[6. Monitor and Audit]                Step1 --&gt; Step2        Step2 --&gt; Step3        Step3 --&gt; Step4        Step4 --&gt; Step5        Step5 --&gt; Step6    end        classDef step fill:#beb,stroke:#333,stroke-width:1px    class Step1,Step2,Step3,Step4,Step5,Step6 stepThis diagram outlines the six key steps in implementing MCP security with a proxy. The process follows a logical progression:  Set up Identity Provider: Establish the foundation for authentication and authorization.  Configure API Gateway: Set up the central security control point.  Implement Tool Registry: Create a system for managing MCP tools.  Define Security Policies: Establish the rules for access control and data protection.  Integrate AI Agents: Connect the AI agents to the secure MCP environment.  Monitor and Audit: Continuously track and review system activity.Each step builds on the previous ones, creating a comprehensive security implementation. The following sections will explore each step in detail.1. Setting Up the Identity ProviderThe first step is to configure your identity provider (IdP) to support the OIDC flows needed for MCP security. This typically involves:  Creating an OIDC application in your IdP (e.g., Azure AD, Okta, Auth0)  Configuring the appropriate scopes and claims  Setting up the redirect URIs for your AI application  Generating client credentials (client ID and secret)The IdP will be responsible for authenticating users and issuing the tokens that will be used to secure MCP requests. It’s important to configure the appropriate scopes and claims to ensure that the tokens contain the necessary information for authorization decisions.2. Configuring the API GatewayNext, you’ll need to configure your API gateway to act as the MCP proxy. This involves:sequenceDiagram    participant Admin    participant Gateway as API Gateway    participant IdP as Identity Provider        Admin-&gt;&gt;Gateway: 1. Configure OIDC integration    Gateway-&gt;&gt;IdP: 2. Fetch OIDC discovery document    IdP-&gt;&gt;Gateway: 3. Return endpoints and keys        Admin-&gt;&gt;Gateway: 4. Set up MCP routing rules    Admin-&gt;&gt;Gateway: 5. Configure security policies        Note over Gateway: Gateway ready to validate tokens and route MCP trafficThis sequence diagram illustrates the process of configuring an API Gateway for MCP security. The process begins with an administrator configuring the OIDC integration in the gateway. The gateway then fetches the OIDC discovery document from the Identity Provider, which returns the necessary endpoints and keys for token validation.Next, the administrator sets up MCP routing rules, defining how requests should be directed to different MCP tools based on various criteria. The administrator also configures security policies, specifying who can access which tools and under what conditions.Once these configurations are complete, the gateway is ready to validate tokens and route MCP traffic according to the defined rules and policies. This setup process establishes the gateway as the central security control point for all MCP interactions.The configuration steps include:  Setting up the OIDC integration, including configuring the token validation parameters (issuer, audience, etc.)  Defining the routing rules for MCP requests  Configuring the security policies for tool access  Setting up the audit loggingThe gateway will be responsible for validating the tokens, enforcing the security policies, and routing the MCP requests to the appropriate backends. It’s important to ensure that the gateway is properly configured to handle the MCP JSON-RPC format and to extract the necessary information for policy decisions.3. Implementing the Tool RegistryA tool registry is essential for managing the lifecycle of MCP tools in your environment. This involves:  Creating a database or service to store tool metadata  Defining the registration process for new tools  Implementing the approval workflow for tool access  Integrating the registry with the API gatewayThe tool registry will be responsible for maintaining the list of available tools, their endpoints, and their access requirements. It will also provide the necessary information to the API gateway for routing and policy enforcement.graph TB    subgraph \"Tool Registry\"        DB[(Tool Database)]        API[Registry API]        UI[Admin UI]                UI --&gt;|Manage Tools| API        API --&gt;|CRUD Operations| DB    end        subgraph \"Integration Points\"        Gateway[API Gateway]        Agents[AI Agents]                API --&gt;|Tool Configurations| Gateway        API --&gt;|Available Tools| Agents    end        subgraph \"Tool Lifecycle\"        Register[Register]        Approve[Approve]        Deploy[Deploy]        Monitor[Monitor]        Retire[Retire]                Register --&gt; Approve        Approve --&gt; Deploy        Deploy --&gt; Monitor        Monitor --&gt; Retire    end        classDef registry fill:#bbf,stroke:#333,stroke-width:1px;    classDef integration fill:#fbf,stroke:#333,stroke-width:1px;    classDef lifecycle fill:#bfb,stroke:#333,stroke-width:1px;        class DB,API,UI registry;    class Gateway,Agents integration;    class Register,Approve,Deploy,Monitor,Retire lifecycle;This diagram illustrates the components and lifecycle of a Tool Registry in an MCP environment. The Tool Registry consists of three main components:  Tool Database: Stores metadata about all registered MCP tools, including their endpoints, versions, access requirements, and status.  Registry API: Provides programmatic access to the tool database, enabling CRUD operations on tool registrations.  Admin UI: Allows administrators to manage tools through a user interface, including registration, approval, and monitoring.The Tool Registry integrates with two key systems:  API Gateway: Receives tool configurations from the registry, which inform routing and policy decisions.  AI Agents: Discover available tools through the registry, based on user permissions and tool status.The diagram also shows the lifecycle of an MCP tool:  Register: A new tool is registered in the system with its metadata.  Approve: The tool undergoes review and is approved for use by specific user groups.  Deploy: The tool is made available in the production environment.  Monitor: The tool’s usage and performance are monitored.  Retire: When no longer needed, the tool is retired from the system.This comprehensive approach to tool management ensures that all MCP tools are properly vetted, deployed, and monitored throughout their lifecycle, reducing security risks and operational issues.4. Defining Security PoliciesSecurity policies are the rules that govern access to MCP tools. This involves:  Defining the RBAC policies for tool access  Configuring the content filtering rules for responses  Setting up the audit logging requirements  Implementing the version control policiesThe security policies will be enforced by the API gateway based on the user’s identity and the tool being accessed. It’s important to ensure that the policies are comprehensive and aligned with your organization’s security requirements.5. Integrating AI AgentsFinally, you’ll need to integrate your AI agents with the secure MCP environment. This involves:  Configuring the agents to obtain and use OIDC tokens  Implementing the MCP client functionality  Handling authentication and authorization errors  Managing token refresh and session continuityThe AI agents will be responsible for obtaining the necessary tokens and including them in MCP requests. They’ll also need to handle authentication and authorization errors gracefully, providing appropriate feedback to users.sequenceDiagram    participant User    participant Agent as AI Agent    participant App as Application    participant IdP as Identity Provider    participant Gateway as API Gateway    participant Tool as MCP Tool        User-&gt;&gt;App: Access AI application    App-&gt;&gt;IdP: Authenticate user    IdP-&gt;&gt;App: Issue tokens        User-&gt;&gt;Agent: Request using AI capabilities    Agent-&gt;&gt;App: Request token for MCP    App-&gt;&gt;Agent: Provide token        Agent-&gt;&gt;Gateway: MCP request with token    Gateway-&gt;&gt;Gateway: Validate token &amp; apply policies    Gateway-&gt;&gt;Tool: Forward authorized request    Tool-&gt;&gt;Gateway: Response    Gateway-&gt;&gt;Agent: Return response    Agent-&gt;&gt;User: Present result        Note over App,Gateway: Token refresh cycle    App-&gt;&gt;IdP: Refresh token when needed    IdP-&gt;&gt;App: New access tokenThis sequence diagram illustrates the integration of AI agents with a secure MCP environment. The process begins when a user accesses the AI application, which authenticates the user with the Identity Provider and receives tokens.When the user makes a request that requires AI capabilities, the AI agent requests a token from the application, which provides it. The agent then includes this token in its MCP request to the API Gateway.The gateway validates the token and applies security policies to determine if the request should be allowed. If authorized, the request is forwarded to the appropriate MCP tool, which processes it and returns a response. This response flows back through the gateway to the agent and ultimately to the user.In the background, the application handles token refresh cycles, requesting new access tokens from the Identity Provider when needed. This ensures continuous operation without requiring the user to re-authenticate frequently.This integration approach ensures that AI agents operate within the security framework established by the proxy architecture, with all requests properly authenticated and authorized.Conclusion: Beyond Glorified API CallsBy implementing a secure MCP architecture with an identity-aware proxy, you move far beyond “glorified API calls” to a robust, enterprise-grade integration between AI agents and your business systems. This approach addresses the key security challenges of MCP deployments, including:  User identity and access control  Multi-step context exchanges  Complex delegation chains  Dynamic tool provisioning  Remote MCP changes and version trackingThe proxy-based architecture provides a centralized control point for enforcing security policies, managing tool access, and monitoring AI agent activity. It also simplifies operations by abstracting away the complexity of backend services and providing a consistent interface for AI agents.As MCP continues to evolve and gain adoption, the security patterns described in this article will become increasingly important for enterprise deployments. By implementing these patterns now, you can ensure that your AI agent infrastructure is secure, scalable, and ready for the future.graph LR    A[Glorified API Calls] --&gt;|Evolution| B[Secure MCP Architecture]        subgraph \"Key Benefits\"        C[Centralized Security]        D[Identity Propagation]        E[Policy Enforcement]        F[Audit &amp; Compliance]        G[Operational Simplicity]    end        B --&gt; C    B --&gt; D    B --&gt; E    B --&gt; F    B --&gt; G        classDef benefit fill:#bfb,stroke:#333,stroke-width:1px;    class C,D,E,F,G benefit;This final diagram summarizes the evolution from “glorified API calls” to a secure MCP architecture, highlighting the key benefits of this approach:  Centralized Security: A single control point for enforcing security policies across all MCP interactions.  Identity Propagation: Consistent handling of user identity and permissions throughout the system.  Policy Enforcement: Fine-grained control over who can access which tools and under what conditions.  Audit &amp; Compliance: Comprehensive logging and monitoring of all MCP activities for security and compliance purposes.  Operational Simplicity: Abstraction of backend complexity, making it easier to manage and evolve the system over time.By adopting this architecture, organizations can confidently deploy AI agents in enterprise environments, knowing that their MCP interactions are secure, auditable, and manageable at scale. This represents a significant advancement beyond the simplistic view of AI tools as mere API calls, recognizing the complex security requirements of production AI systems.",
      "views": 1487,
      "reading_minutes": 36,
      "tags": [
        
          
          {
            "name": "OIDC",
            "slug": "oidc",
            "url": "/tags/oidc/#posts"
          },
        
          
          {
            "name": "API Gateway",
            "slug": "api-gateway",
            "url": "/tags/api-gateway/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "Authentication",
            "slug": "authentication",
            "url": "/tags/authentication/#posts"
          },
        
          
          {
            "name": "Authorization",
            "slug": "authorization",
            "url": "/tags/authorization/#posts"
          },
        
          
          {
            "name": "Cloud",
            "slug": "cloud",
            "url": "/tags/cloud/#posts"
          },
        
          
          {
            "name": "MCP",
            "slug": "mcp",
            "url": "/tags/mcp/#posts"
          },
        
          
          {
            "name": "Architecture",
            "slug": "architecture",
            "url": "/tags/architecture/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "OpenID Connect for Agents (OIDC-A) 1.0 Proposal",
      "url": "/2025/04/28/oidc-a-proposal/",
      "date_display": "April 28, 2025",
      "date_iso": "2025-04-28",
      "excerpt": "Technical proposal for extending OpenID Connect Core 1.0 to provide a framework for representing, authenticating, and authorizing LLM-based agents within the OAuth 2.0 ecosystem.",
      "content": "This document proposes a standard extension to OpenID Connect for representing and verifying the identity of LLM-based agents. It integrates the core proposal with detailed frameworks for verification, attestation, and delegation chains.AbstractOpenID Connect for Agents (OIDC-A) 1.0 is an extension to OpenID Connect Core 1.0 that provides a framework for representing, authenticating, and authorizing LLM-based agents within the OAuth 2.0 ecosystem. This specification defines standard claims, endpoints, and protocols for establishing agent identity, verifying agent attestation, representing delegation chains, and enabling fine-grained authorization based on agent attributes.1. Introduction1.1 RationaleAs LLM-based agents become increasingly prevalent in digital ecosystems, there is a growing need for standardized methods to represent their identity and manage their authorization. Traditional OAuth 2.0 and OpenID Connect protocols were designed primarily for human users and conventional applications, lacking the necessary constructs to represent the unique characteristics of autonomous agents, such as:  Acting on behalf of users with varying degrees of autonomy  Operating within delegation chains  Possessing dynamic capabilities based on their underlying models  Requiring attestation of their integrity and originThis specification addresses these gaps by extending OpenID Connect to provide a comprehensive framework for agent identity and authorization.1.2 TerminologyThis specification uses the terms defined in OAuth 2.0 [RFC6749], OpenID Connect Core 1.0, and the following additional terms:  Agent: An LLM-based software entity capable of autonomous or semi-autonomous action based on natural language instructions.  Agent Provider: The organization responsible for creating, training, and/or hosting the agent.  Agent Model: The specific LLM model that powers the agent (e.g., GPT-4, Claude 3).  Agent Instance: A specific running instance of an agent, typically associated with a particular task or conversation.  Delegator: The entity (typically a human user) who delegates authority to an agent to act on their behalf.  Delegation Chain: A sequence of delegation steps from the original user through potentially multiple agents.  Attestation: Cryptographic proof of an agent’s integrity, origin, and/or properties.  Attestation Evidence: Data structure containing the proof used for attestation.  Relying Party (RP): In this context, often a Resource Server or Client application that needs to verify an agent’s identity and authorization.1.3 OverviewOIDC-A extends OpenID Connect by:  Defining new standard claims for representing agent identity, delegation, and capabilities.  Specifying mechanisms and formats for agent attestation evidence.  Establishing protocols for representing and validating delegation chains.  Providing discovery mechanisms for agent capabilities and attestation support.  Defining authorization frameworks suitable for agent-specific use cases.  Introducing endpoints for attestation verification and capability discovery.2. Agent Identity Claims2.1 Core Agent Identity ClaimsThe following claims MUST or SHOULD be included in ID Tokens issued to or about agents:                        Claim            Type            Description            Requirement                                    agent_type            string            Identifies the type/class of agent (e.g., \"assistant\", \"retrieval\", \"coding\")            REQUIRED                            agent_model            string            Identifies the specific model (e.g., \"gpt-4\", \"claude-3-opus\", \"gemini-pro\")            REQUIRED                            agent_version            string            Version identifier of the agent model            RECOMMENDED                            agent_provider            string            Organization that provides/hosts the agent (e.g., \"openai.com\", \"anthropic.com\")            REQUIRED                            agent_instance_id            string            Unique identifier for this specific instance of the agent            REQUIRED            2.2 Delegation and Authority Claims                        Claim            Type            Description            Requirement                                    delegator_sub            string            Subject identifier of the entity who most recently delegated authority to this agent            REQUIRED                            delegation_chain            array            Ordered array of delegation steps (see Section 2.4.2)            OPTIONAL                            delegation_purpose            string            Description of the purpose/intent for which authority was delegated            RECOMMENDED                            delegation_constraints            object            Constraints placed on the agent by the delegator            OPTIONAL            2.3 Capability, Trust, and Attestation Claims                        Claim            Type            Description            Requirement                                    agent_capabilities            array            Array of capability identifiers representing what the agent can do            RECOMMENDED                            agent_trust_level            string            Trust classification of the agent (e.g., \"verified\", \"experimental\")            OPTIONAL                            agent_attestation            object            Attestation evidence or reference (see Section 2.4.4)            RECOMMENDED                            agent_context_id            string            Identifier for the conversation/task context            RECOMMENDED            2.4 Claim Formats and Validation2.4.1 agent_typeString value from a defined set of agent types. Implementers SHOULD use one of the following values when applicable:  assistant: General-purpose assistant agent  retrieval: Agent specialized in information retrieval  coding: Agent specialized in code generation or analysis  domain_specific: Agent specialized for a particular domain  autonomous: Agent with high degree of autonomy  supervised: Agent requiring human supervision for key actionsCustom types MAY be used but SHOULD follow the format vendor:type (e.g., acme:financial_advisor).2.4.2 delegation_chainJSON array containing objects representing each step in the delegation chain, from the original user to the current agent. Each object MUST contain:  iss: REQUIRED. String identifying the Authorization Server or entity that issued/validated this delegation step.  sub: REQUIRED. String identifying the delegator (the entity granting permission).  aud: REQUIRED. String identifying the delegatee (the agent receiving permission).  delegated_at: REQUIRED. NumericDate representing the time the delegation occurred.  scope: REQUIRED. Space-separated string of OAuth scopes representing the permissions granted in this delegation step. MUST be a subset of the scopes held by the delegator (sub).  purpose: OPTIONAL. String describing the intended purpose of this delegation step.  constraints: OPTIONAL. JSON object specifying constraints on the delegation (e.g., {\"max_duration\": 3600, \"allowed_resources\": [\"/data/abc\"]}).  jti: OPTIONAL. A unique identifier for this specific delegation step, useful for revocation or tracking.The array MUST be ordered chronologically.Validation Rules for delegation_chain (performed by Relying Party):  Order Verification: Confirm chronological order based on delegated_at.  Issuer Trust: Verify each iss is trusted.  Audience Matching: Confirm aud of step N matches sub of step N+1.  Scope Reduction: Verify scope in each step is a subset of/equal to the delegator’s available scopes.  Constraint Enforcement: Ensure compliance with any constraints.  Signature Validation (if applicable): Validate signatures if steps are individually signed.  Policy Check: Evaluate the validated chain against authorization policies (e.g., max length).2.4.3 agent_capabilitiesArray of string identifiers representing the agent’s capabilities. Implementers SHOULD use capability identifiers from a well-defined taxonomy when available. Custom capabilities SHOULD follow the format vendor:capability (e.g., acme:financial_analysis).2.4.4 agent_attestationJSON object containing attestation evidence or a reference to it. MUST include a format field indicating the type of evidence.Recommended Format: JWT-based, potentially compatible with IETF RATS Entity Attestation Token (EAT).Example:\"agent_attestation\": {  \"format\": \"urn:ietf:params:oauth:token-type:eat\",  \"token\": \"eyJhbGciOiJFUzI1NiIsInR5cCI6ImVhdCtqd3QifQ...\"}Other formats (e.g., \"format\": \"TPM2-Quote\", \"format\": \"SGX-Quote\") MAY be used.3. Protocol Flow3.1 Agent Authentication FlowThe OIDC-A authentication flow extends the standard OpenID Connect Authentication flow:  Client Registration: Clients representing agents MUST register additional metadata (see Section 4).  Authentication Request: Agents SHOULD include the agent scope and potentially delegation_context.  Authentication Response: The Authorization Server includes agent-specific claims in the ID Token.  Token Validation: RPs MUST validate standard OIDC claims and relevant agent-specific claims (including attestation and delegation if present) according to policy.3.2 Delegation FlowWhen an agent is delegated authority:  The delegator authenticates and authorizes the delegation.  The Authorization Server issues a new ID Token to the agent including delegator_sub, delegation_chain (updated), delegation_purpose, and constrained scope.3.3 Attestation Verification FlowTo verify an agent’s attestation:  The agent includes the agent_attestation claim in its ID Token or provides evidence separately.  The RP validates the evidence based on the specified format:          Verify cryptographic signatures using trusted keys (obtained via Discovery).      Compare platform measurements against known-good values.      Validate nonces to prevent replay attacks.      Optionally, use the agent_attestation_endpoint for validation assistance.        Authorization decisions incorporate the attestation status (e.g., verified: true/false).4. Client Registration and Discovery4.1 Agent Client Registration MetadataExtends OAuth 2.0 Dynamic Client Registration [RFC7591]:                        Parameter            Type            Description                                    agent_provider            string            Identifier of the agent provider                            agent_models_supported            array            List of supported agent models                            agent_capabilities            array            List of agent capabilities                            attestation_formats_supported            array            List of supported attestation formats                            delegation_methods_supported            array            List of supported delegation methods            4.2 Discovery MetadataExtends OpenID Connect Discovery 1.0:                        Parameter            Type            Description                                    agent_attestation_endpoint            string            URL of the attestation endpoint                            agent_capabilities_endpoint            string            URL of the capabilities discovery endpoint                            agent_claims_supported            array            List of supported agent claims                            agent_types_supported            array            List of supported agent types                            delegation_methods_supported            array            List of supported delegation methods                            attestation_formats_supported            array            List of supported attestation formats                            attestation_verification_keys_endpoint            string            URL to retrieve public keys for verifying attestation signatures            5. Endpoints5.1 Agent Attestation EndpointAn OAuth 2.0 protected resource that returns attestation information about an agent or assists in validating provided evidence. URL advertised via agent_attestation_endpoint discovery parameter.5.1.1 Request Example (Get Info)GET /agent/attestation?agent_id=123&amp;nonce=abcAuthorization: Bearer &lt;token&gt;5.1.2 Response Example{  \"verified\": true,  \"provider\": \"openai.com\",  \"model\": \"gpt-4\",  \"version\": \"2025-03\",  \"attestation_timestamp\": 1714348800,  \"attestation_signature\": \"...\"}5.2 Agent Capabilities EndpointProvides information about an agent’s capabilities. URL advertised via agent_capabilities_endpoint discovery parameter.5.2.1 Request ExampleGET /.well-known/agent-capabilities5.2.2 Response Example{  \"capabilities\": [    {\"id\": \"text_generation\", \"description\": \"...\"},    {\"id\": \"code_generation\", \"description\": \"...\"}  ],  \"supported_constraints\": [\"max_tokens\", \"allowed_tools\"]}6. Security Considerations6.1 Agent AuthenticationAgents SHOULD use strong, asymmetric methods (JWT Client Auth [RFC7523], mTLS [RFC8705]), potentially combined with attestation. Shared secrets are NOT RECOMMENDED.6.2 Delegation SecuritySystems MUST validate the entire delegation chain, enforce scope reduction, implement consent mechanisms, and consider time-bounding. Policies may limit chain length. Robust revocation mechanisms are needed.6.3 Attestation SecurityRequires secure management of signing keys, robust nonce handling, trustworthy known-good measurements, secure endpoints, and protection against replay attacks. Attestation evidence may have privacy implications.6.4 Token SecurityID Tokens with agent claims SHOULD be encrypted. Access tokens SHOULD have limited lifetimes. Refresh tokens for agents require careful consideration.7. Privacy ConsiderationsImplementations MUST consider potential correlation of agent identity, privacy implications of delegation chains, user consent requirements, and data minimization in claims.8. Compatibility and VersioningOIDC-A 1.0 is designed for compatibility with OAuth 2.0 [RFC6749], OIDC Core 1.0, JWT [RFC7519], and related RFCs. Future versions will aim for backward compatibility.9. References  [RFC6749] The OAuth 2.0 Authorization Framework  [RFC7519] JSON Web Token (JWT)  [RFC7523] JWT Profile for OAuth 2.0 Client Authentication  [RFC7591] OAuth 2.0 Dynamic Client Registration  [RFC7662] OAuth 2.0 Token Introspection  [RFC8705] OAuth 2.0 Mutual-TLS Client Authentication  [OpenID Connect Core 1.0]  [OpenID Connect Discovery 1.0]  [IETF RATS] Remote Attestation Procedures ArchitectureAppendix A: Example ID Token with Agent Claims{  \"iss\": \"https://auth.example.com\",  \"sub\": \"agent_instance_789\",  \"aud\": \"client_123\",  \"exp\": 1714435200,  \"iat\": 1714348800,  \"auth_time\": 1714348800,  \"nonce\": \"n-0S6_WzA2Mj\",  \"agent_type\": \"assistant\",  \"agent_model\": \"gpt-4\",  \"agent_version\": \"2025-03\",  \"agent_provider\": \"openai.com\",  \"agent_instance_id\": \"agent_instance_789\",  \"delegator_sub\": \"user_456\",  \"delegation_purpose\": \"Email management assistant\",  \"agent_capabilities\": [\"email:read\", \"email:draft\", \"calendar:view\"],  \"agent_trust_level\": \"verified\",  \"agent_context_id\": \"conversation_123\",  \"agent_attestation\": {    \"format\": \"urn:ietf:params:oauth:token-type:eat\",    \"token\": \"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...\",    \"timestamp\": 1714348800  },  \"delegation_chain\": [    {      \"iss\": \"https://auth.example.com\",      \"sub\": \"user_456\",      \"aud\": \"agent_instance_789\",      \"delegated_at\": 1714348700,      \"scope\": \"email profile calendar\"    }  ]}Appendix B: Example Delegation Chain (Multi-step)\"delegation_chain\": [  {    \"iss\": \"https://auth.example.com\",    \"sub\": \"user_456\",    \"aud\": \"agent_instance_789\",    \"delegated_at\": 1714348800,    \"scope\": \"email calendar\",    \"purpose\": \"Manage my emails and calendar\"  },  {    \"iss\": \"https://auth.example.com\",    \"sub\": \"agent_instance_789\",    \"aud\": \"agent_instance_101\",    \"delegated_at\": 1714348830,    \"scope\": \"calendar:view\",    \"purpose\": \"Analyze available time slots\"  }]",
      "views": 3228,
      "reading_minutes": 12,
      "tags": [
        
          
          {
            "name": "OpenID",
            "slug": "openid",
            "url": "/tags/openid/#posts"
          },
        
          
          {
            "name": "OAuth",
            "slug": "oauth",
            "url": "/tags/oauth/#posts"
          },
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "Agents",
            "slug": "agents",
            "url": "/tags/agents/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "Identity",
            "slug": "identity",
            "url": "/tags/identity/#posts"
          },
        
          
          {
            "name": "Authentication",
            "slug": "authentication",
            "url": "/tags/authentication/#posts"
          },
        
          
          {
            "name": "Authorization",
            "slug": "authorization",
            "url": "/tags/authorization/#posts"
          },
        
          
          {
            "name": "Standards",
            "slug": "standards",
            "url": "/tags/standards/#posts"
          },
        
          
          {
            "name": "Proposal",
            "slug": "proposal",
            "url": "/tags/proposal/#posts"
          },
        
          
          {
            "name": "Specification",
            "slug": "specification",
            "url": "/tags/specification/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "AI Agents and Agentic Security: The Next Frontier in Enterprise Automation",
      "url": "/2024/12/10/ai-agents-agentic-security-enterprise-automation/",
      "date_display": "December 10, 2024",
      "date_iso": "2024-12-10",
      "excerpt": "Exploring the potential of AI agents in enterprise security and automation, and how they can enhance security operations.",
      "content": "Traditional automation tools like Robotic Process Automation (RPA) and Integration Platform as a Service (iPaaS) have long served as the backbone of enterprise workflows. These systems, designed to automate repetitive tasks and connect disparate software tools, have delivered undeniable value. However, their inherent limitations are becoming increasingly evident. They require significant manual setup, often break when systems change, and struggle to handle unstructured data such as documents, emails, or images.Enter AI agents — a revolutionary leap from static, rule-based automation to intelligent, adaptable systems. AI agents promise to overcome the constraints of traditional tools, paving the way for smarter, more efficient enterprise automation. An excellent breakdown of their significance can be found in the insightful Menlo Ventures article “Beyond Bots: How AI Agents Are Driving the Next Wave of Enterprise Automation”.The Shift from Automation to IntelligenceAI agents represent a fundamental paradigm shift. Unlike their predecessors, these systems are not bound by rigid rules or pre-defined workflows. Instead, they possess the ability to learn, adapt, and make decisions based on changing circumstances. This adaptability enables them to address dynamic and complex tasks, unlocking unprecedented levels of efficiency and scalability.However, this evolution introduces a new layer of complexity: agentic security. As AI agents grow more autonomous, ensuring their security, transparency, and trustworthiness becomes paramount, particularly in multi-agent environments where multiple AI systems must collaborate. This shift necessitates rethinking how we secure enterprise automation systems to ensure they remain robust and trustworthy in a rapidly evolving landscape.The Imperative of Agentic SecurityAgentic security involves safeguarding intelligent, autonomous systems while maintaining their transparency and reliability. It becomes especially critical in environments where multiple AI agents operate simultaneously, managing dynamic processes and sensitive data. Key considerations for agentic security include:Dynamic Adaptability with Robust SecurityAI agents excel at adjusting to system changes, but their adaptability must not come at the expense of enterprise security. In multi-agent environments, secure communication protocols and strong authentication mechanisms form the foundation of security. However, static security measures alone are insufficient. Evolving contexts require context-aware security — a system that dynamically adjusts access controls and agent behavior based on situational needs and data sensitivities. This mitigates risks such as unauthorized escalations, prompt injection attacks, and data breaches.For example, a financial reporting agent, which has access to internal financial metrics, should be able to generate a detailed report for C-suite agents while maintaining strict data boundaries. If an HR agent requests information about salaries, the financial agent should only provide relevant, pre-approved metrics, such as aggregated departmental budgets, rather than individual salary slips. This ensures that agents respect organizational boundaries and adhere to context-aware security protocols.In cross-enterprise collaborations, where AI agents from different organizations interact, maintaining the integrity of each participant’s systems is essential. Context-aware security ensures that agents respect boundaries and operate within predefined limits, even as they adapt to new information or changing environments.Transparent Decision-Making and AccountabilityAs AI agents take on more critical roles in enterprise processes, transparency and accountability become non-negotiable. Organizations must implement mechanisms to trace and audit agent decisions, ensuring they align with business objectives and ethical standards. This is particularly important in regulated industries, where compliance requirements demand a clear understanding of how and why decisions are made.Trust in Multi-Agent CollaborationIn scenarios where multiple agents collaborate, trust is the cornerstone of effective operation. Agents must communicate securely, share information responsibly, and resolve conflicts without compromising the integrity of the broader system. Establishing trust requires robust encryption, tamper-proof logs, and mechanisms for conflict resolution to prevent unintended behaviors or system failures.The Path ForwardAI agents represent the next frontier in enterprise automation, promising smarter, faster, and more scalable workflows. However, their increasing sophistication demands a proactive approach to agentic security. As organizations embrace these intelligent systems, they must prioritize building trust, safeguarding data, and ensuring transparency to foster sustainable innovation.The Menlo Ventures article encapsulates this beautifully: AI agents are not just tools — they are collaborators, reshaping how enterprises operate. But with great power comes great responsibility. By addressing the challenges of agentic security, we can unlock the full potential of AI agents while preserving the integrity and trust that underpin modern enterprises.",
      "views": 250,
      "reading_minutes": 3,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "Security",
            "slug": "security",
            "url": "/tags/security/#posts"
          },
        
          
          {
            "name": "Automation",
            "slug": "automation",
            "url": "/tags/automation/#posts"
          },
        
          
          {
            "name": "Enterprise",
            "slug": "enterprise",
            "url": "/tags/enterprise/#posts"
          },
        
          
          {
            "name": "AI Agents",
            "slug": "ai-agents",
            "url": "/tags/ai-agents/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "A feat of strength MVP for AI Apps",
      "url": "/2024/02/20/a-feat-of-strength-mvp-for-ai-apps/",
      "date_display": "February 20, 2024",
      "date_iso": "2024-02-20",
      "excerpt": "Exploring the concept of a Minimum Viable Product (MVP) in AI applications, focusing on delivering value by understanding and addressing user needs effectively.",
      "content": "A minimum viable product (MVP) is a version of a product with just enough features to be usable by early customers, who can then provide feedback for future product development.Today I want to focus on what that looks like for shipping AI applications. To do that, we only need to understand 4 things.  What does 80% actually mean?  What segments can we serve well?  Can we double down?  Can we educate the user about the segments we don’t serve well?The Pareto principle, also known as the 80/20 rule, still applies but in a different way than you might think.What is an MVP?An analogy I often use to help understand this concept is as follows: You need something to help get from point A to point B. Maybe the vision is to have a car. However, the MVP is not a chassis without wheels or an engine. Instead, it might look like a skateboard. You’ll ship and realize the product needs brakes or steering. So then you ship a scooter. Afterwards, you figure out the scooter needs more leverage, so you add larger wheels and end up with a bicycle. Limited by the force you can apply as a human being, you start thinking about motors and can branch out into mopeds, e-bikes, and motorcycles. Then one day, ship the car.Consider the 80/20 ruleWhen talking about something being  80% done or 80% ready, it is usually in a machine-learning sense. In this context, each component is deterministic, which means 80% translates to  8 out of 10 features being complete. Once the remaining 2 features are ready, we can ship the product. However, If we want to follow the 80/20 rule, we might be able to ship the product with 80% of the features and then add the remaining 20% later, like a car without a radio or air conditioning. However, The meaning of 80% can vary significantly, and this definition may not apply to an AI-powered application.The issue with Summary StatisticsThe above image is an example of Anscombe’s quartet. It’s a set of four datasets that have nearly identical simple descriptive statistics yet very different distributions and appearances. This is a classic explanation of why summary statistics can be misleading.Consider the following example:                        Query_id            score                                    1            0.9                            2            0.8                            3            0.9                            4            0.9                            5            0.0                            6            0.0            The average score is 0.58. However, if we analyze the queries within segments, we might discover that we are serving the majority of queries exceptionally well!  Admitting what you’re bad at  Being honest with what you’re bad at is a great way to build trust with your users. If you can accurately identify when something will perform poorly and confidently reject it, then you might be ready to ship a great product while educating your users about the limitations of your application.It is very important to understand the limitations of your system and to be able to confidently understand the characteristics of your system beyond summary statistics. This is because not all systems are made equal. The behavior of a probabilistic system could be very different from the previous example. Consider the following dataset:                        Query_id            Score                                    1            .59                            2            .58                            3            .59                            4            .57            A system like this also has the same average score of 0.58, but it’s not as easy to reject any subset of requests…Learning to say noConsider an RAG application where a large proportion of the queries are regarding timeline queries. If our search engines do not support this time constraint, we will likely be unable to perform well.                        Query_id            Score            Query Type                                    1            0.9            text search                            2            0.8            text search                            3            0.9            news search                            4            0.9            news search                            5            0.0            timeline                            6            0.0            timeline            If we’re in a pinch to ship, we could simply build a classification model that detects whether or not these questions are timeline questions and throw a warning. Instead of constantly trying to push the algorithm to do better, we can educate the user and educate them by changing the way that we might design the product.  Detecting segments  Detecting these segments could be accomplished in various ways. We could construct a classifier or employ a language model to categorize them. Additionally, we can utilize clustering algorithms with the embeddings to identify common groups and potentially analyze the mean scores within each group. The sole objective is to identify segments that can enhance our understanding of the activities within specific subgroups.One of the worst things you can do is to spend months building out a feature that only increases your productivity by a little while ignoring some more important segment of your user base.By redesigning our application and recognizing its limitations, we can potentially improve performance under certain conditions by identifying the types of tasks we can decline. If we are able to put this segment data into some kind of In-System Observability, we can safely monitor what proportion of questions are being turned down and prioritize our work to maximize coverage.Figure out what you’re actually trying to do before you do itOne of the dangerous things I’ve noticed working with startups is that we often think that the AI works at all… As a result, we want to be able to serve a large general application without much thought into what exactly we want to accomplish.In my opinion, most of these companies should try to focus on one or two significant areas and identify a good niche to target. If your app is good at one or two tasks, there’s no way you could not find a hundred or two hundred users to test out your application and get feedback quickly. Whereas, if your application is good at nothing, it’s going to be hard to be memorable and provide something that has repeated use. You might get some virality, but very quickly, you’re going to lose the trust of your users and find yourself in a position where you’re trying to reduce churn.When we’re front-loaded, the ability to use GPT-4 to make predictions, and time to feedback is very important. If we can get feedback quickly, we can iterate quickly. If we can iterate quickly, we can build a better product.Final thoughtsThe MVP for an AI application is not as simple as shipping a product with 80% of the features. Instead, it requires a deep understanding of the segments of your users that you can serve well and the ability to educate your users about the segments that you don’t serve well. By understanding the limitations of your system and niching down, you can build a product that is memorable and provides something that has repeated use. This will allow you to get feedback quickly and iterate quickly, ultimately leading to a better product, by identifying your feats of strength.",
      "views": 151,
      "reading_minutes": 6,
      "tags": [
        
          
          {
            "name": "AI",
            "slug": "ai",
            "url": "/tags/ai/#posts"
          },
        
          
          {
            "name": "MVP",
            "slug": "mvp",
            "url": "/tags/mvp/#posts"
          },
        
          
          {
            "name": "Product Development",
            "slug": "product-development",
            "url": "/tags/product-development/#posts"
          },
        
          
          {
            "name": "User Feedback",
            "slug": "user-feedback",
            "url": "/tags/user-feedback/#posts"
          },
        
          
          {
            "name": "Innovation",
            "slug": "innovation",
            "url": "/tags/innovation/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "The Nockout Story",
      "url": "/2024/01/11/the-nockout-story/",
      "date_display": "January 11, 2024",
      "date_iso": "2024-01-11",
      "excerpt": "Discover how Nockout is transforming the way we find and enjoy sports activities. No more hassle in booking courts, no more mismatches in skill levels, just pure joy of playing your favorite sports.",
      "content": "As the co-founders of Nockout, Yash and I, Subramanya, have been on a quest to solve a problem that plagues every sports enthusiast: finding the right place and the right people for playing sports. Our personal struggles with organizing sports activities have led us to create a platform that not only eases these challenges but also promotes a sense of community among sports lovers.The Problem: A Universal ChallengeOur frustrations weren’t unique. Across the globe, from tennis courts to basketball hoops, sports enthusiasts were grappling with the same issues: finding the right venue and the right people to play with. This global dilemma was evident in the shared experiences voiced through numerous tweets and conversations among the community.Bay Club is pretty good. But also trying to find a reliable way to find players is hard (even using PyC).&mdash; Gautam (@gautamtata) January 1, 2024You should move to New York, where it&#39;s even more difficult!https://t.co/c8RjpPzW9x&mdash; Awais Hussain (@Ahussain4) January 1, 2024someone create an app that shows all public basketball courts and whether or not people are at them or not. this would save a lot of time for me lol.&mdash; thao 🍉 (@holycowitsthao) March 18, 2021I have wanted pickup hoops forever&mdash; Rob Kornblum (@rkorny) July 5, 2021These tweets underscore the need for a platform like Nockout.Our Solution: Introducing NockoutNockout is more than just an app; it’s a revolution in the sports community. Designed to be intuitive and user-friendly, it addresses key challenges:  Venue Discovery: The app shows you all available sports facilities nearby. Whether it’s a public basketball court or a private soccer field, “Nockout” has you covered.  Skill-Based Activity Matching: Our platform intuitively recommends players whose skills align with yours, ensuring you can join in on sporting activities that suit your preferences and proficiency in your chosen sport. After all, it’s all about fair play and good competition.  Intuitive Process: We’ve designed Nockout to be user-friendly. The booking process is straightforward, and finding players is hassle-free.The Impact: Fostering Community and Fair PlayNockout transcends being a mere application; it’s about building a community bound by the love of sports. It encourages fair play, connects like-minded individuals, and rekindles the joy in sports.Looking Ahead: The Future of NockoutOur vision for Nockout is expansive and all-encompassing:  Creating Spaces for Teams: Developing private areas for teams and groups to interact and bond.  Expanding Community Features: Introducing a platform for sharing triumphs and experiences.  Accessible Coaching and Activities: Offering a range of activities and coaching sessions for all skill levels and interests.  Streamlined Payments and Management: Enhancing the booking and payment process for a smooth user experience.  Personalized Athletic Journey: Providing tailored advice for sports and nutrition, alongside a comprehensive sports marketplace.Join the RevolutionBe part of a movement that’s reshaping the sports landscape. Sign up for early beta access at Nockout.co, and connect with us on Instagram, LinkedIn, and Twitter. Together, let’s make sports accessible and enjoyable for everyone!",
      "views": 390,
      "reading_minutes": 2,
      "tags": [
        
          
          {
            "name": "Sports",
            "slug": "sports",
            "url": "/tags/sports/#posts"
          },
        
          
          {
            "name": "Technology",
            "slug": "technology",
            "url": "/tags/technology/#posts"
          },
        
          
          {
            "name": "Community",
            "slug": "community",
            "url": "/tags/community/#posts"
          },
        
          
          {
            "name": "Innovation",
            "slug": "innovation",
            "url": "/tags/innovation/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Enhancing Document Interactions - Leveraging the synergy of Google Cloud Platform, Pinecone, and LLM in Natural Language Communication",
      "url": "/2023/06/10/enhancing-document-interactions/",
      "date_display": "June 10, 2023",
      "date_iso": "2023-06-10",
      "excerpt": "Explore the groundbreaking fusion of Google Cloud Platform for OCR, Pinecone, and Large Language Model that is transforming information retrieval. This blog delves into how these potent tools collaborate to enable seamless interactions with documents using natural language. Discover how Google Cloud Platform offers a solid foundation, Pinecone provides rapid similarity searches for effective document retrieval, and LLM elevates language comprehension and generation capabilities.",
      "content": "High-level view of system design with Document AI, OpenAI, PineconeIn today’s digital era, accessing crucial information from government documents can be overwhelming and time-consuming due to their scanned and non-digitized formats. To address this issue, there is a need for an innovative tool that simplifies navigation, scanning, and digitization of these documents, making them easily readable and searchable. This user-friendly solution will revolutionize the way people interact with government documents, leading to better decision-making, improved public services, and a more informed and engaged citizenry. Developing such a tool is essential for ensuring transparency and accessibility of vital information in the modern world.To achieve our goal, we will follow a systematic approach consisting of the following steps:  We will use the powerful Document AI API provided by Google Cloud Platform to convert PDF / Image documents into text format. This step allows us to extract textual content from the documents, making it easier to process and analyze.  Next, we will employ a Language Model (LLM) to generate embeddings for each text extracted from the documents. These embeddings capture the semantic representation of the text, enabling us to effectively analyze and compare documents based on their content.  To optimize the retrieval process, we will utilize Pinecone, a robust indexing and similarity search system. By storing the generated embeddings in PineCone, we can quickly search for documents that closely match a user’s query.  With the acquired knowledge and enhanced search capabilities, our tool will efficiently answer user queries by retrieving the most relevant documents based on their content.For demonstration of this process, we utilized documents from the Karnataka Resident Data Hub (KRDH) by web scraping.    Demo: Building a powerful question/answering for government documents using Document AI, OpenAI, Pinecone, and Flask1. Setting Up Google Cloud Platform - Document AIDocument AI is a document understanding platform that converts unstructured data from documents into structured data, making it easier to comprehend, analyze, and utilize. To set up Document AI in your Google Cloud Platform (GCP) Console, follow these steps:  Enable the Document AI API.  Create a service account:          Navigate to the create service account page in the Google Cloud console.      Choose your project.      Enter a name in the Service account name field. The Google Cloud console will automatically fill in the Service account ID field based on this name.      Click Create and continue.      Grant the Project &gt; Owner role to your service account to provide access to your project.      Click Continue.      Click Done to complete the service account creation process. (Do not close your browser window, as you will need it in the next step.)        Create a service account key:          In the Google Cloud console, click the email address for the service account you created.      Click Keys.      Click Add key, then click Create new key.      Click Create. A JSON key file will be downloaded to your computer.      Click Close.        Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file containing your service account key. This variable applies only to your current shell session, so if you open a new session, you will need to set the variable again.  Install the Client Library:      pip install --upgrade google-cloud-documentai         Create a Processor:          In the Document AI section of the Google Cloud console, go to the Processors page.      Click +Create processor.      Choose the processor type you want to create from the list.      In the Create processor window, specify a processor name.      Select your desired region from the list.      Click Create to generate your processor.      Take note of the Processor ID and location.      After completing these steps, you are ready to use the Document AI API in your code.def convert_pdf_images_to_text(file_path: str):    \"\"\"    Convert PDF or image file containing text into plain text using Google Document AI.    Args:        file_path (str): The file path of the PDF or image file.    Returns:        str: The extracted plain text from the input file.    \"\"\"    extention = file_path.split(\".\")[-1].strip()    if extention == \"pdf\":        mime_type = \"application/pdf\"    elif extention == \"png\":        mime_type = \"image/png\"    elif extention == \"jpg\" or extention == \"jpeg\":        mime_type = \"image/jpeg\"    opts = ClientOptions(        api_endpoint=f\"{location}-documentai.googleapis.com\"    )    client = documentai.DocumentProcessorServiceClient(client_options=opts)    # Add the credentials obtained, Project ID, Location and the Processor ID    name = client.processor_path(        project_id, location, processor_id    )    # Read the file into memory    with open(file_path, \"rb\") as image:        image_content = image.read()    # Load Binary Data into Document AI RawDocument Object    raw_document = documentai.RawDocument(content=image_content, mime_type=mime_type)    # Configure the process request    request = documentai.ProcessRequest(name=name, raw_document=raw_document)    result_document = client.process_document(request=request).document    return result_document.text2. Embeddings Generation and PineconeIn this step, we will use the OpenAI Text Embedding API to generate embeddings that capture the semantic meaning of the extracted text. These embeddings serve as numerical representations of the textual data, allowing us to understand the underlying context and nuances.After generating the embeddings, we will securely store them in Pinecone, a powerful indexing and similarity search system. By leveraging Pinecone’s efficient storage capabilities, we can effectively organize and index the embeddings for quick and precise retrieval.With the embeddings stored in Pinecone, our system gains the ability to perform similarity searches. This enables us to find documents that closely match a given query or exhibit similar semantic characteristics.The following code uses OpenAI’s Text Embedding model to create embeddings for text data. It divides the input text into chunks, generates embeddings for each chunk, and then upserts the embeddings along with associated metadata to a Pinecone search index for efficient searching and retrieval.def create_embeddings(    text: str, model: str = \"text-embedding-ada-002\"):    \"\"\"    Creates a text embedding using OpenAI's Text Embedding model.    Args:        text (str): The text to embed        model (str, optional): The name of the text embedding model to use.            Defaults to \"text-embedding-ada-002\".    Returns:        List[float]: The text embedding.    \"\"\"    if type(text) == list:        response = openai.Embedding.create(model=model, input=text).data        return [d[\"embedding\"] for d in response]    else:        return [openai.Embedding.create(            model=model, input=[text]).data[0][\"embedding\"]]∂def generate_embeddings_upload_to_pinecone(documents: List[Dict[str, Any]]):    \"\"\"    Generates text embeddings from the provided documents, then uploads and indexes     them to Pinecone.    Args:        documents (List[Dict[str, Any]]): A list of dictionaries containing         document information.            Each dictionary should include the following keys:                - \"Content\": The text content of the document.                - \"DocumentName\": The name of the document.                - \"DocumentType\": The type/category of the document.    Note:        This function assumes that Pinecone and the associated index have already        been initialized properly. Please make sure to initialize Pinecone first        and set up the index accordingly.    \"\"\"    # create chunks    chunks = []    for document in documents:        texts = create_chunks(document[\"Content\"])        chunks.extend(            [                {                    \"id\": str(uuid4()),                    \"text\": texts[i],                    \"chunk_index\": i,                    \"title\": document[\"DocumentName\"],                    \"type\": document[\"DocumentType\"],                }                for i in range(len(texts))            ]        )    # initialize Pinecone index, create embeddings, and upsert to Pinecone    index = pinecone.Index(\"pinecone-index\")    for i in tqdm(range(0, len(chunks), 100)):        # find end of batch        i_end = min(len(chunks), i + 100)        batch = chunks[i:i_end]        ids_batch = [x[\"id\"] for x in batch]        texts = [x[\"text\"] for x in batch]        embeds = create_embeddings(text=texts)        # cleanup metadata        meta_batch = [            {                \"title\": x[\"title\"],                \"type\": x[\"type\"],                \"text\": x[\"text\"],                \"chunk_index\": x[\"chunk_index\"],            }            for x in batch        ]        to_upsert = []        for id, embed, meta in list(zip(ids_batch, embeds, meta_batch)):            to_upsert.append(                {                    \"id\": id,                    \"values\": embed,                    \"metadata\": meta,                }            )        # upsert to Pinecone        index.upsert_documents(to_upsert)For more information on OpenAI’s Text Embedding API, refer to the OpenAI API documentation. For more details on Pinecone, check out the Pinecone documentation.3. User Query and CommunicationFinally, with all the necessary components in place, we can witness the powerful functionality of our tool as it matches user queries with relevant context and provides accurate answers.When a user submits a query, our system leverages the stored embeddings and advanced search capabilities to identify the most relevant documents based on their semantic similarity to the query. By analyzing the contextual information captured in the embeddings, our tool can retrieve the documents that contain the desired information.def query_and_combine(    self, query_vector: list, top_k: int = 5, threshold: float = 0.75):    \"\"\"Query Pinecone index and combine responses to string    Args:        query_embedding (list): Query embedding        index (str): Pinecone index to query        top_k (int, optional): Number of top results to return. Defaults to 5.        threshold : The similarity threshold. Defaults to 0.75    Returns:        str: Combined responses    \"\"\"    responses = index.query(query_vector=query_vector, top_k=top_k, metadata=True)    _responses = []    for sample in responses[\"matches\"]:        if sample[\"score\"] &lt; threshold:            continue        if \"text\" in sample[\"metadata\"]:            _responses.append(sample[\"metadata\"][\"text\"])        else:            _responses.append(str(sample[\"metadata\"]))    return \" \\n --- \\n \".join(_responses).replace(\"\\n---\\n\", \" \\n --- \\n \").strip()def generate_answer(query: str, language: str = \"English\"):    \"\"\"    Generates an answer to a user's query using the context from Pinecone search results    and OpenAI's chat models.    The function takes the user's query, creates a text embedding from it, performs a    Pinecone query to find relevant context, and then generates an answer using OpenAI's    chat models with the given context.    Returns:        A JSON object containing the generated answer.    Note:        This function assumes that Pinecone and the associated index have already been         initialized properly, and that the OpenAI API is set up correctly. Please         make sure to initialize Pinecone and the OpenAI API first.    \"\"\"    query_embed = create_embeddings(text=query)[0]    augmented_query = query_and_combine(        query_embed,        top_k=app.config[\"top_n\"],        threshold=app.config[\"pinecone_threshold\"],    )    ## Creating the prompt for model    primer = \"\"\"You are Q&amp;A bot. A highly intelligent system that answers    user questions based on the context provided by the user above    each question. If the information can not be found in the context    provided by the user you truthfully say \"I don't know\". Be as concise as possible.    \"\"\"    augmented_query = augmented_query if augmented_query != \"\" else \"No context found\"    text, usage = openai.ChatCompletion.create(        messages=[            {\"role\": \"system\", \"content\": primer},            {                \"role\": \"user\",                \"content\": f\"Context: \\n {augmented_query} \\n --- \\n Question: {query} \\n Answer in {language}\",            },        ],        model=app.config[\"chat_model\"],        temperature=app.config[\"temperature\"],    )    return textThe code consists of two functions.  query_and_combine() queries a Pinecone index using a query vector, retrieves the top matching responses, and combines them into a single string. It filters the responses based on a similarity threshold and extracts the relevant text or metadata to be included in the combined result.  generate_answer() generates an answer to a user query. It creates an embedding for the query, performs a combined query on the Pinecone index, and uses the obtained augmented query as context for a chat-based language model. The model generates an answer based on the context and user query, which is then returned as the response.Overall, the code enables querying a Pinecone index, combining responses, and generating answers using a language model based on the given query and context.As you reach the end of this blog, we hope you have gained valuable insights into the powerful combination of Google Cloud Platform, Pinecone, and Language Models for revolutionizing document interactions. To dive deeper and explore the code behind this innovative solution, visit our GitHub repository. Feel free to clone, modify, and contribute to the project, and don’t hesitate to share your thoughts and experiences. I would also like to thank Tasheer Hussain B for his contributions.  Happy coding!References  Google Document AI  Retrieval Enhanced Generative Question Answering with OpenAI  Introduction to Flask  GitHub repository",
      "views": 111,
      "reading_minutes": 13,
      "tags": [
        
          
          {
            "name": "GCP",
            "slug": "gcp",
            "url": "/tags/gcp/#posts"
          },
        
          
          {
            "name": "Pinecone",
            "slug": "pinecone",
            "url": "/tags/pinecone/#posts"
          },
        
          
          {
            "name": "Large Language Models",
            "slug": "large-language-models",
            "url": "/tags/large-language-models/#posts"
          },
        
          
          {
            "name": "OpenAI",
            "slug": "openai",
            "url": "/tags/openai/#posts"
          },
        
          
          {
            "name": "Document AI",
            "slug": "document-ai",
            "url": "/tags/document-ai/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Hybrid Search for E-Commerce with Pinecone and LLMs",
      "url": "/2023/05/02/hybrid-search-for-e-commerce-with-pinecone-and-LLM/",
      "date_display": "May 2, 2023",
      "date_iso": "2023-05-02",
      "excerpt": "Learn how to build a powerful hybrid search system for e-commerce applications by combining traditional information retrieval methods with machine learning models like Language Models (LLMs) and Pinecone, a managed vector database. Discover the benefits of hybrid search for e-commerce, including improved search relevance, personalization, handling long-tail queries, and simpler infrastructure management.",
      "content": "Searching and finding relevant products is a critical component of an e-commerce website. Providing fast and accurate search results can make the difference between high user satisfaction and user frustration. With recent advancements in natural language understanding and vector search technologies, enhanced search systems have become more accessible and efficient, leading to better user experiences and improved conversion rates.In this blog post, we’ll explore how to implement a hybrid search system for e-commerce using Pinecone, a high-performance vector search engine, and fine-tuned domain-specific language models. By the end of this post, you’ll not only have a strong understanding of hybrid search but also a practical step-by-step guide to implementing it.What is Hybrid Search?High-level view of simple Pinecone Hybrid IndexBefore diving into the implementation, let’s quickly understand what hybrid search means. Hybrid search is an approach that combines the strengths of both traditional search (sparse vector search) and vector search (dense vector search) to achieve better search performance across a wide range of domains.Dense vector search extracts high-quality vector embeddings from text data and performs a similarity search to find relevant documents. However, it often struggles with out-of-domain data when it’s not fine-tuned on domain-specific datasets.On the other hand, traditional search uses sparse vector representations, like term frequency-inverse document frequency (TF-IDF) or BM25, and does not require any domain-specific fine-tuning. While it can handle new domains, its performance is limited by its inability to understand semantic relations between words and lacks the intelligence of dense retrieval.Hybrid search tries to mitigate the weaknesses of both approaches by combining them in a single system, leveraging the performance potential of dense vector search and the zero-shot adaptability of traditional search.Now that we have a basic understanding of hybrid search, let’s dive into its implementation.Building a Hybrid Search SystemWe’ll cover the following steps for implementing a hybrid search system:  Leveraging Domain-Specific Language Models  Creating Sparse and Dense Vectors  Setting Up Pinecone  Implementing the Hybrid Search Pipeline  Making Queries and Tuning Parameters1. Leveraging Domain-Specific Language ModelsIn recent years, large-scale pre-trained language models like OpenAI’s GPT and Cohere have become increasingly popular for a variety of tasks, including natural language understanding and generation. These models can be fine-tuned on domain-specific data to improve their performance and adapt to specific tasks, such as e-commerce product search.In our example, we will use a fine-tuned domain-specific language model to generate dense vector embeddings for products and queries. However, you can choose other models or even create your own custom embeddings based on your specific domain.import torchfrom transformers import AutoTokenizer, AutoModel# Load a pre-trained domain-specific language modelmodel_name = \"your-domain-specific-model\"tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModel.from_pretrained(model_name)# Generate dense vector embeddings for a product descriptiontext = \"Nike Air Max sports shoes for men\"inputs = tokenizer(text, return_tensors=\"pt\")with torch.no_grad():    outputs = model(**inputs)    dense_embedding = outputs.last_hidden_state.mean(dim=1).numpy()2. Creating Sparse and Dense VectorsHybrid search requires both sparse and dense vector representations for our e-commerce data. We’ll now describe how to generate these vectors.Sparse VectorsSparse vector representations, like TF-IDF or BM25, can be created using standard text processing techniques, such as tokenization, stopword removal, and stemming. An example of generating sparse vectors can be achieved using a vocabulary matrix.# This function generates sparse vector representations of a list of product descriptionsdef generate_sparse_vectors(text):    '''Generates sparse vector representations for a list of product descriptions    Args:        text (list): A list of product descriptions    Returns:        sparse_vector (dict): A dictionary of indices and values    '''    sparse_vector = bm25.encode_queries(text)    return sparse_vectorfrom pinecone_text.sparse import BM25Encoder# Create the BM25 encoder and fit the databm25 = BM25Encoder()bm25.fit(new_df.full_data)# Create the sparse vectorssparse_vectors = []for product_description in product_descriptions:    sparse_vectors.append(generate_sparse_vectors(text=product_description))Dense VectorsDense vector representations can be generated using pre-trained or custom domain-specific language models. In our previous example, we used a domain-specific language model to generate dense vector embeddings for a product description.def generate_dense_vector(text):    '''Generates dense vector embeddings for a list of product descriptions    Args:        text (list): A list of product descriptions    Returns:        dense_embedding (np.array): A numpy array of dense vector embeddings    '''    # Tokenize the text and convert to PyTorch tensors    inputs = tokenizer(text, return_tensors=\"pt\")    # Generate the embeddings with the pre-trained model    with torch.no_grad():        outputs = model(**inputs)        dense_vector = outputs.last_hidden_state.mean(dim=1).numpy()    return dense_vector# Generate dense vector embeddings for a list of product descriptionsdense_vectors = []for product_description in product_descriptions:    dense_vectors.append(generate_dense_vector(text=product_description))3. Setting Up PineconePinecone is a high-performance vector search engine that supports hybrid search. It enables the creation of a single index for both sparse and dense vectors and seamlessly handles search queries across different data modalities.To use Pinecone, you’ll need to sign up for an account, install the Pinecone client, and set up your API key and environment.# Create a Pinecone hybrid search indeximport pineconepinecone.init(    api_key=\"YOUR_API_KEY\",  # app.pinecone.io    environment=\"YOUR_ENV\"  # find next to api key in console)# Create a Pinecone hybrid search indexindex_name = \"ecommerce-hybrid-search\"pinecone.create_index(    index_name = index_name,    dimension = MODEL_DIMENSION,  # dimensionality of dense model    metric = \"dotproduct\")# connect to the indexindex = pinecone.Index(index_name=index_name)# view index statsindex.describe_index_stats()4. Implementing the Hybrid Search PipelineWith our sparse and dense vectors generated and Pinecone set up, we can now build a hybrid search pipeline. This pipeline includes the following steps:  Adding product data to the Pinecone index  Retrieving results using both sparse and dense vectorsdef add_product_data_to_index(product_ids, sparse_vectors, dense_vectors, metadata=None):    \"\"\"Upserts product data to the Pinecone index.    Args:        product_ids (`list` of `str`): Product IDs.        sparse_vectors (`list` of `list` of `float`): Sparse vectors.        dense_vectors (`list` of `list` of `float`): Dense vectors.        metadata (`list` of `list` of `str`): Optional metadata.    Returns:        None    \"\"\"    batch_size = 32    # Loop through the product IDs in batches.    for i in range(0, len(product_ids), batch_size):        i_end = min(i + batch_size, len(product_ids))        ids = product_ids[i:i_end]        sparse_batch = sparse_vectors[i:i_end]        dense_batch = dense_vectors[i:i_end]        meta_batch = metadata[i:i_end] if metadata else []        vectors = []        for _id, sparse, dense, meta in zip(ids, sparse_batch, dense_batch, meta_batch):            vectors.append({                'id': _id,                'sparse_values': sparse,                'values': dense,                'metadata': meta            })        # Upsert the vectors into the Pinecone index.        index.upsert(vectors=vectors)add_product_data_to_index(product_ids, sparse_vectors, dense_vectors)Now that our data is indexed, we can perform hybrid search queries.5. Making Queries and Tuning ParametersHigh-level view of simple Pinecone Hybrid QueryTo make hybrid search queries, we’ll create a function that takes a query, the number of top results, and an alpha parameter to control the weighting between dense and sparse vector search scores.def hybrid_scale(dense, sparse, alpha: float):    \"\"\"Hybrid vector scaling using a convex combination    alpha * dense + (1 - alpha) * sparse    Args:        dense: Array of floats representing        sparse: a dict of `indices` and `values`        alpha: float between 0 and 1 where 0 == sparse only               and 1 == dense only    \"\"\"    if alpha &lt; 0 or alpha &gt; 1:        raise ValueError(\"Alpha must be between 0 and 1\")    # scale sparse and dense vectors to create hybrid search vecs    hsparse = {        'indices': sparse['indices'],        'values':  [v * (1 - alpha) for v in sparse['values']]    }    hdense = [v * alpha for v in dense]    return hdense, hsparsedef search_products(query, top_k=10, alpha=0.5):    # Generate sparse query vector    sparse_query_vector = generate_sparse_vector(query)    # Generate dense query vector    dense_query_vector = generate_dense_vector(query)    # Calculate hybrid query vector    dense_query_vector, sparse_query_vector = hybrid_scale(dense_query_vector, sparse_query_vector, alpha)    # Search products using Pinecone    results = index.query(        vector=dense_query_vector,        sparse_vector=sparse_query_vector,        top_k=top_k    )    return resultsWe can then use this function to search for relevant products in our e-commerce dataset.query = \"running shoes for women\"results = search_products(query, top_k=5)for result in results:    print(result['id'], result['metadata']['product_name'], result['score'])Experimenting with different values for the alpha parameter will help you find the optimal balance between sparse and dense vector search for your specific domain.ConclusionIn this blog post, we demonstrated how to build a hybrid search system for e-commerce using Pinecone and domain-specific language models. Hybrid search enables us to combine the strengths of both traditional search and vector search, improving search performance and adaptability across diverse domains.By following the steps and code snippets provided in this post, you can implement your own hybrid search system tailored to your e-commerce website’s specific requirements. Start exploring Pinecone and improve your e-commerce search experience today!References  Ecommerce Search using Hybrid Search Techniques in Pinecone (Google Colab Notebook): A practical guide showcasing the implementation of e-commerce search using Pinecone’s hybrid search techniques.  Pinecone Ecommerce Search Documentation: Official Pinecone documentation for building e-commerce search systems.  BM25 Vector Generation using Pinecone (Google Colab Notebook): A guide for generating BM25 sparse vectors using Pinecone.  Pinecone Text Repository on GitHub: A collection of text processing and vector generation resources using Pinecone.  Introduction to Hybrid Search on Pinecone’s Website: An overview of hybrid search, its benefits, and use cases in the context of pinecone’s capabilities.",
      "views": 856,
      "reading_minutes": 10,
      "tags": [
        
          
          {
            "name": "Pinecone",
            "slug": "pinecone",
            "url": "/tags/pinecone/#posts"
          },
        
          
          {
            "name": "Hybrid Search",
            "slug": "hybrid-search",
            "url": "/tags/hybrid-search/#posts"
          },
        
          
          {
            "name": "E-Commerce",
            "slug": "e-commerce",
            "url": "/tags/e-commerce/#posts"
          },
        
          
          {
            "name": "Large Language Models",
            "slug": "large-language-models",
            "url": "/tags/large-language-models/#posts"
          },
        
          
          {
            "name": "Vector Database",
            "slug": "vector-database",
            "url": "/tags/vector-database/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
    
    
    
    {
      "kind": "post",
      "title": "Demystifying the Shell Scripting: Working with Files and Directories",
      "url": "/2023/01/04/demystifying-the-shell-scripting-working-with-files-and-directories/",
      "date_display": "January 4, 2023",
      "date_iso": "2023-01-04",
      "excerpt": "Master the art of working with files and directories in shell scripting to streamline your tasks and improve efficiency. Learn how to create, copy, move, and delete files and directories, as well as read and write to files using practical examples. Discover the power of searching for files and directories with the `find` command. Enhance your shell scripting skills with valuable resources and tutorials, and unlock the full potential of file and directory management in the shell.",
      "content": "In my previous blog posts, we covered the basics of using the shell, introduced shell scripting for beginners, and explored advanced techniques and best practices. In this blog post, we will focus on working with files and directories in shell scripts. We will discuss common tasks such as creating, copying, moving, and deleting files and directories, as well as reading and writing to files. We will also provide some resources for further learning.Creating Files and DirectoriesTo create a new file in a shell script, you can use the touch command:touch new_file.txtTo create a new directory, you can use the mkdir command:mkdir new_directoryCopying and Moving Files and DirectoriesTo copy a file, you can use the cp command:cp source_file.txt destination_file.txtTo copy a directory, you can use the -r (recursive) option:cp -r source_directory destination_directoryTo move a file or directory, you can use the mv command:mv source_file.txt destination_file.txtDeleting Files and DirectoriesTo delete a file, you can use the rm command:rm file_to_delete.txtTo delete a directory, you can use the -r (recursive) option:rm -r directory_to_deleteReading and Writing to FilesTo read the contents of a file, you can use the cat command:cat file_to_read.txtTo write to a file, you can use the &gt; operator to overwrite the file or the &gt;&gt; operator to append to the file:echo \"This is a new line\" &gt; file_to_write.txtecho \"This is another new line\" &gt;&gt; file_to_write.txtTo read a file line by line, you can use a while loop with the read command:#!/bin/bashwhile IFS= read -r line; do  echo \"Line: $line\"done &lt; file_to_read.txtSearching for Files and DirectoriesTo search for files and directories, you can use the find command:find /path/to/search -name \"file_pattern\"For example, to find all .txt files in the /home/user directory, you can use:find /home/user -name \"*.txt\"ResourcesTo further improve your skills in working with files and directories in shell scripts, here are some resources:  File Management Commands in Linux: A comprehensive guide to file management commands in Linux.  Linux Find Command Examples: A collection of examples for using the find command in Linux.In conclusion, working with files and directories is an essential aspect of shell scripting. By mastering common tasks such as creating, copying, moving, and deleting files and directories, as well as reading and writing to files, you will be well-equipped to handle a wide range of shell scripting tasks.",
      "views": 0,
      "reading_minutes": 2,
      "tags": [
        
          
          {
            "name": "Shell Scripting",
            "slug": "shell-scripting",
            "url": "/tags/shell-scripting/#posts"
          },
        
          
          {
            "name": "Bash",
            "slug": "bash",
            "url": "/tags/bash/#posts"
          },
        
          
          {
            "name": "Shell",
            "slug": "shell",
            "url": "/tags/shell/#posts"
          },
        
          
          {
            "name": "File Management",
            "slug": "file-management",
            "url": "/tags/file-management/#posts"
          },
        
          
          {
            "name": "Directory Management",
            "slug": "directory-management",
            "url": "/tags/directory-management/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
    
    
    
    {
      "kind": "post",
      "title": "Demystifying the Shell Scripting: Advanced Techniques and Best Practices",
      "url": "/2022/12/28/demystifying-the-shell-scripting-advanced-techniques-and-best-practices/",
      "date_display": "December 28, 2022",
      "date_iso": "2022-12-28",
      "excerpt": "Building upon the fundamentals of shell scripting, this guide delves into advanced techniques and best practices that will elevate your scripting skills. We will explore error handling, command substitution, process management, and share valuable tips for writing efficient, robust, and maintainable scripts. By mastering these advanced concepts, you will be well-equipped to tackle complex scripting challenges and harness the full power of shell scripting.",
      "content": "In my previous blog posts, we covered the basics of using the shell and introduced shell scripting for beginners. Now that you have a solid foundation in shell scripting, it’s time to explore some advanced techniques and best practices that will help you write more efficient, robust, and maintainable scripts. In this blog post, we will discuss error handling, command substitution, process management, and best practices for writing shell scripts. We will also provide some resources for further learning.Error HandlingError handling is an essential aspect of writing robust shell scripts. By default, shell scripts continue to execute subsequent commands even if an error occurs. To change this behavior and make your script exit immediately if a command fails, you can use the set -e option:#!/bin/bashset -e# Your script hereYou can also use the trap command to define custom error handling behavior. For example, you can create a cleanup function that will be called if your script exits unexpectedly:#!/bin/bashfunction cleanup() {  echo \"Cleaning up before exiting...\"  # Your cleanup code here}trap cleanup EXIT# Your script hereCommand SubstitutionCommand substitution allows you to capture the output of a command and store it in a variable. This can be useful for processing the output of a command within your script. There are two ways to perform command substitution:  Using backticks (` `):output=`ls`  Using $():output=$(ls)The $() syntax is preferred because it is more readable and can be easily nested.Process ManagementShell scripts often need to manage background processes, such as starting, stopping, or monitoring them. Here are some useful commands for process management:  &amp;: Run a command in the background by appending an ampersand (&amp;) to the command.long_running_command &amp;  wait: Wait for a background process to complete before continuing with the script.long_running_command &amp;wait  kill: Terminate a process by sending a signal to it.kill -9 process_id  ps: List running processes and their process IDs.ps auxBest PracticesHere are some best practices for writing shell scripts:  Use meaningful variable and function names.  Add comments to explain complex or non-obvious code.  Use indentation and whitespace to improve readability.  Keep your scripts modular by breaking them into smaller functions.  Use the local keyword to limit the scope of variables within functions.  Always quote your variables to prevent issues with spaces and special characters.  Use the [[ ]] syntax for conditional expressions, as it is more robust than [ ].ResourcesTo further improve your shell scripting skills, here are some resources:  Google Shell Style Guide: A comprehensive style guide for writing shell scripts, created by Google.  ShellCheck: A static analysis tool for shell scripts that can help you identify and fix potential issues in your code.  Awesome Shell: A curated list of awesome command-line frameworks, toolkits, guides, and other resources for shell scripting.In conclusion, mastering advanced techniques and best practices in shell scripting will help you write more efficient, robust, and maintainable scripts. By understanding error handling, command substitution, process management, and following best practices, you will be well on your way to becoming a shell scripting expert.",
      "views": 0,
      "reading_minutes": 3,
      "tags": [
        
          
          {
            "name": "Shell Scripting",
            "slug": "shell-scripting",
            "url": "/tags/shell-scripting/#posts"
          },
        
          
          {
            "name": "Bash",
            "slug": "bash",
            "url": "/tags/bash/#posts"
          },
        
          
          {
            "name": "Shell",
            "slug": "shell",
            "url": "/tags/shell/#posts"
          },
        
          
          {
            "name": "Error Handling",
            "slug": "error-handling",
            "url": "/tags/error-handling/#posts"
          },
        
          
          {
            "name": "Command Substitution",
            "slug": "command-substitution",
            "url": "/tags/command-substitution/#posts"
          },
        
          
          {
            "name": "Process Management",
            "slug": "process-management",
            "url": "/tags/process-management/#posts"
          },
        
          
          {
            "name": "Best Practices",
            "slug": "best-practices",
            "url": "/tags/best-practices/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
    
    
    
    {
      "kind": "post",
      "title": "Demystifying the Shell Scripting: A Beginner's Guide",
      "url": "/2022/12/28/demystifying-the-shell-scripting-a-beginners-guide/",
      "date_display": "December 28, 2022",
      "date_iso": "2022-12-28",
      "excerpt": "Shell scripting is a powerful tool that enables users to automate tasks, perform complex operations, and create custom commands. In this beginner's guide, we will explore the basics of shell scripting, including creating and executing scripts, working with variables, control structures, loops, and functions. By understanding these fundamental concepts, you will be well on your way to mastering shell scripting and unlocking its full potential.",
      "content": "In my previous blog post, we introduced the basics of using the shell, navigating within it, connecting programs, and some miscellaneous tips and tricks. Now that you have a good understanding of the shell, it’s time to take your skills to the next level by learning shell scripting. Shell scripting allows you to automate tasks, perform complex operations, and create custom commands. In this blog post, we will explore the basics of shell scripting, including variables, control structures, loops, and functions. We will also provide some resources for further learning.What is Shell Scripting?Shell scripting is the process of writing a series of commands in a text file (called a script) that can be executed by the shell. These scripts can be used to automate repetitive tasks, perform complex operations, and create custom commands. Shell scripts are typically written in the same language as the shell itself (e.g., Bash, Zsh, or Fish).Creating a Shell ScriptTo create a shell script, simply create a new text file with the extension .sh (e.g., myscript.sh). The first line of the script should be a “shebang” (#!) followed by the path to the shell interpreter (e.g., #!/bin/bash for Bash scripts). This line tells the operating system which interpreter to use when executing the script.Here’s an example of a simple shell script that prints “Hello, World!” to the console:#!/bin/bashecho \"Hello, World!\"To execute the script, you need to make it executable by changing its permissions using the chmod command:chmod +x myscript.shNow you can run the script by typing ./myscript.sh in the terminal.VariablesVariables in shell scripts are used to store values that can be referenced and manipulated throughout the script. To create a variable, use the = operator without any spaces:my_variable=\"Hello, World!\"To reference the value of a variable, use the $ symbol:echo $my_variableControl StructuresControl structures, such as if statements and case statements, allow you to add conditional logic to your shell scripts. Here’s an example of an if statement:#!/bin/bashnumber=5if [ $number -gt 3 ]; then  echo \"The number is greater than 3.\"else  echo \"The number is not greater than 3.\"fiIn this example, the script checks if the value of the number variable is greater than 3 and prints a message accordingly.LoopsLoops allow you to execute a block of code multiple times. There are two main types of loops in shell scripting: for loops and while loops. Here’s an example of a for loop:#!/bin/bashfor i in {1..5}; do  echo \"Iteration $i\"doneThis script will print the message “Iteration X” five times, with X being the current iteration number.FunctionsFunctions are reusable blocks of code that can be called with a specific set of arguments. To create a function, use the function keyword followed by the function name and a pair of parentheses:#!/bin/bashfunction greet() {  echo \"Hello, $1!\"}greet \"World\"In this example, the greet function takes one argument ($1) and prints a greeting message using that argument.ResourcesTo further improve your shell scripting skills, here are some resources:  Shell Scripting Tutorial: A comprehensive tutorial covering all aspects of shell scripting.  Bash Guide for Beginners: A beginner-friendly guide to Bash scripting.  Advanced Bash-Scripting Guide: A more advanced guide for those looking to deepen their understanding of Bash scripting.In conclusion, shell scripting is a powerful tool that allows you to automate tasks, perform complex operations, and create custom commands. By understanding the basics of shell scripting, including variables, control structures, loops, and functions, you will be well on your way to becoming a shell scripting expert.",
      "views": 0,
      "reading_minutes": 3,
      "tags": [
        
          
          {
            "name": "Shell Scripting",
            "slug": "shell-scripting",
            "url": "/tags/shell-scripting/#posts"
          },
        
          
          {
            "name": "Bash",
            "slug": "bash",
            "url": "/tags/bash/#posts"
          },
        
          
          {
            "name": "Shell",
            "slug": "shell",
            "url": "/tags/shell/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
    
    
    
    {
      "kind": "post",
      "title": "Demystifying the Shell: A Beginner's Guide",
      "url": "/2022/12/28/demystifying-the-shell-a-beginners-guide/",
      "date_display": "December 28, 2022",
      "date_iso": "2022-12-28",
      "excerpt": "Discover the power of the shell, a command-line interface that allows you to interact with your computer's operating system more directly and efficiently. Learn the basics of using the shell, navigating within it, and connecting programs using simple examples. Enhance your skills with miscellaneous tips and resources, including tab completion, command history, keyboard shortcuts, and helpful online tools. Embrace the command line and unlock the full potential of the shell!",
      "content": "The shell is an essential tool for any developer, system administrator, or even a casual computer user. It allows you to interact with your computer’s operating system using text-based commands, giving you more control and flexibility than graphical user interfaces (GUIs). In this blog post, we will explore the basics of using the shell, navigating within it, connecting programs, and some miscellaneous tips and tricks. We will also provide some resources for further learning.What is the Shell?The shell is a command-line interface (CLI) that allows you to interact with your computer’s operating system by typing commands. It is a program that takes your commands, interprets them, and then sends them to the operating system to be executed. There are various types of shells available, such as Bash (Bourne Again SHell), Zsh (Z Shell), and Fish (Friendly Interactive SHell), each with its own unique features and capabilities.Using the ShellTo start using the shell, you need to open a terminal emulator. On Linux and macOS, you can usually find the terminal application in your Applications or Utilities folder. On Windows, you can use the Command Prompt, PowerShell, or install a third-party terminal emulator like Git Bash or Windows Subsystem for Linux (WSL).Once you have opened the terminal, you can start typing commands. For example, to list the files and directories in your current directory, you can type the following command:lsThis command will display the contents of your current directory. You can also use flags (options) to modify the behavior of a command. For example, to display the contents of a directory in a more detailed format, you can use the -l flag:ls -lNavigating in the ShellNavigating within the shell is quite simple. You can use the cd (change directory) command to move between directories. For example, to move to the /home/user/Documents directory, you can type:cd /home/user/DocumentsTo move up one directory level, you can use the .. notation:cd ..You can also use the pwd (print working directory) command to display the current directory you are in:pwdConnecting ProgramsIn the shell, you can connect multiple programs together using pipes (|). This allows you to pass the output of one program as input to another program. For example, you can use the grep command to search for a specific word in a file, and then use the wc (word count) command to count the number of lines containing that word:grep 'search_word' file.txt | wc -lThis command will first search for the word ‘search_word’ in the file ‘file.txt’ and then count the number of lines containing that word.MiscellaneousHere are some miscellaneous tips and tricks for using the shell:  Use the history command to view your command history.  Use the clear command to clear the terminal screen.  Use the man command followed by a command name to view the manual page for that command (e.g., man ls).  Use the TAB key to auto-complete file and directory names.  Use the CTRL + C keyboard shortcut to cancel a running command.ResourcesTo further improve your shell skills, here are some resources:  LinuxCommand.org: This website provides a wealth of information on using the shell, including tutorials, examples, and reference material.  ExplainShell: This is an online tool that allows you to enter a shell command and receive a detailed explanation of what each part of the command does.  Bash Cheat Sheet: This is a handy reference guide that provides a quick overview of common Bash commands and syntax.  ShellCheck: This is an online tool that can help you find and fix issues in your shell scripts. It provides suggestions and explanations for common mistakes and best practices.In conclusion, mastering the shell is an essential skill for any computer user. It allows you to interact with your computer’s operating system more efficiently and effectively than using graphical user interfaces. By understanding the basics of using the shell, navigating within it, connecting programs, and learning some miscellaneous tips and tricks, you will be well on your way to becoming a shell expert.",
      "views": 0,
      "reading_minutes": 3,
      "tags": [
        
          
          {
            "name": "Bash",
            "slug": "bash",
            "url": "/tags/bash/#posts"
          },
        
          
          {
            "name": "Shell",
            "slug": "shell",
            "url": "/tags/shell/#posts"
          }
        
      ]
    },
  
    
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
        
        
    
    
    
    {
      "kind": "post",
      "title": "Version Control (Git)",
      "url": "/2022/12/21/version-control/",
      "date_display": "December 21, 2022",
      "date_iso": "2022-12-21",
      "excerpt": "How to use version control _properly_, and take advantage of it to save you from disaster, collaborate with others, and quickly find and isolate problematic changes. No more `rm -rf; git clone`. No more merge conflicts (well, fewer of them at least). No more huge blocks of commented-out code. No more fretting over how to find what broke your code. No more \"oh no, did we delete the working code?!\".",
      "content": "Version control systems (VCSs) are tools used to track changes to source code (or other collections of files and folders). As the name implies, these tools help maintain a history of changes; furthermore, they facilitate collaboration. VCSs track changes to a folder and its contents in a series of snapshots, whereeach snapshot encapsulates the entire state of files/folders within a top-level directory. VCSs also maintain metadata like who created each snapshot, messages associated with each snapshot, and so on.Why is version control useful? Even when you’re working by yourself, it can let you look at old snapshots of a project, keep a log of why certain changes weremade, work on parallel branches of development, and much more. When working with others, it’s an invaluable tool for seeing what other people have changed, as well as resolving conflicts in concurrent development.Modern VCSs also let you easily (and often automatically) answer questions like:  Who wrote this module?  When was this particular line of this particular file edited? By whom? Why was it edited?  Over the last 1000 revisions, when/why did a particular unit test stop working?While other VCSs exist, Git is the de facto standard for version control.This XKCD comic captures Git’s reputation:Because Git’s interface is a leaky abstraction, learning Git top-down (starting with its interface / command-line interface) can lead to a lot of confusion. It’s possible to memorize a handful of commands and think of them as magic incantations, and follow the approach in the comic above whenever anything goes wrong.While Git admittedly has an ugly interface, its underlying design and ideas are beautiful. While an ugly interface has to be memorized, a beautiful design can be understood. For this reason, we give a bottom-up explanation of Git, starting with its data model and later covering the command-line interface. Once the data model is understood, the commands can be better understood in terms of how they manipulate the underlying data model.Git’s data modelThere are many ad-hoc approaches you could take to version control. Git has a well-thought-out model that enables all the nice features of version control, like maintaining history, supporting branches, and enabling collaboration.SnapshotsGit models the history of a collection of files and folders within some top-level directory as a series of snapshots. In Git terminology, a file is called a “blob”, and it’s just a bunch of bytes. A directory is called a “tree”, and it maps names to blobs or trees (so directories can contain other directories). A snapshot is the top-level tree that is being tracked. For example, we might have a tree as follows:&lt;root&gt; (tree)|+- foo (tree)|  ||  + bar.txt (blob, contents = \"hello world\")|+- baz.txt (blob, contents = \"git is wonderful\")The top-level tree contains two elements, a tree “foo” (that itself contains one element, a blob “bar.txt”), and a blob “baz.txt”.Modeling history: relating snapshotsHow should a version control system relate snapshots? One simple model would be to have a linear history. A history would be a list of snapshots in time-order. For many reasons, Git doesn’t use a simple model like this.In Git, a history is a directed acyclic graph (DAG) of snapshots. That may sound like a fancy math word, but don’t be intimidated. All this means is that each snapshot in Git refers to a set of “parents”, the snapshots that preceded it. It’s a set of parents rather than a single parent (as would be the case in a linear history) because a snapshot might descend from multiple parents, for example, due to combining (merging) two parallel branches of development.Git calls these snapshots “commit”s. Visualizing a commit history might look something like this:o &lt;-- o &lt;-- o &lt;-- o            ^             \\              --- o &lt;-- oIn the ASCII art above, the os correspond to individual commits (snapshots). The arrows point to the parent of each commit (it’s a “comes before” relation, not “comes after”). After the third commit, the history branches into two separate branches. This might correspond to, for example, two separate features being developed in parallel, independently from each other. In the future, these branches may be merged to create a new snapshot that incorporates both of the features, producing a new history that looks like this, with the newly created merge commit shown in bold:o &lt;-- o &lt;-- o &lt;-- o &lt;---- o            ^            /             \\          v              --- o &lt;-- oCommits in Git are immutable. This doesn’t mean that mistakes can’t be corrected, however; it’s just that “edits” to the commit history are actually creating entirely new commits, and references (see below) are updated to point to the new ones.Data model, as pseudocodeIt may be instructive to see Git’s data model written down in pseudocode:// a file is a bunch of bytestype blob = array&lt;byte&gt;// a directory contains named files and directoriestype tree = map&lt;string, tree | blob&gt;// a commit has parents, metadata, and the top-level treetype commit = struct {    parents: array&lt;commit&gt;    author: string    message: string    snapshot: tree}It’s a clean, simple model of history.Objects and content-addressingAn “object” is a blob, tree, or commit:type object = blob | tree | commitIn Git data store, all objects are content-addressed by their SHA-1hash.objects = map&lt;string, object&gt;def store(object):    id = sha1(object)    objects[id] = objectdef load(id):    return objects[id]Blobs, trees, and commits are unified in this way: they are all objects. When they reference other objects, they don’t actually contain them in their on-disk representation, but have a reference to them by their hash.For example, the tree for the example directory structure above(visualized using git cat-file -p 698281bc680d1995c5f4caaf3359721a5a58d48d),looks like this:100644 blob 4448adbf7ecd394f42ae135bbeed9676e894af85    baz.txt040000 tree c68d233a33c5c06e0340e4c224f0afca87c8ce87    fooThe tree itself contains pointers to its contents, baz.txt (a blob) and foo(a tree). If we look at the contents addressed by the hash corresponding tobaz.txt with git cat-file -p 4448adbf7ecd394f42ae135bbeed9676e894af85, we getthe following:git is wonderfulReferencesNow, all snapshots can be identified by their SHA-1 hashes. That’s inconvenient, because humans aren’t good at remembering strings of 40 hexadecimal characters.Git’s solution to this problem is human-readable names for SHA-1 hashes, called “references”. References are pointers to commits. Unlike objects, which areimmutable, references are mutable (can be updated to point to a new commit). For example, the master reference usually points to the latest commit in themain branch of development.references = map&lt;string, string&gt;def update_reference(name, id):    references[name] = iddef read_reference(name):    return references[name]def load_reference(name_or_id):    if name_or_id in references:        return load(references[name_or_id])    else:        return load(name_or_id)With this, Git can use human-readable names like “master” to refer to a particular snapshot in the history, instead of a long hexadecimal string.One detail is that we often want a notion of “where we currently are” in the history, so that when we take a new snapshot, we know what it is relative to (how we set the parents field of the commit). In Git, that “where we currently are” is a special reference called “HEAD”.RepositoriesFinally, we can define what (roughly) is a Git repository: it is the data objects and references.On disk, all Git stores are objects and references: that’s all there is to Git’s data model. All git commands map to some manipulation of the commit DAG byadding objects and adding/updating references.Whenever you’re typing in any command, think about what manipulation the command is making to the underlying graph data structure. Conversely, if you’re trying to make a particular kind of change to the commit DAG, e.g. “discard uncommitted changes and make the ‘master’ ref point to commit 5d83f9e”, there’s probably a command to do it (e.g. in this case, git checkout master; git reset --hard 5d83f9e).Staging areaThis is another concept that’s orthogonal to the data model, but it’s a part of the interface to create commits.One way you might imagine implementing snapshotting as described above is to have a “create snapshot” command that creates a new snapshot based on the current state of the working directory. Some version control tools work like this, but not Git. We want clean snapshots, and it might not always be ideal to make a snapshot from the current state. For example, imagine a scenario where you’ve implemented two separate features, and you want to create two separate commits, where the first introduces the first feature, and the next introduces the second feature. Or imagine a scenario where you have debugging print statements added all over your code, along with a bugfix; you want to commit the bugfix while discarding all the print statements.Git accommodates such scenarios by allowing you to specify which modifications should be included in the next snapshot through a mechanism called the “staging area”.Git command-line interfaceTo avoid duplicating information, we’re not going to explain the commands below in detail. See the highly recommended Pro Git for more information.BasicsThe git init command initializes a new Git repository, with repository metadata being stored in the .git directory:$ mkdir myproject$ cd myproject$ git initInitialized empty Git repository in .git$ git statusOn branch masterNo commits yetnothing to commit (create/copy files and use \"git add\" to track)How do we interpret this output? “No commits yet” basically means our versionhistory is empty. Let’s fix that.$ echo \"hello, git\" &gt; hello.txt$ git add hello.txt$ git statusOn branch masterNo commits yetChanges to be committed:  (use \"git rm --cached &lt;file&gt;...\" to unstage)        new file:   hello.txt$ git commit -m 'Initial commit'[master (root-commit) 4515d17] Initial commit 1 file changed, 1 insertion(+) create mode 100644 hello.txtWith this, we’ve git added a file to the staging area, and then git commited that change, adding a simple commit message “Initial commit”. If we didn’t specify a -m option, Git would open our text editor to allow us type a commit message.Now that we have a non-empty version history, we can visualize the history. Visualizing the history as a DAG can be especially helpful in understanding the current status of the repo and connecting it with your understanding of the Git data model.The git log command visualizes history. By default, it shows a flattened version, which hides the graph structure. If you use a command like git log --all --graph --decorate, it will show you the full version history of the repository, visualized in graph form.$ git log --all --graph --decorate* commit 4515d17a167bdef0a91ee7d50d75b12c9c2652aa (HEAD -&gt; master)  Author: Subramanya N &lt;subramanyanagabhushan@gmail.com&gt;  Date: Tue Dec 21 22:18:36 2020 -0500      Initial commitThis doesn’t look all that graph-like, because it only contains a single node. Let’s make some more changes, author a new commit, and visualize the history once more.$ echo \"another line\" &gt;&gt; hello.txt$ git statusOn branch masterChanges not staged for commit:  (use \"git add &lt;file&gt;...\" to update what will be committed)  (use \"git checkout -- &lt;file&gt;...\" to discard changes in working directory)        modified:   hello.txtno changes added to commit (use \"git add\" and/or \"git commit -a\")$ git add hello.txt$ git statusOn branch masterChanges to be committed:  (use \"git reset HEAD &lt;file&gt;...\" to unstage)        modified:   hello.txt$ git commit -m 'Add a line'[master 35f60a8] Add a line 1 file changed, 1 insertion(+)Now, if we visualize the history again, we’ll see some of the graph structure:* commit 35f60a825be0106036dd2fbc7657598eb7b04c67 (HEAD -&gt; master)| Author: Subramanya N &lt;subramanyanagabhushan@gmail.com&gt;| Date:   Tue Dec 21 22:26:20 2020 -0500|     Add a line* commit 4515d17a167bdef0a91ee7d50d75b12c9c2652aa  Author: Subramanya N &lt;subramanyanagabhushan@gmail.com&gt;  Date: Tue Dec 21 22:18:36 2020 -0500      Initial commitAlso, note that it shows the current HEAD, along with the current branch(master).We can look at old versions using the git checkout command.$ git checkout 4515d17  # previous commit hash; yours will be differentNote: checking out '4515d17'.You are in 'detached HEAD' state. You can look around, make experimentalchanges and commit them, and you can discard any commits you make in thisstate without impacting any branches by performing another checkout.If you want to create a new branch to retain commits you create, you maydo so (now or later) by using -b with the checkout command again. Example:  git checkout -b &lt;new-branch-name&gt;HEAD is now at 4515d17 Initial commit$ cat hello.txthello, git$ git checkout masterPrevious HEAD position was 4515d17 Initial commitSwitched to branch 'master'$ cat hello.txthello, gitanother lineGit can show you how files have evolved (differences, or diffs) using the gitdiff command:$ git diff 4515d17 hello.txtdiff --git c/hello.txt w/hello.txtindex 94bab17..f0013b2 100644--- c/hello.txt+++ w/hello.txt@@ -1 +1,2 @@ hello, git +another line  git help &lt;command&gt;: get help for a git command  git init: creates a new git repo, with data stored in the .git directory  git status: tells you what’s going on  git add &lt;filename&gt;: adds files to staging area  git commit: creates a new commit          Write good commit messages!      Even more reasons to write good commit messages!        git log: shows a flattened log of history  git log --all --graph --decorate: visualizes history as a DAG  git diff &lt;filename&gt;: show changes you made relative to the staging area  git diff &lt;revision&gt; &lt;filename&gt;: shows differences in a file between snapshots  git checkout &lt;revision&gt;: updates HEAD and current branchBranching and mergingBranching allows you to “fork” version history. It can be helpful for working on independent features or bug fixes in parallel. The git branch command can be used to create new branches; git checkout -b &lt;branch name&gt; creates and branch and checks it out.Merging is the opposite of branching: it allows you to combine forked version histories, e.g. merging a feature branch back into master. The git merge command is used for merging.  git branch: shows branches  git branch &lt;name&gt;: creates a branch  git checkout -b &lt;name&gt;: creates a branch and switches to it          same as git branch &lt;name&gt;; git checkout &lt;name&gt;        git merge &lt;revision&gt;: merges into current branch  git mergetool: use a fancy tool to help resolve merge conflicts  git rebase: rebase set of patches onto a new baseRemotes  git remote: list remotes  git remote add &lt;name&gt; &lt;url&gt;: add a remote  git push &lt;remote&gt; &lt;local branch&gt;:&lt;remote branch&gt;: send objects to remote, and update remote reference  git branch --set-upstream-to=&lt;remote&gt;/&lt;remote branch&gt;: set up correspondence between local and remote branch  git fetch: retrieve objects/references from a remote  git pull: same as git fetch; git merge  git clone: download repository from remoteUndo  git commit --amend: edit a commit’s contents/message  git reset HEAD &lt;file&gt;: unstage a file  git checkout -- &lt;file&gt;: discard changesAdvanced Git  git config: Git is highly customizable  git clone --depth=1: shallow clone, without entire version history  git add -p: interactive staging  git rebase -i: interactive rebasing  git blame: show who last edited which line  git stash: temporarily remove modifications to working directory  git bisect: binary search history (e.g. for regressions)  .gitignore: specify intentionally untracked files to ignoreMiscellaneous  GUIs: there are many GUI clientsout there for Git. We personally don’t use them and use the command-lineinterface instead.  Shell integration: it’s super handy to have a Git status as part of yourshell prompt (zsh,bash). Often included inframeworks like Oh My Zsh.  Editor integration: similarly to the above, handy integrations with manyfeatures. fugitive.vim is the standardone for Vim.  Workflows: we taught you the data model, plus some basic commands; wedidn’t tell you what practices to follow when working on big projects (andthere are manydifferentapproaches).  GitHub: Git is not GitHub. GitHub has a specific way of contributing codeto other projects, called pullrequests.  Other Git providers: GitHub is not special: there are many Git repositoryhosts, like GitLab andBitBucket.Resources  Pro Git is highly recommended reading.Going through Chapters 1–5 should teach you most of what you need to use Gitproficiently, now that you understand the data model. The later chapters havesome interesting, advanced material.  Oh Shit, Git!?! is a short guide on how to recoverfrom some common Git mistakes.  Git for ComputerScientists is ashort explanation of Git’s data model, with less pseudocode and more fancydiagrams than these lecture notes.  Git from the Bottom Upis a detailed explanation of Git’s implementation details beyond just the datamodel, for the curious.  How to explain git in simplewords  Learn Git Branching is a browser-basedgame that teaches you Git.",
      "views": 151,
      "reading_minutes": 15,
      "tags": [
        
          
          {
            "name": "Git",
            "slug": "git",
            "url": "/tags/git/#posts"
          },
        
          
          {
            "name": "Version Control",
            "slug": "version-control",
            "url": "/tags/version-control/#posts"
          }
        
      ]
    },
  
  
    {
      "kind": "book",
      "title": "Navigating UMass Amherst: A Handbook for International Students",
      "url": "/books/navigating-umass-amherst-a-handbook-for-international-students/",
      "date_display": "May 8, 2023",
      "date_iso": "2023-05-08",
      "excerpt": "This handbook, penned by an international student at UMass Amherst, shares insights and advice based on personal experiences navigating academic and cultural transitions. The author has undertaken a variety of courses and collaborated with prominent entities, contributing to the vibrant academic community. Aimed at making the journey less daunting for future students, the handbook touches on academic expectations, cultural nuances, and logistical issues, while providing resource links for deeper exploration. It's a tool for sharing collective wisdom, rather than a definitive guide or shortcut to success.",
      "content": "",
      "views": null,
      "reading_minutes": null,
      "tags": [
        
          
          {
            "name": "Handbook",
            "slug": "handbook",
            "url": "/tags/handbook/#books"
          },
        
          
          {
            "name": "UMass Amherst",
            "slug": "umass-amherst",
            "url": "/tags/umass-amherst/#books"
          },
        
          
          {
            "name": "International Students",
            "slug": "international-students",
            "url": "/tags/international-students/#books"
          }
        
      ]
    }
  
]
