Subramanya N

The Three-Platform Problem in Enterprise AI

Sun, 07 Dec 2025 00:00:00 +0000

Enterprise AI has a platform problem. The tools to build AI-powered applications exist, but they’re scattered across three disconnected ecosystems—each solving part of the puzzle, none providing a complete solution.

This isn’t a “too many choices” problem. It’s an architectural one. Gartner tracks these ecosystems in separate Magic Quadrants because they serve fundamentally different users with different needs. But building production AI applications requires capabilities from all three.

Three Ecosystems, Zero Integration

1. Low-Code Platforms (The Citizen Developer)

Platforms like Microsoft Power Apps, Mendix, and OutSystems let business users build applications quickly without writing code. They excel at UI, rapid prototyping, and workflow automation.

Gartner Magic Quadrant for Enterprise Low-Code Application Platforms

What they do well: Speed to prototype, accessibility for non-developers, business process automation.

What they lack: Infrastructure control, enterprise governance at scale, and the flexibility professional developers need.

2. DevOps Platforms (The Professional Developer)

GitLab, Microsoft Azure DevOps, and Atlassian provide CI/CD pipelines, source control, and deployment infrastructure. They answer the “how do we ship and operate this reliably?” question.

Gartner Magic Quadrant for DevOps Platforms

What they do well: Security, governance, testing, deployment automation, operational excellence.

What they lack: They don’t help you build faster—they help you ship what you’ve already built.

3. AI/ML Platforms (The AI Specialist)

Cloud providers (AWS, GCP, Azure) and specialized vendors offer models, MLOps tooling, and inference infrastructure. They provide the intelligence layer.

Gartner Magic Quadrant for AI Code Assistants

What they do well: Model access, training infrastructure, inference at scale.

What they lack: An opinion on how you actually build and deploy applications around those models.

The Cost of Fragmentation

When your AI strategy requires stitching together leaders from three separate ecosystems, you pay an integration tax:

Workflow disconnects. A business user prototypes an AI workflow in a low-code tool. A developer rebuilds it from scratch to meet security requirements. The prototype and production system share nothing but a spec document.

Observability gaps. Tracing a user request through a low-code UI, into a DevOps pipeline, through an AI model call, and back is nearly impossible without custom instrumentation.

Governance drift. Security policies enforced in your DevOps platform don’t automatically apply to your low-code environment. Compliance becomes a manual audit.

Your most capable engineers end up writing glue code instead of building products.

A Different Architecture: API-First Unification

The solution isn’t better integrations—it’s platforms built on a different architecture.

Replit offers a useful case study. They’ve grown from $10M to $100M ARR in under six months by building a platform where:

The same infrastructure serves both citizen developers and professionals. A business user building through natural language (“create a customer feedback dashboard”) and a developer writing code are using the same underlying APIs, the same deployment system, the same security model.
AI is native, not bolted on. Their Agent can build, test, and deploy complete applications autonomously—but it’s using the same environment a professional developer would use. No “export to production” step.
Governance applies universally. Database access, API key management, and deployment policies are platform-level concerns. They apply whether you’re prompting an AI agent or writing TypeScript.

This is the “headless-first” pattern that companies like Stripe and Twilio proved out: build the API, make it excellent, then layer interfaces on top. The UI for non-developers and the API for developers are just different clients to the same system.

What This Means for Platform Strategy

If you’re evaluating AI platforms, the question isn’t “which low-code tool, which DevOps platform, and which AI vendor?”

The better question: Does this platform unify these concerns, or will we be writing integration code for the next three years?

Look for:

API-first architecture. Can professional developers access everything through APIs? Is the UI built on those same APIs?
Built-in deployment and operations. Does prototyping in the platform give you production-ready infrastructure, or does it give you an export button and a prayer?
Platform-level governance. Are security, compliance, and cost controls configured once and inherited everywhere, or are they per-tool?

The platforms winning in this space aren’t the ones with the longest feature lists. They’re the ones that recognized the three-ecosystem problem and architected around it from day one.

The Platform Convergence: Why the Future of AI SaaS is Headless-First

Tue, 02 Dec 2025 00:00:00 +0000

The AI agent market is experiencing its own big bang—but this rapid expansion is creating fundamental fragmentation. Enterprises deploying agents at scale are caught between two incomplete solutions: Agent Builders and AI Gateways.

Agent Builders democratize creation through no-code interfaces. AI Gateways provide enterprise governance over costs, security, and compliance. Both are critical, but in their current separate forms, they force a false choice: speed or control? The reality is, you need both.

We’ve seen this movie before. The most successful developer platforms—Stripe, Twilio, Shopify—aren’t just slick UIs or robust infrastructure. They are headless-first platforms that masterfully combine both.

The Headless-First Model

Stripe didn’t win payments by offering a payment form. Twilio didn’t win communications by providing a dashboard. They won by providing a powerful, programmable foundation with APIs as the primary interface. Their UIs are built on the same public APIs their customers use. Everything is composable, programmable, and extensible.

Principle	Benefit
API-First Design	Platform’s own UI uses public APIs, ensuring completeness
Progressive Complexity	Start with no-code UI, graduate to API without migration
Composability	Every capability is a building block for higher-level abstractions
Extensibility	Third parties build on the platform, creating ecosystem effects

This is the blueprint for AI platforms: not just a UI for building agents, nor just a gateway for traffic—but a comprehensive, programmable platform for building, running, and governing AI at every layer.

The Two Incomplete Categories

Agent Builders (Microsoft Copilot Studio, Google Agent Builder) empower non-technical users to create agents in minutes. The problem arises at scale: Who manages API keys? Who tracks costs? Who ensures compliance? This democratization often creates ungoverned “shadow IT”—business units spinning up agents independently, each with its own credentials and error handling. Platform teams discover the proliferation only when something breaks.

AI Gateways (Kong, Apigee) solve the governance problem with centralized security, cost monitoring, and compliance. But a gateway is just plumbing—it doesn’t accelerate creation. Business users wait in IT queues while engineers build what they need. Innovation slows to a crawl.

Integrating both categories creates its own integration tax: two authentication systems, two deployment processes, broken observability across disconnected logs, and policy enforcement gaps where builder retry logic conflicts with gateway rate limits.

The Platform Convergence

The solution is a unified, headless-first platform with four integrated layers:

Layer 1: UI Layer — Intuitive no-code agent builder for business users, built on top of the platform’s own APIs. Natural language definition, visual workflow design, one-click deployment with inherited governance.

Layer 2: Runtime Layer — Enterprise-grade gateway that every agent runs through automatically. Centralized auth (OAuth, OIDC, SAML), real-time policy enforcement, distributed tracing, cost tracking, anomaly detection.

Layer 3: Platform Layer — Comprehensive APIs and SDKs for developers. REST/GraphQL endpoints, language-specific SDKs, agent lifecycle management, webhook system for event-driven architectures.

Layer 4: Ecosystem Layer — Marketplace for discovering and sharing agents, tools, and integrations. Internal registry, reusable components, version control, usage analytics.

Speed AND Control

The difference between fragmented and unified approaches:

Capability	Fragmented Tools	Unified Platform
Agent Creation	Separate builder	Integrated no-code + API/SDK
Infrastructure	Separate gateway	Built-in gateway with inherited policies
Observability	Disconnected logs	End-to-end unified tracing
Policy Management	Manual coordination	Single policy engine
Developer Experience	High friction	Single, cohesive API surface
Audit & Compliance	Cross-system correlation	Native audit trails

With a unified platform: business user creates agent in UI → platform applies policies automatically → agent deploys with full observability → platform team monitors centrally → developer extends via API without migration.

What This Unlocks

Self-Service AI: HR builds a resume screening agent in 20 minutes. It inherits security policies automatically. Cost allocates to HR’s budget. Compliance trail generates without extra work.

AI-Powered Products: Engineers embed agent capabilities into customer-facing apps using platform APIs. Multi-tenant isolation, usage-based billing, and governance come built-in.

Internal Marketplace: Marketing’s “competitive intelligence” agent gets discovered by Sales. One-click deployment. Usage metrics show ROI across the organization.

Conclusion

The debate over agent builder vs. AI gateway is a red herring—a false choice leading to fragmented, expensive solutions. The real question: point solution or true platform?

In payments, Stripe won by unifying developer APIs with merchant tools. In communications, Twilio won by combining carrier control with developer speed. The AI platform market is at the same inflection point.

The future isn’t about stitching tools together; it’s about building on a unified, programmable foundation. The organizations that invest in platform-first infrastructure—rather than cobbling together point solutions—will move faster, govern more effectively, and build more sophisticated agentic systems.

The convergence is coming. The question is whether you’ll be ahead of it or behind it.

MCP Enterprise Readiness: How the 2025-11-25 Spec Closes the Production Gap

Mon, 01 Dec 2025 00:00:00 +0000

Just over a week ago, the Model Context Protocol celebrated its first anniversary with the release of the 2025-11-25 specification [1]. The announcement was rightly triumphant—MCP has evolved from an experimental open-source project to a foundational standard backed by GitHub, OpenAI, Microsoft, and Block, with thousands of active servers in production [1].

But beneath the celebration lies a more interesting story: this spec release is not just an evolution; it’s a strategic pivot toward enterprise readiness. For the past year, MCP has succeeded as a developer tool—a convenient way to connect AI models to data and capabilities during experimentation. The 2025-11-25 spec is different. It introduces features explicitly designed to solve the operational, security, and governance challenges that prevent organizations from deploying agent-tool ecosystems at enterprise scale.

This article examines three key features from the new spec and analyzes how they close what I call the “production gap”—the distance between experimental agent prototypes and enterprise-grade agentic infrastructure.

The Production Gap: Why Experimental Agents Don’t Scale

Before diving into the technical features, we need to understand the problem they’re solving. Organizations have been experimenting with MCP-powered agents for months, often with impressive results in controlled environments. Yet most of these projects remain trapped in pilot purgatory, unable to progress to production deployments. The barriers are not technical whimsy; they are fundamental operational requirements:

Requirement	Why It Matters	What’s Been Missing
Asynchronous Operations	Real-world tasks like report generation, data analysis, and workflow automation can take minutes or hours, not milliseconds.	MCP connections are synchronous. Long-running tasks force clients to hold connections open or build custom polling systems.
Enterprise Authentication	Organizations need centralized control over which users, agents, and services can access sensitive tools and data.	The original OAuth flow assumed a consumer app model. It lacked support for machine-to-machine auth and didn’t integrate with enterprise Identity Providers.
Extensibility	Different industries and use cases require custom capabilities without fragmenting the core protocol.	There was no formal mechanism to standardize extensions, leading to proprietary, incompatible implementations.

These aren’t edge cases; they are the table stakes for production systems. The 2025-11-25 spec directly addresses each one.

Feature 1: Asynchronous Tasks — Making Long-Running Workflows Production-Ready

Perhaps the most transformative addition is the new Tasks primitive [2]. While still marked as experimental, it fundamentally changes how agents interact with MCP servers for long-running operations.

The Problem: Synchronous Request-Response Doesn’t Match Real Work

Traditional MCP follows the classic RPC pattern: the client sends a request, the server processes it, and the server returns a response—all within a single connection. This works beautifully for quick operations like reading a database row or checking a weather API. But it breaks down for realistic enterprise workflows:

Data Analytics Agent: “Generate a quarterly financial report by analyzing three years of transaction data” → 15 minutes of processing.
Compliance Agent: “Scan all customer contracts for non-standard clauses” → 2 hours across 10,000 documents.
DevOps Agent: “Deploy this service to production and run integration tests” → 30 minutes with orchestration dependencies.

Organizations have been forced to build custom workarounds: job queues, polling systems, callback webhooks—all non-standard, all increasing complexity and reducing interoperability.

The Solution: A Unified Async Model

The new Tasks feature introduces a standard “call-now, fetch-later” pattern:

The client sends a request to an MCP server with a task hint.
The server immediately acknowledges the request and returns a unique taskId.
The client periodically checks the task status (working, completed, failed) using standard Task operations.
When complete, the client retrieves the final result using the taskId.

This is more than syntactic sugar. It provides a uniform abstraction for asynchronous work across the entire MCP ecosystem. An agent framework doesn’t need to know whether it’s calling a data pipeline, a deployment system, or a document processor—the async pattern is the same.

Enterprise Impact: Agents That Don’t Block

In production environments, this changes everything. An AI assistant orchestrating a complex workflow can:

Kick off multiple long-running tasks in parallel (e.g., “analyze sales data,” “generate customer insights,” “create visualizations”).
Continue planning and reasoning while tasks are in progress.
Provide real-time status updates to users without blocking.
Handle failures gracefully with retries and fallback strategies.

This is how real autonomous agents operate. The Tasks primitive makes it possible within a standard, interoperable protocol.

Feature 2: Enterprise-Grade OAuth with CIMD and Extensions

The original MCP spec included OAuth 2.0 support, but it was modeled on consumer app patterns (think “Log in with GitHub”). That model doesn’t work for enterprise use cases, where organizations need centralized identity management, audit trails, and policy-based access control. The 2025-11-25 spec introduces two critical updates to close this gap.

CIMD: Decentralized Trust Without Dynamic Client Registration

The first change is replacing Dynamic Client Registration (DCR) with Client ID Metadata Documents (CIMD) [3]. In the old model, every MCP client had to register with every authorization server it wanted to use—a scalability nightmare in federated enterprise environments.

With CIMD, the client_id is now a URL that the client controls (e.g., https://agents.mycompany.com/sales-assistant). When an authorization server needs information about this client, it fetches a JSON metadata document from that URL. This document includes:

Client name and description
Valid redirect URIs
Supported grant types
Public keys for token verification

This approach creates a decentralized trust model anchored in DNS and HTTPS. The authorization server doesn’t need a pre-existing relationship with the client; it trusts the metadata published at the URL. For large organizations with dozens of agent applications and multiple MCP providers, this dramatically reduces operational overhead.

Extension 1: Machine-to-Machine OAuth (SEP-1046)

The second critical addition is support for the OAuth 2.0 client_credentials flow via the M2M OAuth extension. This enables machine-to-machine authentication—allowing agents and services to authenticate directly with MCP servers without a human user in the loop.

Why does this matter? Consider these enterprise scenarios:

Scheduled Agent Jobs: A nightly data ingestion agent that pulls information from multiple MCP sources to update a data warehouse.
Service-to-Service Communication: A monitoring agent that periodically checks the health of deployed systems by querying infrastructure management tools.
Headless Automation: An agent that processes incoming support tickets and takes automated actions based on predefined rules.

None of these involve an interactive user. They are autonomous services that need persistent, secure credentials to access tools on behalf of the organization. The client_credentials flow is the standard OAuth mechanism for exactly this use case, and its inclusion in MCP makes headless agentic systems viable.

Extension 2: Cross App Access (XAA) (SEP-990)

Perhaps the most strategically significant feature for large enterprises is the Cross App Access (XAA) extension. This solves a governance problem that has plagued the consumerization of enterprise AI: uncontrolled tool sprawl.

In the standard OAuth flow, a user grants consent directly to an AI application to access a tool. The enterprise Identity Provider (IdP) sees only that “Alice logged in to the AI app,” not that “Alice’s AI agent is now accessing the payroll system.” This creates a governance black hole.

XAA changes the authorization flow to insert the enterprise IdP as a central policy enforcement point. Now, when an agent attempts to access an MCP server:

The agent requests authorization from the enterprise IdP.
The IdP evaluates organizational policies: Is this agent approved for production use? Does Alice have permission to delegate payroll access to this agent? Is this access compliant with our data governance policies?
Only if all policies are satisfied does the IdP issue tokens to the agent.

This provides centralized visibility and control over the entire agent-tool ecosystem. Security teams can monitor which agents are accessing which tools, set organization-wide policies (e.g., “no agents can access PII without human review”), and audit all delegated access. It eliminates shadow AI and provides the compliance story that regulated industries demand.

Enterprise Impact: From Shadow AI to Governed Infrastructure

Together, these OAuth enhancements transform MCP from a developer convenience into a governed, auditable integration layer. Organizations can:

Enforce Identity Standards: All agents authenticate using the corporate IdP, with the same rigor as human employees.
Enable Zero-Trust Architecture: Every tool access is explicitly authorized based on policy, not implicit trust.
Provide Audit Trails: Every delegation, token issuance, and access event is logged for compliance and forensic analysis.
Scale Securely: Decentralized trust via CIMD means new agents and tools can be onboarded without central bottlenecks, while XAA ensures control is never lost.

Feature 3: Formal Extensions Framework — Enabling Innovation Without Fragmentation

The third major addition is the introduction of a formal Extensions framework [3]. This is a governance mechanism for the protocol itself, allowing the community to develop new capabilities without fragmenting the ecosystem.

The Innovation-Standardization Tension

Every successful protocol faces this dilemma: enable innovation fast enough to keep up with evolving use cases, but standardize carefully enough to maintain interoperability. Move too slowly, and the community builds proprietary extensions that fragment the ecosystem. Move too quickly, and the core protocol becomes bloated with niche features that most implementations don’t need.

MCP’s solution is a structured extension process. New capabilities are proposed as Specification Enhancement Proposals (SEPs), which undergo community review and can be adopted incrementally. Extensions are namespaced and clearly marked, so implementations can selectively support them without breaking compatibility.

Enterprise Impact: Customization Without Vendor Lock-In

For enterprises, this is critical. Different industries have unique requirements:

Healthcare: Extensions for HIPAA-compliant audit logging and patient consent management.
Financial Services: Extensions for transaction integrity, regulatory reporting, and fraud detection hooks.
Manufacturing: Extensions for real-time sensor data streaming and factory floor integrations.

The formal extensions framework allows organizations to develop these capabilities as standard, interoperable extensions rather than proprietary forks. This preserves the core value proposition of MCP—a universal protocol for agent-tool communication—while enabling the customization required for production use.

The Multiplier Effect: Sampling with Tools (SEP-1577)

One more feature deserves mention: Sampling with Tools [3]. This allows MCP servers themselves to act as agentic systems, capable of multi-step reasoning and tool use. A server can now request the client to invoke an LLM on its behalf, enabling server-side agents.

Why is this powerful? It enables compositional agent architectures. A high-level agent can delegate to specialized MCP servers, which themselves use agentic reasoning to fulfill complex requests. For example:

A “Financial Analysis Agent” delegates to an “ERP Data Server,” which uses its own reasoning to determine which tables to query, how to join data, and how to format results.
A “Compliance Agent” delegates to a “Legal Document Server,” which autonomously searches case law, extracts relevant clauses, and generates a summary.

This nested, hierarchical approach is how real autonomous systems will scale. By making it a standard protocol feature rather than a custom implementation, MCP provides the foundation for a rich ecosystem of specialized, composable agents.

Closing the Production Gap: A New Maturity Threshold

The 2025-11-25 MCP specification is not a radical redesign; it’s a targeted set of enhancements that directly address the barriers preventing enterprise adoption. By introducing:

Asynchronous Tasks for long-running workflows,
Enterprise OAuth with CIMD, M2M, and XAA for governed, auditable authentication,
Formal Extensions for standardized innovation,
Sampling with Tools for compositional agent architectures,

the spec closes the production gap—the distance between experimental prototypes and scalable, secure, enterprise-grade systems.

This is the moment when MCP transitions from a promising developer tool to a foundational piece of enterprise infrastructure. Organizations that have been waiting for “production readiness” signals now have them. The features are there. The governance mechanisms are there. The security model is there.

The next phase of agentic AI will be defined not by flashy demos, but by the quiet, reliable, at-scale operation of autonomous systems integrated deeply into enterprise workflows. The 2025-11-25 MCP spec is the technical foundation that makes this future possible.

For technology leaders evaluating whether to invest in MCP-based infrastructure, the calculus has changed. This is no longer an experimental protocol; it’s a production standard. The organizations that adopt it now, build their agent ecosystems on it, and contribute to its continued evolution will define the next decade of enterprise AI.

References:

[1] MCP Core Maintainers. (2025, November 25). One Year of MCP: November 2025 Spec Release. Model Context Protocol.

[2] Model Context Protocol. (2025, November 25). Tasks. Model Context Protocol Specification.

[3] Pakiti, Maria. (2025, November 26). MCP 2025-11-25 is here: async Tasks, better OAuth, extensions, and a smoother agentic future. WorkOS Blog.

[4] Subramanya, N. (2025, November 20). The Governance Stack: Operationalizing AI Agent Governance at Enterprise Scale. subramanya.ai.

[5] Subramanya, N. (2025, November 17). Why Private Registries are the Future of Enterprise Agentic Infrastructure. subramanya.ai.

The Governance Stack: Operationalizing AI Agent Governance at Enterprise Scale

Thu, 20 Nov 2025 00:00:00 +0000

Enterprise adoption of AI agents has reached a tipping point. According to McKinsey’s 2025 global survey, 88% of organizations now report regular use of AI agents in at least one business function, with 62% actively experimenting with agentic systems [1]. Yet this rapid adoption has created a critical disconnect: while organizations understand the importance of governance, they struggle with the implementation of it. The same survey reveals that 40% of technology executives believe their current governance programs are insufficient for the scale and complexity of their agentic workforce [1, 2].

The problem is not a lack of frameworks. Numerous organizations have published comprehensive governance principles—from Databricks’ AI Governance Framework to the EU AI Act’s regulatory requirements [2]. The problem is that governance has remained largely conceptual, living in policy documents and compliance checklists rather than in the operational infrastructure where agents actually execute.

This article presents the technical foundation required to operationalize governance at scale: the Governance Stack. This is the integrated set of platforms, protocols, and enforcement mechanisms that transform governance from aspiration into automated reality across the entire agentic workforce lifecycle.

The Governance Gap: From Principle to Practice

Traditional enterprise governance models were designed for static systems and predictable workflows. An application goes through a review process, gets deployed, and then operates within well-defined boundaries. Governance checkpoints are discrete events: code reviews, security scans, compliance audits.

Agentic AI shatters this model. Agents are dynamic, adaptive systems that make autonomous decisions, spawn sub-agents, and interact with constantly evolving toolsets. They don’t follow predetermined paths; they reason, plan, and execute based on context. As one industry analysis puts it, the governance question shifts from “did the code do what we programmed?” to “did the agent make the right decision given the circumstances?” [3].

This creates four fundamental challenges that traditional governance infrastructure cannot address:

Challenge	Traditional Governance	Agentic Reality
Decision-Making	Predetermined logic paths, testable and auditable	Context-dependent reasoning, emergent behavior
Delegation	Single service boundary, clear ownership	Recursive agent chains, distributed responsibility
Policy Enforcement	Deployment-time checks, periodic audits	Real-time enforcement at the moment of action
Auditability	Static code and logs	Dynamic decision traces across multiple agents and tools

The governance gap is the distance between what existing frameworks prescribe and what existing infrastructure can enforce. Closing this gap requires purpose-built technology.

The Five Layers of the Governance Stack

Drawing on the foundational pillars outlined in frameworks like Databricks’ AI Governance model [2], we can define a technical architecture—a Governance Stack—that provides the infrastructure necessary to operationalize these principles. This stack has five integrated layers, each addressing a specific aspect of agent lifecycle management.

Layer 1: Identity and Attestation Foundation

Before governance can be enforced, we must know who (or what) is making a request. This requires a robust identity layer specifically designed for autonomous agents, not just human users.

As discussed in previous work on OIDC-A (OpenID Connect for Agents), this layer provides [4]:

Verifiable Agent Identities: Every agent receives a cryptographically verifiable identity, issued by a trusted authority (the AI provider or enterprise identity system).
Delegation Chains: Clear, auditable records of which user or system authorized the agent, and what permissions were delegated.
Attestation Mechanisms: Proof that the agent is running the expected code, on approved infrastructure, with the intended configuration.

This identity foundation is the prerequisite for all subsequent layers. Without it, governance policies have no subject to act upon.

Layer 2: Agent and Tool Registries

Governance requires visibility. The second layer of the stack is a comprehensive registry system that provides a single source of truth for:

Agent Registry: A catalog of every agent deployed in the enterprise, including its capabilities, business owner, data access, and lifecycle status [5]. This is not just a static directory; it’s a dynamic system that tracks agent versions, configurations, and runtime behavior.
MCP/Tool Registry: A curated, approved set of tools and MCP servers that agents are authorized to access. This registry enforces pre-deployment security reviews, manages versions, tracks usage, and provides cost visibility [5].

As explored in our previous article on private registries, this layer transforms governance from a manual audit process into an automated, enforceable function of the infrastructure itself [5]. Agents that aren’t registered can’t deploy. Tools that haven’t been vetted can’t be accessed.

Layer 3: Policy Engine and Gateway

The third layer is where governance rules are codified and enforced in real-time. This includes:

Agent Firewalls and MCP Gateways: Acting as intermediaries between agents and their tools, these gateways inspect every request, enforce security policies, and block unauthorized actions before they occur [6]. They provide:

Prompt injection detection and filtering
Real-time policy evaluation (e.g., “can this agent access PII?”)
Dynamic rate limiting and cost controls
Anomaly detection for suspicious behavior patterns

Automated Policy Enforcement: Instead of relying on manual reviews, the policy engine automatically validates agents against organizational standards at every lifecycle stage. For example, an agent cannot be promoted to production without:

A completed data classification assessment
Approval from the designated business owner
A passed security scan
Documented human oversight procedures for high-stakes decisions

This layer is the operational heart of the governance stack. It is where abstract policies become concrete actions that prevent harm in real-time.

Layer 4: Observability and Monitoring Platform

Governance is not a one-time gate; it requires continuous oversight. The fourth layer provides real-time visibility into the behavior of the entire agentic workforce:

Performance Dashboards: Track accuracy, decision quality, latency, and resource consumption across all agents.
Drift Detection: Monitor agents for behavioral changes that might indicate model degradation, prompt injection, or unauthorized modifications.
Audit Trails: Capture every agent action, tool invocation, and delegation event with sufficient context to enable forensic analysis and compliance reporting [3].
Anomaly Alerting: Trigger automated responses when agents deviate from expected patterns, such as accessing unusual data sources or making an abnormal volume of API calls.

This layer transforms governance from reactive (responding to incidents after they occur) to proactive (detecting and preventing issues before they cause harm).

Layer 5: Human-in-the-Loop Orchestration

The final layer recognizes that not all decisions can or should be fully automated. For high-stakes scenarios, governance requires explicit human oversight:

Escalation Workflows: Agents can request human approval before executing sensitive actions, such as modifying production systems or processing large financial transactions.
Override Mechanisms: Authorized personnel can intervene to pause, redirect, or terminate agent operations when necessary.
Explainability Interfaces: When agents make consequential decisions, stakeholders need to understand the reasoning. This layer provides tools to inspect the decision chain, view the data that influenced the agent, and audit the tool usage.

This is not about replacing human judgment; it’s about augmenting it with the right information at the right time.

Operationalizing the Framework: Governance Across the Agent Lifecycle

The power of the Governance Stack becomes clear when we map it to the complete agent lifecycle. Governance is not a single checkpoint; it is a continuous process embedded at every stage.

Lifecycle Stage	Governance Stack in Action
Planning & Design	Identity layer establishes agent ownership. Policy engine validates business case against organizational risk appetite.
Data Preparation	Registries enforce data classification and lineage tracking. Policy engine blocks access to non-compliant datasets.
Development & Training	Observability platform tracks experiments and model performance. Registries version all agent configurations.
Testing & Validation	Agent firewall tests for adversarial inputs and prompt injections. Policy engine validates against security and ethical standards.
Deployment	Gateway enforces real-time authorization for all tool access. Observability platform begins continuous monitoring.
Operations	Monitoring platform detects drift and anomalies. Human-in-the-loop mechanisms escalate high-stakes decisions.
Retirement	Registries archive agent configurations. Identity layer revokes all permissions. Audit trails are retained for compliance.

This lifecycle-aware approach ensures that governance is not an afterthought, but an integrated function of how agents are built, deployed, and managed.

The ROI of Governance Infrastructure

Implementing a comprehensive Governance Stack is a significant investment. Organizations rightfully ask: what is the return?

The answer lies in four measurable outcomes:

Risk Mitigation: As demonstrated by the recent AI-orchestrated cyber espionage campaign disrupted by Anthropic [6], uncontrolled agent access to powerful tools is not a theoretical threat. A governance stack with identity attestation, gateways, and real-time policy enforcement would have prevented that attack at multiple layers.

Regulatory Compliance: With regulations like the EU AI Act imposing strict requirements on high-risk AI systems, the ability to demonstrate comprehensive lifecycle governance, auditability, and human oversight is not optional—it’s mandatory [2]. The Governance Stack provides the automated evidence generation required for compliance.

Operational Efficiency: Without centralized registries and monitoring, organizations waste time debugging agent failures, tracking down tool dependencies, and investigating cost overruns. The stack provides the visibility and control to operate an agentic workforce at scale.

Trust and Adoption: The ultimate ROI is internal and external trust. Employees, customers, and regulators need confidence that autonomous agents are operating safely, ethically, and in alignment with organizational values. The Governance Stack makes that confidence possible.

Building vs. Buying: The Emerging Vendor Landscape

Organizations face a critical decision: build this governance infrastructure in-house or adopt emerging platforms that provide it as a service. Early movers are choosing different paths:

Enterprise Platforms: Companies like Collibra, Databricks, and TrueFoundry are extending their data governance and MLOps platforms to include agent registries and observability tools [2, 5, 7].
Purpose-Built Solutions: Startups like Agentic Trust are building end-to-end governance platforms specifically designed for agentic AI, providing integrated registries, gateways, and policy engines [5].
Protocol-Level Standards: Open standards like OIDC-A and MCP are enabling interoperability, allowing organizations to build custom stacks from best-of-breed components [4].

The optimal path depends on organizational maturity, existing infrastructure, and the scale of agentic deployment. However, the underlying message is universal: governance at scale requires dedicated infrastructure.

Conclusion: Governance as the Enabler of Scale

The era of experimental agentic AI pilots is ending. Organizations are now operationalizing agentic workforces across critical business functions, and the governance gap is the primary barrier to scaling these deployments safely and responsibly.

The Governance Stack is not a constraint on innovation; it is the foundation that makes innovation sustainable. By providing identity, visibility, policy enforcement, continuous monitoring, and human oversight, this technical infrastructure transforms governance from a compliance burden into a strategic enabler.

The organizations that invest in this stack today will be the ones that confidently deploy autonomous agents at enterprise scale tomorrow. They will move faster, operate more safely, and earn the trust of stakeholders who demand accountability in the age of autonomous AI.

For technology leaders navigating this landscape, the path is clear: governance is not a policy problem—it is an engineering challenge. And like all engineering challenges, it requires purpose-built infrastructure to solve. The Governance Stack is that infrastructure.

References:

[1] McKinsey & Company. (2025, November 5). The State of AI in 2025: A global survey. McKinsey.

[2] Databricks. (2025, July 1). Introducing the Databricks AI Governance Framework. Databricks.

[3] DZone. (2025, May 21). Securing the Future: Best Practices for Privacy and Data Governance in LLMOps. DZone.

[4] Subramanya, N. (2025, April 28). OpenID Connect for Agents (OIDC-A) 1.0 Proposal. subramanya.ai.

[5] Subramanya, N. (2025, November 17). Why Private Registries are the Future of Enterprise Agentic Infrastructure. subramanya.ai.

[6] Subramanya, N. (2025, November 14). From Espionage to Identity: Securing the Future of Agentic AI. subramanya.ai.

[7] TrueFoundry. (2025, September 10). What is AI Agent Registry. TrueFoundry.

Why Private Registries are the Future of Enterprise Agentic Infrastructure

Mon, 17 Nov 2025 00:00:00 +0000

The age of agentic AI is no longer on the horizon; it’s in our datacenters, cloud environments, and business units. A recent PwC report highlights that a staggering 79% of companies are already adopting AI agents in some capacity [1]. As these autonomous systems proliferate, executing tasks and making decisions on behalf of the enterprise, a critical governance gap has emerged. Without a robust management framework, organizations risk a chaotic landscape of “shadow AI,” creating significant security vulnerabilities, compliance nightmares, and operational inefficiencies.

The solution lies in a new class of enterprise software: the Private Agent and MCP Registry. This is not just a catalog, but a command center for agentic infrastructure, providing the visibility, governance, and security necessary to scale AI responsibly. Let’s explore the core pillars of this trend, using the “Agentic Trust” platform as a blueprint for building a better, more secure agentic future.

Pillar 1: A Centralized Directory for Every Agent

The first step to managing agentic chaos is to establish a single source of truth. You cannot govern what you cannot see. A private agent registry provides a comprehensive, real-time inventory of every agent operating within the enterprise, whether built in-house or sourced from a third-party vendor.

A centralized agent directory, as shown in the Agentic Trust platform, provides a complete inventory for governance and oversight.

As the screenshot of the Agentic Trust directory illustrates, this is more than just a list. A mature registry tracks critical metadata for each agent, including:

Unique Identity: A verifiable ID for every agent, forming the foundation for authentication and authorization.
Capabilities: A clear declaration of what the agent is designed to do, including the tools, resources, and prompts it can access.
Lifecycle Status: Tracking whether an agent is in development, production, or retired.
Ownership and Lineage: Connecting each agent to a business owner, use case, and the data it interacts with.
Activity Monitoring: Recording when agents were last used and their registration dates.

This centralized view eliminates blind spots and provides the traceability required for compliance and security audits. Organizations can quickly answer critical questions: How many agents do we have? Who owns them? What are they authorized to do?

Pillar 2: A Curated Marketplace for Agent Tools (MCPs)

Autonomous agents are only as powerful as the tools they can access. The Model Context Protocol (MCP) has become a standard for providing agents with these tools, but an uncontrolled proliferation of MCP servers creates another layer of risk. A private registry addresses this by functioning as a curated, internal “app store” or marketplace for MCPs.

An MCP Registry, like this one from Agentic Trust, allows enterprises to create a governed marketplace of approved tools for their AI agents.

Instead of allowing agents to connect to any public MCP, the enterprise can define a catalog of approved, vetted, and secure tools. As shown in the Agentic Trust MCP Registry, this allows organizations to:

Enforce Security Standards: Ensure that all available tools meet enterprise security and compliance requirements before they’re made available to agents.
Manage Versions and Dependencies: Control which versions of tools are used, preventing unexpected breaking changes that could disrupt agent operations.
Control Costs: Monitor the usage of paid APIs and tools, preventing runaway costs from autonomous agents making thousands of requests.
Improve Developer Productivity: Provide a central place for developers to discover and reuse existing tools, accelerating agent development and reducing duplication.
Categorize and Organize: Group tools by function (productivity, collaboration, payments, development, monitoring) to make discovery easier.

The registry shows connection status for each MCP server, making it immediately visible which integrations are active and which require attention. This operational visibility is critical for maintaining a healthy agentic ecosystem.

Pillar 3: End-to-End Governance and Policy Enforcement

A private registry is the enforcement point for enterprise AI policy. It moves governance from a manual, after-the-fact process to an automated, built-in function of the agentic infrastructure. Drawing on best practices from platforms like Collibra and Microsoft Azure’s private registry implementations, this includes [1, 2]:

Mandatory Metadata and Documentation: Before an agent or MCP can be registered, developers must provide essential information such as data classification, business owner, purpose, and criticality. This ensures that every component in the agentic ecosystem is properly documented and understood.

Lifecycle Policy Alignment: The registry can embed automated policy checks at each stage of an agent’s lifecycle. For example, an agent cannot be promoted to production without a completed security review, ethical bias assessment, and approval from the designated business owner. This creates natural checkpoints that enforce organizational standards.

Access Control and Permissions: Using Role-Based Access Control (RBAC), integrated with enterprise identity systems like Entra ID or Okta, the registry defines who can create, manage, and consume agents and their tools. Different teams might have different levels of access based on their role and the sensitivity of the agents they’re working with.

Audit Trails and Compliance: Every action in the registry—agent registration, tool connection, permission changes—is logged and auditable. This creates a complete forensic trail that satisfies regulatory requirements and enables rapid incident response when issues arise.

Pillar 4: Solving Real Enterprise Challenges

The value of a private registry becomes clear when we examine the specific problems it solves. Consider these common enterprise scenarios:

Challenge: Shadow AI and Uncontrolled Tool Adoption

Development teams are rapidly adopting AI tools and MCP servers without central oversight. This creates security blind spots, compliance risks, and operational fragmentation across the organization. A private registry provides centralized discovery of approved tools and usage visibility, allowing security teams to monitor what tools are being used and by whom [2].

Challenge: Regulatory Compliance and Data Sovereignty

Organizations in regulated industries (financial services, healthcare, government) need to maintain strict control over data flows and ensure AI tools meet compliance requirements. The registry enables data classification tagging for MCP servers, geographic controls for region-specific availability, comprehensive audit trails, and pre-configured compliance templates [2].

Challenge: Cost Control and Resource Optimization

Without visibility into agent and tool usage, organizations face unpredictable costs as autonomous agents make API calls and consume resources. A private registry provides usage analytics, cost allocation by team or project, budget alerts, and the ability to deprecate underutilized or expensive tools [2].

Challenge: Developer Productivity and Tool Discovery

Developers waste time rebuilding integrations that already exist elsewhere in the organization or struggle to find the right tools for their agents. The registry solves this with searchable catalogs, reusable components, standardized integration patterns, and clear documentation for each available tool [3].

The Architecture That Enables Scale

Behind the user interface of platforms like Agentic Trust lies a sophisticated architecture that makes enterprise-scale agent management possible. The key components include [3, 4]:

Component	Purpose
Central Registry API	Provides standardized endpoints for agent and MCP registration, discovery, and management
Metadata Database	Stores agent cards, capability declarations, and relationship data
Policy Engine	Enforces governance rules, access controls, and compliance checks
Discovery Service	Enables capability-based search and intelligent agent-to-tool matching
Health Monitor	Tracks agent and MCP server availability through heartbeats and health checks
Integration Layer	Connects to enterprise identity systems, monitoring tools, and DevOps pipelines

This architecture mirrors patterns from successful enterprise software registries, such as container registries, API management platforms, and model registries. The lesson is clear: as a technology becomes critical to enterprise operations, it requires industrial-grade management infrastructure.

The Path Forward

The trend toward private registries for agentic infrastructure is not a passing fad; it is a necessary evolution in response to the rapid adoption of autonomous AI systems. As the Model Context Protocol ecosystem continues to grow, with the official MCP Registry serving as a public catalog [4], forward-thinking enterprises are building their own private implementations to maintain control, security, and governance.

Platforms like Agentic Trust demonstrate what this future looks like: a unified command center where every agent is visible, every tool is vetted, and every action is governed by policy. This is how organizations move from the chaos of unmanaged AI to the strategic advantage of a well-orchestrated agentic ecosystem.

For enterprises embarking on this journey, the message is clear: you cannot scale what you cannot see, and you cannot govern what you cannot control. A private registry is the foundation upon which responsible, secure, and effective agentic AI is built.

References:

[1] Collibra. (2025, October 6). Collibra AI agent registry: Governing autonomous AI agents. Collibra.

[2] Bajada, AJ. (2025, August 14). DevOps and AI Series: Azure Private MCP Registry. azurewithaj.com.

[3] TrueFoundry. (2025, September 10). What is AI Agent Registry. TrueFoundry.

[4] Model Context Protocol. (2025, September 8). Introducing the MCP Registry. Model Context Protocol.

From Espionage to Identity: Securing the Future of Agentic AI

Fri, 14 Nov 2025 00:00:00 +0000

Anthropic has detailed its disruption of the first publicly reported cyber espionage campaign orchestrated by a sophisticated AI agent [1]. The incident, attributed to a state-sponsored group designated GTG-1002, is more than just a security bulletin; it is a clear signal that the age of autonomous, agentic AI threats is here. It also serves as a critical case study, validating the urgent need for a new generation of identity and access management protocols specifically designed for AI.

This post will dissect the anatomy of the attack, connect it to the foundational security challenges facing agentic AI, and explore how emerging standards like OpenID Connect for Agents (OIDC-A) provide a necessary path forward [2, 3].

Anatomy of an AI-Orchestrated Attack

Anthropic’s investigation revealed a campaign of unprecedented automation. The attackers turned Anthropic’s own Claude Code model into an autonomous weapon, targeting approximately thirty global organizations across technology, finance, and government. The AI was not merely an assistant; it was the operator, executing 80-90% of the tactical work with human intervention only required at a few key authorization gates [1].

The technical sophistication of the attack did not lie in novel malware, but in orchestration. The threat actor built a custom framework around a series of Model Context Protocol (MCP) servers. These servers acted as a bridge, giving the AI agent access to a toolkit of standard, open-source penetration testing utilities—network scanners, password crackers, and database exploitation tools.

By decomposing the attack into seemingly benign sub-tasks, the attackers tricked the AI into executing a complex intrusion campaign. The AI agent, operating with a persona of a legitimate security tester, autonomously performed reconnaissance, vulnerability analysis, and data exfiltration at a machine-speed that no human team could match.

The MCP Paradox: Extensibility vs. Security

The Anthropic report explicitly states that the attackers leveraged the Model Context Protocol (MCP) to arm their AI agent [1]. This highlights a central paradox in agentic AI architecture: the very protocols designed for extensibility and power, like MCP, can become the most potent attack vectors.

As the “Identity Management for Agentic AI” whitepaper notes, MCP is a leading framework for connecting AI to external tools, but it also presents significant security challenges [3]. When an AI can dynamically access powerful tools without robust oversight, it creates a direct and dangerous path for misuse. The GTG-1002 campaign is a textbook example of this risk realized.

This forces a critical re-evaluation of how we architect agentic systems. We can no longer afford to treat the connection between an AI agent and its tools as a trusted channel. This is where the concept of an MCP Gateway or Proxy becomes not just a good idea, but an absolute necessity.

The Solution: Identity, Delegation, and Zero Trust for Agents

The security gaps exploited in the Anthropic incident are precisely what emerging standards like OIDC-A (OpenID Connect for Agents) are designed to close [2, 3]. The core problem is one of identity and authority. The AI agent in the attack acted with borrowed, indistinct authority, effectively impersonating a legitimate user or process. True security requires a shift to a model of explicit, verifiable delegation.

The OIDC-A proposal introduces a framework for establishing the identity of an AI agent and managing its authorization through cryptographic delegation chains. This means an agent is no longer just a proxy for a user; it is a distinct entity with its own identity, operating on behalf of a user with a clearly defined and constrained set of permissions.

Here’s how this new model, enforced by an MCP Gateway, would have mitigated the Anthropic attack:

Security Layer	Description
Agent Identity & Attestation	The AI agent would have a verifiable identity, attested by its provider. An MCP Gateway could immediately block any requests from unattested or untrusted agents.
Tool-Level Delegation	Instead of broad permissions, the agent would receive narrowly-scoped, delegated authority for specific tools. The OIDC-A `delegation_chain` ensures that the agent’s permissions are a strict subset of the delegating user’s permissions [2]. An agent designed for code analysis could never be granted access to a password cracker.
Policy Enforcement & Anomaly Detection	The MCP Gateway would act as a policy enforcement point, monitoring all tool requests. It could detect anomalous behavior, such as an agent attempting to use a tool outside its delegated scope or a sudden spike in high-risk tool usage, and automatically terminate the agent’s session.
Auditing and Forensics	Every tool request and delegation would be cryptographically signed and logged, creating an immutable audit trail. This would provide immediate, granular visibility into the agent’s actions, dramatically accelerating incident response.

Building Enterprise-Grade Security for Agentic AI

The Anthropic report is a watershed moment. It proves that the threats posed by agentic AI are no longer theoretical. As the “Identity Management for Agentic AI” paper argues, we must move beyond traditional, human-centric security models and build a new foundation for AI identity [3].

Today, most MCP servers being developed are experimental tools designed for individual developers and small-scale applications. They lack the enterprise-grade security controls that organizations require to deploy them in production environments. For enterprises to confidently adopt agentic AI systems built on protocols like MCP, we need to fundamentally rethink how we approach security.

The path forward requires building robust delegation frameworks, implementing proper identity management for AI agents, and creating enterprise-grade security controls like gateways and policy enforcement points. We need solutions that provide:

Cryptographic delegation chains that clearly define and constrain agent permissions
Real-time policy enforcement that can detect and prevent anomalous behavior
Comprehensive audit trails that enable forensic analysis and compliance
Zero-trust architectures where every agent action is verified and authorized

We cannot afford to let the open, extensible nature of protocols like MCP become a permanent backdoor for malicious actors. The future of agentic AI depends on our ability to build security into these systems from the ground up, making enterprise adoption not just possible, but secure and responsible.

References:

[1] Anthropic. (2025, November). Disrupting the first reported AI-orchestrated cyber espionage campaign. Anthropic.

[2] Subramanya, N. (2025, April 28). OpenID Connect for Agents (OIDC-A) 1.0 Proposal. subramanya.ai.

[3] South, T. (Ed.). (2025, October). Identity Management for Agentic AI: The new frontier of authorization, authentication, and security for an AI agent world. arXiv.

Claude Skills vs. MCP: A Tale of Two AI Customization Philosophies

Thu, 30 Oct 2025 00:00:00 +0000

In the rapidly evolving landscape of artificial intelligence, the ability to customize and extend the capabilities of large language models (LLMs) has become a critical frontier. Anthropic, a leading AI research company, has introduced two powerful but distinct approaches to this challenge: Claude Skills and the Model Context Protocol (MCP). While both aim to make AI more useful and integrated into our workflows, they operate on fundamentally different principles. This post delves into a detailed comparison of Claude Skills and MCP, explores whether they can or should be merged, and discusses the exciting future of AI customization they represent.

What are Claude Skills? The Power of Procedural Knowledge

Claude Skills, also known as Agent Skills, are a revolutionary way to teach Claude how to perform specific tasks in a repeatable and customized manner. At its core, a Skill is a folder containing a SKILL.md file, which includes instructions, resources, and even executable code. Think of Skills as a set of standard operating procedures for the AI. For example, a Skill could instruct Claude on how to format a weekly report, adhere to a company’s brand guidelines, or analyze data using a specific methodology.

The genius of Claude Skills lies in their architecture, which is built on a principle called progressive disclosure. This three-tiered system ensures that Claude’s context window isn’t overwhelmed with information:

Level 1: Metadata: When a session starts, Claude loads only the name and description of each available Skill. This is a very lightweight process, consuming only a few tokens per Skill.
Level 2: The SKILL.md file: If Claude determines that a Skill is relevant to the user’s request, it then loads the full content of the SKILL.md file.
Level 3 and beyond: Additional resources: If the SKILL.md file references other documents or scripts within the Skill’s folder, Claude will load them only when needed.

This efficient, just-in-time loading mechanism allows for a vast library of Skills to be available without sacrificing performance. Skills are also portable, working across Claude.ai, Claude Code, and the API, and can even include executable code for deterministic and reliable operations.

What is the Model Context Protocol (MCP)? The Universal Connector

The Model Context Protocol (MCP) is an open-source standard designed to connect AI applications to external systems. If Claude Skills are about teaching the AI how to do something, MCP is about giving it access to what it needs to do it. MCP acts as a universal connector, similar to a USB-C port for AI, allowing models like Claude to interact with a wide range of data sources, tools, and workflows.

MCP operates on a client-server architecture:

MCP Host: The AI application (e.g., Claude) that manages connections to various external systems.
MCP Client: A component within the host that maintains a one-to-one connection with an MCP server.
MCP Server: A program that exposes tools, resources, and prompts from an external system to the AI.

This architecture allows an AI to connect to multiple external systems simultaneously, from local files and databases to remote services like GitHub, Slack, or a company’s internal APIs. MCP is built on a two-layer architecture, with a data layer based on JSON-RPC 2.0 and a transport layer that supports both local and remote connections.

The Core Difference: Methodology vs. Connectivity

The fundamental distinction between Claude Skills and MCP can be summarized as methodology versus connectivity. MCP provides the AI with access to tools and data, while Skills provide the instructions on how to use them effectively. According to Anthropic’s own documentation:

“MCP connects Claude to external services and data sources. Skills provide procedural knowledge—instructions for how to complete specific tasks or workflows. You can use both together: MCP connections give Claude access to tools, while Skills teach Claude how to use those tools effectively.”

This highlights that Skills and MCP are not competing technologies but are, in fact, complementary. An apt analogy is that of a master chef. MCP provides the chef with a fully stocked pantry of ingredients and a set of high-end kitchen appliances (the what). Skills, on the other hand, are the chef’s personal recipe book and techniques, guiding them on how to combine the ingredients and use the appliances to create a culinary masterpiece.

Feature	Claude Skills	Model Context Protocol (MCP)
Primary Purpose	Procedural knowledge and methodology	Connectivity to external systems
Architecture	Filesystem-based with progressive disclosure	Client-server with JSON-RPC 2.0
Core Concept	Teaching the AI how to do something	Giving the AI access to what it needs
Dependency	Requires a code execution environment	A client and a server implementation
Token Efficiency	Very high due to progressive disclosure	Moderate, with tool descriptions in context
Portability	Across Claude interfaces	Open standard for any LLM

Can a Claude Skill be an MCP? And Should They Be Merged?

Given that both are Anthropic’s creations, a natural question arises: could a Claude Skill be implemented as an MCP, or should the two be merged into a single, unified system? While technically possible to create an MCP server that exposes Skills, it would be architecturally inefficient and would defeat the purpose of both systems.

Exposing Skills through MCP would negate the benefits of progressive disclosure, as it would introduce the overhead of the MCP protocol for what should be a simple filesystem read. It would also create a redundant abstraction layer, as Skills already require a local code execution environment. The two systems are designed for different purposes and have different optimization goals: Skills for context efficiency within Claude, and MCP for standardized integration across different AI systems.

Therefore, Claude Skills and MCP should be treated as independent, complementary technologies. The most powerful workflows will come from using them in synergy.

The Power of Synergy: Using Skills and MCP Together

The true potential of these technologies is unlocked when they are used in concert. Here are a few integration patterns that showcase their combined power:

Skills as MCP Orchestrators: A Skill can contain a complex workflow that orchestrates calls to multiple MCP servers. For example, a “Deploy and Notify” Skill could contain a deployment checklist, notification templates, and rollback procedures. It would then use MCP to access GitHub for code, a CI/CD server for deployment, and Slack for notifications.
Skills for MCP Configuration: An organization can create Skills that teach Claude its specific standards for using MCP tools. For example, a “GitHub Workflow Standards” Skill could contain instructions on branch naming conventions, pull request review checklists, and commit message templates, ensuring that Claude uses the GitHub MCP server in a way that aligns with the company’s best practices.
Hybrid Skills: A Skill can contain embedded code that makes calls to an MCP server. This is useful for self-contained workflows that need to fetch external data.

The Future: A Marketplace for Skills and an Ecosystem for MCP

The future of AI customization will likely see the development of a vibrant Skills Marketplace. Similar to the app stores for our smartphones or the extension marketplaces for our code editors, a Skills Marketplace would allow developers to publish, share, and even sell Skills. This could create a new economy around AI expertise, with a wide range of Skills available, from free, community-contributed Skills to premium, industry-specific Skill packages for domains like law, medicine, or finance.

Simultaneously, the MCP ecosystem will continue to grow, with more and more tools and services exposing their functionality through MCP servers. This will create a virtuous cycle: as more tools become available through MCP, the demand for Skills that can effectively use those tools will increase.

Conclusion

Claude Skills and the Model Context Protocol represent two distinct but complementary philosophies of AI customization. MCP is the universal connector, providing the what—the access to tools and data. Skills are the procedural knowledge, providing the how—the instructions and methodology. They are not competitors but partners in the quest to create more powerful, personalized, and integrated AI assistants. The future of AI workflows will not be about choosing between Skills or MCP, but about leveraging the power of Skills and MCP to create intelligent systems that are truly tailored to our needs.

References:

[1] Anthropic. (2025, October 16). Claude Skills: Customize AI for your workflows. Anthropic.

[2] Anthropic. (2025, October 16). Equipping agents for the real world with Agent Skills. Anthropic.

[3] Model Context Protocol. (n.d.). What is the Model Context Protocol (MCP)? Model Context Protocol.

[4] Model Context Protocol. (n.d.). Architecture overview. Model Context Protocol.

[5] Willison, S. (2025, October 16). Claude Skills are awesome, maybe a bigger deal than MCP. Simon Willison’s Weblog.

[6] Claude Help Center. (n.d.). What are Skills? Claude Help Center.

[7] IntuitionLabs. (2025, October 27). Claude Skills vs. MCP: A Technical Comparison for AI Workflows. IntuitionLabs.

Beyond "Non-Deterministic": Deconstructing the Illusion of Randomness in LLMs

Tue, 09 Sep 2025 00:00:00 +0000

In the rapidly evolving lexicon of AI, few terms are as casually thrown around—and as fundamentally misunderstood—as “non-deterministic.” We use it to explain away unexpected outputs, to describe the creative spark of generative models, and to justify the frustrating brittleness of our AI-powered systems. But this term, borrowed from classical computer science, is not just imprecise when applied to Large Language Models (LLMs); it’s a conceptual dead end. It obscures the intricate, deterministic machinery humming beneath the surface and distracts us from the real architectural challenges we face.

Attributing an LLM’s behavior to “non-determinism” is like blaming a complex system’s emergent behavior on magic. It’s an admission of incomprehension, not an explanation. The truth is far more fascinating and, for architects and engineers, far more critical to understand. LLMs are not mystical black boxes governed by chance. They are complex, stateful systems whose outputs are the result of a deterministic, albeit highly sensitive, process. The perceived randomness is not a feature; it is a symptom of a deeper architectural paradigm shift.

This post will dismantle the myth of LLM non-determinism. We will explore why the term is a poor fit, dissect the underlying deterministic mechanisms that govern LLM behavior, and reframe the conversation around the true challenge: the profound difficulty of controlling a system whose behavior is an emergent property of its architecture. We will move beyond the simplistic notion of randomness and into the far more complex and rewarding territory of input ambiguity, ill-posed inverse problems, and the dawn of truly evolutionary software architectures.

The Deterministic Heart of the LLM

To understand why “non-deterministic” is a misnomer, we must first revisit its classical definition. A deterministic algorithm, given a particular input, will always produce the same output. An LLM, at its core, is a mathematical function. It is a massive, intricate, but ultimately deterministic, series of calculations. Given the same model, the same weights, and the same input sequence, the same sequence of floating-point operations will occur, producing the same output logits.

The illusion of non-determinism arises not from the model itself, but from the sampling strategies we apply to its output. The model’s final layer produces a vector of logits, one for each token in its vocabulary. These logits are then converted into a probability distribution via the softmax function. It is at this final step—the selection of the next token from this distribution—that we introduce controlled randomness.

Temperature and Sampling: The Controlled Introduction of Randomness

The temperature parameter is the primary lever we use to control this randomness. A temperature of 0 results in greedy decoding—a purely deterministic process where the token with the highest probability is always chosen. In theory, with a temperature of 0, an LLM should be perfectly deterministic. However, as many have discovered, even this is not a perfect guarantee. Minor differences in floating-point arithmetic across different hardware, or even different software library versions, can lead to minuscule variations in the logits, which can occasionally be enough to tip the balance in favor of a different token.

When the temperature is set above 0, we enter the realm of stochastic sampling. The temperature value scales the logits before they are passed to the softmax function. A higher temperature flattens the probability distribution, making less likely tokens more probable. A lower temperature sharpens the distribution, making the most likely tokens even more dominant. This is not non-determinism in the classical sense; it is a controlled, probabilistic process. We are not dealing with a system that can arbitrarily choose its next state; we are dealing with a system that makes a weighted random choice from a set of possibilities whose probabilities are deterministically calculated.

Other sampling techniques, such as top-k and top-p (nucleus) sampling, further refine this process. Top-k sampling restricts the choices to the k most likely tokens, while top-p sampling selects from the smallest set of tokens whose cumulative probability exceeds a certain threshold. These are all mechanisms for shaping and constraining the probabilistic selection process, not for introducing true non-determinism.

Demonstrating Determinism: A Concrete Example

Consider this simple demonstration using a transformer model with temperature set to 0:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "microsoft/DialoGPT-medium"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")

# Run the same generation 10 times with temperature=0
outputs = []
for i in range(10):
    generated = model.generate(
        inputs['input_ids'],
        max_length=50,
        temperature=0.0,  # Deterministic
        do_sample=False,  # Greedy decoding
        pad_token_id=tokenizer.eos_token_id
    )
    text = tokenizer.decode(generated[0], skip_special_tokens=True)
    outputs.append(text)

# All outputs should be identical
assert all(output == outputs[0] for output in outputs)

This code will pass its assertion in most cases, demonstrating the deterministic nature of the underlying model. However, the occasional failure of this assertion—due to hardware differences, library versions, or floating-point precision variations—illustrates why even “deterministic” settings cannot guarantee perfect reproducibility across all environments.

The Real Culprit: Input Ambiguity and the Ill-Posed Inverse Problem

If the LLM itself is fundamentally deterministic, why is it so hard to get the output we want? The answer lies not in the forward pass of the model, but in the inverse problem we are trying to solve. When we interact with an LLM, we are not simply providing an input and observing an output. We are attempting to solve an inverse problem: we have a desired output in mind, and we are trying to find the input prompt that will produce it.

This is where the concept of a well-posed problem, as defined by the mathematician Jacques Hadamard, becomes critical. A problem is well-posed if it satisfies three conditions:

Existence: A solution exists.
Uniqueness: The solution is unique.
Stability: The solution’s behavior changes continuously with the initial conditions.

Prompt engineering, when viewed as an inverse problem, fails on all three counts.

Existence: The specific output we desire may not be achievable by any possible prompt. The model’s latent space may not contain a representation that perfectly matches our intent.
Uniqueness: There are often many different prompts that can produce very similar outputs. This is the problem of prompt equivalence, and it makes it difficult to find the single “best” prompt.
Stability: This is the most frustrating aspect of prompt engineering. A tiny, seemingly insignificant change to a prompt can lead to a radically different output. This lack of stability is what makes LLM-based systems feel so brittle and unpredictable.

This is what people are really talking about when they say LLMs are “non-deterministic.” They are not talking about a lack of determinism in the model’s execution; they are talking about the ill-posed nature of the inverse problem they are trying to solve. The model is not random; our ability to control it is simply imprecise.

The Mathematics of Prompt Sensitivity

The sensitivity of LLMs to prompt variations can be understood through the lens of chaos theory and dynamical systems. Small perturbations in the input space can lead to dramatically different trajectories through the model’s latent space. This is not randomness; it is sensitive dependence on initial conditions—a hallmark of complex deterministic systems.

Consider the mathematical representation of this sensitivity. If we denote our prompt as a vector p in the input space, and the model’s output as a function f(p), then the sensitivity can be expressed as:

||f(p + δp) - f(p)|| >> ||δp||

Where δp represents a small change to the prompt, and the double bars represent vector norms. This inequality shows that small changes in input can produce disproportionately large changes in output—the mathematical signature of a chaotic system, not a random one.

This sensitivity is further amplified by the autoregressive nature of text generation. Each token prediction depends on all previous tokens, creating a cascade effect where early variations compound exponentially. A single different token early in the generation can completely alter the semantic trajectory of the entire output.

The Architectural Shift: From Predictable Execution to Emergent Behavior

This reframing from non-determinism to input ambiguity has profound implications for how we design and build systems that incorporate LLMs. For decades, software architecture has been predicated on the assumption of predictable execution. We design systems with the expectation that a given component, when provided with a specific input, will behave in a known and repeatable manner. This is the foundation of everything from unit testing to microservices architecture.

AI agents, powered by LLMs, shatter this assumption. They do not simply execute our designs; they exhibit emergent behavior. The system’s behavior is not explicitly defined by the architect, but emerges from the complex interplay of the model’s weights, the input prompt, the sampling strategy, and the context of the interaction. This is a fundamental shift from a mechanical to a biological metaphor for software. We are no longer building machines that execute instructions; we are cultivating ecosystems where intelligent agents adapt and evolve.

This has several immediate architectural consequences:

The Death of the Static API Contract: In a traditional microservices architecture, the API contract is sacrosanct. In an agent-based system, the “contract” is fluid and context-dependent. The same functional goal may be achieved through different series of actions depending on the nuances of the initial prompt and the state of the system.
The Rise of Intent-Driven Design: Instead of specifying the exact steps a system should take, we must design systems that can understand and act on user intent. This requires a shift from imperative to declarative interfaces, where we specify what we want, not how to achieve it.
The Need for Robust Observability: When a system’s behavior is emergent, we can no longer rely on traditional logging and monitoring. We need new tools and techniques for observing and understanding the behavior of agent-based systems. This includes not just monitoring for errors, but also for unexpected successes and novel solutions.

Engineering for Emergence: Practical Approaches

Understanding that LLMs are deterministic but sensitive systems opens up new avenues for engineering robust AI-powered applications. Rather than fighting the sensitivity, we can design systems that work with it.

Ensemble Methods and Consensus Mechanisms

One approach is to embrace the variability through ensemble methods. Instead of trying to get a single “perfect” output, we can generate multiple outputs and use consensus mechanisms to select the best result. This approach treats the sensitivity as a feature, not a bug, allowing us to explore the space of possible outputs and select the most appropriate one.

def consensus_generation(model, prompt, n_samples=5, temperature=0.7):
    """Generate multiple outputs and select based on consensus."""
    outputs = []
    for _ in range(n_samples):
        output = model.generate(prompt, temperature=temperature)
        outputs.append(output)
    
    # Use semantic similarity or other metrics to find consensus
    return select_consensus_output(outputs)

Prompt Optimization Through Gradient-Free Methods

Since the prompt-to-output mapping is not differentiable in the traditional sense, we must rely on gradient-free optimization methods. Techniques from evolutionary computation, such as genetic algorithms or particle swarm optimization, can be adapted to search the prompt space more effectively.

Architectural Patterns for Agent Systems

The shift from deterministic to emergent behavior requires new architectural patterns:

Circuit Breakers for AI: Traditional circuit breakers protect against cascading failures. AI circuit breakers must protect against semantic drift and unexpected behavior patterns.
Semantic Monitoring: Instead of monitoring for technical failures, we must monitor for semantic coherence and goal alignment.
Adaptive Retry Logic: Rather than simple exponential backoff, AI systems need retry logic that can adapt the prompt or approach based on the nature of the failure.

Conclusion: Embracing the Complexity

The term “non-deterministic” is a crutch. It allows us to avoid the difficult but necessary work of understanding the true nature of LLM-based systems. By retiring this term from our vocabulary, we can begin to have a more honest and productive conversation about the real challenges and opportunities that lie ahead.

We are not building random number generators; we are building the first generation of truly evolutionary software. These systems are not unpredictable because they are random, but because they are complex. They are not uncontrollable because they are non-deterministic, but because our methods of control are still in their infancy.

The path forward lies not in trying to force LLMs into the old paradigms of predictable execution, but in developing new architectural patterns that embrace the reality of emergent behavior. We must become less like mechanical engineers and more like gardeners. We must learn to cultivate, guide, and prune these systems, rather than simply designing and building them.

The architectural revolution is here. It’s time to update our vocabulary to match.

The Architectural Revolution: Why AI Agents Shatter Traditional Design Patterns

Mon, 21 Jul 2025 00:00:00 +0000

For decades, software architects have operated under a fundamental assumption: we design systems, and systems execute our designs. We draw diagrams, define interfaces, and specify behaviors. Our applications dutifully follow these blueprints, calling the APIs we’ve mapped out, processing data through the pipelines we’ve constructed, and failing in the predictable ways we’ve anticipated.

AI agents are rewriting this contract entirely.

Unlike the monoliths and microservices that came before them, AI agents don’t just execute architecture—they evolve it. They make decisions we never programmed, forge connections we never specified, and solve problems through paths we never imagined. This isn’t simply a new deployment pattern or communication protocol. It’s the emergence of the first truly evolutionary software architecture, where systems adapt, learn, and fundamentally change their own structure during runtime.

The implications stretch far beyond adding “AI capabilities” to existing systems. We’re witnessing the birth of software that exhibits emergent properties, where the whole becomes genuinely greater than the sum of its parts. For software architects, this represents both an unprecedented opportunity and a fundamental challenge to everything we thought we knew about building reliable, scalable systems.

The Architecture DNA: From Blueprints to Evolution

To understand why AI agents represent such a radical departure, we need to examine the architectural DNA that has shaped software development for the past several decades. Each major architectural pattern emerged to solve specific problems of its era, but also carried forward certain assumptions about how software systems should behave.

timeline
    title Architectural Evolution: From Control to Emergence
    
    section Monolithic Era
        1990s-2000s : Single Deployable Unit
                    : Centralized Control
                    : Predictable Execution
                    : Shared Memory Model
    
    section Microservices Era  
        2010s-2020s : Distributed Services
                    : Service Boundaries
                    : API Contracts
                    : Orchestrated Workflows
    
    section Agent Era
        2020s-Future : Autonomous Entities
                     : Emergent Behavior
                     : Self-Organizing Networks
                     : Evolutionary Architecture

The monolithic era gave us centralized control and predictable execution paths. Every function call, every data transformation, every business rule was explicitly coded and deterministically executed. When something went wrong, we could trace through the call stack and identify exactly where the failure occurred. The system was complicated, but it was knowable.

Microservices introduced distributed complexity but maintained the fundamental assumption of designed behavior. We broke our monoliths into smaller, more manageable pieces, but each service still executed predetermined logic through well-defined APIs. The communication patterns became more complex, but they remained static and predictable. We could still draw service maps and dependency graphs that accurately represented how our systems would behave in production.

AI agents shatter this predictability entirely. They don’t just execute code—they reason, adapt, and make autonomous decisions based on context, goals, and learned patterns. An agent tasked with “optimizing system performance” might decide to scale certain services, modify caching strategies, or even restructure data flows—all without explicit programming for these specific actions. The system’s behavior emerges from the interaction of autonomous entities rather than from predetermined design specifications.

This shift from designed to emergent behavior represents more than just a technical evolution. It’s a fundamental change in how we think about software systems themselves. We’re moving from mechanical metaphors—where systems are machines that execute instructions—to biological ones, where systems are living entities that adapt and evolve.

The Fundamental Differences: Decision-Making in the Age of Autonomy

The most profound difference between traditional architectures and agent-based systems lies not in their technical implementation, but in how decisions get made. This shift fundamentally alters the relationship between architects, systems, and runtime behavior.

Decision-Making Patterns Across Architectures

graph TD
    subgraph "Monolithic Decision Making"
        A1[User Request] --> B1[Application Logic]
        B1 --> C1[Business Rules Engine]
        C1 --> D1[Database Query]
        D1 --> E1[Response]
        style B1 fill:#ff9999
        style C1 fill:#ff9999
    end
    
    subgraph "Microservices Decision Making"
        A2[User Request] --> B2[API Gateway]
        B2 --> C2[Service A]
        B2 --> D2[Service B]
        C2 --> E2[Service C]
        D2 --> E2
        E2 --> F2[Aggregated Response]
        style C2 fill:#99ccff
        style D2 fill:#99ccff
        style E2 fill:#99ccff
    end
    
    subgraph "Agent Decision Making"
        A3[Goal/Intent] --> B3[Agent Network]
        B3 --> C3{Agent A<br/>Reasoning}
        C3 -->|Context 1| D3[Action Set 1]
        C3 -->|Context 2| E3[Action Set 2]
        C3 -->|Context 3| F3[Delegate to Agent B]
        F3 --> G3{Agent B<br/>Reasoning}
        G3 --> H3[Emergent Solution]
        style C3 fill:#99ff99
        style G3 fill:#99ff99
        style H3 fill:#ffff99
    end

In monolithic systems, decision-making follows a predetermined path through centralized business logic. The application contains all the rules, and execution is deterministic. Given the same input, you’ll always get the same output through the same code path.

Microservices distribute decision-making across service boundaries, but each service still contains predetermined logic. The decision tree is distributed, but it’s still a tree—with predictable branches and outcomes. Service A will always call Service B under certain conditions, and Service B will always respond in predictable ways.

Agent systems introduce autonomous reasoning at multiple points in the execution flow. Each agent evaluates context, considers multiple options, and makes decisions that weren’t explicitly programmed. More importantly, agents can decide to involve other agents, creating dynamic collaboration patterns that emerge based on the specific problem being solved.

Communication Patterns: From Contracts to Conversations

The communication patterns in agent systems represent an equally dramatic departure from traditional approaches:

sequenceDiagram
    participant U as User
    participant G as API Gateway
    participant A as Service A
    participant B as Service B
    participant D as Database
    
    Note over U,D: Traditional Microservices Communication
    U->>G: HTTP Request
    G->>A: Predefined API Call
    A->>B: Predefined API Call
    B->>D: SQL Query
    D-->>B: Result Set
    B-->>A: JSON Response
    A-->>G: JSON Response
    G-->>U: HTTP Response
    
    Note over U,D: Agent Communication (Same Goal)
    U->>G: Natural Language Intent
    G->>A: Goal + Context
    A->>A: Reasoning Process
    A->>B: Dynamic Request (Format TBD)
    B->>B: Reasoning Process
    B->>D: Optimized Query (Generated)
    D-->>B: Result Set
    B->>B: Result Analysis
    B-->>A: Insights + Recommendations
    A->>A: Solution Synthesis
    A-->>G: Solution + Explanation
    G-->>U: Natural Language Response

Traditional microservices communicate through rigid contracts—predefined APIs with fixed schemas, expected response formats, and error codes. These contracts are designed at development time and remain static throughout the system’s lifecycle.

Agent communication is fundamentally conversational. Agents negotiate what information they need, adapt their requests based on context, and can even invent new communication patterns on the fly. An agent might ask another agent for “insights about user behavior patterns” rather than requesting a specific dataset through a predetermined endpoint.

This shift from contracts to conversations enables agents to solve problems that weren’t anticipated during system design. They can combine capabilities in novel ways, request information at different levels of abstraction, and collaborate to address complex scenarios that would require significant development effort in traditional systems.

The Emergence Principle: When Systems Become Greater Than Their Parts

Perhaps the most fascinating aspect of agent-based architectures is their capacity for emergence—the phenomenon where complex behaviors and capabilities arise from the interaction of simpler components. This isn’t just theoretical; it’s a practical reality that fundamentally changes how we think about system design and capability planning.

System Behavior Emergence

graph TB
    subgraph "Traditional Systems: Additive Behavior"
        T1[Component A<br/>Capability X] --> TR[System Capability<br/>X + Y + Z]
        T2[Component B<br/>Capability Y] --> TR
        T3[Component C<br/>Capability Z] --> TR
        style TR fill:#ffcccc
    end
    
    subgraph "Agent Systems: Emergent Behavior"
        A1[Agent A<br/>Reasoning + Action X] --> E1[Emergent Capability α]
        A2[Agent B<br/>Reasoning + Action Y] --> E1
        A3[Agent C<br/>Reasoning + Action Z] --> E1
        
        A1 --> E2[Emergent Capability β]
        A2 --> E2
        
        A1 --> E3[Emergent Capability γ]
        A3 --> E3
        
        E1 --> ES[System Capabilities<br/>X + Y + Z + α + β + γ + ...]
        E2 --> ES
        E3 --> ES
        
        style E1 fill:#99ff99
        style E2 fill:#99ff99
        style E3 fill:#99ff99
        style ES fill:#ffff99
    end

In traditional systems, the total capability is essentially the sum of individual component capabilities. If Service A handles user authentication, Service B manages inventory, and Service C processes payments, your system can authenticate users, manage inventory, and process payments. The capabilities are additive and predictable.

Agent systems exhibit true emergence. When agents with reasoning capabilities interact, they can discover solutions and create capabilities that none of them possessed individually. An agent trained on customer service might collaborate with an agent focused on inventory management to automatically identify and resolve supply chain issues that affect customer satisfaction—a capability that emerges from their interaction rather than being explicitly programmed into either agent.

This emergence isn’t random or chaotic. It follows patterns that we’re only beginning to understand. Agents tend to develop specialized roles based on their interactions and successes. They form temporary coalitions to solve complex problems, then dissolve and reform in different configurations for new challenges. The system develops a kind of organizational intelligence that adapts to changing conditions and requirements.

The Unpredictability Paradox

This emergent behavior creates what we might call the “unpredictability paradox” of agent systems. While individual agent behaviors may be somewhat predictable based on their training and constraints, the system-level behaviors that emerge from agent interactions are fundamentally unpredictable. Yet these unpredictable behaviors often represent the most valuable capabilities of the system.

Consider a customer support scenario where multiple agents collaborate to resolve a complex issue. The customer service agent might identify that the problem requires technical expertise and automatically involve a technical support agent. The technical agent might determine that the issue is actually a product design flaw and involve a product development agent. The product agent might realize this represents a broader pattern and initiate a proactive communication campaign through a marketing agent.

None of these individual agents were programmed to execute this specific workflow, yet their collaboration produces a comprehensive solution that addresses not just the immediate customer issue, but also prevents future occurrences and improves overall customer experience. This is emergence in action—system-level intelligence that arises from agent interactions rather than explicit programming.

Design Implications for the Future: From Control to Influence

The shift to agent-based architectures requires a fundamental rethinking of design principles. Traditional software architecture focuses on control—defining exactly what the system should do and how it should do it. Agent architecture focuses on influence—creating conditions that guide autonomous entities toward desired outcomes.

New Design Principles for Agent Systems

mindmap
  root((Agent Architecture Design))
    Traditional Principles
      Explicit Control
        Predetermined workflows
        Fixed API contracts
        Centralized decision making
        Error handling by exception
      Predictable Behavior
        Deterministic execution
        Static service topology
        Known failure modes
        Linear scalability
    Agent-Era Principles
      Emergent Guidance
        Goal-oriented constraints
        Adaptive communication protocols
        Distributed reasoning
        Learning from failures
      Evolutionary Behavior
        Self-modifying workflows
        Dynamic capability discovery
        Emergent failure recovery
        Non-linear capability growth

This paradigm shift requires architects to think more like ecosystem designers than system engineers. Instead of specifying exact behaviors, we define environmental conditions, constraints, and incentive structures that encourage agents to develop desired capabilities and behaviors.

From Specification to Guidance

Traditional architecture relies heavily on specification. We define interfaces, document expected behaviors, and create detailed system designs that teams implement. The assumption is that if we specify the system correctly, it will behave correctly.

Agent architecture requires a shift to guidance-based design. We establish goals, define constraints, and create feedback mechanisms that help agents learn and adapt. Rather than specifying that “Service A should call Service B when condition X occurs,” we might establish that “agents should collaborate to optimize customer satisfaction while maintaining system performance within defined parameters.”

This doesn’t mean abandoning all structure or control. Instead, it means designing systems that can evolve and adapt while maintaining alignment with business objectives and operational constraints. We’re moving from rigid blueprints to adaptive frameworks that can accommodate emergent behaviors while ensuring system reliability and security.

The Role of the Architect in an Agent World

The architect’s role evolves from system designer to ecosystem curator. Key responsibilities shift toward:

Constraint Design: Rather than defining exact behaviors, architects design constraint systems that guide agent decision-making toward desired outcomes while preventing harmful behaviors.

Emergence Facilitation: Creating conditions that encourage beneficial emergent behaviors while providing mechanisms to detect and redirect problematic emergence patterns.

Evolution Management: Establishing processes for monitoring system evolution, understanding emergent capabilities, and guiding the system’s development over time.

Interaction Pattern Design: Defining frameworks for agent communication and collaboration that enable effective problem-solving while maintaining system coherence.

This represents a fundamental shift from deterministic to probabilistic thinking. Instead of asking “What will this system do?” we ask “What is this system likely to do, and how can we influence those probabilities toward desired outcomes?”

Conclusion: Embracing Architectural Evolution

The transition from traditional architectures to agent-based systems represents more than just another technological evolution—it’s a fundamental shift in how we conceive of software systems themselves. We’re moving from a world where we build machines that execute our instructions to one where we cultivate ecosystems of autonomous entities that solve problems in ways we never imagined.

This shift challenges many of our core assumptions about software architecture. The predictability and control that have been hallmarks of good system design become less relevant when systems can adapt and evolve autonomously. Instead, we need new frameworks for thinking about emergence, guidance, and evolutionary development.

For software architects, this represents both an unprecedented opportunity and a significant challenge. The opportunity lies in building systems that can adapt to changing requirements, discover novel solutions, and continuously improve their capabilities without constant human intervention. The challenge lies in learning to design for emergence rather than control, and developing new skills for guiding evolutionary systems.

The future belongs to architects who can embrace this uncertainty and learn to design systems that are robust enough to evolve safely, flexible enough to adapt to unexpected challenges, and aligned enough to maintain coherence with business objectives. We’re not just building the next generation of software—we’re participating in the emergence of truly intelligent systems that will reshape how we think about technology, automation, and human-computer collaboration.

The architectural revolution is just beginning. The question isn’t whether agent-based systems will become dominant—it’s whether we’ll be ready to design and manage them effectively when they do.

Do Agents Need Their Own Identity?

Tue, 15 Jul 2025 00:00:00 +0000

As AI agents become more sophisticated and autonomous, a fundamental question is emerging: should agents operate under user credentials, or do they need their own distinct identities? This isn’t just a technical curiosity—it’s a critical trust and security decision that will shape how we build reliable, accountable AI systems.

The question gained prominence when an engineer asked: “Why can’t we just pass the user’s OIDC token through to the agent? Why complicate things with separate agent identities?” The answer reveals deeper implications for trust, security, and governance in our AI-driven future.

When User Identity Works: The Simple Case

For many AI agents today, user identity propagation works perfectly. Consider a Kubernetes troubleshooting agent that helps developers debug failing pods. When a user asks “why is my pod failing?”, the agent investigates pod events, logs, and configurations—all within the user’s existing RBAC permissions. The agent acts as an intelligent intermediary, but the user remains fully responsible for the actions and outcomes.

This approach succeeds when agents operate as sophisticated tools: they work within the user’s session timeframe, perform clearly user-initiated actions, and maintain the user’s accountability. The trust model remains simple and familiar—the agent is merely an extension of the user’s capabilities.

The Trust Gap: Where User Identity Falls Short

However, as agents become more autonomous and capable, this simple model breaks down in ways that create significant trust and security challenges.

The Capability Mismatch Problem

Imagine a marketing manager asking an AI agent to verify GDPR compliance for a new campaign. The manager has permissions to read and write marketing content, but the compliance agent needs far broader access: scanning marketing data across all departments, accessing audit logs, cross-referencing customer data with privacy regulations, and analyzing historical compliance patterns.

Using the manager’s token creates an impossible choice: either the agent fails because it can’t access necessary resources, or the manager receives dangerously broad permissions they don’t need and shouldn’t have. Neither option serves security or operational needs effectively.

The Attribution Challenge

More concerning is the accountability problem that emerges with autonomous decision-making. Consider a supply chain optimization agent tasked with “optimizing hardware procurement.” The user never explicitly authorized accessing financial records or integrating with vendor APIs, yet the agent determines these actions are necessary to fulfill the optimization request.

When the agent makes an automated purchase order that goes wrong, who bears responsibility? The user who made a high-level request, or the agent that made specific autonomous decisions based on its interpretation of that request? With only user identity, everything gets attributed to the user—creating a dangerous disconnect between authority and accountability.

This attribution gap becomes critical for compliance, audit trails, and risk management. Organizations need to trace not just what happened, but who or what made each decision in the chain: user intent → agent interpretation → agent decision → system action.

The Path Forward: Embracing Dual Identity

The solution isn’t choosing between user and agent identity—it’s recognizing that both are necessary. This mirrors lessons from service mesh architectures, where zero trust requires considering both user identity and workload identity.

In this dual model, agents operate within delegated authority from users while maintaining their own identity for the specific decisions they make. The user grants the agent permission to “optimize supply chain,” but the agent’s identity governs what resources it can access and what actions it can take within that scope.

This approach offers several trust advantages: clearer attribution of decisions, more precise permission boundaries, better audit trails, and the ability to revoke or modify agent capabilities independently of user permissions. Technical implementations might leverage existing frameworks like SPIFFE for workload identity or extend OAuth 2.0 for agent-specific flows.

The dual identity model also enables more sophisticated scenarios, like agent-to-agent delegation, where one agent authorizes another to perform specific tasks—each maintaining its own identity and accountability.

Building Trustworthy Agent Systems

Getting agent identity right isn’t just a technical challenge—it’s fundamental to building AI systems that organizations can trust at scale. As agents become more autonomous, we need identity frameworks that provide clear attribution, appropriate authorization, and robust governance.

The community is still working through delegation mechanisms, revocation strategies, and authentication protocols for agent interactions. But one thing is clear—the simple days of “just use the user’s token” are behind us. The future of trustworthy AI depends on solving these identity challenges with security and accountability as primary design principles.

Securing AI Assistants: Why Your Favorite Apps Need Digital IDs for Their AI

Tue, 01 Jul 2025 00:00:00 +0000

When AI Acts on Your Behalf

Imagine you’re using Booking.com’s AI assistant to plan your vacation. It searches for flights, suggests hotels, and even makes reservations for you. But how does the payment system know this AI assistant is actually authorized to use your credit card? How does the hotel booking system know it’s acting on your behalf?

This isn’t just a hypothetical scenario. Today, AI assistants on platforms like Instagram, Facebook, and Booking.com are becoming more autonomous, taking actions for us rather than just answering questions. This shift creates a new challenge: how do we securely identify AI agents and verify they’re authorized to act on our behalf?

The Identity Problem for AI Agents

Traditional apps use simple API keys or service accounts for machine-to-machine communication. But AI agents are different for three key reasons:

They’re autonomous - They make decisions on their own based on your instructions
They’re personal - Your Instagram AI assistant acts differently than someone else’s
They’re delegated - They act on your behalf with your permissions

When Facebook’s AI assistant posts a comment for you or Booking.com’s AI makes a reservation, these platforms need to know:

Which specific AI instance is making the request
Who authorized it to act
What specific permissions it has
Whether it’s behaving as expected

Without proper identity systems, these platforms risk unauthorized actions, inability to track which AI did what, and security vulnerabilities.

How AI Identity Works: A Simple Flow

Here’s how AI identity works when you use an AI assistant on a platform like Booking.com:

sequenceDiagram
    participant User as You
    participant Platform as App Platform
    participant Auth as Identity System
    participant Agent as AI Assistant
    participant Service as App Services
    
    User->>Platform: "Book me a hotel in Paris"
    Platform->>Auth: Register AI with your permissions
    Auth->>Auth: Create digital ID for this AI
    Auth-->>Platform: Confirm AI registration
    
    Platform->>Agent: Start AI with your task
    Agent->>Platform: Request identity
    Platform->>Auth: Get identity for this AI
    Auth-->>Agent: Provide digital ID
    
    Agent->>Service: Book hotel (with digital ID)
    Service->>Service: Verify AI's identity & permissions
    Service-->>Agent: Confirm booking
    Agent-->>User: "Your hotel is booked!"

This process happens behind the scenes, but it ensures that AI agents can only do what they’re specifically authorized to do.

The Big Picture: AI Identity System

The diagram below shows how an AI identity system connects you, your AI assistants, and the services they use:

graph TB
    subgraph "AI Identity System"
        User["You"]
        Platform["App Platform"]
        Auth["Identity System"]
        
        subgraph "AI Assistants"
            Agent1["Your Booking Assistant"]
            Agent2["Your Social Media Assistant"]
        end
        
        subgraph "App Services"
            Service1["Hotel Booking"]
            Service2["Payment System"]
            Service3["Post Creation"]
        end
    
        %% Main connections
        User -->|"Give permission"| Platform
        Platform -->|"Register AI"| Auth
        Auth -->|"Issue digital ID"| Agent1
        Auth -->|"Issue digital ID"| Agent2
        
        %% Service connections
        Agent1 -->|"Book hotel with ID"| Service1
        Agent1 -->|"Pay with ID"| Service2
        Agent2 -->|"Post with ID"| Service3
        
        %% Verification
        Service1 -->|"Verify ID"| Auth
        Service2 -->|"Verify ID"| Auth
        Service3 -->|"Verify ID"| Auth
    end

Why Consumer Platforms Should Care

For platforms like Booking.com, Facebook, and Instagram, implementing proper AI identity has several benefits:

For Users:

Peace of mind that AI assistants can’t exceed their permissions
Clear audit trails of what actions AI took on their behalf
Ability to revoke AI access instantly if needed

For Platforms:

Reduced security risks from compromised AI systems
Better compliance with privacy regulations
Ability to track and attribute all AI actions
Improved trust from users who know AI actions are controlled

Real-World Applications

Here’s how this might look in practice:

Booking.com: When you authorize the AI assistant to book trips under $500, it receives a digital identity certificate with these specific constraints. If it tries to book a $600 hotel, the booking system automatically rejects the request because it’s outside the authorized limit.

Instagram: Your AI assistant gets a unique identity that allows it to post content with specific hashtags you’ve approved. The platform can track exactly which AI posted what content, maintaining accountability.

Facebook: When the AI responds to comments on your business page, it uses its digital identity to prove it’s authorized to speak on your behalf, and Facebook’s systems can verify this authorization in real-time.

The Path Forward

As AI assistants become more integrated into our favorite apps and platforms, proper identity systems will be essential. Frameworks like SPIFFE (Secure Production Identity Framework for Everyone) provide the foundation, but platforms need to adapt them for consumer AI use cases.

For users, this mostly happens behind the scenes, but the result is more trustworthy AI assistants that can safely act on our behalf without overstepping boundaries.

The next time you ask an AI assistant to book a flight or post content for you, remember that its digital identity is what ensures it can only do what you’ve authorized—nothing more, nothing less.

References:

[1] SPIFFE - Secure Production Identity Framework for Everyone.

[2] Olden, E. (2025). “Why Agentic Identities Matter for Accountability and Trust.” Strata.io Blog.

From Gateway to Guardian: The Evolution of MCP Security

Sat, 21 Jun 2025 00:00:00 +0000

The Model Context Protocol (MCP) has rapidly evolved from experimental tool integration to enterprise-critical infrastructure. While AWS’s recent blog highlighted the operational benefits of centralized MCP gateways [1], the security landscape reveals a more complex reality: operational efficiency alone isn’t enough for production AI systems.

The Centralization Win

AWS’s MCP Gateway & Registry solution elegantly addresses the “wild west of AI tool integration” [1]. As Amit Arora described:

“Managing a growing collection of disparate MCP servers feels like herding cats. It slows down development, increases the chance of errors, and makes scaling a headache.” [1]

The gateway architecture provides immediate operational benefits:

Unified Discovery: Single catalog of all MCP servers and tools
Simplified Configuration: Predictable paths like gateway.mycorp.com/weather
Centralized Management: Real-time health monitoring and control
Standardized Access: Consistent authentication and logging

graph TD
    A[AI Agent] --> B[MCP Gateway]
    B --> C[Weather Server]
    B --> D[Database Server]
    B --> E[Email Server]
    B --> F[File Server]
    
    G[Web UI] --> B
    H[Health Monitor] --> B
    
    style B fill:#e1f5fe
    style A fill:#f3e5f5

Figure 1: Basic MCP Gateway Architecture - Centralized but not security-focused

The Security Reality Check

However, centralization without security creates new vulnerabilities. As Subramanya N from Agentic Trust warns, we’re operating in “the wild west of early computing, with computer viruses (now = malicious prompts hiding in web data/tools), and not well developed defenses” [2].

The core issue is Simon Willison’s “lethal trifecta” [2]:

Private Data Access: AI agents need extensive organizational data access
Untrusted Content Exposure: Agents process external content as instructions
External Communication: Agents can send data outside the organization

graph LR
    A[Private Data<br/>Access] --> D[Lethal<br/>Trifecta]
    B[Untrusted Content<br/>Exposure] --> D
    C[External<br/>Communication] --> D
    
    D --> E[Security<br/>Vulnerability]
    
    style D fill:#ffcdd2
    style E fill:#f44336,color:#fff

Figure 2: The Lethal Trifecta - When combined, these create unprecedented attack surfaces

MCP’s modular architecture inadvertently amplifies these risks by encouraging specialized servers that collectively provide all three dangerous capabilities.

Beyond “Glorified API Calls”

Enterprise MCP deployment involves complexity invisible in simple demos. As Subramanya N explains:

“In a real enterprise scenario, a lot more is happening behind the scenes” [3]

Enterprise requirements include:

Identity Management: Who is the AI agent acting for?
Dynamic Authorization: Different tools for different users
Audit Compliance: Complete request tracking
Version Control: Managing MCP server changes
Fault Tolerance: Circuit breaking and failover

The Guardian Architecture

The solution is evolving from operational gateway to security guardian through identity-aware architecture:

graph TD
    A[User] --> B[AI Agent]
    B --> C[Identity Provider<br/>OIDC]
    B --> D[API Gateway/Proxy<br/>Guardian]
    
    C --> D
    D --> E[MCP Server 1]
    D --> F[MCP Server 2]
    D --> G[MCP Server 3]
    
    H[Policy Engine] --> D
    I[Audit Logger] --> D
    J[Monitor] --> D
    
    style D fill:#c8e6c9
    style C fill:#fff3e0
    style H fill:#e8f5e8

Figure 3: Guardian Architecture - Identity-aware security controls

Key Guardian Capabilities

Identity-Aware Access Control

OIDC integration for authentication
Dynamic tool provisioning per user
Context-aware authorization decisions

Production Security Features

MCP version tracking and change management
Real-time threat detection
Automated incident response

Enterprise Compliance

Comprehensive audit trails
Regulatory compliance support
Risk assessment and reporting

Attack Flow Comparison

Before: Vulnerable Gateway

sequenceDiagram
    participant A as Attacker
    participant W as Web Content
    participant AI as AI Agent
    participant G as Basic Gateway
    participant D as Database
    
    A->>W: Embed malicious prompt
    AI->>W: Process content
    W->>AI: "Extract all customer data"
    AI->>G: Request customer data
    G->>D: Forward request
    D->>G: Return sensitive data
    G->>AI: Forward data
    AI->>A: Exfiltrate data via email

After: Guardian Protection

sequenceDiagram
    participant A as Attacker
    participant W as Web Content
    participant AI as AI Agent
    participant G as Guardian Gateway
    participant P as Policy Engine
    participant D as Database
    
    A->>W: Embed malicious prompt
    AI->>W: Process content
    W->>AI: "Extract all customer data"
    AI->>G: Request customer data
    G->>P: Check authorization
    P->>G: Deny - suspicious pattern
    G->>AI: Access denied
    Note over G: Alert security team

Figure 4: Attack Flow Comparison - Guardian architecture prevents exploitation

Implementation Strategy

Phase 1: Identity Foundation

Integrate OIDC identity provider
Implement token management
Establish basic authentication

Phase 2: Authorization Engine

Deploy policy-as-code framework
Implement role-based access control
Add dynamic tool provisioning

Phase 3: Security Monitoring

Deploy comprehensive logging
Implement anomaly detection
Add automated response capabilities

Phase 4: Advanced Protection

Content analysis for prompt injection
Dynamic risk assessment
Incident response automation

Production Challenges Addressed

The guardian architecture specifically addresses critical production issues:

Challenge	Guardian Solution
Remote MCP changes affecting agents	Version tracking and change management
No dynamic tool provisioning	Identity-aware tool catalogs
Limited audit capabilities	Comprehensive request logging
No threat detection	Real-time security monitoring
Manual incident response	Automated threat mitigation

The Path Forward

The evolution from gateway to guardian isn’t optional—it’s essential for production AI systems. Organizations must:

Start with Identity: Implement OIDC-based authentication
Add Authorization: Deploy dynamic policy engines
Enable Monitoring: Implement comprehensive observability
Automate Response: Deploy threat detection and mitigation

As AI agents become more autonomous and handle more sensitive data, robust security architecture becomes critical. The guardian approach provides a scalable foundation for managing evolving security challenges while preserving operational benefits.

The transformation represents the natural maturation of enterprise AI infrastructure. Organizations that embrace this evolution early will be better positioned to realize AI’s full potential while managing associated risks.

References

[1] Arora, A. (2025, May 30). How the MCP Gateway Centralizes Your AI Model’s Tools. AWS Community.

[2] N, S. (2025, June 16). The MCP Security Crisis: Understanding the ‘Wild West’ of AI Agent Infrastructure. Agentic Trust Blog.

[3] N, S. (2025, May 21). Securing MCP with OIDC & OIDC-A: Identity-Aware API Gateways Beyond “Glorified API Calls”. Subramanya N.

Securing MCP with OIDC & OIDC-A: Identity-Aware API Gateways Beyond "Glorified API Calls"

Wed, 21 May 2025 00:00:00 +0000

AI agents are quickly moving from research demos to real enterprise applications, connecting large language models (LLMs) with company data and services. A common approach is using tools or plugins to let an LLM fetch context or take actions – but some dismiss these as just “glorified API calls.” In reality, securely integrating AI with business systems is far more complex. This is where the Model Context Protocol (MCP) comes in, and why a robust proxy architecture with OpenID Connect (OIDC) identity is crucial for enterprise-scale deployments.

graph TB
    User[User] --> |interacts with| AIAgent[AI Agent]
    AIAgent --> |MCP requests| Proxy[API Gateway/Proxy]
    Proxy --> |authenticates via| OIDC[Identity Provider/OIDC]
    Proxy --> |routes to| Tools[MCP Tools/Servers]
    Tools --> |access| Backend[Backend Systems]
    
    subgraph "Security Perimeter"
        Proxy
        OIDC
    end
    
    classDef security fill:#f96,stroke:#333,stroke-width:2px;
    class Proxy,OIDC security;

The diagram above illustrates the high-level architecture of a secure MCP implementation. At its core, this architecture places an API Gateway/Proxy as the central security control point between AI agents and MCP tools. The proxy works in conjunction with an Identity Provider supporting OIDC to create a security perimeter that enforces authentication, authorization, and access controls. This ensures that all MCP requests from AI agents are properly authenticated and authorized before reaching the actual MCP tools, which in turn access various backend systems.

MCP is an open standard (originally introduced by Anthropic) that provides a consistent way for AI assistants to interact with external data sources and tools. Instead of bespoke integrations for each system, MCP acts like a universal connector, allowing AI models to retrieve context or execute tasks via a standardized JSON-RPC interface. Importantly, MCP was built with security in mind – nothing is exposed to the AI by default, and it only gains access to what you explicitly allow. In practice, however, ensuring that “allow list” principle across many tools and users requires careful infrastructure. A production-grade API gateway (proxy) can serve as the gatekeeper between AI agents (MCP clients) and the tools or data sources (MCP servers), enforcing authentication, authorization, and routing rules.

Before diving into the solution, a quick note on Envoy: there are active proposals to use Envoy Proxy as a reference implementation of an MCP gateway. Envoy’s rich L7 routing and extensibility make it a strong candidate, and it may soon include first-class MCP support. That said, the pattern we discuss here is proxy-agnostic – any modern HTTP reverse proxy or API gateway (Envoy, NGINX, HAProxy, Kong, etc.) that offers similar capabilities can be used. The goal is to outline a secure architecture for MCP, rather than the specifics of Envoy configuration.

Beyond “Glorified API Calls”: The Need for Secure MCP Integration

At first glance, using an AI tool via MCP might seem as simple as calling a web API. In a basic demo, an LLM agent could hit a REST endpoint, get some JSON, and that’s that. But in a real enterprise scenario, a lot more is happening behind the scenes:

graph LR
    subgraph "Simple API Call"
        A[Client] -->|Request| B[API]
        B -->|Response| A
    end
    
    subgraph "Enterprise MCP Reality"
        C[User] -->|Interacts| D[AI Agent]
        D -->|MCP Request with Identity| E[API Gateway]
        E -->|Validate Token| F[Identity Provider]
        E -->|Route Request| G[Tool Registry]
        E -->|Authorized Request| H[MCP Tool]
        H -->|Query with User Context| I[Backend System]
        I -->|Data| H
        H -->|Response| E
        E -->|Filtered Response| D
        D -->|Result| C
        
        J[Security Monitoring] -.->|Audit| E
    end
    
    classDef security fill:#f96,stroke:#333,stroke-width:2px;
    class E,F,G,J security;

This diagram contrasts a simple API call with the complex reality of enterprise MCP implementations. In the simple case, a client makes a direct request to an API and receives a response. However, in the enterprise MCP reality, the flow is much more complex:

A user interacts with an AI agent
The agent makes an MCP request that includes the user’s identity token
The API Gateway validates this token with an Identity Provider
The Gateway consults a Tool Registry to determine routing
If authorized, the request is forwarded to the appropriate MCP tool
The tool queries backend systems using the user’s context
Data flows back through the tool to the gateway
The gateway may filter the response based on security policies
The filtered response reaches the AI agent
The agent presents the result to the user

Throughout this process, security monitoring systems audit the interactions at the gateway level. This comprehensive flow ensures that user identity, permissions, and security policies are enforced at every step, far beyond what a simple API call would entail.

User Identity and Access Control: In an interactive AI application (like a chat assistant that can query internal systems), each request originates from a user with specific permissions. The system must ensure the AI only accesses data or performs actions that the current user is allowed to. Unlike a typical API call where a user directly authenticates to the service, here the AI agent is calling on the user’s behalf. Without a proper identity propagation mechanism, you risk turning a simple tool call into a serious data leak or privilege violation.
Multi-Step Context Exchanges: MCP supports stateful sessions and streaming interactions. An AI agent might carry on a multi-turn conversation, calling several tools in sequence and synthesizing their outputs. This is far beyond a one-off API call. The longer this chain goes, the higher the chance of things like context poisoning – where erroneous or malicious data from one step influences subsequent steps. We need safeguards so that a malicious response from one tool cannot trick the model into doing something dangerous in the next step.
Complex Delegation Chains: Related to the above, consider when tools call other tools. For example, an AI might use a “file search” tool which itself queries a database or calls another API. This delegation chain should carry forward the original user’s permissions and context without over-privileging any step. Each hop needs consistent enforcement of “who is allowed to do what,” or else an intermediate service might execute an action the user didn’t intend. Managing these delegated authorizations is non-trivial.
Dynamic Tool Provisioning: In agile environments, new tools (MCP servers) will be added frequently – think of spinning up a new microservice and immediately making it available to AI agents, or letting third-party plugins be installed. This dynamism is great for flexibility but a headache for security. How do you ensure every new tool meets your security standards? How do you prevent an unvetted or even malicious tool from being introduced? A free-for-all approach can quickly lead to chaos or breach. Clearly defined onboarding, registration, and policy enforcement for tools is needed from day one.

In short, an enterprise must treat AI tool integrations with the same rigor as any production service integration – if not more. A proper gateway layer helps address these concerns by acting as a central control point. Instead of hard-coding trust into each AI agent or tool, the proxy imposes organization-wide security policies. This approach moves us beyond the “just call an API” mindset to a structured model where every MCP call is authenticated, authorized, monitored, and audited.

Key Security Challenges in MCP Workflows

Let’s examine a few specific security challenges that arise when deploying MCP at scale, and why they matter:

graph TD
    A[Context Poisoning] --> |mitigated by| B[Content Filtering]
    A --> |mitigated by| C[Tool Verification]
    
    D[Identity Propagation] --> |solved with| E[Token-based Auth]
    D --> |solved with| F[Delegation Chains]
    
    G[Dynamic Tool Provisioning] --> |managed by| H[Tool Registry]
    G --> |managed by| I[Approval Workflows]
    G --> |managed by| J[Version Tracking]
    
    K[Remote MCP Changes] --> |controlled by| L[Proxy Governance]
    
    subgraph "Proxy Security Controls"
        B
        C
        E
        F
        H
        I
        J
        L
    end
    
    classDef challenge fill:#f66,stroke:#333,stroke-width:2px;
    classDef solution fill:#6f6,stroke:#333,stroke-width:2px;
    
    class A,D,G,K challenge;
    class B,C,E,F,H,I,J,L solution;

This diagram maps the key security challenges in MCP workflows (shown in red) to their corresponding solutions (shown in green) that can be implemented within the proxy security controls. The diagram illustrates how:

Context poisoning is mitigated through content filtering and tool verification
Identity propagation challenges are solved with token-based authentication and proper delegation chains
Dynamic tool provisioning risks are managed through a tool registry, approval workflows, and version tracking
Remote MCP changes are controlled through proxy governance

By implementing these controls within the proxy layer, organizations can address these security challenges in a centralized, consistent manner rather than trying to solve them individually for each tool or agent.

Context Poisoning: Because MCP enables feeding external data into the model’s context, there’s a risk that data could be deliberately crafted to mislead or exploit the model. This could be a form of prompt injection – e.g. a document retrieved via a tool might contain instructions that hijack the model’s behavior. A malicious actor might also try to register a tool that returns toxic content or false information. The architecture needs ways to validate and sanitize context coming from tools. Mitigations can include content filtering on responses, verifying data against expectations, or restricting which tools the model trusts for certain queries.
Delegation Chains and Identity Propagation: As mentioned, an AI agent often acts on behalf of a user. When it calls an MCP server, it should pass along who the user is (or at least what they’re allowed to do). If a tool then calls a backend API, that backend might also need credentials. This chain of delegation is tricky – you want to avoid the “sharing passwords” anti-pattern or hardcoding keys in the open. Instead, solutions involve tokens and OAuth flows: e.g. the user consents and an OAuth2/OIDC token is issued, the AI carries that token in MCP requests, and the MCP server can pass it through to the backend API (or exchange it). Managing these tokens and ensuring they’re used correctly (and not by someone else) is a core security task. The proxy should facilitate this by attaching and validating identity context at each step. It also enables RBAC policies – e.g. only allow certain tool methods if the user’s role is admin.

sequenceDiagram
    participant User
    participant AIAgent as AI Agent
    participant Proxy as API Gateway
    participant IdP as Identity Provider
    participant Tool as MCP Tool
    participant Backend as Backend System
    
    User->>IdP: 1. Authenticate (username/password)
    IdP->>User: 2. Issue OIDC token
    User->>AIAgent: 3. Interact with AI (token attached)
    AIAgent->>Proxy: 4. MCP request with token
    Proxy->>IdP: 5. Validate token
    IdP->>Proxy: 6. Token valid, contains claims/scopes
    
    alt Token Valid with Required Permissions
        Proxy->>Tool: 7. Forward request with user context
        Tool->>Backend: 8. Query with delegated auth
        Backend->>Tool: 9. Return data (filtered by user permissions)
        Tool->>Proxy: 10. Return result
        Proxy->>AIAgent: 11. Return authorized response
        AIAgent->>User: 12. Present result
    else Token Invalid or Insufficient Permissions
        Proxy->>AIAgent: 7. Reject request (401/403)
        AIAgent->>User: 8. Report access denied
    end

This sequence diagram illustrates the authentication and authorization flow in an MCP system using OIDC. The process begins with the user authenticating to an Identity Provider and receiving an OIDC token. This token is then attached to the user’s interactions with the AI agent. When the agent makes an MCP request, it includes this token, which the API Gateway validates with the Identity Provider.

If the token is valid and contains the necessary permissions (claims/scopes), the request is forwarded to the appropriate MCP tool along with the user’s context. The tool can then query backend systems using delegated authentication, ensuring that the data returned is filtered according to the user’s permissions. The result flows back through the system to the user.

If the token is invalid or lacks sufficient permissions, the request is rejected at the gateway level with an appropriate error code (401 Unauthorized or 403 Forbidden), and the AI agent reports this access denial to the user.

This flow ensures that user identity and permissions are consistently enforced throughout the entire interaction chain, preventing unauthorized access to sensitive data or operations.

Dynamic Tool Provisioning: In an MCP ecosystem, tools can come and go. For example, an enterprise might quickly stand up a new MCP server for a specific dataset or integrate a third-party service via MCP. Without controls, an AI agent might immediately start invoking any new tool as soon as it appears. That’s risky – you might not want a newly added tool to be available to everyone by default, or it might need vetting. There’s also the configuration aspect: new tool endpoints should be discoverable by the AI, and the gateway needs to know how to route to them and what auth to require. A secure setup will likely involve a tool registry or discovery service that the proxy consults, and administrative approval for tools. The proxy can then automatically enforce the appropriate auth and routing for each new tool, rather than relying on each agent developer to update logic. This provides a governance layer for tool lifecycle.

sequenceDiagram
    participant Admin
    participant Registry as Tool Registry
    participant Proxy as API Gateway
    participant Tool as New MCP Tool
    participant AIAgent as AI Agent
    
    Admin->>Tool: 1. Develop new MCP tool
    Admin->>Registry: 2. Register tool (metadata, endpoints, auth requirements)
    Registry->>Registry: 3. Validate tool configuration
    Registry->>Proxy: 4. Update routing configuration
    
    Note over Registry,Proxy: Tool is now registered but not yet approved
    
    Admin->>Registry: 5. Approve tool for specific user groups
    Registry->>Proxy: 6. Update access policies
    
    Note over AIAgent,Proxy: Tool is now available to authorized users
    
    AIAgent->>Proxy: 7. Discover available tools
    Proxy->>AIAgent: 8. Return approved tools for user
    AIAgent->>Proxy: 9. Call new tool
    Proxy->>Tool: 10. Route request if authorized

This sequence diagram illustrates the tool registration and approval workflow in a secure MCP environment. The process begins with an administrator developing a new MCP tool and registering it in the Tool Registry, providing metadata, endpoints, and authentication requirements. The registry validates the tool configuration and updates the routing configuration in the API Gateway.

At this point, the tool is registered but not yet approved for use. The administrator must explicitly approve the tool for specific user groups, which triggers an update to the access policies in the API Gateway. Only then does the tool become available to authorized users.

When an AI agent discovers available tools through the proxy, it only receives information about tools that have been approved for the current user. When the agent calls the new tool, the proxy routes the request to the tool only if the user is authorized to access it.

This workflow ensures that new tools undergo proper vetting and approval before they can be used, and that access is restricted to authorized users only. It also centralizes the tool governance process, making it easier to manage the lifecycle of MCP tools in a secure manner.

By recognizing these challenges, security engineers and architects can design defenses before problems occur. We next look at how an identity-aware proxy can provide those defenses in a clean, centralized way.

The Identity-Aware Proxy Pattern for MCP

A proven design in cloud architectures is to put a reverse proxy (often called an API gateway) in front of your services. MCP-based AI systems are no exception. By introducing an intelligent proxy between AI agents (clients) and the MCP servers (tools/backends), we create a controlled funnel through which all AI tool traffic passes. This proxy can operate at Layer 7 (application layer), meaning it understands HTTP and even JSON payloads, allowing fine-grained control. Below, we outline the key roles such a proxy plays in securing MCP:

graph TB
    subgraph "Client Side"
        User[User]
        AIAgent[AI Agent]
        User -->|interacts| AIAgent
    end
    
    subgraph "Security Layer"
        Proxy[API Gateway/Proxy]
        Auth[Authentication]
        RBAC[Authorization/RBAC]
        Registry[Tool Registry]
        Audit[Audit Logging]
        
        Proxy -->|uses| Auth
        Proxy -->|enforces| RBAC
        Proxy -->|consults| Registry
        Proxy -->|generates| Audit
    end
    
    subgraph "MCP Tools"
        Tool1[Document Search]
        Tool2[Database Query]
        Tool3[File Operations]
        Tool4[External API]
    end
    
    subgraph "Backend Systems"
        DB[(Databases)]
        Storage[File Storage]
        APIs[Internal APIs]
        External[External Services]
    end
    
    AIAgent -->|MCP requests| Proxy
    Proxy -->|routes to| Tool1
    Proxy -->|routes to| Tool2
    Proxy -->|routes to| Tool3
    Proxy -->|routes to| Tool4
    
    Tool1 -->|reads| DB
    Tool1 -->|reads| Storage
    Tool2 -->|queries| DB
    Tool3 -->|manages| Storage
    Tool4 -->|calls| APIs
    Tool4 -->|calls| External
    
    classDef security fill:#f96,stroke:#333,stroke-width:2px;
    class Proxy,Auth,RBAC,Registry,Audit security;

This diagram provides a detailed view of the identity-aware proxy pattern for MCP. The architecture is divided into four main layers:

Client Side: Users interact with AI agents, which generate MCP requests.
Security Layer: The API Gateway/Proxy sits at the center of the security layer, working with authentication, authorization/RBAC, tool registry, and audit logging components to enforce security policies.
MCP Tools: Various tools like document search, database query, file operations, and external API access are available through the MCP interface.
Backend Systems: The actual data sources and services that the MCP tools interact with, including databases, file storage, internal APIs, and external services.

All MCP requests from AI agents must pass through the proxy, which authenticates the requests, enforces RBAC policies, consults the tool registry to determine routing, and generates audit logs. The proxy then routes authorized requests to the appropriate MCP tools, which in turn interact with the backend systems.

This centralized security architecture ensures consistent enforcement of security policies across all MCP interactions, regardless of which tools are being used or which backend systems are being accessed.

Session-Aware Routing and Load Balancing

Unlike a simple stateless API call, MCP sessions can be long-lived and involve streaming (Server-Sent Events for output, etc.). The proxy should ensure that all requests and responses belonging to a given session or conversation are handled consistently. This often means implementing session affinity – if multiple instances of an MCP server are running, the proxy will route a given session’s traffic to the same instance each time. This prevents issues where, say, tool A’s state (in-memory cache, context window, etc.) is lost because request 2 went to a different instance than request 1. Modern proxies can do session-aware load balancing using HTTP headers or routes (for example, mapping a session ID or client ID in the URL to a particular backend). Additionally, the proxy can handle SSE connections gracefully, so that streaming responses aren’t accidentally broken by network intermediaries. Should a session need to be resumed or handed off, the gateway can coordinate that (as proposed in upcoming Envoy features for MCP). In short, the proxy ensures reliability and consistency for MCP’s stateful interactions, which is crucial for user experience and for maintaining correct context.

sequenceDiagram
    participant User
    participant AIAgent as AI Agent
    participant Proxy as API Gateway
    participant Instance1 as Tool Instance 1
    participant Instance2 as Tool Instance 2
    
    User->>AIAgent: Start conversation
    AIAgent->>Proxy: MCP request 1 (session=abc123)
    
    Note over Proxy: Session affinity routing
    
    Proxy->>Instance1: Route to instance 1
    Instance1->>Proxy: Response with state
    Proxy->>AIAgent: Return response
    
    User->>AIAgent: Continue conversation
    AIAgent->>Proxy: MCP request 2 (session=abc123)
    
    Note over Proxy: Same session ID routes to same instance
    
    Proxy->>Instance1: Route to instance 1 (preserves state)
    Instance1->>Proxy: Response with updated state
    Proxy->>AIAgent: Return response
    
    Note over User,Instance2: Without session affinity, request might go to instance 2 and lose state

This sequence diagram illustrates how session affinity works in an MCP environment. When a user starts a conversation with an AI agent, the agent makes an MCP request to the API Gateway with a session identifier (in this case, “abc123”). The gateway uses this session ID to route the request to a specific tool instance (Instance 1).

When the user continues the conversation, the agent makes another MCP request with the same session ID. Because the gateway implements session affinity, it routes this request to the same instance (Instance 1), which preserves the state from the previous interaction. This ensures a consistent and coherent experience for the user.

Without session affinity, the second request might be routed to a different instance (Instance 2), which would not have the state information from the first request. This would result in a broken experience, as the tool would not have the context of the previous interaction.

Session affinity is particularly important for MCP because many AI interactions are stateful and context-dependent. The proxy’s ability to maintain this session consistency is a key advantage over simpler API integration approaches.

JWT and OIDC Integration for Authentication

Every request hitting the MCP gateway should carry a valid identity token – typically a JSON Web Token (JWT) issued by an Identity Provider via OIDC (OpenID Connect). By requiring JWTs, the proxy offloads authentication from the tools themselves and ensures that only authenticated, authorized calls make it through. In practice, this means the AI agent (or the user’s session with the agent) must obtain an OIDC token (for example, an ID token or access token) and attach it to each MCP request (often in an HTTP header like Authorization: Bearer <token>). The proxy verifies this token, checks signature and claims (issuer, audience, expiration, etc.), and rejects any request that isn’t properly authenticated. This way, your MCP servers never see an anonymous call – they trust the gateway to have vetted identity.

sequenceDiagram
    participant User
    participant App as AI Application
    participant IdP as Identity Provider
    participant Proxy as API Gateway
    participant Tool as MCP Tool
    
    User->>App: Access AI application
    App->>IdP: Redirect to login
    User->>IdP: Authenticate
    IdP->>App: Authorization code
    App->>IdP: Exchange code for tokens
    IdP->>App: ID token + access token
    
    Note over App: Store tokens securely
    
    User->>App: Request using AI tool
    App->>Proxy: MCP request with access token
    
    Proxy->>Proxy: Validate token (signature, expiry, audience)
    Proxy->>Proxy: Extract user identity and permissions
    
    alt Token Valid
        Proxy->>Tool: Forward request with user context
        Tool->>Proxy: Response
        Proxy->>App: Return response
        App->>User: Display result
    else Token Invalid
        Proxy->>App: 401 Unauthorized
        App->>User: Session expired, please login again
    end
    
    Note over App,Proxy: Token refresh happens in background
    App->>IdP: Refresh token when needed
    IdP->>App: New access token

This sequence diagram illustrates the OIDC authentication flow in an MCP environment. The process begins when a user accesses the AI application, which redirects to the Identity Provider for authentication. After the user authenticates, the Identity Provider issues an authorization code, which the application exchanges for ID and access tokens.

The application securely stores these tokens and uses the access token when making MCP requests through the AI agent. When the proxy receives a request, it validates the token by checking the signature, expiration, audience, and other claims. It also extracts the user’s identity and permissions from the token.

If the token is valid, the proxy forwards the request to the appropriate MCP tool along with the user’s context. The tool processes the request and returns a response, which flows back through the proxy to the application and ultimately to the user.

If the token is invalid (expired, tampered with, etc.), the proxy returns a 401 Unauthorized response, and the application prompts the user to log in again.

In the background, the application can use a refresh token to obtain new access tokens when needed, without requiring the user to re-authenticate. This ensures a smooth user experience while maintaining security.

This OIDC integration provides a robust authentication mechanism that is widely adopted in enterprise environments and integrates well with existing identity management systems.

Introducing OIDC-A for Agent & Tool Identity

While the discussion above focuses on authenticating the human user, a production-grade MCP deployment must also identify two additional actors:

The LLM agent that is orchestrating the workflow.
The MCP tool / resource that is being invoked on the backend.

Our companion post “OpenID Connect for Agents (OIDC-A) 1.0 Proposal” (/2025/04/28/oidc-a-proposal/) extends OIDC Core 1.0 with a rich set of claims for agent identity, attestation, and delegation chains. In practice this means:

When an AI agent starts a session it obtains an ID Token that contains the OIDC-A claims (agent_type, agent_model, agent_instance_id, delegator_sub, delegation_chain, etc.). This token travels alongside the user’s access token in every MCP request.
MCP tools can likewise expose their own OIDC identity (or be issued a signed resource token) that advertises metadata such as tool capabilities, version, and trust level (agent_capabilities, agent_trust_level, agent_attestation).
The gateway now validates up to three identities on every call – user → agent → tool – forming an explicit delegation chain that can be evaluated against RBAC and compliance policies.

Adopting OIDC-A brings several benefits:

End-to-end, cryptographically verifiable identity for everything that touches the request path.
Fine-grained authorisation based on agent or tool capabilities (e.g., allow only agents that advertise email:draft capability to invoke the Mail tool).
Built-in attestation (agent_attestation) enables the gateway to verify the integrity and provenance of both agents and tools before routing traffic to them.

For the remainder of this article, whenever we refer to a “token” being validated by the gateway, assume this now encompasses the user’s token, the agent’s OIDC-A token, and (optionally) the tool/resource token – all evaluated in a single policy decision step.

This pattern is already used widely in API security: “an API Gateway can securely and consistently implement authentication… without burdening the applications themselves.” In our context, the MCP proxy might integrate with your enterprise SSO (Azure AD, Okta, etc.) via OIDC to handle user login flows and token validation. Many gateways support OIDC natively, initiating redirects for user login if needed and then storing the resulting token in a cookie for session continuity. In a headless agent scenario (where the AI is calling tools server-to-server), the token might be provisioned out-of-band (e.g. the user logged into the AI app, so the app injects the token for the agent to use). Either way, the gateway enforces that no token = no access. It can also map token claims to roles or scopes to implement authorization (e.g., only users with an “HR_read” scope can use the “HR Database” tool). This aligns perfectly with MCP’s design goal of secure connections – combining MCP with OIDC and OIDC-A gives you an end-to-end authenticated channel for tool usage.

sequenceDiagram
    participant User
    participant Agent as LLM Agent (OIDC-A)
    participant Proxy as API Gateway
    participant Tool as MCP Tool (OIDC-A)
    participant Backend as Backend System

    User->>Agent: 1. Interact (chat, form, etc.)
    Agent->>Proxy: 2. MCP request\nBearer user token + agent OIDC-A token
    Proxy->>Proxy: 3. Validate user token (OIDC) & agent token (OIDC-A)
    Proxy-->>Tool: 4. Forward request plus optional *resource token* for the tool
    Tool->>Backend: 5. Query/act using delegated auth
    Backend-->>Tool: 6. Data / result
    Tool-->>Proxy: 7. Response (may include attestation)
    Proxy-->>Agent: 8. Authorized response
    Agent-->>User: 9. Present result

Tool Metadata Filtering and Policy Enforcement

A powerful advantage of the proxy is that it can make routing decisions based not just on URLs, but on metadata within the requests. With MCP, requests and responses are in JSON-RPC format, which includes fields like the tool method name, parameters, and even tool annotations. An identity-aware proxy can be configured to inspect these details and apply policy rules. For example, you might configure rules such as:

graph TD
    subgraph "MCP Request"
        Request[JSON-RPC Request]
        Method[Tool Method]
        Params[Parameters]
        User[User Identity]
    end
    
    subgraph "Policy Engine"
        Rules[Policy Rules]
        RBAC[Role-Based Access]
        Audit[Audit Logging]
        Transform[Response Transformation]
    end
    
    Request --> Method
    Request --> Params
    Request --> User
    
    Method --> Rules
    Params --> Rules
    User --> RBAC
    
    Rules --> Decision{Allow/Deny}
    RBAC --> Decision
    
    Decision -->|Allow| Forward[Forward to Tool]
    Decision -->|Deny| Reject[Reject Request]
    
    Forward --> Audit
    Reject --> Audit
    
    Forward --> Tool[MCP Tool]
    Tool --> Response[Tool Response]
    Response --> Transform
    Transform --> Filtered[Filtered Response]
    
    classDef request fill:#bbf,stroke:#333,stroke-width:1px;
    classDef policy fill:#fbf,stroke:#333,stroke-width:1px;
    classDef action fill:#bfb,stroke:#333,stroke-width:1px;
    
    class Request,Method,Params,User request;
    class Rules,RBAC,Audit,Transform policy;
    class Decision,Forward,Reject,Filtered action;

This diagram illustrates how tool metadata filtering and policy enforcement work in an MCP proxy. The process begins with an MCP request in JSON-RPC format, which contains the tool method, parameters, and user identity information. These components are extracted and fed into the policy engine.

The policy engine consists of policy rules, role-based access control (RBAC), audit logging, and response transformation components. The tool method and parameters are evaluated against the policy rules, while the user identity is checked against RBAC permissions.

Based on these evaluations, the policy engine makes an allow/deny decision. If the request is allowed, it is forwarded to the MCP tool; if denied, it is rejected. In either case, the action is logged for audit purposes.

When a request is allowed and processed by the tool, the response may pass through a transformation step before being returned to the client. This transformation can filter or modify the response based on security policies, such as removing sensitive information that the user shouldn’t see.

This fine-grained policy enforcement at the metadata level allows for sophisticated security controls that go far beyond simple URL-based routing. For example:

“If the tool call is delete_file and the user is not in the IT Admin group, deny the request.”
“Only allow the execute_sql tool on weekdays between 9am-5pm, and log all queries.”
“If a tool is marked as containing sensitive data, ensure the response is sanitized or encrypted.”

This is analogous to a web application firewall (WAF) or an API gateway performing content filtering, but tailored to AI tool usage. In the Envoy MCP proposal, this corresponds to parsing MCP messages and using RBAC filters on them. The proxy essentially understands the intent of each tool call and can gate it appropriately. It also can redact or transform data if needed – for instance, stripping out certain fields from a response that the user shouldn’t see, or masking personally identifiable information. By centralizing this in the gateway, you avoid having to implement checks in each tool service (which could be inconsistent or forgotten). Auditing is another benefit: the proxy can log every tool invocation along with user identity and parameters, feeding into SIEM systems for monitoring. That way, if an AI one day does something it shouldn’t, you have a clear trail of which tool call was involved and who prompted it. In sum, metadata-based filtering turns the proxy into a smart policy enforcement point, adding a safety layer on top of MCP’s basic capabilities.

Version-Aware and Context-Aware Routing

Enterprises constantly evolve their services – new versions, A/B tests, staging vs. production deployments, etc. The proxy can greatly simplify how AI agents handle these changes. Instead of the AI needing to know which version of a tool to call, the gateway can implement version-aware routing. For instance, the MCP endpoint for a “Document Search” tool could remain the same for the agent, but the proxy might route 90% of requests to v1 of the service and 10% to a new v2 (for a canary rollout). Or route internal users to a “beta” instance while external users go to stable. This is done by matching on request attributes or using routing rules that include user audience and tool identifiers.

graph TB
    AIAgent[AI Agent] -->|MCP Request| Proxy[API Gateway]
    
    Proxy -->|"90% traffic"| V1[Tool v1]
    Proxy -->|"10% traffic"| V2[Tool v2 - Canary]
    
    Proxy -->|"Internal Users"| Beta[Beta Version]
    Proxy -->|"External Users"| Stable[Stable Version]
    
    Proxy -->|"Small Requests"| Standard[Standard Instance]
    Proxy -->|"Large Requests"| HighMem[High-Memory Instance]
    
    Proxy -->|"US Users"| US[US Region]
    Proxy -->|"EU Users"| EU[EU Region]
    
    classDef proxy fill:#f96,stroke:#333,stroke-width:2px;
    classDef version fill:#bbf,stroke:#333,stroke-width:1px;
    classDef audience fill:#bfb,stroke:#333,stroke-width:1px;
    classDef size fill:#fbf,stroke:#333,stroke-width:1px;
    classDef region fill:#ff9,stroke:#333,stroke-width:1px;
    
    class Proxy proxy;
    class V1,V2 version;
    class Beta,Stable audience;
    class Standard,HighMem size;
    class US,EU region;

This diagram illustrates the various routing strategies that an API Gateway can implement for MCP requests. The gateway can route traffic based on multiple factors:

Version-based routing: The gateway can split traffic between different versions of a tool, such as sending 90% to v1 and 10% to a canary deployment of v2. This allows for gradual rollouts and A/B testing without requiring changes to the AI agents.
Audience-based routing: Internal users can be directed to beta versions of tools, while external users are routed to stable versions. This allows for internal testing and validation before wider release.
Request size-based routing: Small requests can be handled by standard instances, while large requests that require more resources are directed to high-memory instances. This optimizes resource utilization and ensures that demanding requests don’t impact the performance of standard operations.
Geographic routing: Users from different regions can be directed to region-specific instances, reducing latency and potentially addressing data residency requirements.

The AI agent doesn’t need to be aware of these routing decisions; it simply makes requests to the logical tool name, and the gateway handles the complexity of routing to the appropriate backend. This abstraction simplifies the agent’s implementation while providing powerful operational capabilities.

Similarly, routing can consider context – e.g., direct requests to the nearest regional server for lower latency if the user’s location is known, or choose a different backend depending on the size of the request (perhaps a special high-memory instance for very large files). All of this is configurable at the proxy level. The AI agent simply calls the logical tool name, and the gateway takes care of finding the right backend. This not only eases operations (you can upgrade backend tools without breaking the AI’s interface), but also adds to security. You could isolate certain versions for testing, or ensure that experimental tools are only accessible under certain conditions. By controlling traffic flow, the proxy helps maintain a principle of least privilege on a macro scale – the AI only reaches the backends it’s supposed to, via routes that are appropriate for the current context.

Implementing MCP Security with a Proxy: A Practical Approach

Now that we’ve covered the key security patterns, let’s look at a practical approach to implementing MCP security with an identity-aware proxy. This section outlines the steps to set up a secure MCP environment, focusing on the integration points between components.

graph TB
    subgraph ImplementationSteps["Implementation Steps"]
        Step1[1. Set up Identity Provider]
        Step2[2. Configure API Gateway]
        Step3[3. Implement Tool Registry]
        Step4[4. Define Security Policies]
        Step5[5. Integrate AI Agents]
        Step6[6. Monitor and Audit]
        
        Step1 --> Step2
        Step2 --> Step3
        Step3 --> Step4
        Step4 --> Step5
        Step5 --> Step6
    end
    
    classDef step fill:#beb,stroke:#333,stroke-width:1px
    class Step1,Step2,Step3,Step4,Step5,Step6 step

This diagram outlines the six key steps in implementing MCP security with a proxy. The process follows a logical progression:

Set up Identity Provider: Establish the foundation for authentication and authorization.
Configure API Gateway: Set up the central security control point.
Implement Tool Registry: Create a system for managing MCP tools.
Define Security Policies: Establish the rules for access control and data protection.
Integrate AI Agents: Connect the AI agents to the secure MCP environment.
Monitor and Audit: Continuously track and review system activity.

Each step builds on the previous ones, creating a comprehensive security implementation. The following sections will explore each step in detail.

1. Setting Up the Identity Provider

The first step is to configure your identity provider (IdP) to support the OIDC flows needed for MCP security. This typically involves:

Creating an OIDC application in your IdP (e.g., Azure AD, Okta, Auth0)
Configuring the appropriate scopes and claims
Setting up the redirect URIs for your AI application
Generating client credentials (client ID and secret)

The IdP will be responsible for authenticating users and issuing the tokens that will be used to secure MCP requests. It’s important to configure the appropriate scopes and claims to ensure that the tokens contain the necessary information for authorization decisions.

2. Configuring the API Gateway

Next, you’ll need to configure your API gateway to act as the MCP proxy. This involves:

sequenceDiagram
    participant Admin
    participant Gateway as API Gateway
    participant IdP as Identity Provider
    
    Admin->>Gateway: 1. Configure OIDC integration
    Gateway->>IdP: 2. Fetch OIDC discovery document
    IdP->>Gateway: 3. Return endpoints and keys
    
    Admin->>Gateway: 4. Set up MCP routing rules
    Admin->>Gateway: 5. Configure security policies
    
    Note over Gateway: Gateway ready to validate tokens and route MCP traffic

This sequence diagram illustrates the process of configuring an API Gateway for MCP security. The process begins with an administrator configuring the OIDC integration in the gateway. The gateway then fetches the OIDC discovery document from the Identity Provider, which returns the necessary endpoints and keys for token validation.

Next, the administrator sets up MCP routing rules, defining how requests should be directed to different MCP tools based on various criteria. The administrator also configures security policies, specifying who can access which tools and under what conditions.

Once these configurations are complete, the gateway is ready to validate tokens and route MCP traffic according to the defined rules and policies. This setup process establishes the gateway as the central security control point for all MCP interactions.

The configuration steps include:

Setting up the OIDC integration, including configuring the token validation parameters (issuer, audience, etc.)
Defining the routing rules for MCP requests
Configuring the security policies for tool access
Setting up the audit logging

The gateway will be responsible for validating the tokens, enforcing the security policies, and routing the MCP requests to the appropriate backends. It’s important to ensure that the gateway is properly configured to handle the MCP JSON-RPC format and to extract the necessary information for policy decisions.

3. Implementing the Tool Registry

A tool registry is essential for managing the lifecycle of MCP tools in your environment. This involves:

Creating a database or service to store tool metadata
Defining the registration process for new tools
Implementing the approval workflow for tool access
Integrating the registry with the API gateway

The tool registry will be responsible for maintaining the list of available tools, their endpoints, and their access requirements. It will also provide the necessary information to the API gateway for routing and policy enforcement.

graph TB
    subgraph "Tool Registry"
        DB[(Tool Database)]
        API[Registry API]
        UI[Admin UI]
        
        UI -->|Manage Tools| API
        API -->|CRUD Operations| DB
    end
    
    subgraph "Integration Points"
        Gateway[API Gateway]
        Agents[AI Agents]
        
        API -->|Tool Configurations| Gateway
        API -->|Available Tools| Agents
    end
    
    subgraph "Tool Lifecycle"
        Register[Register]
        Approve[Approve]
        Deploy[Deploy]
        Monitor[Monitor]
        Retire[Retire]
        
        Register --> Approve
        Approve --> Deploy
        Deploy --> Monitor
        Monitor --> Retire
    end
    
    classDef registry fill:#bbf,stroke:#333,stroke-width:1px;
    classDef integration fill:#fbf,stroke:#333,stroke-width:1px;
    classDef lifecycle fill:#bfb,stroke:#333,stroke-width:1px;
    
    class DB,API,UI registry;
    class Gateway,Agents integration;
    class Register,Approve,Deploy,Monitor,Retire lifecycle;

This diagram illustrates the components and lifecycle of a Tool Registry in an MCP environment. The Tool Registry consists of three main components:

Tool Database: Stores metadata about all registered MCP tools, including their endpoints, versions, access requirements, and status.
Registry API: Provides programmatic access to the tool database, enabling CRUD operations on tool registrations.
Admin UI: Allows administrators to manage tools through a user interface, including registration, approval, and monitoring.

The Tool Registry integrates with two key systems:

API Gateway: Receives tool configurations from the registry, which inform routing and policy decisions.
AI Agents: Discover available tools through the registry, based on user permissions and tool status.

The diagram also shows the lifecycle of an MCP tool:

Register: A new tool is registered in the system with its metadata.
Approve: The tool undergoes review and is approved for use by specific user groups.
Deploy: The tool is made available in the production environment.
Monitor: The tool’s usage and performance are monitored.
Retire: When no longer needed, the tool is retired from the system.

This comprehensive approach to tool management ensures that all MCP tools are properly vetted, deployed, and monitored throughout their lifecycle, reducing security risks and operational issues.

4. Defining Security Policies

Security policies are the rules that govern access to MCP tools. This involves:

Defining the RBAC policies for tool access
Configuring the content filtering rules for responses
Setting up the audit logging requirements
Implementing the version control policies

The security policies will be enforced by the API gateway based on the user’s identity and the tool being accessed. It’s important to ensure that the policies are comprehensive and aligned with your organization’s security requirements.

5. Integrating AI Agents

Finally, you’ll need to integrate your AI agents with the secure MCP environment. This involves:

Configuring the agents to obtain and use OIDC tokens
Implementing the MCP client functionality
Handling authentication and authorization errors
Managing token refresh and session continuity

The AI agents will be responsible for obtaining the necessary tokens and including them in MCP requests. They’ll also need to handle authentication and authorization errors gracefully, providing appropriate feedback to users.

sequenceDiagram
    participant User
    participant Agent as AI Agent
    participant App as Application
    participant IdP as Identity Provider
    participant Gateway as API Gateway
    participant Tool as MCP Tool
    
    User->>App: Access AI application
    App->>IdP: Authenticate user
    IdP->>App: Issue tokens
    
    User->>Agent: Request using AI capabilities
    Agent->>App: Request token for MCP
    App->>Agent: Provide token
    
    Agent->>Gateway: MCP request with token
    Gateway->>Gateway: Validate token & apply policies
    Gateway->>Tool: Forward authorized request
    Tool->>Gateway: Response
    Gateway->>Agent: Return response
    Agent->>User: Present result
    
    Note over App,Gateway: Token refresh cycle
    App->>IdP: Refresh token when needed
    IdP->>App: New access token

This sequence diagram illustrates the integration of AI agents with a secure MCP environment. The process begins when a user accesses the AI application, which authenticates the user with the Identity Provider and receives tokens.

When the user makes a request that requires AI capabilities, the AI agent requests a token from the application, which provides it. The agent then includes this token in its MCP request to the API Gateway.

The gateway validates the token and applies security policies to determine if the request should be allowed. If authorized, the request is forwarded to the appropriate MCP tool, which processes it and returns a response. This response flows back through the gateway to the agent and ultimately to the user.

In the background, the application handles token refresh cycles, requesting new access tokens from the Identity Provider when needed. This ensures continuous operation without requiring the user to re-authenticate frequently.

This integration approach ensures that AI agents operate within the security framework established by the proxy architecture, with all requests properly authenticated and authorized.

Conclusion: Beyond Glorified API Calls

By implementing a secure MCP architecture with an identity-aware proxy, you move far beyond “glorified API calls” to a robust, enterprise-grade integration between AI agents and your business systems. This approach addresses the key security challenges of MCP deployments, including:

User identity and access control
Multi-step context exchanges
Complex delegation chains
Dynamic tool provisioning
Remote MCP changes and version tracking

The proxy-based architecture provides a centralized control point for enforcing security policies, managing tool access, and monitoring AI agent activity. It also simplifies operations by abstracting away the complexity of backend services and providing a consistent interface for AI agents.

As MCP continues to evolve and gain adoption, the security patterns described in this article will become increasingly important for enterprise deployments. By implementing these patterns now, you can ensure that your AI agent infrastructure is secure, scalable, and ready for the future.

graph LR
    A[Glorified API Calls] -->|Evolution| B[Secure MCP Architecture]
    
    subgraph "Key Benefits"
        C[Centralized Security]
        D[Identity Propagation]
        E[Policy Enforcement]
        F[Audit & Compliance]
        G[Operational Simplicity]
    end
    
    B --> C
    B --> D
    B --> E
    B --> F
    B --> G
    
    classDef benefit fill:#bfb,stroke:#333,stroke-width:1px;
    class C,D,E,F,G benefit;

This final diagram summarizes the evolution from “glorified API calls” to a secure MCP architecture, highlighting the key benefits of this approach:

Centralized Security: A single control point for enforcing security policies across all MCP interactions.
Identity Propagation: Consistent handling of user identity and permissions throughout the system.
Policy Enforcement: Fine-grained control over who can access which tools and under what conditions.
Audit & Compliance: Comprehensive logging and monitoring of all MCP activities for security and compliance purposes.
Operational Simplicity: Abstraction of backend complexity, making it easier to manage and evolve the system over time.

By adopting this architecture, organizations can confidently deploy AI agents in enterprise environments, knowing that their MCP interactions are secure, auditable, and manageable at scale. This represents a significant advancement beyond the simplistic view of AI tools as mere API calls, recognizing the complex security requirements of production AI systems.

OpenID Connect for Agents (OIDC-A) 1.0 Proposal

Mon, 28 Apr 2025 00:00:00 +0000

This document proposes a standard extension to OpenID Connect for representing and verifying the identity of LLM-based agents. It integrates the core proposal with detailed frameworks for verification, attestation, and delegation chains.

Abstract

OpenID Connect for Agents (OIDC-A) 1.0 is an extension to OpenID Connect Core 1.0 that provides a framework for representing, authenticating, and authorizing LLM-based agents within the OAuth 2.0 ecosystem. This specification defines standard claims, endpoints, and protocols for establishing agent identity, verifying agent attestation, representing delegation chains, and enabling fine-grained authorization based on agent attributes.

1. Introduction

1.1 Rationale

As LLM-based agents become increasingly prevalent in digital ecosystems, there is a growing need for standardized methods to represent their identity and manage their authorization. Traditional OAuth 2.0 and OpenID Connect protocols were designed primarily for human users and conventional applications, lacking the necessary constructs to represent the unique characteristics of autonomous agents, such as:

Acting on behalf of users with varying degrees of autonomy
Operating within delegation chains
Possessing dynamic capabilities based on their underlying models
Requiring attestation of their integrity and origin

This specification addresses these gaps by extending OpenID Connect to provide a comprehensive framework for agent identity and authorization.

1.2 Terminology

This specification uses the terms defined in OAuth 2.0 [RFC6749], OpenID Connect Core 1.0, and the following additional terms:

Agent: An LLM-based software entity capable of autonomous or semi-autonomous action based on natural language instructions.
Agent Provider: The organization responsible for creating, training, and/or hosting the agent.
Agent Model: The specific LLM model that powers the agent (e.g., GPT-4, Claude 3).
Agent Instance: A specific running instance of an agent, typically associated with a particular task or conversation.
Delegator: The entity (typically a human user) who delegates authority to an agent to act on their behalf.
Delegation Chain: A sequence of delegation steps from the original user through potentially multiple agents.
Attestation: Cryptographic proof of an agent’s integrity, origin, and/or properties.
Attestation Evidence: Data structure containing the proof used for attestation.
Relying Party (RP): In this context, often a Resource Server or Client application that needs to verify an agent’s identity and authorization.

1.3 Overview

OIDC-A extends OpenID Connect by:

Defining new standard claims for representing agent identity, delegation, and capabilities.
Specifying mechanisms and formats for agent attestation evidence.
Establishing protocols for representing and validating delegation chains.
Providing discovery mechanisms for agent capabilities and attestation support.
Defining authorization frameworks suitable for agent-specific use cases.
Introducing endpoints for attestation verification and capability discovery.

2. Agent Identity Claims

2.1 Core Agent Identity Claims

The following claims MUST or SHOULD be included in ID Tokens issued to or about agents:

Claim	Type	Description	Requirement
`agent_type`	string	Identifies the type/class of agent (e.g., "assistant", "retrieval", "coding")	REQUIRED
`agent_model`	string	Identifies the specific model (e.g., "gpt-4", "claude-3-opus", "gemini-pro")	REQUIRED
`agent_version`	string	Version identifier of the agent model	RECOMMENDED
`agent_provider`	string	Organization that provides/hosts the agent (e.g., "openai.com", "anthropic.com")	REQUIRED
`agent_instance_id`	string	Unique identifier for this specific instance of the agent	REQUIRED

2.2 Delegation and Authority Claims

Claim	Type	Description	Requirement
`delegator_sub`	string	Subject identifier of the entity who most recently delegated authority to this agent	REQUIRED
`delegation_chain`	array	Ordered array of delegation steps (see Section 2.4.2)	OPTIONAL
`delegation_purpose`	string	Description of the purpose/intent for which authority was delegated	RECOMMENDED
`delegation_constraints`	object	Constraints placed on the agent by the delegator	OPTIONAL

2.3 Capability, Trust, and Attestation Claims

Claim	Type	Description	Requirement
`agent_capabilities`	array	Array of capability identifiers representing what the agent can do	RECOMMENDED
`agent_trust_level`	string	Trust classification of the agent (e.g., "verified", "experimental")	OPTIONAL
`agent_attestation`	object	Attestation evidence or reference (see Section 2.4.4)	RECOMMENDED
`agent_context_id`	string	Identifier for the conversation/task context	RECOMMENDED

2.4 Claim Formats and Validation

2.4.1 `agent_type`

String value from a defined set of agent types. Implementers SHOULD use one of the following values when applicable:

assistant: General-purpose assistant agent
retrieval: Agent specialized in information retrieval
coding: Agent specialized in code generation or analysis
domain_specific: Agent specialized for a particular domain
autonomous: Agent with high degree of autonomy
supervised: Agent requiring human supervision for key actions

Custom types MAY be used but SHOULD follow the format vendor:type (e.g., acme:financial_advisor).

2.4.2 `delegation_chain`

JSON array containing objects representing each step in the delegation chain, from the original user to the current agent. Each object MUST contain:

iss: REQUIRED. String identifying the Authorization Server or entity that issued/validated this delegation step.
sub: REQUIRED. String identifying the delegator (the entity granting permission).
aud: REQUIRED. String identifying the delegatee (the agent receiving permission).
delegated_at: REQUIRED. NumericDate representing the time the delegation occurred.
scope: REQUIRED. Space-separated string of OAuth scopes representing the permissions granted in this delegation step. MUST be a subset of the scopes held by the delegator (sub).
purpose: OPTIONAL. String describing the intended purpose of this delegation step.
constraints: OPTIONAL. JSON object specifying constraints on the delegation (e.g., {"max_duration": 3600, "allowed_resources": ["/data/abc"]}).
jti: OPTIONAL. A unique identifier for this specific delegation step, useful for revocation or tracking.

The array MUST be ordered chronologically.

Validation Rules for delegation_chain (performed by Relying Party):

Order Verification: Confirm chronological order based on delegated_at.
Issuer Trust: Verify each iss is trusted.
Audience Matching: Confirm aud of step N matches sub of step N+1.
Scope Reduction: Verify scope in each step is a subset of/equal to the delegator’s available scopes.
Constraint Enforcement: Ensure compliance with any constraints.
Signature Validation (if applicable): Validate signatures if steps are individually signed.
Policy Check: Evaluate the validated chain against authorization policies (e.g., max length).

2.4.3 `agent_capabilities`

Array of string identifiers representing the agent’s capabilities. Implementers SHOULD use capability identifiers from a well-defined taxonomy when available. Custom capabilities SHOULD follow the format vendor:capability (e.g., acme:financial_analysis).

2.4.4 `agent_attestation`

JSON object containing attestation evidence or a reference to it. MUST include a format field indicating the type of evidence.

Recommended Format: JWT-based, potentially compatible with IETF RATS Entity Attestation Token (EAT).

Example:

"agent_attestation": {
  "format": "urn:ietf:params:oauth:token-type:eat",
  "token": "eyJhbGciOiJFUzI1NiIsInR5cCI6ImVhdCtqd3QifQ..."
}

Other formats (e.g., "format": "TPM2-Quote", "format": "SGX-Quote") MAY be used.

3. Protocol Flow

3.1 Agent Authentication Flow

The OIDC-A authentication flow extends the standard OpenID Connect Authentication flow:

Client Registration: Clients representing agents MUST register additional metadata (see Section 4).
Authentication Request: Agents SHOULD include the agent scope and potentially delegation_context.
Authentication Response: The Authorization Server includes agent-specific claims in the ID Token.
Token Validation: RPs MUST validate standard OIDC claims and relevant agent-specific claims (including attestation and delegation if present) according to policy.

3.2 Delegation Flow

When an agent is delegated authority:

The delegator authenticates and authorizes the delegation.
The Authorization Server issues a new ID Token to the agent including delegator_sub, delegation_chain (updated), delegation_purpose, and constrained scope.

3.3 Attestation Verification Flow

To verify an agent’s attestation:

The agent includes the agent_attestation claim in its ID Token or provides evidence separately.
The RP validates the evidence based on the specified format:
- Verify cryptographic signatures using trusted keys (obtained via Discovery).
- Compare platform measurements against known-good values.
- Validate nonces to prevent replay attacks.
- Optionally, use the agent_attestation_endpoint for validation assistance.
Authorization decisions incorporate the attestation status (e.g., verified: true/false).

4. Client Registration and Discovery

4.1 Agent Client Registration Metadata

Extends OAuth 2.0 Dynamic Client Registration [RFC7591]:

Parameter	Type	Description
`agent_provider`	string	Identifier of the agent provider
`agent_models_supported`	array	List of supported agent models
`agent_capabilities`	array	List of agent capabilities
`attestation_formats_supported`	array	List of supported attestation formats
`delegation_methods_supported`	array	List of supported delegation methods

4.2 Discovery Metadata

Extends OpenID Connect Discovery 1.0:

Parameter	Type	Description
`agent_attestation_endpoint`	string	URL of the attestation endpoint
`agent_capabilities_endpoint`	string	URL of the capabilities discovery endpoint
`agent_claims_supported`	array	List of supported agent claims
`agent_types_supported`	array	List of supported agent types
`delegation_methods_supported`	array	List of supported delegation methods
`attestation_formats_supported`	array	List of supported attestation formats
`attestation_verification_keys_endpoint`	string	URL to retrieve public keys for verifying attestation signatures

5. Endpoints

5.1 Agent Attestation Endpoint

An OAuth 2.0 protected resource that returns attestation information about an agent or assists in validating provided evidence. URL advertised via agent_attestation_endpoint discovery parameter.

5.1.1 Request Example (Get Info)

GET /agent/attestation?agent_id=123&nonce=abc
Authorization: Bearer <token>

5.1.2 Response Example

{
  "verified": true,
  "provider": "openai.com",
  "model": "gpt-4",
  "version": "2025-03",
  "attestation_timestamp": 1714348800,
  "attestation_signature": "..."
}

5.2 Agent Capabilities Endpoint

Provides information about an agent’s capabilities. URL advertised via agent_capabilities_endpoint discovery parameter.

5.2.1 Request Example

GET /.well-known/agent-capabilities

5.2.2 Response Example

{
  "capabilities": [
    {"id": "text_generation", "description": "..."},
    {"id": "code_generation", "description": "..."}
  ],
  "supported_constraints": ["max_tokens", "allowed_tools"]
}

6. Security Considerations

6.1 Agent Authentication

Agents SHOULD use strong, asymmetric methods (JWT Client Auth [RFC7523], mTLS [RFC8705]), potentially combined with attestation. Shared secrets are NOT RECOMMENDED.

6.2 Delegation Security

Systems MUST validate the entire delegation chain, enforce scope reduction, implement consent mechanisms, and consider time-bounding. Policies may limit chain length. Robust revocation mechanisms are needed.

6.3 Attestation Security

Requires secure management of signing keys, robust nonce handling, trustworthy known-good measurements, secure endpoints, and protection against replay attacks. Attestation evidence may have privacy implications.

6.4 Token Security

ID Tokens with agent claims SHOULD be encrypted. Access tokens SHOULD have limited lifetimes. Refresh tokens for agents require careful consideration.

7. Privacy Considerations

Implementations MUST consider potential correlation of agent identity, privacy implications of delegation chains, user consent requirements, and data minimization in claims.

8. Compatibility and Versioning

OIDC-A 1.0 is designed for compatibility with OAuth 2.0 [RFC6749], OIDC Core 1.0, JWT [RFC7519], and related RFCs. Future versions will aim for backward compatibility.

9. References

[RFC6749] The OAuth 2.0 Authorization Framework
[RFC7519] JSON Web Token (JWT)
[RFC7523] JWT Profile for OAuth 2.0 Client Authentication
[RFC7591] OAuth 2.0 Dynamic Client Registration
[RFC7662] OAuth 2.0 Token Introspection
[RFC8705] OAuth 2.0 Mutual-TLS Client Authentication
[OpenID Connect Core 1.0]
[OpenID Connect Discovery 1.0]
[IETF RATS] Remote Attestation Procedures Architecture

Appendix A: Example ID Token with Agent Claims

{
  "iss": "https://auth.example.com",
  "sub": "agent_instance_789",
  "aud": "client_123",
  "exp": 1714435200,
  "iat": 1714348800,
  "auth_time": 1714348800,
  "nonce": "n-0S6_WzA2Mj",
  "agent_type": "assistant",
  "agent_model": "gpt-4",
  "agent_version": "2025-03",
  "agent_provider": "openai.com",
  "agent_instance_id": "agent_instance_789",
  "delegator_sub": "user_456",
  "delegation_purpose": "Email management assistant",
  "agent_capabilities": ["email:read", "email:draft", "calendar:view"],
  "agent_trust_level": "verified",
  "agent_context_id": "conversation_123",
  "agent_attestation": {
    "format": "urn:ietf:params:oauth:token-type:eat",
    "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
    "timestamp": 1714348800
  },
  "delegation_chain": [
    {
      "iss": "https://auth.example.com",
      "sub": "user_456",
      "aud": "agent_instance_789",
      "delegated_at": 1714348700,
      "scope": "email profile calendar"
    }
  ]
}

Appendix B: Example Delegation Chain (Multi-step)

"delegation_chain": [
  {
    "iss": "https://auth.example.com",
    "sub": "user_456",
    "aud": "agent_instance_789",
    "delegated_at": 1714348800,
    "scope": "email calendar",
    "purpose": "Manage my emails and calendar"
  },
  {
    "iss": "https://auth.example.com",
    "sub": "agent_instance_789",
    "aud": "agent_instance_101",
    "delegated_at": 1714348830,
    "scope": "calendar:view",
    "purpose": "Analyze available time slots"
  }
]

AI Agents and Agentic Security: The Next Frontier in Enterprise Automation

Tue, 10 Dec 2024 00:00:00 +0000

Traditional automation tools like Robotic Process Automation (RPA) and Integration Platform as a Service (iPaaS) have long served as the backbone of enterprise workflows. These systems, designed to automate repetitive tasks and connect disparate software tools, have delivered undeniable value. However, their inherent limitations are becoming increasingly evident. They require significant manual setup, often break when systems change, and struggle to handle unstructured data such as documents, emails, or images.

Enter AI agents — a revolutionary leap from static, rule-based automation to intelligent, adaptable systems. AI agents promise to overcome the constraints of traditional tools, paving the way for smarter, more efficient enterprise automation. An excellent breakdown of their significance can be found in the insightful Menlo Ventures article “Beyond Bots: How AI Agents Are Driving the Next Wave of Enterprise Automation”.

The Shift from Automation to Intelligence

AI agents represent a fundamental paradigm shift. Unlike their predecessors, these systems are not bound by rigid rules or pre-defined workflows. Instead, they possess the ability to learn, adapt, and make decisions based on changing circumstances. This adaptability enables them to address dynamic and complex tasks, unlocking unprecedented levels of efficiency and scalability.

However, this evolution introduces a new layer of complexity: agentic security. As AI agents grow more autonomous, ensuring their security, transparency, and trustworthiness becomes paramount, particularly in multi-agent environments where multiple AI systems must collaborate. This shift necessitates rethinking how we secure enterprise automation systems to ensure they remain robust and trustworthy in a rapidly evolving landscape.

The Imperative of Agentic Security

Agentic security involves safeguarding intelligent, autonomous systems while maintaining their transparency and reliability. It becomes especially critical in environments where multiple AI agents operate simultaneously, managing dynamic processes and sensitive data. Key considerations for agentic security include:

Dynamic Adaptability with Robust Security

AI agents excel at adjusting to system changes, but their adaptability must not come at the expense of enterprise security. In multi-agent environments, secure communication protocols and strong authentication mechanisms form the foundation of security. However, static security measures alone are insufficient. Evolving contexts require context-aware security — a system that dynamically adjusts access controls and agent behavior based on situational needs and data sensitivities. This mitigates risks such as unauthorized escalations, prompt injection attacks, and data breaches.

For example, a financial reporting agent, which has access to internal financial metrics, should be able to generate a detailed report for C-suite agents while maintaining strict data boundaries. If an HR agent requests information about salaries, the financial agent should only provide relevant, pre-approved metrics, such as aggregated departmental budgets, rather than individual salary slips. This ensures that agents respect organizational boundaries and adhere to context-aware security protocols.

In cross-enterprise collaborations, where AI agents from different organizations interact, maintaining the integrity of each participant’s systems is essential. Context-aware security ensures that agents respect boundaries and operate within predefined limits, even as they adapt to new information or changing environments.

Transparent Decision-Making and Accountability

As AI agents take on more critical roles in enterprise processes, transparency and accountability become non-negotiable. Organizations must implement mechanisms to trace and audit agent decisions, ensuring they align with business objectives and ethical standards. This is particularly important in regulated industries, where compliance requirements demand a clear understanding of how and why decisions are made.

Trust in Multi-Agent Collaboration

In scenarios where multiple agents collaborate, trust is the cornerstone of effective operation. Agents must communicate securely, share information responsibly, and resolve conflicts without compromising the integrity of the broader system. Establishing trust requires robust encryption, tamper-proof logs, and mechanisms for conflict resolution to prevent unintended behaviors or system failures.

The Path Forward

AI agents represent the next frontier in enterprise automation, promising smarter, faster, and more scalable workflows. However, their increasing sophistication demands a proactive approach to agentic security. As organizations embrace these intelligent systems, they must prioritize building trust, safeguarding data, and ensuring transparency to foster sustainable innovation.

The Menlo Ventures article encapsulates this beautifully: AI agents are not just tools — they are collaborators, reshaping how enterprises operate. But with great power comes great responsibility. By addressing the challenges of agentic security, we can unlock the full potential of AI agents while preserving the integrity and trust that underpin modern enterprises.

A feat of strength MVP for AI Apps

Tue, 20 Feb 2024 00:00:00 +0000

A minimum viable product (MVP) is a version of a product with just enough features to be usable by early customers, who can then provide feedback for future product development.

Today I want to focus on what that looks like for shipping AI applications. To do that, we only need to understand 4 things.

What does 80% actually mean?
What segments can we serve well?
Can we double down?
Can we educate the user about the segments we don’t serve well?

The Pareto principle, also known as the 80/20 rule, still applies but in a different way than you might think.

What is an MVP?

An analogy I often use to help understand this concept is as follows: You need something to help get from point A to point B. Maybe the vision is to have a car. However, the MVP is not a chassis without wheels or an engine. Instead, it might look like a skateboard. You’ll ship and realize the product needs brakes or steering. So then you ship a scooter. Afterwards, you figure out the scooter needs more leverage, so you add larger wheels and end up with a bicycle. Limited by the force you can apply as a human being, you start thinking about motors and can branch out into mopeds, e-bikes, and motorcycles. Then one day, ship the car.

Consider the 80/20 rule

When talking about something being 80% done or 80% ready, it is usually in a machine-learning sense. In this context, each component is deterministic, which means 80% translates to 8 out of 10 features being complete. Once the remaining 2 features are ready, we can ship the product. However, If we want to follow the 80/20 rule, we might be able to ship the product with 80% of the features and then add the remaining 20% later, like a car without a radio or air conditioning. However, The meaning of 80% can vary significantly, and this definition may not apply to an AI-powered application.

The issue with Summary Statistics The above image is an example of Anscombe’s quartet. It’s a set of four datasets that have nearly identical simple descriptive statistics yet very different distributions and appearances. This is a classic explanation of why summary statistics can be misleading.

Consider the following example:

Query_id	score
1	0.9
2	0.8
3	0.9
4	0.9
5	0.0
6	0.0

The average score is 0.58. However, if we analyze the queries within segments, we might discover that we are serving the majority of queries exceptionally well!

Admitting what you’re bad at

Being honest with what you’re bad at is a great way to build trust with your users. If you can accurately identify when something will perform poorly and confidently reject it, then you might be ready to ship a great product while educating your users about the limitations of your application.

It is very important to understand the limitations of your system and to be able to confidently understand the characteristics of your system beyond summary statistics. This is because not all systems are made equal. The behavior of a probabilistic system could be very different from the previous example. Consider the following dataset:

Query_id	Score
1	.59
2	.58
3	.59
4	.57

A system like this also has the same average score of 0.58, but it’s not as easy to reject any subset of requests…

Learning to say no

Consider an RAG application where a large proportion of the queries are regarding timeline queries. If our search engines do not support this time constraint, we will likely be unable to perform well.

Query_id	Score	Query Type
1	0.9	text search
2	0.8	text search
3	0.9	news search
4	0.9	news search
5	0.0	timeline
6	0.0	timeline

If we’re in a pinch to ship, we could simply build a classification model that detects whether or not these questions are timeline questions and throw a warning. Instead of constantly trying to push the algorithm to do better, we can educate the user and educate them by changing the way that we might design the product.

Detecting segments

Detecting these segments could be accomplished in various ways. We could construct a classifier or employ a language model to categorize them. Additionally, we can utilize clustering algorithms with the embeddings to identify common groups and potentially analyze the mean scores within each group. The sole objective is to identify segments that can enhance our understanding of the activities within specific subgroups.

One of the worst things you can do is to spend months building out a feature that only increases your productivity by a little while ignoring some more important segment of your user base.

By redesigning our application and recognizing its limitations, we can potentially improve performance under certain conditions by identifying the types of tasks we can decline. If we are able to put this segment data into some kind of In-System Observability, we can safely monitor what proportion of questions are being turned down and prioritize our work to maximize coverage.

Figure out what you’re actually trying to do before you do it

One of the dangerous things I’ve noticed working with startups is that we often think that the AI works at all… As a result, we want to be able to serve a large general application without much thought into what exactly we want to accomplish.

In my opinion, most of these companies should try to focus on one or two significant areas and identify a good niche to target. If your app is good at one or two tasks, there’s no way you could not find a hundred or two hundred users to test out your application and get feedback quickly. Whereas, if your application is good at nothing, it’s going to be hard to be memorable and provide something that has repeated use. You might get some virality, but very quickly, you’re going to lose the trust of your users and find yourself in a position where you’re trying to reduce churn.

When we’re front-loaded, the ability to use GPT-4 to make predictions, and time to feedback is very important. If we can get feedback quickly, we can iterate quickly. If we can iterate quickly, we can build a better product.

Final thoughts

The MVP for an AI application is not as simple as shipping a product with 80% of the features. Instead, it requires a deep understanding of the segments of your users that you can serve well and the ability to educate your users about the segments that you don’t serve well. By understanding the limitations of your system and niching down, you can build a product that is memorable and provides something that has repeated use. This will allow you to get feedback quickly and iterate quickly, ultimately leading to a better product, by identifying your feats of strength.

The Nockout Story

Thu, 11 Jan 2024 00:00:00 +0000

As the co-founders of Nockout, Yash and I, Subramanya, have been on a quest to solve a problem that plagues every sports enthusiast: finding the right place and the right people for playing sports. Our personal struggles with organizing sports activities have led us to create a platform that not only eases these challenges but also promotes a sense of community among sports lovers.

The Problem: A Universal Challenge

Our frustrations weren’t unique. Across the globe, from tennis courts to basketball hoops, sports enthusiasts were grappling with the same issues: finding the right venue and the right people to play with. This global dilemma was evident in the shared experiences voiced through numerous tweets and conversations among the community.

Bay Club is pretty good. But also trying to find a reliable way to find players is hard (even using PyC).
— Gautam (@gautamtata) January 1, 2024

You should move to New York, where it's even more difficult!https://t.co/c8RjpPzW9x
— Awais Hussain (@Ahussain4) January 1, 2024

someone create an app that shows all public basketball courts and whether or not people are at them or not. this would save a lot of time for me lol.
— thao 🍉 (@holycowitsthao) March 18, 2021

I have wanted pickup hoops forever
— Rob Kornblum (@rkorny) July 5, 2021

These tweets underscore the need for a platform like Nockout.

Our Solution: Introducing Nockout

Nockout is more than just an app; it’s a revolution in the sports community. Designed to be intuitive and user-friendly, it addresses key challenges:

Venue Discovery: The app shows you all available sports facilities nearby. Whether it’s a public basketball court or a private soccer field, “Nockout” has you covered.
Skill-Based Activity Matching: Our platform intuitively recommends players whose skills align with yours, ensuring you can join in on sporting activities that suit your preferences and proficiency in your chosen sport. After all, it’s all about fair play and good competition.
Intuitive Process: We’ve designed Nockout to be user-friendly. The booking process is straightforward, and finding players is hassle-free.

The Impact: Fostering Community and Fair Play

Nockout transcends being a mere application; it’s about building a community bound by the love of sports. It encourages fair play, connects like-minded individuals, and rekindles the joy in sports.

Looking Ahead: The Future of Nockout

Our vision for Nockout is expansive and all-encompassing:

Creating Spaces for Teams: Developing private areas for teams and groups to interact and bond.
Expanding Community Features: Introducing a platform for sharing triumphs and experiences.
Accessible Coaching and Activities: Offering a range of activities and coaching sessions for all skill levels and interests.
Streamlined Payments and Management: Enhancing the booking and payment process for a smooth user experience.
Personalized Athletic Journey: Providing tailored advice for sports and nutrition, alongside a comprehensive sports marketplace.

Join the Revolution

Be part of a movement that’s reshaping the sports landscape. Sign up for early beta access at Nockout.co, and connect with us on Instagram, LinkedIn, and Twitter. Together, let’s make sports accessible and enjoyable for everyone!

Enhancing Document Interactions - Leveraging the synergy of Google Cloud Platform, Pinecone, and LLM in Natural Language Communication

Sat, 10 Jun 2023 00:00:00 +0000

High-level view of system design with Document AI, OpenAI, Pinecone

In today’s digital era, accessing crucial information from government documents can be overwhelming and time-consuming due to their scanned and non-digitized formats. To address this issue, there is a need for an innovative tool that simplifies navigation, scanning, and digitization of these documents, making them easily readable and searchable. This user-friendly solution will revolutionize the way people interact with government documents, leading to better decision-making, improved public services, and a more informed and engaged citizenry. Developing such a tool is essential for ensuring transparency and accessibility of vital information in the modern world.

To achieve our goal, we will follow a systematic approach consisting of the following steps:

We will use the powerful Document AI API provided by Google Cloud Platform to convert PDF / Image documents into text format. This step allows us to extract textual content from the documents, making it easier to process and analyze.
Next, we will employ a Language Model (LLM) to generate embeddings for each text extracted from the documents. These embeddings capture the semantic representation of the text, enabling us to effectively analyze and compare documents based on their content.
To optimize the retrieval process, we will utilize Pinecone, a robust indexing and similarity search system. By storing the generated embeddings in PineCone, we can quickly search for documents that closely match a user’s query.
With the acquired knowledge and enhanced search capabilities, our tool will efficiently answer user queries by retrieving the most relevant documents based on their content.

For demonstration of this process, we utilized documents from the Karnataka Resident Data Hub (KRDH) by web scraping.

Demo: Building a powerful question/answering for government documents using Document AI, OpenAI, Pinecone, and Flask

1. Setting Up Google Cloud Platform - Document AI

Document AI is a document understanding platform that converts unstructured data from documents into structured data, making it easier to comprehend, analyze, and utilize. To set up Document AI in your Google Cloud Platform (GCP) Console, follow these steps:

Enable the Document AI API.
Create a service account:
- Navigate to the create service account page in the Google Cloud console.
- Choose your project.
- Enter a name in the Service account name field. The Google Cloud console will automatically fill in the Service account ID field based on this name.
- Click Create and continue.
- Grant the Project > Owner role to your service account to provide access to your project.
- Click Continue.
- Click Done to complete the service account creation process. (Do not close your browser window, as you will need it in the next step.)
Create a service account key:
- In the Google Cloud console, click the email address for the service account you created.
- Click Keys.
- Click Add key, then click Create new key.
- Click Create. A JSON key file will be downloaded to your computer.
- Click Close.
Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file containing your service account key. This variable applies only to your current shell session, so if you open a new session, you will need to set the variable again.

Install the Client Library:

  pip install --upgrade google-cloud-documentai 

Create a Processor:
- In the Document AI section of the Google Cloud console, go to the Processors page.
- Click +Create processor.
- Choose the processor type you want to create from the list.
- In the Create processor window, specify a processor name.
- Select your desired region from the list.
- Click Create to generate your processor.
- Take note of the Processor ID and location.

After completing these steps, you are ready to use the Document AI API in your code.

def convert_pdf_images_to_text(file_path: str):
    """
    Convert PDF or image file containing text into plain text using Google Document AI.
    Args:
        file_path (str): The file path of the PDF or image file.

    Returns:
        str: The extracted plain text from the input file.
    """
    extention = file_path.split(".")[-1].strip()
    if extention == "pdf":
        mime_type = "application/pdf"
    elif extention == "png":
        mime_type = "image/png"
    elif extention == "jpg" or extention == "jpeg":
        mime_type = "image/jpeg"
    opts = ClientOptions(
        api_endpoint=f"{location}-documentai.googleapis.com"
    )
    client = documentai.DocumentProcessorServiceClient(client_options=opts)
    # Add the credentials obtained, Project ID, Location and the Processor ID
    name = client.processor_path(
        project_id, location, processor_id
    )
    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()
    # Load Binary Data into Document AI RawDocument Object
    raw_document = documentai.RawDocument(content=image_content, mime_type=mime_type)
    # Configure the process request
    request = documentai.ProcessRequest(name=name, raw_document=raw_document)
    result_document = client.process_document(request=request).document
    return result_document.text

2. Embeddings Generation and Pinecone

In this step, we will use the OpenAI Text Embedding API to generate embeddings that capture the semantic meaning of the extracted text. These embeddings serve as numerical representations of the textual data, allowing us to understand the underlying context and nuances.

After generating the embeddings, we will securely store them in Pinecone, a powerful indexing and similarity search system. By leveraging Pinecone’s efficient storage capabilities, we can effectively organize and index the embeddings for quick and precise retrieval.

With the embeddings stored in Pinecone, our system gains the ability to perform similarity searches. This enables us to find documents that closely match a given query or exhibit similar semantic characteristics.

The following code uses OpenAI’s Text Embedding model to create embeddings for text data. It divides the input text into chunks, generates embeddings for each chunk, and then upserts the embeddings along with associated metadata to a Pinecone search index for efficient searching and retrieval.

def create_embeddings(
    text: str, model: str = "text-embedding-ada-002"):
    """
    Creates a text embedding using OpenAI's Text Embedding model.

    Args:
        text (str): The text to embed
        model (str, optional): The name of the text embedding model to use.
            Defaults to "text-embedding-ada-002".

    Returns:
        List[float]: The text embedding.
    """
    if type(text) == list:
        response = openai.Embedding.create(model=model, input=text).data
        return [d["embedding"] for d in response]
    else:
        return [openai.Embedding.create(
            model=model, input=[text]).data[0]["embedding"]]∂

def generate_embeddings_upload_to_pinecone(documents: List[Dict[str, Any]]):
    """
    Generates text embeddings from the provided documents, then uploads and indexes 
    them to Pinecone.

    Args:
        documents (List[Dict[str, Any]]): A list of dictionaries containing 
        document information.
            Each dictionary should include the following keys:
                - "Content": The text content of the document.
                - "DocumentName": The name of the document.
                - "DocumentType": The type/category of the document.

    Note:
        This function assumes that Pinecone and the associated index have already
        been initialized properly. Please make sure to initialize Pinecone first
        and set up the index accordingly.
    """
    # create chunks
    chunks = []
    for document in documents:
        texts = create_chunks(document["Content"])
        chunks.extend(
            [
                {
                    "id": str(uuid4()),
                    "text": texts[i],
                    "chunk_index": i,
                    "title": document["DocumentName"],
                    "type": document["DocumentType"],
                }
                for i in range(len(texts))
            ]
        )
    # initialize Pinecone index, create embeddings, and upsert to Pinecone
    index = pinecone.Index("pinecone-index")
    for i in tqdm(range(0, len(chunks), 100)):
        # find end of batch
        i_end = min(len(chunks), i + 100)
        batch = chunks[i:i_end]
        ids_batch = [x["id"] for x in batch]
        texts = [x["text"] for x in batch]
        embeds = create_embeddings(text=texts)
        # cleanup metadata
        meta_batch = [
            {
                "title": x["title"],
                "type": x["type"],
                "text": x["text"],
                "chunk_index": x["chunk_index"],
            }
            for x in batch
        ]
        to_upsert = []
        for id, embed, meta in list(zip(ids_batch, embeds, meta_batch)):
            to_upsert.append(
                {
                    "id": id,
                    "values": embed,
                    "metadata": meta,
                }
            )
        # upsert to Pinecone
        index.upsert_documents(to_upsert)

For more information on OpenAI’s Text Embedding API, refer to the OpenAI API documentation. For more details on Pinecone, check out the Pinecone documentation.

3. User Query and Communication

Finally, with all the necessary components in place, we can witness the powerful functionality of our tool as it matches user queries with relevant context and provides accurate answers.

When a user submits a query, our system leverages the stored embeddings and advanced search capabilities to identify the most relevant documents based on their semantic similarity to the query. By analyzing the contextual information captured in the embeddings, our tool can retrieve the documents that contain the desired information.

def query_and_combine(
    self, query_vector: list, top_k: int = 5, threshold: float = 0.75):
    """Query Pinecone index and combine responses to string

    Args:
        query_embedding (list): Query embedding
        index (str): Pinecone index to query
        top_k (int, optional): Number of top results to return. Defaults to 5.
        threshold : The similarity threshold. Defaults to 0.75

    Returns:
        str: Combined responses
    """
    responses = index.query(query_vector=query_vector, top_k=top_k, metadata=True)
    _responses = []
    for sample in responses["matches"]:
        if sample["score"] < threshold:
            continue
        if "text" in sample["metadata"]:
            _responses.append(sample["metadata"]["text"])
        else:
            _responses.append(str(sample["metadata"]))

    return " \n --- \n ".join(_responses).replace("\n---\n", " \n --- \n ").strip()


def generate_answer(query: str, language: str = "English"):
    """
    Generates an answer to a user's query using the context from Pinecone search results
    and OpenAI's chat models.

    The function takes the user's query, creates a text embedding from it, performs a
    Pinecone query to find relevant context, and then generates an answer using OpenAI's
    chat models with the given context.

    Returns:
        A JSON object containing the generated answer.

    Note:
        This function assumes that Pinecone and the associated index have already been 
        initialized properly, and that the OpenAI API is set up correctly. Please 
        make sure to initialize Pinecone and the OpenAI API first.
    """
    query_embed = create_embeddings(text=query)[0]
    augmented_query = query_and_combine(
        query_embed,
        top_k=app.config["top_n"],
        threshold=app.config["pinecone_threshold"],
    )
    ## Creating the prompt for model
    primer = """You are Q&A bot. A highly intelligent system that answers
    user questions based on the context provided by the user above
    each question. If the information can not be found in the context
    provided by the user you truthfully say "I don't know". Be as concise as possible.
    """
    augmented_query = augmented_query if augmented_query != "" else "No context found"

    text, usage = openai.ChatCompletion.create(
        messages=[
            {"role": "system", "content": primer},
            {
                "role": "user",
                "content": f"Context: \n {augmented_query} \n --- \n Question: {query} \n Answer in {language}",
            },
        ],
        model=app.config["chat_model"],
        temperature=app.config["temperature"],
    )

    return text

The code consists of two functions.

query_and_combine() queries a Pinecone index using a query vector, retrieves the top matching responses, and combines them into a single string. It filters the responses based on a similarity threshold and extracts the relevant text or metadata to be included in the combined result.
generate_answer() generates an answer to a user query. It creates an embedding for the query, performs a combined query on the Pinecone index, and uses the obtained augmented query as context for a chat-based language model. The model generates an answer based on the context and user query, which is then returned as the response. Overall, the code enables querying a Pinecone index, combining responses, and generating answers using a language model based on the given query and context.

As you reach the end of this blog, we hope you have gained valuable insights into the powerful combination of Google Cloud Platform, Pinecone, and Language Models for revolutionizing document interactions. To dive deeper and explore the code behind this innovative solution, visit our GitHub repository. Feel free to clone, modify, and contribute to the project, and don’t hesitate to share your thoughts and experiences. I would also like to thank Tasheer Hussain B for his contributions. Happy coding!

References

Hybrid Search for E-Commerce with Pinecone and LLMs

Tue, 02 May 2023 00:00:00 +0000

Searching and finding relevant products is a critical component of an e-commerce website. Providing fast and accurate search results can make the difference between high user satisfaction and user frustration. With recent advancements in natural language understanding and vector search technologies, enhanced search systems have become more accessible and efficient, leading to better user experiences and improved conversion rates.

In this blog post, we’ll explore how to implement a hybrid search system for e-commerce using Pinecone, a high-performance vector search engine, and fine-tuned domain-specific language models. By the end of this post, you’ll not only have a strong understanding of hybrid search but also a practical step-by-step guide to implementing it.

What is Hybrid Search?

High-level view of simple Pinecone Hybrid Index

Before diving into the implementation, let’s quickly understand what hybrid search means. Hybrid search is an approach that combines the strengths of both traditional search (sparse vector search) and vector search (dense vector search) to achieve better search performance across a wide range of domains.

Dense vector search extracts high-quality vector embeddings from text data and performs a similarity search to find relevant documents. However, it often struggles with out-of-domain data when it’s not fine-tuned on domain-specific datasets.

On the other hand, traditional search uses sparse vector representations, like term frequency-inverse document frequency (TF-IDF) or BM25, and does not require any domain-specific fine-tuning. While it can handle new domains, its performance is limited by its inability to understand semantic relations between words and lacks the intelligence of dense retrieval.

Hybrid search tries to mitigate the weaknesses of both approaches by combining them in a single system, leveraging the performance potential of dense vector search and the zero-shot adaptability of traditional search.

Now that we have a basic understanding of hybrid search, let’s dive into its implementation.

Building a Hybrid Search System

We’ll cover the following steps for implementing a hybrid search system:

Leveraging Domain-Specific Language Models
Creating Sparse and Dense Vectors
Setting Up Pinecone
Implementing the Hybrid Search Pipeline
Making Queries and Tuning Parameters

1. Leveraging Domain-Specific Language Models

In recent years, large-scale pre-trained language models like OpenAI’s GPT and Cohere have become increasingly popular for a variety of tasks, including natural language understanding and generation. These models can be fine-tuned on domain-specific data to improve their performance and adapt to specific tasks, such as e-commerce product search.

In our example, we will use a fine-tuned domain-specific language model to generate dense vector embeddings for products and queries. However, you can choose other models or even create your own custom embeddings based on your specific domain.

import torch
from transformers import AutoTokenizer, AutoModel

# Load a pre-trained domain-specific language model
model_name = "your-domain-specific-model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Generate dense vector embeddings for a product description
text = "Nike Air Max sports shoes for men"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    dense_embedding = outputs.last_hidden_state.mean(dim=1).numpy()

2. Creating Sparse and Dense Vectors

Hybrid search requires both sparse and dense vector representations for our e-commerce data. We’ll now describe how to generate these vectors.

Sparse Vectors

Sparse vector representations, like TF-IDF or BM25, can be created using standard text processing techniques, such as tokenization, stopword removal, and stemming. An example of generating sparse vectors can be achieved using a vocabulary matrix.

# This function generates sparse vector representations of a list of product descriptions
def generate_sparse_vectors(text):
    '''Generates sparse vector representations for a list of product descriptions

    Args:
        text (list): A list of product descriptions

    Returns:
        sparse_vector (dict): A dictionary of indices and values
    '''
    sparse_vector = bm25.encode_queries(text)
    return sparse_vector

from pinecone_text.sparse import BM25Encoder

# Create the BM25 encoder and fit the data
bm25 = BM25Encoder()
bm25.fit(new_df.full_data)

# Create the sparse vectors
sparse_vectors = []
for product_description in product_descriptions:
    sparse_vectors.append(generate_sparse_vectors(text=product_description))

Dense Vectors

Dense vector representations can be generated using pre-trained or custom domain-specific language models. In our previous example, we used a domain-specific language model to generate dense vector embeddings for a product description.

def generate_dense_vector(text):
    '''Generates dense vector embeddings for a list of product descriptions

    Args:
        text (list): A list of product descriptions

    Returns:
        dense_embedding (np.array): A numpy array of dense vector embeddings
    '''
    # Tokenize the text and convert to PyTorch tensors
    inputs = tokenizer(text, return_tensors="pt")
    # Generate the embeddings with the pre-trained model
    with torch.no_grad():
        outputs = model(**inputs)
        dense_vector = outputs.last_hidden_state.mean(dim=1).numpy()
    return dense_vector

# Generate dense vector embeddings for a list of product descriptions
dense_vectors = []
for product_description in product_descriptions:
    dense_vectors.append(generate_dense_vector(text=product_description))

3. Setting Up Pinecone

Pinecone is a high-performance vector search engine that supports hybrid search. It enables the creation of a single index for both sparse and dense vectors and seamlessly handles search queries across different data modalities.

To use Pinecone, you’ll need to sign up for an account, install the Pinecone client, and set up your API key and environment.

# Create a Pinecone hybrid search index
import pinecone

pinecone.init(
    api_key="YOUR_API_KEY",  # app.pinecone.io
    environment="YOUR_ENV"  # find next to api key in console
)

# Create a Pinecone hybrid search index
index_name = "ecommerce-hybrid-search"
pinecone.create_index(
    index_name = index_name,
    dimension = MODEL_DIMENSION,  # dimensionality of dense model
    metric = "dotproduct"
)
# connect to the index
index = pinecone.Index(index_name=index_name)
# view index stats
index.describe_index_stats()

4. Implementing the Hybrid Search Pipeline

With our sparse and dense vectors generated and Pinecone set up, we can now build a hybrid search pipeline. This pipeline includes the following steps:

Adding product data to the Pinecone index
Retrieving results using both sparse and dense vectors

def add_product_data_to_index(product_ids, sparse_vectors, dense_vectors, metadata=None):
    """Upserts product data to the Pinecone index.

    Args:
        product_ids (`list` of `str`): Product IDs.
        sparse_vectors (`list` of `list` of `float`): Sparse vectors.
        dense_vectors (`list` of `list` of `float`): Dense vectors.
        metadata (`list` of `list` of `str`): Optional metadata.

    Returns:
        None
    """
    batch_size = 32

    # Loop through the product IDs in batches.
    for i in range(0, len(product_ids), batch_size):
        i_end = min(i + batch_size, len(product_ids))
        ids = product_ids[i:i_end]
        sparse_batch = sparse_vectors[i:i_end]
        dense_batch = dense_vectors[i:i_end]
        meta_batch = metadata[i:i_end] if metadata else []

        vectors = []
        for _id, sparse, dense, meta in zip(ids, sparse_batch, dense_batch, meta_batch):
            vectors.append({
                'id': _id,
                'sparse_values': sparse,
                'values': dense,
                'metadata': meta
            })

        # Upsert the vectors into the Pinecone index.
        index.upsert(vectors=vectors)

add_product_data_to_index(product_ids, sparse_vectors, dense_vectors)

Now that our data is indexed, we can perform hybrid search queries.

5. Making Queries and Tuning Parameters

High-level view of simple Pinecone Hybrid Query

To make hybrid search queries, we’ll create a function that takes a query, the number of top results, and an alpha parameter to control the weighting between dense and sparse vector search scores.

def hybrid_scale(dense, sparse, alpha: float):
    """Hybrid vector scaling using a convex combination

    alpha * dense + (1 - alpha) * sparse

    Args:
        dense: Array of floats representing
        sparse: a dict of `indices` and `values`
        alpha: float between 0 and 1 where 0 == sparse only
               and 1 == dense only
    """
    if alpha < 0 or alpha > 1:
        raise ValueError("Alpha must be between 0 and 1")
    # scale sparse and dense vectors to create hybrid search vecs
    hsparse = {
        'indices': sparse['indices'],
        'values':  [v * (1 - alpha) for v in sparse['values']]
    }
    hdense = [v * alpha for v in dense]
    return hdense, hsparse

def search_products(query, top_k=10, alpha=0.5):
    # Generate sparse query vector
    sparse_query_vector = generate_sparse_vector(query)

    # Generate dense query vector
    dense_query_vector = generate_dense_vector(query)

    # Calculate hybrid query vector
    dense_query_vector, sparse_query_vector = hybrid_scale(dense_query_vector, sparse_query_vector, alpha)

    # Search products using Pinecone
    results = index.query(
        vector=dense_query_vector,
        sparse_vector=sparse_query_vector,
        top_k=top_k
    )

    return results

We can then use this function to search for relevant products in our e-commerce dataset.

query = "running shoes for women"
results = search_products(query, top_k=5)

for result in results:
    print(result['id'], result['metadata']['product_name'], result['score'])

Experimenting with different values for the alpha parameter will help you find the optimal balance between sparse and dense vector search for your specific domain.

Conclusion

In this blog post, we demonstrated how to build a hybrid search system for e-commerce using Pinecone and domain-specific language models. Hybrid search enables us to combine the strengths of both traditional search and vector search, improving search performance and adaptability across diverse domains.

By following the steps and code snippets provided in this post, you can implement your own hybrid search system tailored to your e-commerce website’s specific requirements. Start exploring Pinecone and improve your e-commerce search experience today!

References

Ecommerce Search using Hybrid Search Techniques in Pinecone (Google Colab Notebook): A practical guide showcasing the implementation of e-commerce search using Pinecone’s hybrid search techniques.
Pinecone Ecommerce Search Documentation: Official Pinecone documentation for building e-commerce search systems.
BM25 Vector Generation using Pinecone (Google Colab Notebook): A guide for generating BM25 sparse vectors using Pinecone.
Pinecone Text Repository on GitHub: A collection of text processing and vector generation resources using Pinecone.
Introduction to Hybrid Search on Pinecone’s Website: An overview of hybrid search, its benefits, and use cases in the context of pinecone’s capabilities.

Demystifying the Shell Scripting: Working with Files and Directories

Wed, 04 Jan 2023 00:00:00 +0000

In my previous blog posts, we covered the basics of using the shell, introduced shell scripting for beginners, and explored advanced techniques and best practices. In this blog post, we will focus on working with files and directories in shell scripts. We will discuss common tasks such as creating, copying, moving, and deleting files and directories, as well as reading and writing to files. We will also provide some resources for further learning.

Creating Files and Directories

To create a new file in a shell script, you can use the touch command:

touch new_file.txt

To create a new directory, you can use the mkdir command:

mkdir new_directory

Copying and Moving Files and Directories

To copy a file, you can use the cp command:

cp source_file.txt destination_file.txt

To copy a directory, you can use the -r (recursive) option:

cp -r source_directory destination_directory

To move a file or directory, you can use the mv command:

mv source_file.txt destination_file.txt

Deleting Files and Directories

To delete a file, you can use the rm command:

rm file_to_delete.txt

To delete a directory, you can use the -r (recursive) option:

rm -r directory_to_delete

Reading and Writing to Files

To read the contents of a file, you can use the cat command:

cat file_to_read.txt

To write to a file, you can use the > operator to overwrite the file or the >> operator to append to the file:

echo "This is a new line" > file_to_write.txt
echo "This is another new line" >> file_to_write.txt

To read a file line by line, you can use a while loop with the read command:

#!/bin/bash

while IFS= read -r line; do
  echo "Line: $line"
done < file_to_read.txt

Searching for Files and Directories

To search for files and directories, you can use the find command:

find /path/to/search -name "file_pattern"

For example, to find all .txt files in the /home/user directory, you can use:

find /home/user -name "*.txt"

Resources

To further improve your skills in working with files and directories in shell scripts, here are some resources:

File Management Commands in Linux: A comprehensive guide to file management commands in Linux.
Linux Find Command Examples: A collection of examples for using the find command in Linux.

In conclusion, working with files and directories is an essential aspect of shell scripting. By mastering common tasks such as creating, copying, moving, and deleting files and directories, as well as reading and writing to files, you will be well-equipped to handle a wide range of shell scripting tasks.

Navigating UMass Amherst: A Handbook for International Students

Mon, 08 May 2023 00:00:00 +0000