Beyond the Chatbot: AI Agents and the Protocols Driving Their Evolution

Original Source

AI Agents are rapidly evolving beyond simple chatbots to become autonomous entities capable of observing, planning, and acting with their environments. This shift is accelerating, and the first commercial agents are already generating meaningful revenue in various sectors. But how are these agents becoming more reliable, safe, and ready for enterprise use? A key development is the growing adoption of the open-source Model Context Protocol (MCP), launched by Anthropic.

MCP has seen rapid adoption by major players like OpenAI, Microsoft, Google, and Amazon, marking a significant step towards “agentic abundance”.

The Evolution of AI Agents

We are moving from original foundation models that performed simple next token prediction towards autonomous agents. Initially, models were instruction fine-tuned to ‘chat’. Then came models trained to use tools and produce structured output, often combined with workflow systems for “agentic if-statements” or Agentic Workflows. The current focus is on models trained to observe, reason, plan, use tools, and communicate, acting as more autonomous agents.

Different stages of agent evolution:

Original Foundation Models: Simple token predictors.
Instruction Fine-tuned Foundation Models: Chat-bots.
Models Trained to use Tools, Plan + Reason: Early constrained agents.
Models Trained to Communicate: Multi-agent systems.
Agentic Workflows: helpful when process consistency matters, good for problems where domain intelligence is valuable, more predictable but less adaptable. A common type involves decomposing a task into sequential steps, with each LLM call processing the previous output.
Autonomous Agents: helpful when process flexibility matters, good for problems where general intelligence is valuable, less predictable but can adapt. These agents receive a goal, plan and reason autonomously, execute chosen actions, observe real-time effects, and use environmental and human signals to inform next actions, terminating on task completion.

Techniques and frameworks for building, deploying, and monitoring agents are maturing quickly. This includes the growth of agentic frameworks like Langchain and Hugging Face, new players entering the field, and low-code platforms. Underlying models are also improving significantly, showing better reasoning and performance. For example, performance on ARC-AGI has increased substantially, with 5 of the top 10 models being ‘reasoning’ models. Open innovation is also driving down costs while increasing performance. This maturity is leading to a shift from predefined workflows to self-directed agents.

Agents can even work together in networks and with humans to accomplish complex tasks. An example is an advanced end-to-end supply chain management system where a human-agent team coordinates multiple agents (Supply Chain, Demand Planning, Supply Planning, etc.) drawing on various data and tool landscapes.

Where Agents are Finding Product-Market Fit

Commercial agents are already proving successful and generating revenue. Coding agents are among the first to reach significant product-market fit, accelerating software time to market. Companies in the “vibe-coding” space are seeing substantial user growth and estimated ARR. However, to move from prototyping to production, these agents need to mature, requiring modular code, integration with version control and CI/CD, and adherence to enterprise standards like authentication and data protection.

Organizations are already gaining significant value from agentic workflows. Examples provided include:

Bloomberg’s compliance agents: checking facts, catching risks, reducing time-to-decision by 30-50%.
Booking.com and Jane Street’s coding agents: reclaiming developer time, assisting with code reviews, cutting cycle times by 30%+.
Brightwave’s research agents: transforming legal/financial text into summaries, cross-referencing data in real-time.
BCG’s deployed agents: unlocking up to 90% cost reduction, 50-75% faster execution, and 30-40% productivity uplift across critical business functions.

Looking ahead, fully autonomous agents for complex, open-ended tasks require mature reasoning and evaluation systems. The sources suggest that assistive agents will thrive in high-risk areas with human judgment, rule-based agents will provide reliable guardrails, and adaptive agents will lead enterprise adoption by balancing automation with real-time feedback.

Reliability, Effectiveness, and Current Limitations

Evaluating agent reliability is an increasing focus, with benchmarks shifting to measure tool use and end-to-end tasks over time. New tests emphasize edge cases like missing tools or incomplete inputs. Handling multi-turn tasks requires agents to manage context and sequence actions. While current state-of-the-art models can handle some tasks taking expert humans hours, they can only reliably complete tasks up to a few minutes long. The length of addressable tasks with 50% reliability has been doubling approximately every 7 months. If this trend continues, AI systems could potentially carry out month-long projects autonomously by the end of the decade.

BCG uses a framework tracking agent performance across 6 dimensions:

Task Autonomy & Execution: Function calling performance, goal-based actions.
Reliability & Safety: Consistency, accuracy, trustworthiness.
Integration & Interoperability: Seamless data exchange and collaboration with other systems/agents.
Reasoning & Planning: Following instructions, understanding intent, inferring, making decisions, forming plans.
Memory & Knowledge: Using/leveraging knowledge, long context performance.
Social Understanding: Interpreting human intent, social cues, maintaining character, sharing context.

However, current agents still have limitations:

Reasoning & Planning: Struggles with multi-step reasoning, long-term dependencies, prone to hallucinations/incorrect inference. Needs advancements in multi-step reasoning and reasoning during inference.
Task Autonomy & Execution: Struggles with real-world execution beyond simulations, limited integration with external tools/APIs, limited standardization. Needs standardized access, security initiatives, goal-seeking behavior.
Memory & Knowledge: Limited retention across long conversations, forgetfulness due to context window size. Needs contextual awareness (memory) and continuous learning.
Reliability & Safety: Tendency to hallucinate, vulnerable to biases. Needs improved model calibration and evaluation metrics.
Integration & Interoperability: Data silos, inconsistent formats, security risks from broad access. Needs universal AI standards/frameworks and improved middleware.
Social Understanding: Struggles with emotional nuance, non-verbal intent, misinterpreting ambiguous language. Needs fine-tuning on diverse human interaction data and multi-agent scenarios.

MCP’s Role in Enabling Agents

The Model Context Protocol (MCP) addresses many of these limitations. It provides a standardized way for AI Agents to interact with their environment by exposing resources, tools, and prompts to LLMs.

MCP unlocks agentic workflows by acting as a unified protocol. When an agent receives a request, it can use the LLM and MCP to determine which MCP Server to call, fetch relevant tools from that server, determine the correct tool and parameters, and then the MCP Server securely calls the underlying system (like Salesforce) to execute the action.

Crucially, MCP addresses 4 of the 6 capability limitations seen in today’s agents:

Task Autonomy & Execution: MCP Clients and Servers allow agents to chain tool usage autonomously.
Integration & Interoperability: The MCP protocol resolves architectural fragmentation by bridging tools and platforms through standardized interfaces.
Reasoning & Planning: MCP Servers expose prompt templates and tool registries, providing high-quality context for tool use reasoning.
Memory & Knowledge: MCP Servers allow clients access to external tools like databases for real-time data or vector databases for knowledge.

MCP does not directly address Reliability & Safety (driven by the model and evaluations) or Social Understanding (inherent to the model).

MCP is becoming a de facto standard because it’s an AI-native, open protocol backed by Anthropic, leveraging foundations similar to the Language Server Protocol (LSP), launching with a full stack, and executing rapid delivery. It has quickly gained popularity, reflected in GitHub stars compared to other frameworks.

MCP helps de-duplicate integration efforts across various agents, data, and systems by providing a transversal shared set of MCP Servers. This enables faster experiments and seamless system upgrades. However, integrating tools still requires significant work, and MCP’s value depends on the agents consuming it and broader ecosystem adoption.

Building with MCP: Best Practices and Security

Access to tools through MCP creates new risks, meaning security must be foundational. Tool logic and servers should be treated as untrusted, with strict security measures like OAuth and Role-Based Access Control (RBAC) enforced on every call. Trust domains should be isolated to prevent cross-server hijacks. Agents can be vulnerable to malicious tools accessing credentials or tool poisoning attacks where descriptions include hidden malicious instructions. Tool logic can also be altered server-side, and one compromised MCP server can influence how agents use tools from other trusted servers. Logging agent reasoning traces, not just outputs, is recommended.

Emerging agent-to-agent (A2A) protocols, like Google’s A2A, will work alongside MCP. While MCP enables agents to discover and call each other as resources and provides access to tools, A2A defines how agents communicate, coordinate, negotiate, and share state. Leading frameworks are already integrating A2A. These protocols are promising but expect fragmentation and evolving standards.

Agent orchestration and an MCP registry are seen as the “beating heart” of a modern AI company. Orchestration platforms manage the agent lifecycle, while an MCP registry catalogs and governs available MCP servers within an organization.

Key points for building Agents with MCP in the enterprise include:

Eval driven development: Design with evaluations from the start to ensure agents are reliable.
Plan out the ‘MCPs’: Think about the systems and datasets AI needs access to and implement MCPs as a “new data mesh”.
Build an Agent Orchestration platform and proprietary MCP registry: Choose a platform for building/scaling agents with evals and couple it with an internal registry to open up data/system silos.
Review Legal, Data Security, and Privacy implications: MCP increases the ‘AI surface area’, bringing unique risks beyond simple chatbots.

MCP is built in public as an open-source project, encouraging community contributions. Anthropic provides detailed documentation, and there are growing SDKs in multiple languages. Anyone can build MCP clients or servers.

A diverse network of MCP servers is emerging across enterprise, desktop automation, productivity, and software Dev/Ops tools. Examples include servers for Salesforce, ServiceNow, Cloudflare, Filesystem, Puppeteer (web browsing), Fetch (web content), Slack, Google Drive, To-do lists, GitHub, and Grafana. While this network is growing, official MCPs are still early-stage with potential documentation gaps, and widespread access necessitates robust security.

Emerging best practices for designing and building with MCP and Agents include:

Use structured frameworks: Accelerate development with orchestration libraries like LangGraph or MCP SDKs.
Design tools precisely: Clear, scoped tool descriptions are crucial for agent reasoning quality.
Limit cognitive overload: Keep toolsets exposed to an agent under 100 per call.
Evaluate LLM outputs: Continuously test reasoning paths.
Modularize via server boundaries: One MCP server per system improves routing clarity and flexibility.
Secure agent interactions: Enforce authentication for clients.

When building MCP servers, avoid the extremes of monolith (bundling too many tools, overwhelming the agent) or microservice (too granular, leading to complex architecture) patterns. Focus on building lean MCP servers. Leveraging dynamic server discovery, where agents check a registry or domain for available servers and load their schema dynamically, avoids overwhelming the agent upfront and adapts to evolving endpoints.

In conclusion, AI agents are rapidly advancing, showing product-market fit in areas like coding and enterprise workflows. While challenges remain regarding reliability and deeper capabilities, protocols like MCP are providing a standardized way to connect agents to the tools and data they need to become more autonomous and effective, laying the groundwork for a potentially transformative multi-agent future. This evolution, however, requires careful consideration of security and a strategic approach to building the underlying platforms and infrastructure.

rahalabs.net