Discussing AI, complexity, machine learning, and the future of the field with real-world examples. It explores the benefits and challenges of machine learning in a friendly and engaging way.

by

Agents for The Curious

The AI world moves at a ridiculous pace. It feels like we’re constantly being bombarded with a hot new thing, and recently, the title for “most hyped buzzword” has to go to the AI agent. You’ve seen the tweets, the demos of agents doing complex tasks autonomously, and maybe you’ve even been a bit confused about it all. Are they the next big leap, or just another clever repackaging of old ideas?

I was right there with you. The signal-to-noise ratio was terrible, so I decided to take a step back and really figure it out for myself. This blog post is a summary of that journey. My explorations, my notes, my “aha!” moments. It’s what I wish I had when I first started digging.

So let’s cut through the fluff together. We’re going to get to the core of what an agent is from a technical perspective. We’ll look at their fundamental architectures, explore the different types, and demystify some of the common buzzwords you’ve probably heard tossed around. My goal is to help you get grounded, build a solid mental model, and leave with a clear take on what this “agent hype” actually means.

The Agent

Let’s start with the word itself. The root of “agent” comes from the Latin “agēns”, which means “to do” or “to act.” It’s a fitting origin for a concept that is fundamentally about something that produces an effect. In the world of AI, an agent is a tool that does exactly that: it acts. It has “agentic” qualities: the capacity for independent decision making and proactive behavior.

But for me, that description still feels a little fluffy. It’s a great starting point, but it doesn’t really scratch the itch. To truly get a handle on what an AI agent is from a technical perspective, we need to look past the dictionary definition and dig into the actual components that make these systems work.

Types of Agents:

Okay, so we’ve grounded ourselves on what an agent is at a high level. But what does that actually look like in practice? Not all agents are created equal. They exist on a spectrum of complexity, from simple, almost mechanical systems that just react, all the way up to sophisticated, self-improving machines. The real magic and the key to understanding this space is seeing how these different levels build on each other. Let’s peel back the layers and walk through the main types of agent architectures you’ll encounter.

Simple Reflex Agents

The most basic kind of agent you can build. It’s all about direct, instant reactions. A simple reflex agent looks at the current situation, and only the current situation, and then takes action based on a set of rules. Think of it like a thermostat: if it’s too hot, turn on the AC. It doesn’t remember what happened five minutes ago, or what the weather forecast is for tomorrow. It just acts on the present state of the environment.

Model-Based Reflex Agents

This is a big step up. A model-based reflex agent still reacts, but it’s not blind. It carries a “world model,” or an internal representation of the environment, which it keeps updated. This allows it to use history and context to make more informed decisions. By tracking how the environment changes over time, it can make predictions about what will happen next. This lets it react more dynamically and with more foresight than a simple reflex agent, even without a formal goal.

Goal-Based Agents

Now we’re getting proactive. A goal-based agent takes the world model and adds a critical piece: a specific objective. It can distinguish between a successful state and a non-successful state. These agents aren’t just reacting to the environment; they’re actively working towards a target, like a GPS navigating a car. This is where we start to see the early signs of strategic thinking and purposeful, goal-directed behavior.

Utility-Based Agents

This is the advanced evolution of a goal-based agent. A utility-based agent doesn’t just ask, “how do I reach my goal?” It asks, “what’s the best way to reach my goal?” These agents have a utility function that quantifies how desirable a given outcome is. They can weigh different factors and make tradeoffs to find the optimal path. For example, a navigation app might consider not just the fastest route, but also the most fuel efficient, or the one with the fewest turns, optimizing for a higher “utility” score.

Learning-Based Agents

This final category is more of a meta-architecture than a distinct type. A learning-based agent can be any of the above, but with a crucial added capability: it learns and improves over time. By using techniques like reinforcement learning, it can adapt its behavior without being explicitly reprogramed. This is the secret sauce for real-world adaptability and is key to the vision of truly self-improving, dynamic systems that can face new challenges and evolve their own strategies.

Agent Collectives

As problems get bigger and more complex, a single agent just can’t cut it. This is where we shift our focus from a lone brain to a whole collective. It’s about how to get multiple agents working together—from simple teams to highly coordinated systems.

Hierarchical Agents

Think of this like a company or a military command structure. You have a boss at the top (a higher-level agent) whose job is to take a huge, hairy problem and break it down into smaller, more manageable tasks. They don’t do the work themselves; they just hand off these sub-tasks to agents at the next level down. Those agents might break it down further, and so on, until the simplest tasks are handled by a team of specialist agents at the bottom. This structure is a perfect way to manage a massive cognitive load and use specialized skills to solve a single, giant problem.

Multi-Agent Systems (MAS)

This is a more general idea: a team of agents, all interacting with each other to get a job done. The agents can all be the same type, like a swarm of bots searching for something, or they can be totally different, like a team of specialists collaborating on a project. In an MAS, the agents don’t necessarily have a top-down command structure. Instead, they talk to each other, share information, negotiate who does what, and sometimes even compete. It’s all about leveraging distributed intelligence to solve problems that are too big for any single agent.

Orchestrator Agents

In a complex Multi-Agent System, you often need a project manager. That’s the role of the Orchestrator Agent. This agent doesn’t necessarily have specialist skills for the main task, but it has one critical job: making sure everyone else is on the same page. It acts like a conductor in an orchestra, directing the different specialist agents to ensure their individual actions are synchronized and aligned with the overall goal. The orchestrator handles resource allocation and resolves conflicts, making the entire collaborative system run smoothly.

Intentional information control.

This is where things get interesting. Instead of letting all agents see everything at once, you deliberately give them separate, limited contexts. This allows each agent to focus deeply on a specific part of the problem without getting distracted by irrelevant noise. After they’ve each had a chance to work on their piece independently, a higher-level agent can then strategically combine their different opinions and insights. This approach prevents groupthink, reduces bias, and ultimately leads to a more robust, well-reasoned solution than if they had all started with the same information. In advanced systems different solutions to the same situation can be run in parallel and combined and evaluated either periodically or at key points.

Burrowing opportunities & Side quests

Tool Calling

Tool calling is the mechanism that allows an AI agent to extend its capabilities beyond its core knowledge. An agent, often powered by an LLM, identifies when a task requires an external resource like an API or database. The agent then uses planning and parsing modules to format the request, invoke the tool, and interpret the output to complete the task.

Planning techniques:

Planning is a core component of advanced AI agents, enabling them to map out sequences of actions before execution. It involves breaking down complex problems into smaller, manageable tasks and logically sequencing them to achieve a goal. Agents use logic, machine learning models, or predefined heuristics to determine the best course of action.

Agentic Behavior Trees:

ABTs are a structured, hierarchical model for controlling AI agents. Originating from robotics and game development, they offer a visual, modular way to manage an agent’s decision-making and actions. Instead of a linear sequence of states, a behavior tree consists of nodes (selectors, sequences, and actions) that tick in a hierarchical manner. This allows agents to perform complex, multi-step tasks, and crucially, to react to changes in their environment in real time, making them more adaptable and resilient than traditional finite state machines.

MCP:

Model Context Protocol (MCP) is an open standard designed to give AI agents a standardized, “plug-and-play” way to connect with external tools and services. Instead of requiring custom integrations, MCP uses a structured protocol for an AI application (the “host”) to communicate with servers that act as smart adapters for different tools, enabling complex, multi-step tasks.

Guardrails:

AI guardrails are essential safeguards that ensure generative AI systems operate ethically, safely, and within an organization’s policies. They function as a layered defense, preventing harmful, biased, or misleading outputs by filtering training data, aligning a model’s behavior, and enforcing post-deployment controls. For autonomous agents that act independently, guardrails provide critical boundaries for their behavior.

Caching:

Caching is used to manage an AI agent’s short-term memory, often storing recent conversations to maintain context and coherence within a session. This temporary storage is simpler than long-term memory solutions and helps agents avoid needing to re-process past messages repeatedly. Efficient caching prevents performance trade-offs by providing quick access to recent information, improving the user experience.

Role Play:

AI agent role-playing is a technique used in both research and commercial applications. Researchers use it to create interactive artificial societies that reflect believable human behavior, aiming to develop agents that can perform tasks in a human-like manner. In a practical business context, it is used in platforms to create realistic sales scenarios for training and coaching purposes.

Human in the loop:

Human-in-the-loop is an orchestration pattern for AI agents where a person can be involved in the workflow to guide or manage the process. In a “dynamic chat manager” role, a human can guide conversations to productive outcomes. In these scenarios, the agents typically operate in a read-only mode, without making any changes to a running system.

RAG:

Retrieval-Augmented Generation (RAG) is an AI framework that connects a generative model with an external knowledge base to enhance accuracy and context. It works by retrieving up-to-date information from sources like databases or the web and incorporating it into the LLM’s response. This process helps to mitigate hallucinations and provides factual grounding for the output.

Finally

So after all this digging, where does it leave us? For me, the whole journey from simple reflexes to sprawling multi-agent systems led to a more philosophical conclusion. It forced me to boil the agent concept down to its absolute essence. If I had to sum it all up, I would say:

An AI agent is a system that takes actions to converge onto a specified long-term goal or state, despite the randomness of its underlying components and the chaos of a changing environment.

This definition is what ties everything together. It’s the unifying idea that cuts through all the different architectures, from the simplest reflex loops to the most complex hierarchical systems. It gets to the core of what an agent actually does. I could go on for hours about why I believe this is the ultimate takeaway, but for now, I’ll leave you with that thought. I hope this gives you a solid anchor point for your own thinking.

Some Useful Links

https://viper.cs.columbia.edu

https://arxiv.org/pdf/2404.14394

https://www.crewai.com/open-source

https://www.youtube.com/watch?v=kJLiOGle3Lw