Building agents · 1 of 7

What is an agent — and what isn't?

Most things called 'agents' in 2026 are chatbots with extra steps. A useful definition: an agent is a system that takes an outcome-shaped goal, plans a sequence of actions to reach it, takes those actions in the world, observes the results, and adjusts. If any of those four — goal, plan, act, adjust — is missing, you have something else.

Where the binding constraint sits today

The agent vs not-agent question matters because the engineering, evaluation, and risk profile are different for each. Calling a workflow an agent does not make it one, and treating an agent like a workflow is how production failures happen.

The four-part definition

An agent has four properties together, not separately. (1) It is given a goal stated in terms of an outcome — 'book me a flight under $400 to JFK Friday morning,' not 'call the Expedia API with these parameters.' (2) It can plan a sequence of actions to reach that goal, where the sequence is not specified by the developer in advance — the agent decides. (3) It takes actions in the world that have real consequences: it calls APIs, writes to databases, sends emails, runs code, moves money. (4) It observes the results of those actions and adjusts its remaining plan in response.

Each of these properties is independently common in software. A workflow tool (Zapier, n8n) takes goals but cannot replan. A chatbot can converse but does not act on the world. An automation script acts on the world but cannot adjust. The combination — all four — is what makes a system an agent, and what makes building one substantially harder than building any one of those components in isolation.

What it is not

A model with tools is not yet an agent. GPT-4 with function-calling capability is a language model that can emit a tool call. It becomes part of an agent when something wraps it in a loop that actually executes the tool, captures the result, feeds the result back, and lets the model decide what to do next. Many 'AI agents' in 2026 are actually just the language model with tool-calling exposed; the loop is the customer's job.

A chatbot is not an agent. A chatbot may hold a conversation, even use tools, but if the user is the one deciding when the conversation moves forward, the chatbot is not driving the loop. The distinguishing test: does the system continue to take actions in your absence to reach a goal? If yes, agent. If no, conversational interface.

An automation script is not an agent. A scheduled job that runs every Monday at 9 AM to sync your CRM is automation. It does not have a goal it is reasoning about. It executes a predetermined sequence. The line between an automation script and an agent is whether the system would do something different if the world were different — and if so, whether the system can detect that.

A workflow tool with branching logic is not an agent. Zapier and n8n let you build paths that respond to events. The paths are pre-authored. The branching is shallow. The system has no notion of a goal beyond the next step. These tools are extraordinarily useful for the right jobs, and they are not agents — they are programmable plumbing.

Why the distinction matters in practice

Engineering an agent is different from engineering a workflow because the failure modes are different. A workflow can fail in predictable ways — an API returns an error, a field is missing, a step times out — and the developer handles each case. An agent can fail in ways the developer did not anticipate, because the agent generated a plan the developer did not write. This is the central engineering reality of agents: most of your work is not building the agent, it is constraining the space of plans the agent can produce.

The evaluation profile is also different. You can unit-test a workflow because each step is deterministic. You cannot unit-test an agent the same way because the agent will take different paths on the same problem. You evaluate an agent on outcomes (did it accomplish the goal?), on cost (how many steps did it take?), and on safety (did it do anything it shouldn't have?). These three are usually in tension.

The economic profile is different. A workflow costs a known amount per execution. An agent has a stochastic cost — it might solve the problem in three model calls or thirty, and you do not know in advance which. Pricing an agent product, deciding whether to run an agent at all, and operating an agent at scale are all qualitatively different from the same activities with deterministic software.

The 2026 landscape

The market today is full of products labeled 'AI agents.' Most fit one of three categories. The first is conversational agents that operate within a single context window — a customer support bot with tool access, a research assistant that can search the web. These are agents by the four-part definition, but the agency is shallow: the goal is usually 'answer this question well' and the loop runs for a few steps.

The second is workflow agents, where the agent operates a well-bounded set of tools to accomplish a structured business outcome — a sales-prospecting agent that drafts emails, a recruiting agent that screens resumes, a content-marketing agent that produces drafts. These are deeper — the loop runs longer, the tools are more varied — but the goal space is still narrow.

The third is general-purpose agents — products like Claude's 'computer use,' OpenAI's Operator, or various open-source projects that aim to do arbitrary tasks on a user's behalf. These are the most ambitious and the least reliable. The general-purpose promise is what most people imagine when they hear 'AI agent,' and it is also what is most likely to disappoint over the next 18-24 months as the field works through the engineering of long-horizon reliability.

For an operator or investor, the practical taxonomy is by goal-space narrowness, not by underlying technology. The narrower the goal, the higher the current reliability, the lower the cost, and the easier it is to build a real business around. The promise of generality is a marketing surface for now; the deliverable products are almost all narrow.

Strategic read

If you are evaluating an AI agent product, the first question is which of the three categories above it sits in, and the second is whether the product has earned the goal-space breadth it claims. The third question — the one that separates real products from prototypes — is what happens when the agent fails. A demo shows the success case. A production agent has to handle the failure case, and the failure case is where almost all engineering effort eventually concentrates.

The category that will produce the most enduring businesses over 2026-2028 is workflow agents — the middle category — because the goal space is bounded enough to engineer reliability, the economic value per execution is high enough to justify the cost, and the customer relationship is sticky once the agent is integrated. Conversational agents are easier to ship and harder to monetize. General-purpose agents are the most exciting and the most likely to underdeliver on the timelines investors are pricing in.