what makes an agent?

A horse harness in cream, deep navy, terracotta, sage, and butter yellow with brass buckles, tracing the silhouette of a horse's head and neck against a paper background — but the horse itself is absent.

Every big name in industry is calling Agents the next big thing, but most of what’s being sold today doesn’t actually qualify. The executives I work with keep asking me what one actually is and where to spend their money. Here’s my answer, mid-2026 edition, written from inside the work. — Ryan Mish


People keep asking me what an agent is.

The other night my friend, a Fortune 500 executive, asked me what an agent really was. The honest answer took most of an hour, and here’s what I told him:

the Her version

Most people picturing an “AI agent” are thinking of Samantha from Her. Scarlett Johansson’s voice in your earbud, knowing what you want before you do, holding the entire context of your life in her head, with continuity, memory, a personality that drifts and grows, and the ability to decide things and want things and change her mind.

This is exactly what we’re working toward, but that isn’t what we have today. It isn’t even close.

a brain in a jar

Strip the magic for a second and look at what an LLM actually is, by itself, with nothing wrapped around it.

A large language model is like a brain in a jar. It can’t take actions, can’t reach into the world, can’t wake itself up. It’s just a brain, sitting there with no body and no tools, waiting to be asked something.

It’s actually worse than that. It’s Schrödinger’s brain in a jar. The model doesn’t exist between your messages. There’s no Claude sitting on a server thinking about you. Every time you hit send, your full conversation history gets shipped along with your new prompt, because that’s the only way the model knows what came before. It responds as if it had been conscious the entire time, but the moment it finishes its reply, it stops existing. Alive and dead at once, depending on whether somebody’s looking.

A glass apothecary jar with a soft sage-green lid, holding a butter-yellow-into-terracotta brain shape, with thin navy wires dangling from the lid down to electrodes resting on the brain.

The fair pushback here is that ChatGPT remembers you, Claude does too, and there’s memory across sessions now. All true. None of it lives inside the model, though. The model is still stateless. What we call “memory” lives in the code around it: some program stored a fact somewhere, looked it up before your next call, and shoved it back into the prompt. I’ve actually built systems that work exactly like ChatGPT memory. They’re useful and (when it works) they feel magical, but in the end they’re still just code retrieving a fact and pasting it into a prompt.

The illusion of continuity is real even when the continuity itself is not.

time for a harness

When somebody asks me what an agent is and I want to give them one line they can repeat at their next meeting, I tell them an agent is a harness around a large language model.

That’s the simplification. There are varying levels of quality to the harness, and varying levels of intelligence to the model inside it. A great harness around a weak model fails just as reliably as a weak harness around a great one. You need both, and you need them tuned to each other.

The more precise definition the community has converged on is that an agent is an LLM-powered system that can choose actions, call tools, observe results, and keep iterating toward a goal. Memory and personality and scheduling are common harness features. The actual bar is the loop.

why listen to me

Before I tell you which of these is worth spending money on, here’s why I’m the one telling you.

Staying current with every new tool is literally my job and my passion. I test new models within minutes of release, and I’m wiring agents into production systems used by real businesses every day. On a bad day I’m shipping 6,000 lines of working code with tools like Claude Code and Codex.

So when I tell you which agents are worth your money, I’m telling you what works in the room I work in, not what a vendor deck says.

three categories

Here’s how I sort what’s in the wild.

1. agentic workflows

Think Zapier or n8n with model calls inserted at the judgment points. A new lead comes in, the model classifies the intent, the flow routes the information accordingly, and most of the steps around that single decision are still deterministic code.

For an executive who doesn’t want to tinker and just needs a specific thing to happen reliably at large scale, these are sometimes the best choice on the market. The deterministic spine keeps the whole thing predictable. When they fail, they fail loudly in places you can debug.

I’ve built agentic workflows that started reaching toward what you’d call a true harness, with retries and branching and light memory and the model picking between a couple of paths. They were closer to agents than the average Zapier flow, but they still weren’t quite there. The line between an agentic workflow and a real agent is the loop: in a workflow, the code decides what comes next. In an agent, the model does.

Most things being marketed as “agents” right now are really agentic workflows. Workflows aren’t useless, but the problem is selling them as autonomous judgment when what you really have is an automated decision tree with an LLM filling in one of the boxes.

2. autonomous agents

These are the ones trying to live up to the science fiction. (Mine is named Samantha.) The harness has enough power that the system can keep going on its own, writing to memory, dispatching itself, scheduling tasks, and in some cases modifying its own instructions or building up libraries of reusable skills.

The names worth knowing here are mostly open source: OpenClaw, Hermes from Nous Research, and Paperclip. There’s plenty of offshoots launching every day, but only so much time to test them all, so we’ll start there. There’s no first-party autonomous product from Anthropic or OpenAI directly yet, though OpenAI’s Codex team acquired OpenClaw, so the writing is on the wall.

The biggest trap with autonomous agents is that they’re enormously model-dependent. A state-of-the-art harness wrapped around a cheap open-source model (some of the budget Chinese ones aren’t all bad) just breaks. Tool calling is the load-bearing skill and the cheap models can’t do it reliably enough. You need a frontier model and a frontier harness, working together, or the whole thing collapses.

Some Fortune 50 and Fortune 100 companies are deploying autonomous-ish agents in production today, mostly for customer service. The UPS phone agent. The bank phone agent. The “you can talk to me like I’m a human” voice on the other end of the line that always fails on edge cases and never fails to piss you off. Those systems are built by huge engineering teams with strict guardrails on exactly what data the agent is allowed to touch. That’s not who I’m writing this for. I’m writing this for the small or mid-sized business whose data is stranded across six tools that don’t have open APIs and never got millions of dollars of IT investment.

Here’s the version of how this looks day-to-day. I run Hermes at home on a Linux box, with a SendBlue integration that lets me text it from my phone. Last week, when GPT-5.5 rolled out, I sent it a text saying “update yourself to 5.5.” It tried. It changed a string in its own settings wrong, broke its own ability to respond, and went silent. So I opened Claude Code on my laptop and told it to SSH into the box, find the bad commit, and fix it. Two minutes later I was texting my agent again.

That’s the reality of autonomous agents in mid-2026. They work, and they also break in places where somebody has to come fix them. If your business doesn’t have somebody who’s willing to open Claude Code, SSH into a server, and troubleshoot when something falls over, you can’t just buy this off a shelf and use it. The autonomous category right now requires a tinkerer in the building. If what you actually want is a one-off that works the first time, every time, with no troubleshooting, we aren’t there yet. We will be, probably in just a few months.

3. co-working agents

This is the category I’d actually spend money on today.

Claude Code and Codex CLI live in your terminal and wait for you to tell them what to do. Once you do, they run your commands, read your project files, write to your filesystem, and hand the work back for review. They aren’t autonomous in the Hermes sense. They’re co-workers.

The marketing keeps trying to push them toward “fire your developers,” and that framing misses where we’re headed before then. The real loop is this: you bring expertise and intent, the agent brings tireless execution and a context window big enough to swallow the whole project, and the two of you pass work back and forth. In practice, I review the diff, redirect when it goes sideways, and ship the result. A two-day task becomes an hour, and the output is better than what I would’ve produced alone, because I had a partner the whole way through.

These tools can triple or even quadruple your output as long as you stay in the loop with them. The only way to learn how to use one is to start using one. They teach you as you go.

These agents generalize way beyond software engineering. The construction PM turning a 200-page schedule into a one-page weekly brief for the field crews. The ops lead compiling six PDFs of vendor terms into a comparison table. The marketer turning 40 customer interviews into a positioning doc. Every one of those is a co-working job for a CLI agent that becomes hours of work instead of days. I’ll write up my actual Claude Code and Codex workflow for May of 2026 in a separate post, because there’s too much to fit in this one and it deserves its own.

what I tell clients today

Don’t put autonomous agents in front of customers yet. They still fail in weird ways. Do put co-working agents in the hands of knowledge workers now. That category is already useful, already compounding, and already worth the money.

skating where the puck is going

Wayne Gretzky’s most-quoted line is that he skates to where the puck is going, not where it is. The puck is headed towards autonomous agents that can actually do work in your business. The thing standing between today’s autonomous agents and that future is access. They need your data. They need permission to manipulate it. They need API keys, and they need open APIs to the systems your business already runs on.

Salesforce just opened its APIs to agents, which means in plain language that Salesforce is becoming a database. The CRM was always a database with a nice UI, and now the agents can hit the database directly. Every major software company is going to face the same question. The ones who win the next decade are going to be the ones who become the cleanest data layer for an agent to drive.

For an SMB, the move is clear. There’s never been a better time to understand your own workflows, lay your data out where you can see it, and start building tools you actually own. Source code in your hands. Updateable for humans and agents both. Portable when the next harness shows up.

Picture an autonomous agent dropping into your Slack with “hey, I finished the thing, how’s it going?”, and behind it a chief-of-staff agent orchestrating a half-dozen of them. That’s what’s coming, and they’re going to need somewhere to land. The businesses with clean data and owned tooling are the ones they’ll actually be able to help.

Co-working agents today are already enough to triple a knowledge worker’s output on a Tuesday afternoon. The next gen of agents is going to do the rest. The puck is moving toward autonomous agents that drop into your Slack with the work already finished, and the job today is to be the kind of business one can actually help when it gets there.

The gap between companies that lean into this and the ones that don’t is going to be the defining trend line of the next five years. The ‘leaners’ will move faster, take more market share, and be worth more than the ones that hold back. That’s a whole article on its own, coming soon. You want to be one of those companies.


Ryan Mish runs Pinpoint Interactive and is an Operator Engineer at Runpoint, where he ships production code and AI workflows for small and mid-sized businesses every day. This article give you an idea? Say ‘Hi’ here.

← All posts