The definition that actually holds up
AI researcher Lilian Weng's formula, widely cited in technical literature: Agent = LLM + Memory + Planning + Tool Use. The Oracle Developers Blog elaborates: 'The agent loop is the runtime that ties those four pieces together.' An AI agent is a system in which a large language model serves as the reasoning engine — deciding what to do next — while operating in a continuous loop that can call external tools, store and retrieve memory, and take real actions in external systems.
IBM's 2026 technical documentation frames the key distinction: 'AI agents solve complex tasks across enterprise applications by using the advanced natural language processing techniques of large language models to comprehend and respond to user inputs step-by-step and determine when to call on external tools.' The phrases 'step-by-step' and 'external tools' are the operative concepts. A chatbot generates a single response. An agent generates a sequence of steps and executes them with real external capabilities.
The simplified practitioner view from Reddit r/aiagents: 'The entire AI agent architecture is just a list and a while loop — a while loop and less than 20 tool calls attached to an LLM session.' This is technically correct and usefully grounding. The sophistication of modern agents is in what they do within that loop, not the loop itself.
The four components
Every AI agent has the same four components. The LLM is the brain — generates reasoning, decides which actions to take. Memory stores context across steps and sessions: in-context (the current window), short-term external (Redis or in-memory stores for the current task), and long-term external (vector databases, knowledge graphs, markdown files for facts that persist across weeks). Tools are the actions the agent can take: web search, browser navigation, terminal commands, file read/write, API calls, code execution. The runtime is the execution engine running the loop — LangChain, CrewAI, LangGraph, Hermes, or a custom implementation.
The perceive-reason-act-check loop: Perceive (receive input from the user, a schedule trigger, or a previous step's output), Reason (the LLM analyzes the current state and decides the next action), Act (execute a tool call — web search, browser click, API call), Check (examine the tool result and determine whether the goal is complete or another step is needed). This continues until the task is done or a maximum step limit is reached.
Tools are defined with structured contracts: a name, a natural language description, and a JSON schema specifying arguments. The model reads these definitions in the system prompt and generates valid function call JSON. The Model Context Protocol (MCP), donated to the Linux Foundation in 2026 by Google (Agent2Agent) and Anthropic (MCP), is the emerging open standard for how agents discover and call tools across different providers and deployments.
How it differs from a chatbot
The comparison from The AI Corner: 'A chatbot is a calculator; an agent is an employee.' A calculator takes an input, produces an output, and stops. An employee takes a goal, figures out the steps to take, takes them, handles errors, asks for clarification when needed, and produces a completed result that required real actions in the world.
Concretely: a chatbot makes one LLM inference call per user message. An agent makes N inference calls interleaved with N tool calls to complete a single task — where N might be 5 for a simple research task or 50+ for a complex multi-step workflow. The agent controls external systems between those inference calls. The chatbot does not. And an agent runs while you are not present. A chatbot does not.
Where agents are delivering real ROI in 2026
Klarna deployed AI agents equivalent to 700 full-time employees for customer interactions in 2025. Salesforce attributed 4,000 job role reductions to Agentforce. UPS reduced workforce by 20,000 employees partially through AI automation. These are large enterprise deployments — but the ROI map extends to smaller operations too. From IndieHackers, a founder building toward $1M ARR: 'I deployed a conversational AI chatbot that handles 80% of customer inquiries automatically.' A developer on Reddit: 'My workflow is about 80% AI-generated code now — not in the let AI do whatever sense but more like being a senior reviewer who delegates scoped tasks and evaluates output.'
Gartner adds useful calibration: 72% of CIOs in 2026 have not yet broken even on AI investments, and Gartner predicts 40%+ of agentic AI projects will be cancelled by 2027 due to unclear ROI, governance failures, or security issues. The ROI is real for well-scoped automation of repetitive, structured tasks. It is not yet real for every organisation that deployed something in 2025 because it was the thing to do.
The honest limitations in 2026
Context drift: at turns 10-15 in a long agent session, reasoning quality on most models degrades as the context fills with action/observation history. This is why long-running tasks benefit from explicit planning at the start — the written plan persists legibly even as context grows. Security: prompt injection — malicious instructions embedded in tool outputs that redirect the agent's behaviour — is an active and underresearched attack vector. The November 2025 incident in which Claude Code was misused in a cyberattack appeared in The Conversation's 2026 AI review. Agents with real access to external systems have real attack surfaces.
Reliability: agents are probabilistic. The same task given twice may produce different results via different tool call paths. For high-stakes irreversible actions, the safe pattern is human-in-the-loop checkpoints rather than full autonomy. The community question that most frequently goes unanswered in 2026: 'How do you authorize AI agent actions in production?' No single answer exists yet — it is one of the active open problems in the field.