What a multi-agent system actually is
A single agent runs one action loop: one LLM call, one tool call, one observation, repeat until done. A multi-agent system uses multiple agents running concurrently, each with a defined role, coordinated by an orchestrator that routes work and aggregates results.
The analogy from the f3fundit.com multi-agent orchestration guide (March 2026): 'Your AI agent handles customer support. Another scrapes competitor pricing. A third writes follow-up emails. They all work, but they don't talk to each other. You're running three isolated agents when you need an orchestrated system. The difference? Orchestration lets Agent A pass context to Agent B, which triggers Agent C only when conditions match.' That coordination layer is the whole point.
This is not about replacing a single agent with many for the same task. Specialization is the goal — a researcher who only does research, a coder who only writes code, an ops agent who monitors both — and then wiring them with defined handoff protocols so the system produces something none could produce alone.
The main orchestration frameworks in 2026
LangGraph is the most widely deployed multi-agent framework in production as of 2026. It models agent workflows as directed graphs — nodes are agent steps (LLM calls, tool calls, human checkpoints) and edges define the flow between them. This gives precise control over execution order, conditional branching, and parallel execution. LangGraph integrates with PostgresSaver for checkpointing and supports time-travel debugging. Verbose to set up, but precise for production workflows.
CrewAI takes a higher-level abstraction: define agents with roles (researcher, writer, analyst) and tasks, and CrewAI handles orchestration. Faster to prototype than LangGraph but gives less control over exact execution flow. The March 2026 AutoGen vs CrewAI comparison at f3fundit identifies CrewAI as the fastest path to a working multi-agent prototype — 'no manual glue code, no brittle cron jobs, just workflow logic that adapts.' AutoGen (Microsoft) focuses on conversational multi-agent patterns — agents that communicate through dialogue rather than explicit task handoff. Best fit for workflows where agents debate or iterate on outputs.
Hermes Agent supports multi-agent coordination through subagent delegation: the primary agent can spawn up to 3 concurrent subagents, pass them structured tasks, and aggregate their outputs. This is built into the core tool set, not a separate framework layer. For Hermes OS users, a scheduled research task can automatically spawn a browser agent, a summarization agent, and a synthesis agent running in parallel — no additional configuration required.
Orchestration patterns and when to use each
Sequential pipeline: Agent A produces output, passes it to Agent B, which passes to Agent C. The simplest pattern. Use it for workflows where each stage depends on the previous output: data extraction → cleaning → analysis → report. Error isolation is clean — if stage B fails, stages A and C are unaffected.
Parallel execution: Multiple agents run concurrently on independent subtasks. An orchestrator assigns N tasks, waits for all N, and synthesizes. Use it for tasks that can be parallelized without dependency: N research agents each covering one competitor, results aggregated by the orchestrator. Hermes's 3-concurrent-subagent limit covers most practical parallel research use cases.
Hierarchical (supervisor pattern): A supervisor agent decomposes a complex task, assigns subtasks to worker agents, reviews their outputs, and either accepts results or sends workers back for revision. The most powerful pattern for complex reasoning tasks, and the most expensive — the supervisor runs inference on all worker outputs, compounding token costs significantly.
Event-driven (reactive): Agents trigger in response to conditions rather than running on fixed schedules. An alert agent triggers a research agent when a competitor page changes; a research agent triggers a reporting agent when it crosses a quality threshold. Hermes's event hooks support this — gateway hooks fire on every incoming and outgoing message, plugin hooks intercept tool calls for conditional routing.
When multi-agent is not the right answer
Multi-agent adds orchestration overhead, debugging complexity, and cost. The right choice when a single agent's context window genuinely cannot hold the full task, when different subtasks benefit from different model configurations, or when parallel execution would reduce total runtime for a time-sensitive workflow.
Wrong choice when: the task can be accomplished in a single coherent agent loop (most tasks can), you are still debugging the single-agent version (add complexity after the simple version works), or cost is constrained (orchestration overhead means 15x more tokens versus single-chat). Gartner's 2026 enterprise AI report: most teams that jumped straight to multi-agent architectures in 2025 rebuilt simpler single-agent versions after finding the overhead exceeded the benefit for their actual workloads.
What multi-agent systems cost in practice
Costs scale with the orchestration pattern. A simple sequential pipeline costs roughly 3-5x a single agent (N agents each doing a fraction of the total work, plus orchestration inference). Parallel research with 5 concurrent agents costs roughly 5-8x. A hierarchical supervisor system with 3 worker agents costs 10-15x — the supervisor runs inference on all worker outputs, compounding quickly.
Real-world: one developer team from a Promethium multi-agent platform comparison saw costs spike from $1,200/month to $4,800/month when expanding from single-agent to multi-agent across search, chatbot, and internal tooling — without centralized visibility into where the tokens were going. The fix: model routing (cheaper models for simpler subtasks) and explicit per-agent token budgets. Both are available in Hermes Agent's configuration per subagent profile.