What a multi-agent system actually is
A single agent runs one action loop: one LLM call, one tool call, one observation, repeat until done. A multi-agent system uses multiple agents running concurrently, each with a defined role, coordinated by an orchestrator that routes work and aggregates results.
The analogy from the f3fundit.com multi-agent orchestration guide (March 2026) is useful: 'Your AI agent handles customer support. Another scrapes competitor pricing. A third writes follow-up emails. They all work, but they don't talk to each other. You're running three isolated agents when you need an orchestrated system. The difference? Orchestration lets Agent A pass context to Agent B, which triggers Agent C only when conditions match.'
This isn't about replacing a single agent with many agents for the same task. It is about letting agents specialize in a way that makes a complex workflow tractable — a researcher who only does research, a coder who only writes code, an ops agent who monitors both — and then wiring them together with defined handoff protocols.
The main orchestration frameworks in 2026
LangGraph is the most widely deployed multi-agent framework in production as of 2026. It models agent workflows as directed graphs — nodes are agent steps (LLM calls, tool calls, human checkpoints) and edges define the flow between them. This graph model gives precise control over execution order, conditional branching (run this node only if that condition is met), and parallel execution. The framework integrates with PostgresSaver for checkpointing and supports time-travel debugging. It is verbose to set up but highly precise for production workflows.
CrewAI takes a higher-level abstraction — you define agents with roles (researcher, writer, analyst) and tasks, and CrewAI handles the orchestration. It is faster to prototype than LangGraph but gives less control over exact execution flow. The March 2026 AutoGen vs CrewAI vs AgentOps comparison at f3fundit identifies CrewAI as the fastest path to a working multi-agent prototype for teams new to the pattern — 'no manual glue code, no brittle cron jobs, just workflow logic that adapts.'
AutoGen (Microsoft) focuses on conversational multi-agent patterns — agents that communicate through conversation rather than explicit task handoff. This maps well to workflows that genuinely benefit from agents debating or iterating on outputs: a code generator paired with a code reviewer, a proposal writer paired with a critical analysis agent. AutoGen's integration with the Microsoft Agent Framework makes it the default in enterprise Azure deployments.
Hermes Agent supports multi-agent coordination through subagent delegation: the primary agent can spawn up to 3 concurrent subagents, pass them structured tasks, and aggregate their outputs. This is not a separate framework layer — it is built into the core agent's tool set. For Hermes OS users, this means a scheduled research task can automatically spawn a browser agent, a summarization agent, and a synthesis agent running in parallel, aggregated by the primary agent. No additional framework configuration required.
Orchestration patterns and when to use each
Sequential pipeline: Agent A produces output, passes it to Agent B, which passes to Agent C. The simplest pattern. Use it for workflows where each stage depends on the previous output: data extraction → cleaning → analysis → report. Error isolation is clean — if stage B fails, stages A and C are unaffected. LangGraph implements this as a simple linear graph.
Parallel execution: Multiple agents run concurrently on independent subtasks. An orchestrator assigns N tasks, waits for all N to complete, and synthesizes results. Use it for tasks that can be parallelized without dependency: N research agents each covering one competitor, results aggregated by the orchestrator. Hermes's 3-concurrent-subagent limit covers most practical parallel research use cases. LangGraph implements this via conditional parallel branches.
Hierarchical (supervisor pattern): A supervisor agent decomposes a complex task, assigns subtasks to worker agents, reviews their outputs, and either accepts results or sends workers back for revision. This is the most powerful pattern for complex reasoning tasks but also the most expensive — the supervisor runs inference on all worker outputs plus the original task, compounding token costs significantly. Promethium's March 2026 multi-agent platform comparison notes: 'unlike single-agent deployments optimized for response latency and context efficiency, multi-agent systems demand orchestration overhead management, unified context sharing, inter-agent communication protocols, and distributed governance.'
Event-driven (reactive): Agents trigger in response to conditions rather than running on fixed schedules or direct invocation. An alert agent triggers a research agent when a competitor page changes; a research agent triggers a reporting agent when it crosses a quality threshold. Hermes's event hooks (gateway hooks and plugin hooks) support this pattern — gateway hooks fire on every incoming/outgoing message, plugin hooks intercept tool calls for conditional routing.
When multi-agent is not the right answer
Multi-agent architectures add orchestration overhead, debugging complexity, and cost. They are the right choice when a single agent's context window genuinely cannot hold the full task, when different subtasks benefit from different model configurations, or when parallel execution would reduce total runtime for a time-sensitive workflow.
They are the wrong choice when: the task can be accomplished in a single coherent agent loop (most tasks can), you are still debugging the single-agent version (add complexity only after the simple version works), or cost is constrained (orchestration overhead means 15x more tokens for multi-agent versus single-chat).
The practical guidance from Gartner's 2026 enterprise AI report: start with a single agent. Add specialization only when you have specific evidence that the single-agent version hits a capability or context limit that a second specialized agent would solve. Most teams that jumped straight to multi-agent architectures in 2025 rebuilt simpler single-agent versions after finding the overhead and debugging complexity exceeded the benefit for their actual workloads.
What multi-agent systems cost in practice
Multi-agent costs scale with the orchestration pattern. A simple sequential pipeline costs roughly 3-5x a single agent (N agents, each doing a fraction of the total work, plus orchestration inference). A parallel research system with 5 concurrent agents costs roughly 5-8x (5 parallels workers but shared synthesis). A hierarchical supervisor system with 3 worker agents costs 10-15x a single agent — the supervisor runs inference on all worker outputs, compounding the token bill substantially.
Real-world example from the Promethium multi-agent platform comparison: a medium developer team saw costs spike from $1,200/month to $4,800/month when they expanded from single-agent to a multi-agent system across search, chatbot, and internal tooling — without a unified monitoring layer to see where the costs were coming from. The fix: centralized cost visibility, model routing (cheaper models for simpler subtasks), and explicit per-agent token budgets.
On Hermes OS, multi-agent workloads via subagent delegation are covered by the same API key and instance. The hosting cost does not change — only the API token consumption increases based on how many subagents are running and for how long. The Command plan (16 vCPU, 32 GB RAM) is designed for teams running 10+ concurrent agent profiles with coordination between them.