How the two models compare
Most AI SaaS products work one of two ways. In the first model, you pay the product company a flat subscription and they serve you AI from their accounts. The company buys tokens wholesale, marks them up, and the subscription fee includes AI usage. This is convenient, but you are paying the markup and you have limited visibility into how much you are actually using.
In the second model — BYO key — you create a developer account directly with the provider (Anthropic, OpenAI, Google, Mistral, or via OpenRouter as an aggregator), generate an API key, and paste it into the product. Your usage is billed directly by the provider. The product company charges only for the platform or infrastructure, not for AI tokens.
Hermes OS uses the second model. You pay $19-49/month for the managed hosting and dashboard. Every AI request your agent makes goes directly through your API key to the provider. We never see the content of those requests and have nothing to do with your token billing.
What this costs in practice
API pricing as of April 2026: Claude Haiku 4.5 costs $1 per million input tokens and $5 per million output tokens — Anthropic's fastest production model for high-frequency agent tasks. The Batch API cuts this 50%, to $0.50/$2.50 per MTok, for workloads that can tolerate a few hours of async processing. For a moderately active agent running 10-30 scheduled tasks per day, total API spend typically lands at $3-12/month on Haiku.
Claude Sonnet 4.6 ($3/$15 per MTok) covers most research, coding, and analysis tasks — it is the default choice for anything requiring sustained reasoning. With the 1 million token context window now available at standard pricing (no surcharge as of March 2026), long-context agent tasks cost the same per token as short ones. Agents doing heavy research or code generation at volume typically run $20-60/month at Sonnet-level. Opus 4.6 ($5/$25 per MTok, 1M context, 128K max output) is the ceiling tier — worth it for complex multi-step synthesis, not for monitoring or summarization tasks.
If you are on OpenAI, GPT-5 mini at $0.25/$2 per MTok is the cheapest capable option in the market. The combined cost of Hermes OS hosting plus your API usage is almost always lower than what AI SaaS products charge for equivalent functionality, because those products layer their own margin on top of provider pricing and cannot offer the same model flexibility.
Which provider to use
Anthropic's Claude family (Haiku 4.5 for speed, Sonnet 4.6 for reasoning, Opus 4.6 for depth) is the best default for most agent tasks. Claude's instruction-following is highly reliable for multi-step agent workflows — the models are trained specifically to follow structured procedures without going off-script. Hermes Agent's tool-calling format was designed alongside the Claude model family, so the two work particularly well together.
OpenAI's API offers GPT-5 and GPT-5 mini. GPT-5 mini at $0.25/$2 per MTok is the cheapest capable model in the current market — useful if you are running very high-frequency lightweight tasks where cost-per-request matters. Note: ChatGPT Plus ($20/mo) and ChatGPT Pro ($200/mo) subscriptions do not include API access. The API is a completely separate billing relationship with OpenAI at pay-per-token rates.
OpenRouter gives you access to 300+ models from 60+ providers under a single API key and credit balance. No monthly minimums — you top up credits and pay only for what you use. Useful for evaluation (try 5 models against the same task), for accessing open-weight models hosted by third parties, or for building agents that route different task types to different models. The auto-routing option selects from available models automatically, though manual model selection gives more predictable performance.
Privacy implications
With a managed service that controls the AI keys, your conversations and task data route through that service's infrastructure. With BYO key, your requests go directly from the agent to the provider — the hosting service is not in the path of your AI traffic.
This does not mean the data is private from the provider. Anthropic, OpenAI, and others have their own data handling policies, and API usage may be used for model improvement depending on your account settings. Check the provider's terms for your account type.
What it does mean: Hermes OS cannot read your agent's conversations, cannot log your task data, and has no access to your content. We see server metrics — CPU, memory, uptime — not content.