Your agent can actually use the internet.

Running browser automation locally means your laptop has to stay on, your VPN has to stay connected, and you have to babysit the process. It is a fragile, exhausting setup.

Hermes OS deploys your agent to a persistent cloud server with a full browser environment pre-configured. Your agent can browse, click, fill forms, extract data, and interact with any website — 24 hours a day, from a stable cloud IP.

What browser automation actually enables

The most common agent tasks people run with browser access: competitive monitoring (checking competitor pricing or feature pages on a schedule), research briefs (collecting content from a defined list of sources and summarizing it), lead enrichment (looking up a company or person from publicly available pages), and form submission automation for repetitive workflows.

Less obvious but equally useful: the agent can maintain a logged-in session on a website. If you have a tool that lacks an API but has a web interface, the agent can interact with it directly — exporting a report, updating a record, checking a status. This covers a large surface area of tools that developers often have to scrape around.

The agent also uses screenshots as a verification step. Before taking an action on a page, it can take a screenshot, analyze it, and confirm it is on the right page and in the right state. This catches the common failure mode where a site's layout has changed and a previous CSS selector no longer applies.

The infrastructure the browser runs in

Hermes v0.5.0 supports four browser backends, selectable per task or globally: Browserbase (managed cloud browsers with residential IPs, anti-detection fingerprinting, and CAPTCHA solving), Browser Use cloud (Hermes's own cloud browser service, optimized for agent workflows), local Chrome via Chrome DevTools Protocol (CDP) for users who want to connect a browser instance they control, and local Chromium headless for fully self-contained execution.

On Hermes OS cloud hosting, Browserbase is the default backend — it handles bot detection better than a raw cloud IP and includes SSRF safeguards that block agent-initiated browser requests to internal cloud metadata endpoints, preventing server-side request forgery attacks. The SSRF protection is particularly important for multi-tenant environments where agents run alongside other services.

For tasks that maintain state across multiple pages — a multi-step form, a research workflow that follows links — the browser session persists for the duration of the task and closes cleanly when complete. Screenshots taken during the task are logged to task history and visible in the dashboard.

IP address and anti-bot considerations

The agent runs from a stable cloud IP at a Tier-1 data center. For most standard websites — news sites, corporate pages, SaaS pricing pages, public databases — this works without any special configuration. The browser sends standard Chromium headers and behaves like a real user session.

Some sites use aggressive anti-bot systems (Cloudflare's higher-level bot protection, PerimeterX, Akamai Bot Manager) that can detect cloud IP ranges and headless browsers regardless of header spoofing. For these sites, results vary. The agent will tell you when it hits a challenge page rather than silently scraping wrong data.

If you are targeting sites with known bot detection, configure residential proxy credentials in the agent's settings. The proxy configuration accepts any SOCKS5 or HTTP proxy endpoint and routes browser traffic through it.

Combining browser automation with scheduled tasks

Browser automation and scheduling are designed to work together. A daily competitive monitoring brief, for example, is a scheduled task that triggers browser navigation to each target URL, extracts the relevant data, compares it to a stored baseline, and sends you a summary over Telegram or email if anything changed.

You can also build conditional schedules — run a deeper browser research pass only when a trigger condition is met, rather than on a fixed interval. For lightweight monitoring that just checks for a changed value, the agent can use a simpler HTTP request before spinning up the full browser, keeping API token usage down for high-frequency checks.

What it does not do

The browser automation is not JavaScript injection or a security tool. The agent navigates as a user would — it cannot bypass authentication it does not have credentials for, access data behind user-specific sessions it has not been given access to, or bypass server-side access controls.

It also does not work for sites that stream content exclusively through native apps with no web interface. If there is no publicly accessible URL for the data you want, browser automation cannot reach it.

What's included

Four browser backends: Browserbase cloud, Browser Use cloud, Chrome CDP, local Chromium
Runs 24/7 from cloud infrastructure — not your laptop
Autonomous web research, form filling, and data extraction
Screenshot verification before and after actions
Session persistence for multi-step flows
SSRF safeguards blocking internal network access from browser tasks
Residential proxy support for sites with strict bot detection
Vision paste: send any screenshot from clipboard directly to agent
Combine with scheduled tasks for timed automation

Common questions

What browser does the Hermes agent use for automation?

Hermes agents use a pre-configured headless Chromium browser. It runs inside the isolated container on your Hermes OS server.

Can my Hermes agent log in to websites and maintain sessions?

Yes. The agent maintains browser sessions across tasks and handles authentication flows, cookies, and session management.

Will browser automation work with sites that block bots?

For most standard websites, yes. For sites with aggressive anti-bot protection (Cloudflare enterprise tiers, PerimeterX), results vary. Residential proxy credentials can be configured for these cases.

Can I see screenshots from what the agent did in the browser?

Yes. The agent logs browser actions including screenshots to the task history in the dashboard. You can review exactly what the agent saw and did on each run.

Does browser automation increase API costs significantly?

Screenshots are the main cost multiplier — each screenshot sent to the model for analysis uses tokens. For long research tasks that process many pages, this adds up. Text-only extraction runs are much cheaper. You can configure tasks to limit screenshot usage to verification steps only.

Deploy in 5 minutes.

7-day money-back guarantee. BYO AI key. From $9.99/mo.

Start Now

Persistent Memory Scheduled Tasks & Cron No Docker Required Blog: 7 things your agent can automate overnight Blog: How to self-host Hermes Agent Compare Alternatives