FabricFabric
Browser

Browser

Fabric Agents ships with a built-in Chromium browser the agent can drive — navigate pages, interact with elements, read the DOM, take screenshots, inspect network traffic.

Fabric Agents has a real browser inside it. Not a headless one, not a subprocess — an actual Chromium window the agent can see and drive. Ask it to "log into this admin panel and pull the users table", "fill out this form", "take a screenshot of the live dashboard", and the browser opens, works through the steps, and reports back.

This section has three pages:

  • Overview — you're reading it. What the browser is, when it opens, how the agent drives it, ownership and lifecycle.
  • Examples — recipes for common workflows: logins, forms, scraping, screenshots, waiting for dynamic content.
  • API discovery — inspecting network requests to find the JSON APIs a page uses, so the agent can skip the UI and talk to the backend directly.

What it is

  • Chromium, embedded via Electron. Real rendering, real cookies, real JavaScript.
  • Persistent session — cookies and local storage survive between turns and even between app restarts. If you log in once, you stay logged in.
  • Chrome DevTools Protocol under the hood. That's how the agent performs clicks, fills fields, and reads the accessibility tree. CDP is also how API discovery works.
  • One browser window per session, created on demand when the agent calls open.

When it opens

The browser doesn't auto-launch when you start a session. It opens when the agent calls the open command — typically because you asked it to do something web-related.

By default, the window opens in the background so the agent can work without taking over your screen. Pass the --foreground flag (the agent decides when to, usually when it needs to show you progress) to bring the window to the front.

While the agent is active, a small overlay appears at the top of the browser window with an indicator and a "pause" affordance — you can see what the agent is doing at any moment.

What the agent can do

The agent-facing browser tool exposes a small vocabulary of commands:

CategoryCommands
Navigationopen, navigate, back, forward, reload
Inspectionsnapshot (returns accessibility tree with refs like @e1), find, console, network, downloads
Interactionclick, fill, select, drag, key, paste, upload
Canvas-levelclick-at x y, drag x1 y1 x2 y2 for pages without an accessible DOM (Sheets, Figma, charts)
Capturescreenshot, screenshot --annotated, screenshot-region --ref @e12
Executionevaluate <js> for running arbitrary JavaScript in the page
Waitingwait network-idle, wait selector, wait text, wait url
Lifecyclerelease (hand control back to you), close (destroy the window), hide
Clipboardget-clipboard, set-clipboard

Every inspection command returns structured data the agent can reason over — snapshot returns an accessibility tree with semantic roles (button, link, textbox) and references (@e1, @e2, …) the agent uses in subsequent click, fill, select commands.

The Examples page walks through common patterns.

Session ownership

Each browser window is owned by the session that opened it. While the session is processing, nobody else can send commands to it. When the session finishes its turn:

  • The overlay clears.
  • The window stays open unless the agent explicitly closes it.
  • The next message in the same session re-binds ownership automatically.

If the agent in session A opens a browser, and you start talking to session B, session B can't take over that window. Session B can open its own browser independently — there's no hard limit, but one-per-session is the usual cadence.

Remote workspaces drive your local browser

When you connect to a remote workspace, the agent runs on the server — but the browser still opens on your desktop. The remote agent's open, click, fill, screenshot, and other commands are bridged over the same connection back to the Chromium window on your machine, so you watch the automation happen locally and screenshots come back to the chat exactly as they do for a local session.

Two commands behave differently over the bridge, for safety:

  • upload is not available — a remote agent can't reach files on your local disk to attach them.
  • evaluate (running arbitrary JavaScript in the page) is off by default. Enable it with the local allow remote evaluate setting if you trust the remote workspace and need it.

If no desktop client capable of running the browser is connected when a remote agent tries to open one, the agent gets a clear message telling it to open the workspace from the Fabric Agent desktop app and retry, rather than a silent failure.

Browser tabs are per-workspace

Each browser window is stamped with the workspace it was opened in. The tab strip and status badge in a workspace only show that workspace's browser tabs — a window opened by a session in another workspace stays out of view. Tabs you open yourself from the top bar belong to the workspace you opened them in, and a remote agent's tabs appear in the workspace that's connected to that server. Windows still run in parallel under the hood; this is a visibility filter to keep each workspace's tab strip relevant, not a security sandbox.

Explore mode

Browser commands are allowed in Explore (safe) mode. They don't touch your filesystem, don't write credentials to disk, and can't escape the sandbox — so they pass the safe-mode check.

That's useful: you can run read-only research on the live web, with the browser at full fidelity, without worrying about the agent doing anything irreversible on your machine.

Writes through the browser (filling and submitting forms that change server state) are technically possible in Explore mode because the browser is considered non-destructive from Fabric Agents' perspective. If that's a concern for a particular workflow, switch to Ask-to-Edit mode so every form submit still prompts you.

Persistent state

  • Cookies — kept across sessions. Logging into a site once keeps you logged in.
  • Local storage / IndexedDB — persistent too.
  • Download history — visible via downloads.
  • Console output — the last 500 or so messages are kept in memory for console to surface.
  • Network log — the last 500 requests per browser instance. No request bodies or response bodies captured — just the URL, method, status, and timing. See API discovery.

To fully reset state, close the browser and open it again; that creates a fresh instance with no cookies.

When you'd use the browser

  • Scrape a site with no API — an admin panel, an internal tool, a legacy app with HTML-only data.
  • Fill multi-step forms — compliance paperwork, travel bookings, account setup flows.
  • Take screenshots for documentation — a specific UI state, a rendered chart, a dashboard snapshot.
  • Verify a web app after a deploy — "navigate to /status, wait for the health block to be green, take a screenshot, attach to the PR".
  • Test a login flow — without stashing credentials in a CI secret manager.
  • Examples — practical recipes for the common workflows above.
  • API discovery — let the agent find the page's underlying API and skip the browser entirely when it pays off.
  • Permissions — the mode model the browser sits inside.
  • Sources — for sites that do expose an API, a source is usually better than browser automation.

On this page