Browser examples

Practical recipes for the Fabric Agents browser tool — logging in, filling forms, extracting data, taking screenshots, and waiting for dynamic content.

The agent drives the built-in browser with a small command vocabulary (navigate, snapshot, click, fill, evaluate, screenshot, wait, …). You don't need to learn the exact syntax — you describe what you want, the agent picks the commands.

This page shows what the conversations actually look like so you know when to reach for the browser, and when something else is a better fit.

Logging in

Tell the agent what you want; it opens the browser, navigates, finds the email and password fields, fills them, and submits.

"Log into the admin panel at https://admin.example.com with my username alex@example.com. I'll paste the password when you ask for it."

The agent's flow typically looks like:

open
navigate https://admin.example.com/login
snapshot — finds the accessibility tree and picks out refs for the email field (@e1), password (@e2), and submit button (@e3).
fill @e1 alex@example.com
Asks you for the password in chat. You paste it. The agent does fill @e2 <password>.
click @e3
wait network-idle — waits for the login redirect to settle.
snapshot — confirms you're on the dashboard.

Cookies persist. Next time you ask to do something on admin.example.com, you're already logged in.

Tip: never paste passwords into the chat for a site you don't want logged in your session transcript. Use the browser's autofill (the agent opens the browser; you log in manually in the window; the agent resumes with snapshot once you're in). This is why the browser window is a real window, not headless.

Filling a form

"Go to our travel booking tool and start a new request for a trip to Berlin next Tuesday through Friday, economy, no hotel."

open, navigate, snapshot.
fill @e5 "Berlin" for the destination.
select @e7 "economy" for the fare class.
fill @e9 "2026-05-05" for the outbound date.
fill @e10 "2026-05-08" for the return date.
click @e12 (uncheck the hotel option).
snapshot — confirms the form is filled right.
Pauses. "Ready to submit?" — you approve.
click @e15 — submit.

The agent won't click Submit without asking in Ask-to-Edit or Explore mode. In Execute mode it runs straight through; keep a cautious mode for anything with real-world consequences (payments, bookings).

Extracting structured data

The browser is fine for one-off scraping, but for a structured table the combo of snapshot + evaluate is surprisingly clean:

"Pull the open issues assigned to me from our internal tracker at https://issues.internal/mine and give me a data table."

navigate https://issues.internal/mine.
wait selector ".issue-row" — waits for the list to render.

evaluate with a short JavaScript snippet that reads the DOM and returns JSON:

[...document.querySelectorAll('.issue-row')].map(r => ({
  id:       r.querySelector('.issue-id').textContent.trim(),
  title:    r.querySelector('.issue-title').textContent.trim(),
  priority: r.querySelector('.priority').textContent.trim(),
  age:      r.querySelector('.age').textContent.trim(),
}))

Calls transform_data with the result. You see a sortable, filterable data table.

For anything you want to do repeatedly, though, read the API discovery page — catching the underlying JSON API and hitting that directly is faster and more reliable than DOM scraping.

Taking screenshots

"Take a screenshot of the staging dashboard at https://staging.example.com/dashboard at 1440 × 900 and attach it to this thread."

open, navigate.
wait network-idle — waits for the dashboard to finish loading.
screenshot --png — a full-page PNG.
Attachment appears in the chat; you drag it into a PR description or a doc.

Variants:

screenshot --annotated — labels every interactive element with its @eN ref. Useful for documentation or for prompting the agent about a specific button later.
screenshot-region --ref @e12 --padding 16 — crops around a specific element with a little padding. Good for "show just the chart".
screenshot-region --selector ".chart-container" — same idea using a CSS selector.

Waiting for dynamic content

Modern apps don't render everything up-front. Four wait forms cover the cases:

Command	Waits until
`wait network-idle <ms>`	All in-flight requests settle, plus a short quiet period.
`wait selector "<css>" <ms>`	A selector appears in the DOM.
`wait text "<string>" <ms>`	A given string is visible in the page text.
`wait url "<substring>" <ms>`	The current URL contains the substring.

Default timeout is 10 seconds. Past that, the command errors and the agent either retries with a longer timeout or asks what to do.

"Navigate to /login, log in, and wait until the dashboard is showing."

navigate https://example.com/login.
Fills form, clicks submit.
wait url "/dashboard" 8000 — waits for the redirect.
wait text "Welcome" 5000 — waits for the greeting that confirms the render finished.

Canvas-level apps (Sheets, Figma, charts)

Some pages have no meaningful accessibility tree. Google Sheets is the classic example: the entire spreadsheet is canvas-painted pixels. For those, the agent falls back to pixel-level interaction:

click-at x y — click at a specific pixel.
drag x1 y1 x2 y2 — drag from one pixel to another.
screenshot — capture for the agent to reason over visually.

Pixel interaction is brittle (zoom levels and window sizes change coordinates) so the agent uses it as a last resort. For Sheets specifically, reading via the Google Sheets API source is almost always better.

Handling popups and redirects

Redirects — just use wait url or wait network-idle after a click. The browser follows redirects transparently.
Popups — window.open() popups aren't always accessible. If a workflow requires them, you can often substitute direct navigation (bypass the popup entirely).
Native browser dialogs (alerts, confirms) — not currently interceptable. Work around by using evaluate to override window.alert / window.confirm before the triggering action.

When not to use the browser

The service has an API. Add it as an API source — far faster and more reliable than DOM automation.
The service has an MCP server. Even better — add it as an MCP source.
You need to scrape at scale or on a schedule. Use the API, or the browser's network inspector to find one.

Reach for the browser when you genuinely can't avoid a UI — admin panels, legacy apps, flows that require JavaScript interaction, anything visual like screenshots.

Browser overview — what the browser is and how it's owned per-session.
API discovery — catch the page's internal API and skip the DOM.
Sources — the usually-better alternative for programmatic access.

On this page