Human-in-the-Loop Browser Agents Without Losing State

Add human approvals to browser agents without restarting the session that already holds state.

Keep the original session alive, hand reviewers the exact same viewport, and resume automation in place. Steel's live debugUrl embeds stream the running browser over WebRTC, so approvals happen inside the workflow instead of in a detached screenshot or rebuilt session.

That means no more clearing cookies, replaying logins, or cloning state between tools. You pause the automation, surface the interactive embed with interactive=true&showControls=true, log who touched it, then resume the same session ID that was already authenticated.

Short answer

If this is trueDo this inside SteelState impact
Humans must enter credentials, OTP, or payment infoPause the session, share the debugUrl with interactive=true so they type directly into the live browserCookies, storage, and network stack stay exactly where the automation left them (<1s startup, up to 24h lifespan).
An agent needs approval before risky actions (refunds, deletes)Gate the action on an approval event, keep the session open, and require a reviewer banner + log entry before resumingYou never recycle the session, so DOM state and auth survive the pause.
You owe audit evidence after handoffSave the same session's hls replay and attach the approval logReplay shows what the reviewer saw, plus who resumed and when.

Why approvals usually break state

Most teams still punt approvals to a different tool: send a screenshot to Slack, ask a human to replicate the steps, then rerun the job. That nukes the state three times over: cookies expire, CAPTCHAs reappear, and long forms lose unsaved data. When the workflow touches production accounts, starting a fresh session also means handing real credentials to a person or a vault you don't trust yet.

Steel treats approvals like any other session lifecycle step. Sessions cold start in under a second and can stay alive for 24 hours, so you can halt automation without releasing or recreating the browser. Instead of replaying your way back to the risky step, you stream the live browser to the reviewer and give them temporary control.

Control surfaces that keep humans in the loop

SurfaceSteel primitiveWhat it solves
Interactive live viewsession.debugUrl + interactive=true (WebRTC, 25 fps)Reviewer sees the real DOM, not static captures, and can scroll, click, and enter URLs with full fidelity.
Guided navigatorshowControls=true (legacy headless UI) or your own chromeGives reviewers forward/back controls so they don't ask automation to rewind.
Scoped takeover windowApplication-layer ACL that wraps the unauthenticated debug URLAnyone with the raw URL can act on the session; wrap it in your auth and track approved_by before enabling interactivity.
Evidence captureGET /v1/sessions/{id}/hls for MP4/HLS replayAttach the approval to a replay so you can prove what happened later.
Read-only fallbackinteractive=false when you just need someone to watchKeeps observers from mutating state when they only need visibility.
  1. Detect the checkpoint. Instrument your agent to emit an approval-required event (login, payment, destructive action, CAPTCHA). Do not release the Steel session yet.
  2. Freeze automation, keep the ID. Store the session ID and context in your queue. Steel leaves the browser running (default idle timeout is 5 minutes; send a heartbeat if the review might take longer).
  3. Notify the reviewer with a live embed. Render an iframe like the snippet below, wrapped in your auth. Display a banner so they know they're touching production state.
const ApprovalEmbed = ({ debugUrl }: { debugUrl: string }) => (
  <iframe
    src={`${debugUrl}?interactive=true&showControls=true`}
    title="Steel approval session"
    style={{ width: '100%', height: '640px', border: 'none' }}
    allow="clipboard-read; clipboard-write"
  />
);
  1. Track who took control. Before you flip interactive on, require the reviewer to confirm identity. Save { sessionId, approver, action, timestamp } to your audit log.
  2. Resume or roll back. Once the reviewer clicks “Continue,” your agent reconnects to the same session via Playwright/Puppeteer and finishes the workflow. If they decline, call sessions.release() so credentials don't linger.

Safeguards and limits

  • Guard the debug URL. It is intentionally unauthenticated for fast embeds. Always proxy it through your app or sign short-lived URLs so only reviewers gain access.
  • Remember idle timeouts. Headful sessions stay alive for 24h but idle timers default to 5 minutes. Send lightweight actions (page.waitForTimeout(0), heartbeats) if an approval might take longer.
  • Plan for concurrency. Steel Local handles roughly one live session; managed approvals across teams usually need Steel Cloud so 100+ concurrent sessions, Credentials API, and Files API are available.
  • Snapshot before risky edits. Grab an HLS replay pointer or page screenshot before handing off so you can diff what changed.
  • Annotate trust boundaries. Interactive embeds mean another human can run arbitrary navigation. Restrict clipboard permissions, mask sensitive UI, or drop to read-only when the reviewer only needs to watch.

When Steel fits and when it does not

Use this pattern when:

  • You already rely on Steel sessions for automation and need occasional human intervention without scrapping context.
  • The workflow mixes bots and humans on sensitive actions (finance ops, legal filings, vendor portals) and you need a single audit trail.
  • You want reviewers to see exactly what the agent sees, including anti-bot prompts or vendor UI bugs.

Look elsewhere or extend it when:

  • Compliance forbids unauthenticated video streams entirely, so run the embed inside your own zero-trust frame or keep everything on Steel Local inside your VPC until legal approves a managed plan.
  • You need fully custom reviewer tooling (annotations, chat, multi-user co-browsing). Steel streams the browser; collaboration UI is still on you.
  • You cannot afford an operator to babysit approvals; in that case build deterministic policies instead of half-hearted checkpoints.

Next steps

Humans use Chrome. Agents use Steel.