A Production Checklist for Browser Agents

You are ready for production when browser agents survive concurrency, anti-bot pressure, and human review without losing state. That happens only after you wire evidence, approvals, recovery hooks, and plan-tier limits into the run, not when the happy path passes on localhost.

This checklist keeps teams honest: it groups the work into prepare, run, and operate stages so you can see what is missing before traffic spikes or an auditor asks for proof.

Production checklist

1. Prepare your control plane (before traffic)

Item	Why it matters	How to verify
Plan tier sized for real load (Steel Local ~1 session, Steel Cloud 100+)	Keeps concurrency, RPS, and proxy budgets aligned with actual workflows instead of starving jobs on day one	Compare target parallel sessions and RPS with `Pricing/Limits`, bake alerts at 80 percent of plan cap
Session lifetime budget (24 hour ceiling, adjustable timeout per run)	Prevents zombie browsers while ensuring long flows finish under the allowed window	Store timeout per workflow, confirm Sessions API `timeout` matches longest job, add release reminder to job teardown
Proxy and CAPTCHA coverage	Most production sites rotate bot defenses faster than shipping cycles	List required regions, confirm managed proxies or BYO pool per region, toggle `solveCaptcha` only where policy allows
Secrets, profiles, and credentials policy	Auth reuse fails without profile storage and credential custody	Decide which runs need Profiles API vs stateless sessions, store credential owner, rotate tokens on schedule
Environment parity	Local Playwright stacks hide TLS, codec, and CPU limits that show up in Steel Cloud	Rebuild representative workflows against Steel sessions weekly, diff telemetry vs local

2. Run workflows (during execution)

Item	Why it matters	How to verify
Stage timers for startup, first action, completion	Latency drift is the first sign you are about to breach SLAs	Emit timers per session lifecycle stage, export scoreboard daily
Selector and action contracts	Fuzzy selectors create nondeterministic failures that hide in retries	Track selectors in version control, gate releases on selector diff review
Manual approval schema (who, why, elapsed)	Every pause is a production incident without context	Store approvals alongside sessions (reason, operator, resumed session ID), alert when manual rate >5 percent
Evidence coverage (logs plus replay for every failure)	Without proof, you cannot debug or satisfy auditors	Require replay URL and agent log link before marking jobs complete, block deploys when evidence coverage <100 percent
Release discipline	Sessions bill until released even if the job failed early	Call `sessions.release` on success paths and `releaseAll` on incident cleanup; monitor orphan count

3. Operate and recover (after execution)

Item	Why it matters	How to verify
Plan-cap saturation watch	Reliability issues often start as exhausted concurrency pools	Alert when any plan metric hits 90 percent, route to the owner who can upgrade or shift load
Replay and log review cadence	Keeps regressions visible instead of tribal memory	Run a 10 minute daily ritual: export segmented scoreboard, sample replays for every red slice, log actions
Retry strategy with idempotency keys	Production retries must not double-charge or re-open tickets	Store workflow ID and attempt count, refuse retries without idempotency metadata
Incident drill and on-call runbook	Agents touch money and accounts; recovery must be muscle memory	Keep an updated playbook with owner, escalation path, and checklist per failure mode
Dependency refresh (proxies, extensions, scripts)	Browser ecosystems drift; stale components are a silent outage	Track versions for anti-detect packages, extensions, and scripts, rotate on schedule

Red flags you cannot ignore

Signal	Why it is dangerous	Action
Failed session without replay or logs	You are blind during the postmortem and cannot prove what happened	Block deploys until evidence exports reach 100 percent again
Manual interventions above 5 percent of runs	Human approvals are masking automation gaps	Add approval reason taxonomy, route top reasons into engineering backlog
Selector churn on every release	Frontend changes are ahead of your maintenance loop	Freeze deploy until selectors are versioned, pair with DOM-change alerts
Idle sessions after workflow ends	You are paying for nothing and leaving state open	Run hourly `releaseAll` sweep, alert when orphan count exceeds 0
Geo or proxy mismatch	Wrong data or blocked runs despite "success" status	Pin required regions per workflow and audit proxy inventory weekly

Minimum viable production setup

Sessions API client with configurable timeout, proxy, and CAPTCHA params per workflow.
Scoreboard export that captures startup, first action, completion, plan-tier utilization, approval rate, and evidence coverage.
Structured manual approval and intervention log (operator, reason, resumed session ID, elapsed time).
Replay plus agent log storage tied to every failed run before incident close.
Release hooks: success path releases specific session, incident path calls releaseAll and posts orphan count.
Weekly parity run that compares Steel Cloud vs local automation stats, recorded in the artifact folder.

What Steel handles for you

Managed Sessions that start in under a second on Steel Cloud plus Profiles API for state reuse when auth matters.
Built-in managed proxies, CAPTCHA solving, and stealth layers so you do not maintain your own anti-bot stack.
Live view, replays, and Agent Logs for every session so evidence coverage is achievable.
Credentials and Files APIs so sensitive inputs stay vaulted while uploads and downloads remain scriptable.

Next step

Run your workflows against the Sessions API lifecycle doc, confirm every checklist item above has an owner, then cut the first production review using the artifact template you just filled. Read the session lifecycle and wire the release hooks before you add more traffic.

Humans use Chrome. Agents use Steel.