OpenAI Computer Use With Steel

Pair OpenAI Computer Use with Steel sessions for stable browsers, replay-ready evidence, and managed anti-bot tooling without rewiring your agent loop.

Keep OpenAI's Computer Use loop exactly as it is and move the browser work to Steel. Create a Steel session, hand its Computer API to the computer-preview tool, and you get sub-second startup plus up to 24 hour runtimes without touching your GPT-4o prompts or safety rails.

Steel adds the things OpenAI leaves to you: observability, anti-bot controls, managed proxies, viewer links, and guaranteed cleanup. Instead of babysitting local Chrome or hastily provisioned VMs, hand Computer Use actions to a Steel session so the Responses API keeps reasoning while Steel keeps the Chromium instance reliable, traceable, and shareable with the rest of your team.

What stays the same

Computer Use concernWhat you keepNotes
Task prompts and reasoningSame GPT-4o computer-use-preview calls, same system prompt, same responses.create payloadSteel never touches your OpenAI credentials
Tool contractcomputer-preview tool stays the single tool in your tools listYou only swap the backend implementation for clicks and screenshots
Safety workflowExisting safety acknowledgements and human review gatingKeep auto acknowledgement or manual prompts mapped to OpenAI's pending checks
Hosting and orchestrationYour Python or Node runtime and queues stay intactSteel is just an API client alongside openai

What Steel adds

Steel surfaceWhy it matters for Computer UseHow to wire it
Session lifecycleSub-second startup and 24 hour caps let long reasoning loops finish without relaunching Chromesession = client.sessions.create({dimensions:{width:1280,height:768}}) then client.sessions.release(session.id) in finally
ObservabilityLive viewer URL plus replay keeps every action reviewable and easy to forwardLog session.session_viewer_url with the OpenAI response ID
Computer APIDeterministic mapping for click, type, scroll, wait, and screenshot actionsForward each computer_call action to client.sessions.computer(session.id, body) and return data:image/png;base64,... strings
Anti-bot and proxiesManaged residential proxies and CAPTCHA solving reduce false positives that stall reasoning loopsPass use_proxy, region, and enable the CAPTCHAs API before high friction sites
Human-in-loop evidenceViewers, agent logs, and replay exports keep approvals honestPair critical actions with manual reviewer instructions plus the Steel viewer link

Minimal integration path

  1. Install steel-sdk, openai, and python-dotenv or dotenv depending on runtime, then load STEEL_API_KEY, OPENAI_API_KEY, and TASK from .env.
  2. Create a Steel client and session with the viewport you expect Computer Use to draw against:
    import { Steel } from "steel-sdk";
    const steel = new Steel({ steelAPIKey: process.env.STEEL_API_KEY! });
    const session = await steel.sessions.create({
      dimensions: { width: 1280, height: 768 },
      blockAds: true,
      timeout: 900_000,
    });
    console.log(`Live viewer: ${session.sessionViewerUrl}`);
  3. Define the single Computer Use tool exactly as before:
    const tools = [{
      type: "computer-preview",
      display_width: 1280,
      display_height: 768,
      environment: "browser",
    }];
  4. When the Responses API returns a computer_call, map it to Steel's Computer API and return the screenshot back to OpenAI:
    const screenshot = await steel.sessions.computer(session.id, body);
    return {
      type: "computer_call_output",
      call_id: item.call_id,
      output: {
        type: "input_image",
        image_url: `data:image/png;base64,${screenshot.base64_image}`,
      },
    };
  5. Mirror the same helper structure in Python (identical to the Steel quickstart) if you prefer that runtime:
    steel = Steel(steel_api_key=os.getenv("STEEL_API_KEY"))
    session = steel.sessions.create(dimensions={"width": 1280, "height": 768})
    resp = steel.sessions.computer(session.id, action="click_mouse", coordinates=[640, 360], screenshot=True)
    output = {
        "type": "computer_call_output",
        "call_id": item["call_id"],
        "output": {"type": "input_image", "image_url": f"data:image/png;base64,{resp.base64_image}"},
    }
  6. Always release the session (success or failure) so replays finish uploading and concurrency slots reopen:
    try:
        result = agent.execute_task(TASK)
    finally:
        steel.sessions.release(session.id)

Pair Computer Use with Steel observability

SignalSteel hookWhy it matters
Live viewer linksession.session_viewer_urlWatch the agent in real time, hand it to a human approver mid-run
Replay URLSame viewer URL after releaseShare evidence when the computer-use loop misfires or needs auditing
Agent logsclient.sessions.logs.list(session.id)Store log excerpts next to OpenAI response payloads so you can diff retries
CAPTCHA statusclient.sessions.captchas.status(session.id)Keep the LLM idle while Steel routes a CAPTCHA to a solver, then resume once it clears
Release metricsTrack sessions.release success per jobPrevent orphaned sessions from burning plan caps

Fit and trade-offs

Works best for

  • Teams already on the OpenAI Responses API who just need a managed browser with replay evidence.
  • Queues where GPT-4o handles reasoning but browser automation keeps flaking because Chrome dies mid task.
  • Workflows that require human approvals or auditing; Steel's viewer and replay URL close the loop.

Not yet ideal when

  • You need offline or desktop environments; this path is browser-only.
  • Runs exceed the 24 hour Steel session ceiling or require more than hundreds of concurrent sessions without Enterprise limits raised.
  • Your org cannot grant the computer-use-preview capability in OpenAI yet; Steel cannot substitute for that access.

Go-live checklist

  • .env checked into your secrets manager with valid Steel and OpenAI keys plus TASK default.
  • Logging includes session ID, viewer URL, OpenAI response ID, and a boolean for sessions.release success.
  • Dimensions, proxies, and CAPTCHA flags aligned with your target site list before letting agents loose.
  • Manual reviewers know they can open the viewer link to approve or stop sensitive steps.
  • Quickstarts from docs.steel.dev/integrations/openai-computer-use run end to end at least once in both TS and Python so future edits stay grounded.

Next step: wire one Computer Use task into a Steel session, watch the replay, then add CAPTCHA tooling before you scale the queue. Humans use Chrome. Agents use Steel.