The Cloudflare Blog

Browser Run: now running on Cloudflare Containers, it’s faster and more scalable

Ruskin Constant — Wed, 13 May 2026 13:00:00 GMT

We’ve enabled higher usage limits, faster performance, and better reliability for Browser Run by rebuilding on top of Cloudflare’s Containers.

You can now spin up 60 browsers per minute via the Workers binding and run up to 120 concurrently — 4x the previous limit. Also, Quick Action response times dropped more than 50%. You don't need to change anything: these improvements are live today. On top of that, we’re shipping fixes and new features faster than before. Read on to learn how we did it and see the data.

Remind me: what is Browser Run?

Browser Run enables developers to programmatically control and interact with headless browser instances running on Cloudflare’s global network. That’s useful for end-to-end testing of web applications, securely investigating suspicious URLs, and leveraging how browsers can easily render PDF documents, amongst other quick actions like capturing screenshots and extracting content. More recently, it’s become a critical enabler of AI agents to interact with the web. We’re building Browser Run to be the go-to platform to responsibly utilize automated browsers securely at massive scale.

Outgrowing our bunk bed

Before adopting Cloudflare Containers, we shared infrastructure with Browser Isolation (BISO). While technically similar, BISO’s larger container images slowed startup and development. Crucially, BISO browsers lacked optimal global distribution, compromising resiliency and latency. Additionally, typical BISO users’ long, steady sessions clashed with Browser Run’s short, spiky usage, creating scaling bottlenecks and availability delays.

Thankfully, after much internal development, Cloudflare released Durable Object (DO)-enabled Containers open beta last year, meaning we were ready for a tentative adoption that ultimately benefited both product platforms. Like most successful product platforms, we’re committed to building on our own platform wherever feasible so that we can feel and fix any pain points ahead of any external customers.

The migration: Containers

We started a gradual migration by inserting a Worker in our incoming request paths to provide some Container-powered browsers to a handful of users alongside those from BISO. This dual support during development was key: it allowed us to compare performance, isolate implementation bugs and ultimately gain confidence in the benefits of the Container-driven approach.

Ramping up adoption, we first used the Container browsers for all of our Quick Actions endpoints, then for connections via the Workers browser binding on free accounts, followed by pay-as-you-go accounts in order to validate stability before we rolled it out to all remaining contract customers, ensuring a transition that required no action or existing worker redeployments from our customers.

Challenges: performance and scale bottlenecks

On our end, though, we faced a fresh set of challenges getting familiar with a novel, unstable early-stage Containers platform interface that was light on documentation, light on observability, and light on colleagues in an overlapping timezone. However, our feedback to our own teams as Customer Zero meant that we could provide a tight feedback loop leading to substantial upgrades that benefit our external customers too. Nevertheless, there was a lot of friction to overcome initially, most of which were to be expected for a closed beta in active development. Other hurdles to overcome were intrinsic to the new technical environment.

For example, once our browsers could run globally, our architecture had to adapt. DO-enabled Containers create a Durable Object as close to the incoming request as possible, but the connected Container may spin up on the other side of the world. This works fine for one-shot messages like "start my app," but when you're establishing a WebSocket between them and exchanging dozens of messages for a screenshot request, those extra milliseconds crossing the globe start adding up.

Our solution? Create regional pools of pre-warmed DO-backed browser containers to constrain the max distance (and hence max latency) between DOs and containers. When a request comes in, we pick a DO-container pair closest to the user within that region. This keeps latency low on both hops: user to DO, and DO to container. It adds a few more moving parts to our overall architecture, but we figured that was worthwhile so long as we had observability into the global state of each browser so that we could allocate and re-allocate capacity according to changing demand. A perfect use case for Workers KV…to a point.

Demand for our headless browsers has been ramping up since the beginning of last year. In short, AI agent builders discovered Browser Run and quickly brought request volumes outpacing our existing capacity. We quickly hit the limits of how quickly we could adjust our pool capacity to serve this new demand with a scalable approach. KV’s eventual consistency of around 30 seconds was becoming a bottleneck on our critical request path. You might check KV, see a container as "available," but by the time you route to it (30 seconds later), it's already claimed. That lag creates race conditions and overallocation of browsers, severely limiting how fast we could scale to meet demand spikes.

Migrating from KV to D1 + Queues

We previously stored each container state in KV. This meant that we could keep getting a minute old state due to cache TTL (recently KV changed the minimum cache TTL to 30 seconds, but even so that value is still too high).

We decided to migrate the container state into D1 instances instead. D1's transactional nature is a good fit here. Once we assign a browser to a user, it's exclusively theirs. Browsers are not shared resources. SQLite transactions ensure atomic assignment and prevent race conditions where two requests might claim the same browser simultaneously.

Here’s a simplified version of our browser acquisition query:

WITH candidate_pool AS (
    -- candidate pool logic to pick based on latency and other rules
)
UPDATE containers
SET status = 'picked'
WHERE sessionId IN (
    SELECT sessionId
    FROM candidate_pool
    ORDER BY RANDOM()
    LIMIT ?5
)
RETURNING data

We keep D1 shards per location and given that we may have several thousand containers running, and that each container needs to update its state every 5 seconds, we kept running into a problem: we would overload the database. For instance, if each write takes 1ms we can only write at most 1,000 times, which at one row per write would mean that we could only have 5,000 containers before overloading the database.

However, if we batch those writes, we can get much higher values, because batch writes are not significantly longer than individual ones, so we can increase the throughput in orders of magnitude. In our case, we use 100 row batches, which means we can now update a maximum of 500,000 containers per location. This headroom means capacity planning is no longer a bottleneck.

Currently, our P95 for batch write is 0.1ms!

To batch writes, we use Queues: every 5 seconds, each container computes its own state and adds it to its location queue. We then configure a worker consumer with 100 batch size and 1 second batch timeout:

{
    ...
    "queues": {
        "consumers": [
            {
                "queue": "production-core-containers-queue-weur",
                "max_batch_size": 100,
                "max_batch_timeout": 1,
                "max_retries": 1,
            },
            ...
        ]
        ...
    }
}

With this configuration, we achieve acceptable lag times well below 2 seconds. That said, queue backlogs can still cause stale state. When this happens, each region falls back to a designated backup region until the primary queue catches up.

Additional perks for quick actions

With dedicated infrastructure, we could now make upgrades to the browser container image without unwanted side effects or bloat for other products like BISO. This opened the door to optimize quick actions like screenshots and content extraction. Previously, our workers established a WebSocket to the remote browser and sent instructions one at a time: open a page, navigate to the URL, wait for it to load and take the screenshot. Each step had to be completed before the next could begin.

However, now we send all parameters in a single HTTP request directly to the container, and the entire flow executes internally without any back-and-forth between the worker and browser.

Results: massive performance boost and increased limits

We’ve seen a sharp decrease in average quick-action response time, as users are able to get what they need from a browser session in less time: less time waiting for browsers to be ready and faster processing of their DevTools Protocol messages.

Overcoming our real-time state management at this new scale meant we could spend more time in the playground, discovering and cooking up new features such as our recently launched /crawl endpoint.

Better browser flexibility

We also benefitted from another important perk by leaving behind shared Browser Isolation containers: faster upgrades.

When our browsers ran on shared product infrastructure, upgrading Chrome meant coordinating across multiple teams and products, each with their own roadmap and priorities. However, now that we run our own container image, we can upgrade at a faster tempo. For example, WebGL, a much-requested feature, is now available for browser-based rendering along with WebMCP (Model Context Protocol for the web) which enables new agentic interaction patterns. Both are made possible because we can control the browser version and flags without unwanted side effects in other Cloudflare products.

In a nutshell, we’re just getting started with unleashing the power of browsers at scale, especially for agentic development. We hope you’re diving in too — check out our docs.

Get started

Browser Run is available on all Workers plans. Start with the quick start guide, explore the Quick Actions, or try the /crawl endpoint to deeply extract data from any webpage, following links across the site.

Building AI agents? Check out our Agents SDK with built-in Browser Run support.

Agents have their own computers with Sandboxes GA

Kate Reznykova — Mon, 13 Apr 2026 13:08:35 GMT

When we launched Cloudflare Sandboxes last June, the premise was simple: AI agents need to develop and run code, and they need to do it somewhere safe.

If an agent is acting like a developer, this means cloning repositories, building code in many languages, running development servers, etc. To do these things effectively, they will often need a full computer (and if they don’t, they can reach for something lightweight!).

Many developers are stitching together solutions using VMs or existing container solutions, but there are lots of hard problems to solve:

Burstiness - With each session needing its own sandbox, you often need to spin up many sandboxes quickly, but you don’t want to pay for idle compute on standby.
Quick state restoration - Each session should start quickly and re-start quickly, resuming past state.
Security - Agents need to access services securely, but can’t be trusted with credentials.
Control - It needs to be simple to programmatically control sandbox lifecycle, execute commands, handle files, and more.
Ergonomics - You need to give a simple interface for both humans and agents to do common operations.

We’ve spent time solving these issues so you don’t have to. Since our initial launch we’ve made Sandboxes an even better place to run agents at scale. We’ve worked with our initial partners such as Figma, who run agents in containers with Figma Make:

“Figma Make is built to help builders and makers of all backgrounds go from idea to production, faster. To deliver on that goal, we needed an infrastructure solution that could provide reliable, highly-scalable sandboxes where we could run untrusted agent- and user-authored code. Cloudflare Containers is that solution.”
- Alex Mullans, AI and Developer Platforms at Figma

We want to bring Sandboxes to even more great organizations, so today we are excited to announce that Sandboxes and Cloudflare Containers are both generally available.

Let’s take a look at some of the recent changes to Sandboxes:

Secure credential injection lets you make authenticated calls without the agent ever having credential access
PTY support gives you and your agent a real terminal
Persistent code interpreters give your agent a place to execute stateful Python, JavaScript, and TypeScript out of the box
Background processes and live preview URLs provide a simple way to interact with development servers and verify in-flight changes
Filesystem watching improves iteration speed as agents make changes
Snapshots let you quickly recover an agent's coding session
Higher limits and Active CPU Pricing let you deploy a fleet of agents at scale without paying for unused CPU cycles

Sandboxes 101

Before getting into some of the recent changes, let’s quickly look at the basics.

A Cloudflare Sandbox is a persistent, isolated environment powered by Cloudflare Containers. You ask for a sandbox by name. If it's running, you get it. If it's not, it starts. When it's idle, it sleeps automatically and wakes when it receives a request. It’s easy to programmatically interact with the sandbox using methods like exec, gitClone, writeFile and more.

import { getSandbox } from "@cloudflare/sandbox";
export { Sandbox } from "@cloudflare/sandbox";

export default {
  async fetch(request: Request, env: Env) {
    // Ask for a sandbox by name. It starts on demand.
    const sandbox = getSandbox(env.Sandbox, "agent-session-47");

    // Clone a repository into it.
    await sandbox.gitCheckout("https://github.com/org/repo", {
      targetDir: "/workspace",
      depth: 1,
    });

    // Run the test suite. Stream output back in real time.
    return sandbox.exec("npm", ["test"], { stream: true });
  },
};

As long as you provide the same ID, subsequent requests can get to this same sandbox from anywhere in the world.

Secure credential injection

One of the hardest problems in agentic workloads is authentication. You often need agents to access private services, but you can't fully trust them with raw credentials.

Sandboxes solve this by injecting credentials at the network layer using a programmable egress proxy. This means that sandbox agents never have access to credentials and you can fully customize auth logic as you see fit:

class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "my-internal-vcs.dev": (request, env, ctx) => {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}

For a deep dive into how this works — including identity-aware credential injection, dynamically modifying rules, and integrating with Workers bindings — read our recent blog post on Sandbox auth.

A real terminal, not a simulation

Early agent systems often modeled shell access as a request-response loop: run a command, wait for output, stuff the transcript back into the prompt, repeat. It works, but it is not how developers actually use a terminal.

Humans run something, watch output stream in, interrupt it, reconnect later, and keep going. Agents benefit from that same feedback loop.

In February, we shipped PTY support. A pseudo-terminal session in a Sandbox, proxied over WebSocket, compatible with xterm.js.

Just call sandbox.terminal to serve the backend:

// Worker: upgrade a WebSocket connection into a live terminal session
export default {
  async fetch(request: Request, env: Env) {
    const url = new URL(request.url);
    if (url.pathname === "/terminal") {
      const sandbox = getSandbox(env.Sandbox, "my-session");
      return sandbox.terminal(request, { cols: 80, rows: 24 });
    }
    return new Response("Not found", { status: 404 });
  },
};

And use xterm addon to call it from the client:

// Browser: connect xterm.js to the sandbox shell
import { Terminal } from "xterm";
import { SandboxAddon } from "@cloudflare/sandbox/xterm";

const term = new Terminal();
const addon = new SandboxAddon({
  getWebSocketUrl: ({ origin }) => `${origin}/terminal`,
});

term.loadAddon(addon);
term.open(document.getElementById("terminal-container")!);
addon.connect({ sandboxId: "my-session" });

This allows agents and developers to use a full PTY to debug those sessions live.

Each terminal session gets its own isolated shell, its own working directory, its own environment. Open as many as you need, just like you would on your own machine. Output is buffered server-side, so reconnecting replays what you missed.

A code interpreter that remembers

For data analysis, scripting, and exploratory workflows, we also ship a higher-level abstraction: a persistent code execution context.

The key word is “persistent.” Many code interpreter implementations run each snippet in isolation, so state disappears between calls. You can't set a variable in one step and read it in the next.

Sandboxes allow you to create “contexts” that persist state. Variables and imports persist across calls the same way they would in a Jupyter notebook:

// Create a Python context. State persists for its lifetime.
const ctx = await sandbox.createCodeContext({ language: "python" });

// First execution: load data
await sandbox.runCode(`
  import pandas as pd
  df = pd.read_csv('/workspace/sales.csv')
  df['margin'] = (df['revenue'] - df['cost']) / df['revenue']
`, { context: ctx });

// Second execution: df is still there
const result = await sandbox.runCode(`
  df.groupby('region')['margin'].mean().sort_values(ascending=False)
`, { context: ctx, onStdout: (line) => console.log(line.text) });

// result contains matplotlib charts, structured json output, and Pandas tables in HTML

Start a server. Get a URL. Ship it.

Agents are more useful when they can build something and show it to the user immediately. Sandboxes support background processes, readiness checks, and preview URLs. This lets an agent start a development server and share a live link without leaving the conversation.

// Start a dev server as a background process
const server = await sandbox.startProcess("npm run dev", {
  cwd: "/workspace",
});

// Wait until the server is actually ready — don't just sleep and hope
await server.waitForLog(/Local:.*localhost:(\d+)/);

// Expose the running service with a public URL
const { url } = await sandbox.exposePort(3000);

// url is a live public URL the agent can share with the user
console.log(`Preview: ${url}`);

With waitForPort() and waitForLog(), agents can sequence work based on real signals from the running program instead of guesswork. This is much nicer than a common alternative, which is usually some version of sleep(2000) followed by hope.

Watch the file system and react immediately

Modern development loops are event-driven. Save a file, rerun the build. Edit a config, restart the server. Change a test, rerun the suite.

We shipped sandbox.watch() in March. It returns an SSE stream backed by native inotify, the kernel mechanism Linux uses for filesystem events.

import { parseSSEStream, type FileWatchSSEEvent } from '@cloudflare/sandbox';

const stream = await sandbox.watch('/workspace/src', {
  recursive: true,
  include: ['*.ts', '*.tsx']
});

for await (const event of parseSSEStream(stream)) {
  if (event.type === 'modify' && event.path.endsWith('.ts')) {
    await sandbox.exec('npx tsc --noEmit', { cwd: '/workspace' });
  }
}

This is one of those primitives that quietly changes what agents can do. An agent that can observe the filesystem in real time can participate in the same feedback loops as a human developer.

Waking up quickly with snapshots

Imagine a (human) developer working on their laptop. They git clone a repo, run npm install, write code, push a PR, then close their laptop while waiting for code review. When it’s time to resume work, they just re-open the laptop and continue where they left off.

If an agent wants to replicate this workflow on a naive container platform, you run into a snag. How do you resume where you left off quickly? You could keep a sandbox running, but then you pay for idle compute. You could start fresh from the container image, but then you have to wait for a long git clone and npm install.

Our answer is snapshots, which will be rolling out in the coming weeks.

A snapshot preserves a container's full disk state, OS config, installed dependencies, modified files, data files and more. Then it lets you quickly restore it later.

You can configure a Sandbox to automatically snapshot when it goes to sleep.

class AgentDevEnvironment extends Sandbox {
  sleepAfter = "5m";
  persistAcrossSessions = {type: "disk"}; // you can also specify individual directories
}

You can also programmatically take a snapshot and manually restore it. This is useful for checkpointing work or forking sessions. For instance, if you wanted to run four instances of an agent in parallel, you could easily boot four sandboxes from the same state.

class AgentDevEnvironment extends Sandbox {}

async forkDevEnvironment(baseId, numberOfForks) {
  const baseInstance = await getSandbox(baseId);
  const snapshotId = await baseInstance.snapshot();

  const forks = Array.from({ length: numberOfForks }, async (_, i) => {
    const newInstance = await getSandbox(`${baseId}-fork-${i}`);
    return newInstance.start({ snapshot: snapshotId });
  });

  await Promise.all(forks);
}

Snapshots are stored in R2 within your account, giving you durability and location-independence. R2's tiered caching system allows for fast restores across all of Region: Earth.

In future releases, live memory state will also be captured, allowing running processes to resume exactly where they left off. A terminal and an editor will reopen in the exact state they were in when last closed.

If you are interested in restoring session state before snapshots go live, you can use the backup and restore methods today. These also persist and restore directories using R2, but are not as performant as true VM-level snapshots. Though they still can lead to considerable speed improvements over naively recreating session state.

^{Booting a sandbox, cloning ‘axios’, and npm installing takes 30 seconds. Restoring from a backup takes two seconds.}

Stay tuned for the official snapshot release.

Higher limits and Active CPU Pricing

Since our initial launch, we’ve been steadily increasing capacity. Users on our standard pricing plan can now run 15,000 concurrent instances of the lite instance type, 6,000 instances of basic, and over 1,000 concurrent larger instances. Reach out to run even more!

We also changed our pricing model to be more cost effective running at scale. Sandboxes now only charge for actively used CPU cycles. This means that you aren’t paying for idle CPU while your agent is waiting for an LLM to respond.

This is what a computer looks like

Nine months ago, we shipped a sandbox that could run commands and access a filesystem. That was enough to prove the concept.

What we have now is different in kind. A Sandbox today is a full development environment: a terminal you can connect a browser to, a code interpreter with persistent state, background processes with live preview URLs, a filesystem that emits change events in real time, egress proxies for secure credential injection, and a snapshot mechanism that makes warm starts nearly instant.

When you build on this, a satisfying pattern emerges: agents that do real engineering work. Clone a repo, install it, run the tests, read the failures, edit the code, run the tests again. The kind of tight feedback loop that makes a human engineer effective — now the agent gets it too.

We're at version 0.8.9 of the SDK. You can get started today:

npm i @cloudflare/sandbox@latest

Dynamic, identity-aware, and secure Sandbox auth

Mike Nomitch — Mon, 13 Apr 2026 13:00:00 GMT

As AI Large Language Models and harnesses like OpenCode and Claude Code become increasingly capable, we see more users kicking off sandboxed agents in response to chat messages, Kanban updates, vibe coding UIs, terminal sessions, GitHub comments, and more.

The sandbox is an important step beyond simple containers, because it gives you a few things:

Security: Any untrusted end user (or a rogue LLM) can run in the sandbox and not compromise the host machine or other sandboxes running alongside it. This is traditionally (but not always) accomplished with a microVM.
Speed: An end user should be able to pick up a new sandbox quickly and restore the state from a previously used one quickly.
Control: The trusted platform needs to be able to take actions within the untrusted domain of the sandbox. This might mean mounting files in the sandbox, or controlling which requests access it, or executing specific commands.

Today, we’re excited to add another key component of control to our Sandboxes and all Containers: outbound Workers. These are programmatic egress proxies that allow users running sandboxes to easily connect to different services, add observability, and, importantly for agents, add flexible and safe authentication.

How it works

Here’s a quick look at adding a secret key to a header using an outbound Worker:

class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "github.com": (request, env, ctx) => {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}

Any time code running in the sandbox makes a request to “github.com”, the request is proxied through the handler. This allows you to do anything you want on each request, including logging, modifying, or cancelling it. In this case, we’re safely injecting a secret (more on this later). The proxy runs on the same machine as any sandbox, has access to distributed state, and can be easily modified with simple JavaScript.

We’re excited about all the possibilities this adds to Sandboxes, especially around authentication for agents. Before going into details, let’s back up and take a quick tour of traditional forms of auth, and why we think there’s something better.

Common auth for agentic workloads

The core issue with agentic auth is that we can’t fully trust the workload. While our LLMs aren’t nefarious (at least not yet), we still need to be able to apply protections to ensure they don’t use data inappropriately or take actions they shouldn’t.

A few common methods exist to provide auth to agents, and each has downsides:

Standard API tokens are the most basic method of authentication, typically injected into applications via environment variables or in mounted secrets files. This is the arguably simplest method, but least secure. You have to trust that the sandbox won’t somehow be compromised or accidentally exfiltrate the token while making a request. Since you can’t fully trust the agent, you’ll need to set up token expiry and rotation, which can be a hassle.

Workload identity tokens, such as OIDC tokens, can solve some of this pain. Rather than granting the agent a token with general permissions, you can grant it a token that attests its identity. Now, rather than the agent having direct access to some service with a token, it can exchange an identity token for a very short-lived access token. The OIDC token can be invalidated after a specific agent’s workflow completes, and expiry is easier to manage. One of the biggest downsides of workload identity tokens is the potential inflexibility of integrations. Many services don’t have first-class support for OIDC, so in order to get working integrations with upstream services, platforms will need to roll their own token-exchanging services. This makes adoption difficult.

Custom proxies provide maximum flexibility, and can be paired with workload identity tokens. If you can pass some or all of your sandbox egress through a trusted piece of code, you can insert whatever rules you need. Maybe the upstream service your agent is communicating with has a bad RBAC story, and it can’t provide granular permissions. No problem, just write the controls and permissions yourself! This is a great option for agents that you need to lock down with granular controls. However, how do you intercept all of a sandbox’s traffic? How do you set up a proxy that is dynamic and easily programmable? How do you proxy traffic efficiently? These aren’t easy problems to solve.

With those imperfect methods in mind, what does an ideal auth mechanism look like?

Ideally, it is:

Zero trust. No token is ever granted to an untrusted user for any amount of time.
Simple. Easy to author. Doesn’t involve a complex system of minting, rotating, and decrypting tokens.
Flexible. We don’t rely on the upstream system to provide the granular access we need. We can apply whatever rules we want.
Identity-aware. We can identify the sandbox making the call and apply specific rules for it.
Observable. We can easily gather information about what calls are being made.
Performant. We aren’t round-tripping to a centralized or slow source of truth.
Transparent. The sandboxed workload doesn’t have to know about it. Things just work.
Dynamic. We can change rules on the fly.

We believe outbound Workers for Sandboxes fit the bill on all of these. Let’s see how.

Outbound Workers in practice

Basics: restriction and observability

First, we’ll look at a very basic example: logging requests and denying specific actions.

In this case, we’ll use the outbound function, which intercepts all outgoing HTTP requests from the sandbox. With a few lines of JavaScript, it’s easy to ensure only GETs are made and log then deny any disallowed methods.

class MySandboxedApp extends Sandbox {
  static outbound = (req, env, ctx) => {
    // Deny any non-GET action and log
    if (req.method !== 'GET') {
      console.log(`Container making ${req.method} request to: ${req.url}`);
      return new Response('Not Allowed', { status: 405, statusText: 'Method Not Allowed'});
    }

    // Proceed if it is a GET request
    return fetch(req);
  };
}

This proxy runs on Workers and runs on the same machine as the sandboxed VM. Workers were built for quick response times, often sitting in front of cached CDN traffic, so additional latency is extremely minimal.

Because this is running on Workers, we get observability out of the box. You can view logs and outbound requests in the Workers dashboard or export them to your application performance monitoring tool of choice.

Zero trust credential injection

How would we use this to enforce a zero trust environment for our agent? Let’s imagine we want to make a request to a private GitHub instance, but we never want our LLM to access a private token.

We can use outboundByHost to define functions for specific domains or IPs. In this case, we’ll inject a protected credential if the domain is “my-internal-vcs.dev”. The sandboxed agent never has access to these credentials.

class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "my-internal-vcs.dev": (request, env, ctx) => {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}

It is also easy to conditionalize the response based on the identity of the container. You don’t have to inject the same tokens for every sandbox instance.

 static outboundByHost = {
  "my-internal-vcs.dev": (request, env, ctx) => {
    // note: KV is encrypted at rest and in transit
    const authKey = await env.KEYS.get(ctx.containerId);

    const requestWithAuth = new Request(request);
    requestWithAuth.headers.set("x-auth-token", authKey);
    return fetch(requestWithAuth);
  }
}

Using the Cloudflare Developer Platform

As you may have noticed in the last example, another major advantage of using outbound Workers is that it makes integration into the Workers ecosystem easier. Previously, if a user wanted to access R2, they would have to inject an R2 credential, then make a call from their container to the public R2 API. Same for KV, Agents, other Containers, other Worker services, etc.

Now, you just call any binding from your outbound Workers.

class MySandboxedApp extends Sandbox {
  static outboundByHost = {
    "my.kv": async (req, env, ctx) => {
      const key = keyFromReq(req);
      const myResult = await env.KV.get(key);
      return new Response(myResult);
    },
    "objects.cf": async (req, env, ctx) => {
      const prefix = ctx.containerId
      const path = pathFromRequest(req);
      const object = await env.R2.get(`${prefix}/${path}`);
      const myResult = await env.KV.get(key);
      return new Response(myResult);
    },
  };
}

Rather than parsing tokens and setting up policies, we can easily conditionalize access with code and whatever logic we want. In the R2 example, we also were able to use the sandbox’s ID to further scope access with ease.

Making controls dynamic

Networking control should also be dynamic. On many platforms, config for Container and VM networking is static, looking something like this:

{
  defaultEgress: "block",
  allowedDomains: ["github.com", "npmjs.org"]
}

This is better than nothing, but we can do better. For many sandboxes, we might want to apply a policy on start, but then override it with another once specific operations have been performed.

For instance, we can boot a sandbox, grab our dependencies via NPM and Github, and then lock down egress after that. This ensures that we open up the network for as little time as possible.

To achieve this, we can use outboundHandlers, which allows us to define arbitrary outbound handlers that can be applied programmatically using the setOutboundHandler method. Each of these also takes params, allowing you to customize behavior from code. In this case, we will allow some hostnames with the custom “allowHosts” policy, then turn off HTTP.

class MySandboxedApp extends Sandbox {
  static outboundHandlers = {
    async allowHosts(req, env, { params }) {
     const url = new URL(request.url);
     const allowedHostname = params.allowedHostnames.includes(url.hostname);

      if (allowedHostname) {
        return await fetch(newRequest);
      } else {
        return new Response(null, { status: 403, statusText: "Forbidden" });
      }
    }
    
    async noHttp(req) {
      return new Response(null, { status: 403, statusText: "Forbidden" });
    }
  }
}

async setUpSandboxes(req, env) {
  const sandbox = await env.SANDBOX.getByName(userId);
  await sandbox.setOutboundHandler("allowHosts", {
    allowedHostnames: ["github.com", "npmjs.org"]
  });
  await sandbox.gitClone(userRepoURL)
  await sandbox.exec("npm install")
  await sandbox.setOutboundHandler("noHttp");
}

This could be extended even further. Your agent might ask the end user a question like “Do you want to allow POST requests to cloudflare.com?” based on whatever tools it needs at that time. With dynamic outbound Workers, you can easily modify the sandbox rules on the fly to provide this level of control.

TLS support with MITM Proxying

To do anything useful with requests beyond allowing or denying them, you need to have access to the content. This means that if you’re making HTTPS requests, they need to be decrypted by the Workers proxy.

To achieve this, a unique ephemeral certificate authority (CA) and private key are created for each Sandbox instance, and the CA is placed into the sandbox. By default, sandbox instances will trust this CA, while standard container instances can opt into trusting it, for instance by calling sudo update-ca-certificates.

export class MyContainer extends Container {
  interceptHttps = true;
}

MyContainer.outbound = (req, env, ctx) => {
  // All HTTP(S) requests will trigger this hook.
  return fetch(req);
};

TLS traffic is proxied by a Cloudflare isolated network process by performing a TLS handshake. It creates a leaf CA from an ephemeral and unique private key and uses the SNI extracted in the ClientHello. It will then invoke in the same machine the configured Worker to handle the HTTPS request.

Our ephemeral private key and CA will never leave our container runtime sidecar process, and is never shared across other container sidecar processes.

With this in place, outbound Workers act as a truly transparent proxy. The sandbox doesn't need any awareness of specific protocols or domains — all HTTP and HTTPS traffic flows through the outbound handler for filtering or modification.

Under the hood

To enable the functionality shown above in both Container and Sandbox, we added new methods to the ctx.container object: interceptOutboundHttp and interceptOutboundHttps, which intercept outgoing requests on specific hostnames (with basic glob matching), IP ranges, and it can be used to intercept all outbound requests. These methods are called with a WorkerEntrypoint, which gets set up as the front door to the outbound Worker.

export class MyWorker extends WorkerEntrypoint {
 fetch() {
   return new Response(this.ctx.props.message);
 }
}

// ... inside your container DurableObject ...
this.ctx.container.start({ enableInternet: false });
const outboundWorker = this.ctx.exports.MyWorker({ props: { message: 'hello' } });
await this.ctx.container.interceptOutboundHttp('15.0.0.1:80', outboundWorker);

// From now on, all HTTP requests to 15.0.0.1:80 return "hello"
await this.waitForContainerToBeHealthy();

// You can decide to return another message now...
const secondOutboundWorker = this.ctx.exports.MyWorker({ props: { message: 'switcheroo' } });
await this.ctx.container.interceptOutboundHttp('15.0.0.1:80', secondOutboundWorker);
// all HTTP requests to 15.0.0.1 now show "switcheroo", even on connections that were
// open before this interceptOutboundHttp

// You can even set hostnames, CIDRs, for both IPv4 and IPv6
await this.ctx.container.interceptOutboundHttp('example.com', secondOutboundWorker);
await this.ctx.container.interceptOutboundHttp('*.example.com', secondOutboundWorker);
await this.ctx.container.interceptOutboundHttp('123.123.123.123/23', secoundOutboundWorker);

All proxying to Workers happens locally on the same machine that runs the sandbox VM. Even though communication between container and Worker is “authless”, it is secure.

These methods can be called at any time, before or after starting the container, even while connections are still open. Connections that send multiple HTTP requests will automatically pick up a new entrypoint, so updating outbound Workers will not break existing TCP connections or interrupt HTTP requests.

Local development with wrangler dev also has support for egress interception. To make it possible, we automatically spawn a sidecar process inside the local container’s network namespace. We called this sidecar component proxy-everything. Once proxy-everything is attached, it applies the appropriate TPROXY nftable rules, routing matching traffic from the local Container to workerd, Cloudflare’s open source JavaScript runtime, which runs the outbound Worker. This allows the local development experience to mirror what happens in prod, so testing and development remain simple.

Giving outbound Workers a try

If you haven’t tried Cloudflare Sandboxes, check out the Getting Started guide. If you are a current user of Containers or Sandboxes, start using outbound Workers now by reading the documentation and upgrading to @cloudflare/containers@0.3.0 or @cloudflare/sandbox@0.8.9.

Watch on Cloudflare TV

Introducing Moltworker: a self-hosted personal AI agent, minus the minis

Celso Martinho — Thu, 29 Jan 2026 14:00:00 GMT

Editorial note: As of January 30, 2026, Moltbot has been renamed to OpenClaw.

The Internet woke up this week to a flood of people buying Mac minis to run Moltbot (formerly Clawdbot), an open-source, self-hosted AI agent designed to act as a personal assistant. Moltbot runs in the background on a user's own hardware, has a sizable and growing list of integrations for chat applications, AI models, and other popular tools, and can be controlled remotely. Moltbot can help you with your finances, social media, organize your day — all through your favorite messaging app.

But what if you don’t want to buy new dedicated hardware? And what if you could still run your Moltbot efficiently and securely online? Meet Moltworker, a middleware Worker and adapted scripts that allows running Moltbot on Cloudflare's Sandbox SDK and our Developer Platform APIs.

A personal assistant on Cloudflare — how does that work?

Cloudflare Workers has never been as compatible with Node.js as it is now. Where in the past we had to mock APIs to get some packages running, now those APIs are supported natively by the Workers Runtime.

This has changed how we can build tools on Cloudflare Workers. When we first implemented Playwright, a popular framework for web testing and automation that runs on Browser Rendering, we had to rely on memfs. This was bad because not only is memfs a hack and an external dependency, but it also forced us to drift away from the official Playwright codebase. Thankfully, with more Node.js compatibility, we were able to start using node:fs natively, reducing complexity and maintainability, which makes upgrades to the latest versions of Playwright easy to do.

The list of Node.js APIs we support natively keeps growing. The blog post “A year of improving Node.js compatibility in Cloudflare Workers” provides an overview of where we are and what we’re doing.

We measure this progress, too. We recently ran an experiment where we took the 1,000 most popular NPM packages, installed and let AI loose, to try to run them in Cloudflare Workers, Ralph Wiggum as a "software engineer" style, and the results were surprisingly good. Excluding the packages that are build tools, CLI tools or browser-only and don’t apply, only 15 packages genuinely didn’t work. That's 1.5%.

Here’s a graphic of our Node.js API support over time:

We put together a page with the results of our internal experiment on npm packages support here, so you can check for yourself.

Moltbot doesn’t necessarily require a lot of Workers Node.js compatibility because most of the code runs in a container anyway, but we thought it would be important to highlight how far we got supporting so many packages using native APIs. This is because when starting a new AI agent application from scratch, we can actually run a lot of the logic in Workers, closer to the user.

The other important part of the story is that the list of products and APIs on our Developer Platform has grown to the point where anyone can build and run any kind of application — even the most complex and demanding ones — on Cloudflare. And once launched, every application running on our Developer Platform immediately benefits from our secure and scalable global network.

Those products and services gave us the ingredients we needed to get started. First, we now have Sandboxes, where you can run untrusted code securely in isolated environments, providing a place to run the service. Next, we now have Browser Rendering, where you can programmatically control and interact with headless browser instances. And finally, R2, where you can store objects persistently. With those building blocks available, we could begin work on adapting Moltbot.

How we adapted Moltbot to run on us

Moltbot on Workers, or Moltworker, is a combination of an entrypoint Worker that acts as an API router and a proxy between our APIs and the isolated environment, both protected by Cloudflare Access. It also provides an administration UI and connects to the Sandbox container where the standard Moltbot Gateway runtime and its integrations are running, using R2 for persistent storage.

^{High-level architecture diagram of Moltworker.}

Let's dive in more.

AI Gateway

Cloudflare AI Gateway acts as a proxy between your AI applications and any popular AI provider, and gives our customers centralized visibility and control over the requests going through.

Recently we announced support for Bring Your Own Key (BYOK), where instead of passing your provider secrets in plain text with every request, we centrally manage the secrets for you and can use them with your gateway configuration.

An even better option where you don’t have to manage AI providers' secrets at all end-to-end is to use Unified Billing. In this case you top up your account with credits and use AI Gateway with any of the supported providers directly, Cloudflare gets charged, and we will deduct credits from your account.

To make Moltbot use AI Gateway, first we create a new gateway instance, then we enable the Anthropic provider for it, then we either add our Claude key or purchase credits to use Unified Billing, and then all we need to do is set the ANTHROPIC_BASE_URL environment variable so Moltbot uses the AI Gateway endpoint. That’s it, no code changes necessary.

Once Moltbot starts using AI Gateway, you’ll have full visibility on costs and have access to logs and analytics that will help you understand how your AI agent is using the AI providers.

Note that Anthropic is one option; Moltbot supports other AI providers and so does AI Gateway. The advantage of using AI Gateway is that if a better model comes along from any provider, you don’t have to swap keys in your AI Agent configuration and redeploy — you can simply switch the model in your gateway configuration. And more, you specify model or provider fallbacks to handle request failures and ensure reliability.

Sandboxes

Last year we anticipated the growing need for AI agents to run untrusted code securely in isolated environments, and we announced the Sandbox SDK. This SDK is built on top of Cloudflare Containers, but it provides a simple API for executing commands, managing files, running background processes, and exposing services — all from your Workers applications.

In short, instead of having to deal with the lower-level Container APIs, the Sandbox SDK gives you developer-friendly APIs for secure code execution and handles the complexity of container lifecycle, networking, file systems, and process management — letting you focus on building your application logic with just a few lines of TypeScript. Here’s an example:

import { getSandbox } from '@cloudflare/sandbox';
export { Sandbox } from '@cloudflare/sandbox';

export default {
  async fetch(request: Request, env: Env): Promise {
    const sandbox = getSandbox(env.Sandbox, 'user-123');

    // Create a project structure
    await sandbox.mkdir('/workspace/project/src', { recursive: true });

    // Check node version
    const version = await sandbox.exec('node -v');

    // Run some python code
    const ctx = await sandbox.createCodeContext({ language: 'python' });
    await sandbox.runCode('import math; radius = 5', { context: ctx });
    const result = await sandbox.runCode('math.pi * radius ** 2', { context: ctx });

    return Response.json({ version, result });
  }
};

This fits like a glove for Moltbot. Instead of running Docker in your local Mac mini, we run Docker on Containers, use the Sandbox SDK to issue commands into the isolated environment and use callbacks to our entrypoint Worker, effectively establishing a two-way communication channel between the two systems.

R2 for persistent storage

The good thing about running things in your local computer or VPS is you get persistent storage for free. Containers, however, are inherently ephemeral, meaning data generated within them is lost upon deletion. Fear not, though — the Sandbox SDK provides the sandbox.mountBucket() that you can use to automatically, well, mount your R2 bucket as a filesystem partition when the container starts.

Once we have a local directory that is guaranteed to survive the container lifecycle, we can use that for Moltbot to store session memory files, conversations and other assets that are required to persist.

Browser Rendering for browser automation

AI agents rely heavily on browsing the sometimes not-so-structured web. Moltbot utilizes dedicated Chromium instances to perform actions, navigate the web, fill out forms, take snapshots, and handle tasks that require a web browser. Sure, we can run Chromium on Sandboxes too, but what if we could simplify and use an API instead?

With Cloudflare’s Browser Rendering, you can programmatically control and interact with headless browser instances running at scale in our edge network. We support Puppeteer, Stagehand, Playwright and other popular packages so that developers can onboard with minimal code changes. We even support MCP for AI.

In order to get Browser Rendering to work with Moltbot we do two things:

First we create a thin CDP proxy (CDP is the protocol that allows instrumenting Chromium-based browsers) from the Sandbox container to the Moltbot Worker, back to Browser Rendering using the Puppeteer APIs.
Then we inject a Browser Rendering skill into the runtime when the Sandbox starts.

From the Moltbot runtime perspective, it has a local CDP port it can connect to and perform browser tasks.

Zero Trust Access for authentication policies

Next up we want to protect our APIs and Admin UI from unauthorized access. Doing authentication from scratch is hard, and is typically the kind of wheel you don’t want to reinvent or have to deal with. Zero Trust Access makes it incredibly easy to protect your application by defining specific policies and login methods for the endpoints.

^{Zero Trust Access Login methods configuration for the Moltworker application.}

Once the endpoints are protected, Cloudflare will handle authentication for you and automatically include a JWT token with every request to your origin endpoints. You can then validate that JWT for extra protection, to ensure that the request came from Access and not a malicious third party.

Like with AI Gateway, once all your APIs are behind Access you get great observability on who the users are and what they are doing with your Moltbot instance.

Moltworker in action

Demo time. We’ve put up a Slack instance where we could play with our own instance of Moltbot on Workers. Here are some of the fun things we’ve done with it.

We hate bad news.

Here’s a chat session where we ask Moltbot to find the shortest route between Cloudflare in London and Cloudflare in Lisbon using Google Maps and take a screenshot in a Slack channel. It goes through a sequence of steps using Browser Rendering to navigate Google Maps and does a pretty good job at it. Also look at Moltbot’s memory in action when we ask him the second time.

We’re in the mood for some Asian food today, let’s get Moltbot to work for help.

We eat with our eyes too.

Let’s get more creative and ask Moltbot to create a video where it browses our developer documentation. As you can see, it downloads and runs ffmpeg to generate the video out of the frames it captured in the browser.

Run your own Moltworker

We open-sourced our implementation and made it available at https://github.com/cloudflare/moltworker, so you can deploy and run your own Moltbot on top of Workers today.

The README guides you through the necessary setup steps. You will need a Cloudflare account and a Workers Paid plan to access Sandbox Containers; however, all other products are either entirely free (like AI Gateway) or include generous free tiers that allow you to get started and scale under reasonable limits.

Note that Moltworker is a proof of concept, not a Cloudflare product. Our goal is to showcase some of the most exciting features of our Developer Platform that can be used to run AI agents and unsupervised code efficiently and securely, and get great observability while taking advantage of our global network.

Feel free to contribute to or fork our GitHub repository; we will keep an eye on it for a while for support. We are also considering contributing upstream to the official project with Cloudflare skills in parallel.

Conclusion

We hope you enjoyed this experiment, and we were able to convince you that Cloudflare is the perfect place to run your AI applications and agents. We’ve been working relentlessly trying to anticipate the future and release features like the Agents SDK that you can use to build your first agent in minutes, Sandboxes where you can run arbitrary code in an isolated environment without the complications of the lifecycle of a container, and AI Search, Cloudflare’s managed vector-based search service, to name a few.

Cloudflare now offers a complete toolkit for AI development: inference, storage APIs, databases, durable execution for stateful workflows, and built-in AI capabilities. Together, these building blocks make it possible to build and run even the most demanding AI applications on our global edge network.

If you're excited about AI and want to help us build the next generation of products and APIs, we're hiring.

Deploy your own AI vibe coding platform — in one click!

Ashish Kumar Singh — Tue, 23 Sep 2025 14:00:00 GMT

It’s an exciting time to build applications. With the recent AI-powered "vibe coding" boom, anyone can build a website or application by simply describing what they want in a few sentences. We’re already seeing organizations expose this functionality to both their users and internal employees, empowering anyone to build out what they need.

Today, we’re excited to open-source an AI vibe coding platform, VibeSDK, to enable anyone to run an entire vibe coding platform themselves, end-to-end, with just one click.

Want to see it for yourself? Check out our demo platform that you can use to create and deploy applications. Or better yet, click the button below to deploy your own AI-powered platform, and dive into the repo to learn about how it’s built.

Deploying VibeSDK sets up everything you need to run your own AI-powered development platform:

Integration with LLM models to generate code, build applications, debug errors, and iterate in real-time, powered by Agents SDK.
Isolated development environments that allow users to safely build and preview their applications in secure sandboxes.
Infinite scale that allows you to deploy thousands or even millions of applications that end users deploy, all served on Cloudflare’s global network
Observability and caching across multiple AI providers, giving you insight into costs and performance with built-in caching for popular responses.
Project templates that the LLM can use as a starting point to build common applications and speed up development.
One-click project export to the user’s Cloudflare account or GitHub repo, so users can take their code and continue development on their own.

Building an AI vibe coding platform from start to finish

Step 0: Get started immediately with VibeSDK

We’re seeing companies build their own AI vibe coding platforms to enable both internal and external users. With a vibe coding platform, internal teams like marketing, product, and support can build their own landing pages, prototypes, or internal tools without having to rely on the engineering team. Similarly, SaaS companies can embed this capability into their product to allow users to build their own customizations.

Every platform has unique requirements and specializations. By building your own, you can write custom logic to prompt LLMs for your specific needs, giving your users more relevant results. This also grants you complete control over the development environment and application hosting, giving you a secure platform that keeps your data private and within your control.

We wanted to make it easy for anyone to build this themselves, which is why we built a complete platform that comes with project templates, previews, and project deployment. Developers can repurpose the whole platform, or simply take the components they need and customize them to fit their needs.

Step 1: Finding a safe, isolated environment for running untrusted, AI generated code

AI can now build entire applications, but there's a catch: you need somewhere safe to run this untrusted, AI-generated code. Imagine if an LLM writes an application that needs to install packages, run build commands, and start a development server — you can't just run this directly on your infrastructure where it might affect other users or systems.

With Cloudflare Sandboxes, you don't have to worry about this. Every user gets their own isolated environment where the AI-generated code can do anything a normal development environment can do: install npm packages, run builds, start servers, but it's fully contained in a secure, container-based environment that can't affect anything outside its sandbox.

The platform assigns each user to their own sandbox based on their session, so that if a user comes back, they can continue to access the same container with their files intact:

// Creating a sandbox client for a user session
const sandbox = getSandbox(env.Sandbox, sandboxId);

// Now AI can safely write and execute code in this isolated environment
await sandbox.writeFile('app.js', aiGeneratedCode);
await sandbox.exec('npm install express');
await sandbox.exec('node app.js');

Step 2: Generating the code

Once the sandbox is created, you have a development environment that can bring the code to life. VibeSDK orchestrates the whole workflow from writing the code, installing the necessary packages, and starting the development server. If you ask it to build a to-do app, it will generate the React application, write the component files, run bun install to get the dependencies, and start the server, so you can see the end result.

Once the user submits their request, the AI will generate all the necessary files, whether it's a React app, Node.js API, or full-stack application, and write them directly to the sandbox:

async function generateAndWriteCode(instanceId: string) {
    // AI generates the application structure
    const aiGeneratedFiles = await callAIModel("Create a React todo app");
    
    // Write all generated files to the sandbox
    for (const file of aiGeneratedFiles) {
        await sandbox.writeFile(
            `${instanceId}/${file.path}`,
            file.content
        );
        // User sees: "✓ Created src/App.tsx"
        notifyUser(`✓ Created ${file.path}`);
    }
}

To speed this up even more, we’ve provided a set of templates, stored in an R2 bucket, that the platform can use and quickly customize, instead of generating every file from scratch. This is just an initial set, but you can expand it and add more examples.

Step 3: Getting a preview of your deployment

Once everything is ready, the platform starts the development server and uses the Sandbox SDK to expose it to the internet with a public preview URL which allows users to instantly see their AI-generated application running live:

// Start the development server in the sandbox
const processId = await sandbox.startProcess(
    `bun run dev`, 
    { cwd: instanceId }
);

// Create a public preview URL 
const preview = await sandbox.exposePort(3000, { 
    hostname: 'preview.example.com' 
});

// User instantly gets: "https://my-app-xyz.preview.example.com"
notifyUser(`✓ Preview ready at: ${preview.url}`);

Step 4: Test, log, fix, repeat

But that’s not all! Throughout this process, the platform will capture console output, build logs, and error messages and feed them back to the LLM for automatic fixes. As the platform makes any updates or fixes, the user can see it all happening live — the file editing, installation progress, and error resolution.

Deploying applications: From Sandbox to Region Earth

Once the application is developed, it needs to be deployed. The platform packages everything in the sandbox and then uses a separate specialized "deployment sandbox" to deploy the application to Cloudflare Workers. This deployment sandbox runs wrangler deploy inside the secure environment to publish the application to Cloudflare's global network.

Since the platform may deploy up to thousands or millions of applications, Workers for Platforms is used to deploy the Workers at scale. Although all the Workers are deployed to the same Namespace, they are all isolated from one another by default, ensuring there’s no cross-tenant access. Once deployed, each application receives its own isolated Worker instance with a unique public URL like my-app.vibe-build.example.com.

async function deployToWorkersForPlatforms(instanceId: string) {
    // 1. Package the app from development sandbox
    const devSandbox = getSandbox(env.Sandbox, instanceId);
    const packagedApp = await devSandbox.exec('zip -r app.zip .');
    
    // 2. Transfer to specialized deployment sandbox
    const deploymentSandbox = getSandbox(env.DeployerServiceObject, 'deployer');
    await deploymentSandbox.writeFile('app.zip', packagedApp);
    await deploymentSandbox.exec('unzip app.zip');
    
    // 3. Deploy using Workers for Platforms dispatch namespace
    const deployResult = await deploymentSandbox.exec(`
        bunx wrangler deploy \\\\
        --dispatch-namespace vibe-sdk-build-default-namespace
    `);
    
    // Each app gets its own isolated Worker and unique URL
    // e.g., https://my-app.example.com
    return `https://${instanceId}.example.com`;
}

Exportable Applications

The platform also allows users to export their application to their own Cloudflare account and GitHub repo, so they can continue the development on their own.

Observability, caching, and multi-model support built in!

It's no secret that LLM models have their specialties, which means that when building an AI-powered platform, you may end up using a few different models for different operations. By default, VibeSDK leverages Google’s Gemini models (gemini-2.5-pro, gemini-2.5-flash-lite, gemini-2.5-flash) for project planning, code generation, and debugging.

VibeSDK is automatically set up with AI Gateway, so that by default, the platform is able to:

Use a unified access point to route requests across LLM providers, allowing you to use models from a range of providers (OpenAI, Anthropic, Google, and others)
Cache popular responses, so when someone asks to "build a to-do list app", the gateway can serve a cached response instead of going to the provider (saving inference costs)
Get observability into the requests, tokens used, and response times across all providers in one place
Track costs across models and integrations

Open sourced, so you can build your own Platform!

We're open-sourcing VibeSDK for the same reason Cloudflare open-sourced the Workers runtime — we believe the best development happens in the open. That's why we wanted to make it as easy as possible for anyone to build their own AI coding platform, whether it's for internal company use, for your website builder, or for the next big vibe coding platform. We tied all the pieces together for you, so you can get started with the click of a button instead of spending months figuring out how to connect everything yourself. To learn more, check out our reference architecture for vibe coding platforms.

Partnering with OpenAI to bring their new open models onto Cloudflare Workers AI

Michelle Chen — Tue, 05 Aug 2025 21:05:00 GMT

OpenAI has just announced their latest open-weight models — and we are excited to share that we are working with them as a Day 0 launch partner to make these models available in Cloudflare's Workers AI. Cloudflare developers can now access OpenAI's first open model, leveraging these powerful new capabilities on our platform. The new models are available starting today at @cf/openai/gpt-oss-120b and @cf/openai/gpt-oss-20b.

Workers AI has always been a champion for open models and we’re thrilled to bring OpenAI's new open models to our platform today. Developers who want transparency, customizability, and deployment flexibility can rely on Workers AI as a place to deliver AI services. Enterprises that need the ability to run open models to ensure complete data security and privacy can also deploy with Workers AI. We are excited to join OpenAI in fulfilling their mission of making the benefits of AI broadly accessible to builders of any size.

The technical model specs

The OpenAI models have been released in two sizes: a 120 billion parameter model and a 20 billion parameter model. Both of them are Mixture-of-Experts models – a popular architecture for recent model releases – that allow relevant experts to be called for a query instead of running through all the parameters of the model. Interestingly, these models run natively at an FP4 quantization, which means that they have a smaller GPU memory footprint than a 120 billion parameter model at FP16. Given the quantization and the MoE architecture, the new models are able to run faster and more efficiently than more traditional dense models of that size.

These models are text-only; however, they have reasoning capabilities, tool calling, and two new exciting features with Code Interpreter and Web Search (support coming soon). We’ve implemented Code Interpreter on top of Cloudflare Containers in a novel way that allows for stateful code execution (read on below).

The model on Workers AI

We’re landing these new models with a few tweaks: supporting the new Responses API format as well as the historical Chat Completions API format (coming soon). The Responses API format is recommended by OpenAI to interact with their models, and we’re excited to support that on Workers AI.

If you call the model through:

Workers Binding, it will accept/return Responses API – env.AI.run(“@cf/openai/gpt-oss-120b”)
REST API on /run endpoint, it will accept/return Responses API – https://api.cloudflare.com/client/v4/accounts//ai/run/@cf/openai/gpt-oss-120b
REST API on new /responses endpoint, it will accept/return Responses API – https://api.cloudflare.com/client/v4/accounts//ai/v1/responses
REST API for OpenAI Compatible endpoint, it will return Chat Completions (coming soon)– https://api.cloudflare.com/client/v4/accounts//ai/v1/chat/completions

curl https://api.cloudflare.com/client/v4/accounts//ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CLOUDFLARE_API_KEY" \
  -d '{
    "model": "@cf/openai/gpt-oss-120b",
    "reasoning": {"effort": "medium"},
    "input": [
      {
        "role": "user",
        "content": "What are the benefits of open-source models?"
      }
    ]
  }'

Code Interpreter + Cloudflare Sandboxes = the perfect fit

To effectively answer user queries, Large Language Models (LLMs) often struggle with logical tasks such as mathematics or coding. Instead of attempting to reason through these problems, LLMs typically utilize a tool call to execute AI-generated code that solves these problems. OpenAI's new models are specifically trained for stateful Python code execution and include a built-in feature called Code Interpreter, designed to address this challenge.

We’re particularly excited about this. Cloudflare not only has an inference platform (Workers AI), but we also have an ecosystem of compute and storage products that allow people to build full applications on top of our Developer Platform. This means that we are uniquely suited to support the model’s Code Interpreter capabilities, not only for one-time code execution, but for stateful code execution as well.

We’ve built support for Code Interpreter on top of Cloudflare’s Sandbox product that allows for a secure environment to run AI-generated code. The Sandbox SDK is built on our latest Containers product and Code Interpreter is the perfect use case to bring all these products together. When you use Code Interpreter, we spin up a Sandbox container scoped to your session that stays alive for 20 minutes, so the code can be edited for subsequent queries to the model. We’ve also pre-warmed Sandboxes for Code Interpreter to ensure the fastest start up times.

We’ll be publishing an example of how you can use the gpt-oss model on Workers AI and Sandboxes with the OpenAI SDK to make calls to Code Interpreter on our Developer Docs.

Give it a try!

We’re beyond excited for OpenAI’s new open models, and we hope you are too. Super grateful to our friends from vLLM and HuggingFace for supporting efficient model serving on launch day. Read up on the Developer Docs to learn more about the details and how to get started on building with these new models and capabilities.

Containers are available in public beta for simple, global, and programmable compute

Gabi Villalonga Simón — Tue, 24 Jun 2025 16:00:22 GMT

We’re excited to announce that Cloudflare Containers are now available in beta for all users on paid plans.

You can now run new kinds of applications alongside your Workers. From media and data processing at the edge, to backend services in any language, to CLI tools in batch workloads — Containers open up a world of possibilities.

Containers are tightly integrated with Workers and the rest of the developer platform, which means that:

Your workflow stays simple: just define a Container in a few lines of code, and run wrangler deploy, just like you would with a Worker.
Containers are global: as with Workers, you just deploy to Region:Earth. No need to manage configs across 5 different regions for a global app.
You can use the right tool for the job: routing requests between Workers and Containers is easy. Use a Worker when you need to be ultra light-weight and scalable. Use a Container when you need more power and flexibility.
Containers are programmable: container instances are spun up on-demand and controlled by Workers code. If you need custom logic, just write some JavaScript instead of spending time chaining together API calls or writing Kubernetes operators.

Want to try it today? Deploy your first Container-enabled Worker:

A tour of Containers

Let’s take a deeper look at Containers, using an example use case: code sandboxing.

Let’s imagine that you want to run user-generated (or AI-generated) code as part of a platform you’re building. To do this, you want to spin up containers on demand. Each user needs their own isolated container, the users are distributed globally, and you need to start each container quickly so the users aren’t waiting.

You can set this up easily on Cloudflare Containers.

Configuring a Container

In your Worker, use the Container class and wrangler.jsonc to declare some basic configuration, such as your Container’s default port, a sleep timeout, and which image to use, then route to it via the Worker.

For each unique ID passed to the Container’s binding, Cloudflare will spin up a new Container instance and route requests to it. When a new instance is requested, Cloudflare picks the best location across our global network where we’ve pre-provisioned a ready-to-go container. This means that you can deploy a container close to an end user no matter where they are. And the initial container start takes just a few seconds. You don’t have to worry about routing, provisioning, or scaling.

This example Worker will route requests to a unique container instance for each sandbox ID given at the path /sandbox/ID and will be handled by standard Worker JavaScript otherwise:

export class MyContainer extends Container {
  defaultPort = 8080; // The default port for the container to listen on
  sleepAfter = '5m'; // Sleep the container if no requests are made in this timeframe
}

export default {
  async fetch(request, env) {
    const pathname = new URL(request.url).pathname;

    // handle request with an on-demand container instance
    if (pathname.startsWith('/sandbox/')) {
      const sessionId = pathname.split("/")[2]
      const containerInstance = getContainer(env.CONTAINER_SANDBOX, sessionId)
      return await containerInstance.fetch(request);
    }

    // handle request with my Worker code otherwise
    return myWorkerRequestHandler(request);
  },
};

Familiar and easy development workflow with `wrangler dev`

To configure which container image to use, you can provide an image URL in wrangler config or a path to a local Dockerfile.

This config tells wrangler to use a locally defined image:

"containers": [
  {
    "class_name": "ContainerSandbox",
    "image": "./Dockerfile",
    "max_instances": 80,
    "instance_type": "basic"
  }
]

When developing your application, you just run wrangler dev and the container image will be automatically built and routable via your local Worker. This makes it easy to iterate on container code while making changes to your Worker at the same time. When you want to rebuild your image, just press “R” on your keyboard from your terminal running wrangler dev, and the Container is rebuilt and restarted.

Shipping your Container-enabled Worker to production with `wrangler deploy`

When it’s time to deploy, just run wrangler deploy. Wrangler will push your image to your account, then it will be provisioned in various locations across Cloudflare’s global network.

You don’t have to worry about “artifact management”, or distribution, or auth, or jump through hoops to integrate your container with Workers. You just write your code and deploy it.

Observability is built-in

Once your Container is in production, you have the visibility you need into how things are going, with end-to-end observability.

From the Cloudflare dashboard, you can easily track status and resource usage across Container instances with built-in metrics:

And if you need to dive deeper, you can dig into logs, which will be retained in the Cloudflare UI for seven days or pushed to an external sink of your choice:

Try it yourself

Want to give it a shot? Check out this example Worker for running sandboxed code in a Container, and deploy it with one click.

Or better yet, if you have an image sitting around that you’ve been dying to deploy to Cloudflare, you can get started with our docs here.

A world of possibilities

We’re excited about all the new types of applications that are now possible to build on Workers. We’ve heard many of you tell us over the years that you would love to run your entire application on Cloudflare, if only you could deploy this one piece that needs to run in a container.

Today, you can run libraries that you couldn't run in Workers before. For instance, try this Worker that uses FFmpeg to convert video to a GIF.

Or you can run a container as part of a cron job. Or deploy a static frontend with a containerized backend. Or even run a Cloudflare Agent that uses a Container to run Claude Code on your behalf.

The integration with the rest of the Developer Platform makes Containers even more powerful: use Durable Objects for state management, Workflows, Queues, and Agents to compose complex behaviors, R2 to store Container data or media, and more.

Pricing and packaging

As with the rest of our Cloudflare developer products, we wanted to apply the same principles to our developer platform with transparent pricing that scales up and down with your usage.

Today, you can select from the following instances at launch (and yes, we plan to add larger instances over time):

Name	Memory	CPU	Disk
dev	256 MiB	1/16 vCPU	2 GB
basic	1 GiB	1/4 vCPU	4 GB
standard	4 GiB	1/2 vCPU	4 GB

You only pay for what you use — charges start when a request is sent to the container or when it is manually started. Charges stop after the container instance goes to sleep, which can happen automatically after a timeout. This makes it easy to scale to zero, and allows you to get high utilization even with bursty traffic.

Containers are billed for every 10ms that they are actively running at the following rates, with monthly amounts included in Workers Standard:

Memory: $0.0000025 per GiB-second, with 25 GiB-hours included
CPU: $0.000020 per vCPU-second, with 375 vCPU-minutes included
Disk $0.00000007 per GB-second, with 200 GB-hours included

Egress from Containers is priced at the following rates, with monthly amounts included in Workers Standard:

North America and Europe: $0.025 per GB with 1 TB included
Australia, New Zealand, Taiwan, and Korea: $0.050 per GB with 500 GB included
Everywhere else: $0.040 per GB with 500 GB included

See our previous blog post for a more in-depth look into pricing with an example app.

What’s coming next?

With today’s release, we’ve only just begun to scratch the surface of what Containers will do on Workers. This is the first step of many towards our vision of a simple, global, and highly programmable Container platform.

We’re already thinking about what’s next, and wanted to give you a preview:

Higher limits and larger instances – We currently limit your concurrent instances to 40 total GiB of memory and 40 total vCPU. This is enough for some workloads, but many users will want to go higher — a lot higher. Select customers are already scaling well into the thousands of concurrent containers, and we want to bring this ability to more users. We will be raising our limits over the coming months to allow for more total containers and larger instance sizes.
Global autoscaling and latency-aware routing – Currently, containers are addressed by ID and started on-demand. For many use cases, users want to route to one of many stateless container instances deployed across the globe, then autoscale live instances automatically. Autoscaling will be activated with a single line of code, and will enable easy routing to the nearest ready instance.

class MyBackend extends Container {
  defaultPort = 8080;
  autoscale = true; // global autoscaling on - new instances spin up when memory or CPU utilization is high
}

// routes requests to the nearest ready container and load balance globally
async fetch(request, env) {
  return getContainer(env.MY_BACKEND).fetch(request);
}

More ways to communicate between Containers and Workers – We will be adding more ways for your Worker to communicate with your container and vice versa. We will add an exec command to run shell commands in your instance and handlers for HTTP requests from the container to Workers. This will allow you to more easily extend your containers with functionality from the entire developer platform, reach out to other containers, and programmatically set up each container instance.

class MyContainer extends Container {
  // sets up container-to-worker communication with handlers
  handlers = {
    "example.cf": "handleRequestFromContainer"
  };

  handleRequestFromContainer(req) {
    return new Response("You are responding from Workers to a Container request to a specific hostname")
  }

  // use exec to run commands in your container instance
  async cloneRepo(repoUrl) {
    let command = this.exec(`git clone ${repoUrl}`)
    await command.print()
  }  
}

Further integrations with the Developer Platform – We will continue to integrate with the developer platform with first-party APIs for our various services. We want it to be dead simple to mount R2 buckets, reach Hyperdrive, access KV, and more.

And we are just getting started. Stay tuned for more updates this summer and over the course of the entire year.

Try Containers today

The first step is to deploy your own container. Run npm create cloudflare@latest -- --template=cloudflare/templates/containers-template or click the button below to deploy your first Container to Workers.

We’re excited to see all the ways you will use Containers. From new languages and tools, to simplified Cloudflare-only architectures, to advanced programmatic control over container creation, you now have the ability to do even more on the Developer Platform. It is just a wrangler deploy away.

Simple, scalable, and global: Containers are coming to Cloudflare Workers in June 2025

Mike Nomitch — Fri, 11 Apr 2025 14:00:00 GMT

It is almost the end of Developer Week and we haven’t talked about containers: until now. As some of you may know, we’ve been working on a container platform behind the scenes for some time.

In late June, we plan to release Containers in open beta, and today we’ll give you a sneak peek at what makes it unique.

Workers are the simplest way to ship software around the world with little overhead. But sometimes you need to do more. You might want to:

Run user-generated code in any language
Execute a CLI tool that needs a full Linux environment
Use several gigabytes of memory or multiple CPU cores
Port an existing application from AWS, GCP, or Azure without a major rewrite

Cloudflare Containers let you do all of that while being simple, scalable, and global.

Through a deep integration with Workers and an architecture built on Durable Objects, Workers can be your:

API Gateway: Letting you control routing, authentication, caching, and rate-limiting before requests reach a container
Service Mesh: Creating private connections between containers with a programmable routing layer
Orchestrator: Allowing you to write custom scheduling, scaling, and health checking logic for your containers

Instead of having to deploy new services, write custom Kubernetes operators, or wade through control plane configuration to extend the platform, you just write code.

Let’s see what it looks like.

Deploying different application types

A stateful workload: executing AI-generated code

First, let’s take a look at a stateful example.

Imagine you are building a platform where end-users can run code generated by an LLM. This code is untrusted, so each user needs their own secure sandbox. Additionally, you want users to be able to run multiple requests in sequence, potentially writing to local files or saving in-memory state.

To do this, you need to create a container on-demand for each user session, then route subsequent requests to that container. Here’s how you can accomplish this:

First, you write some basic Wrangler config, then you route requests to containers via your Worker:

import { Container } from "cloudflare:workers";

export default {
  async fetch(request, env) {
    const url = new URL(request.url);

    if (url.pathname.startsWith("/execute-code")) {
      const { sessionId, messages } = await request.json();
      // pass in prompt to get the code from Llama 4
      const codeToExecute = await env.AI.run("@cf/meta/llama-4-scout-17b-16e-instruct", { messages });

      // get a different container for each user session
      const id = env.CODE_EXECUTOR.idFromName(sessionId);
      const sandbox = env.CODE_EXECUTOR.get(id);

      // execute a request on the container
      return sandbox.fetch("/execute-code", { method: "POST", body: codeToExecute });
    }

    // ... rest of Worker ...
  },
};

// define your container using the Container class from cloudflare:workers
export class CodeExecutor extends Container {
  defaultPort = 8080;
  sleepAfter = "1m";
}

Then, deploy your code with a single command: wrangler deploy. This builds your container image, pushes it to Cloudflare’s registry, readies containers to boot quickly across the globe, and deploys your Worker.

$ wrangler deploy

That’s it.

How does it work?

Your Worker creates and starts up containers on-demand. Each time you call env.CODE_EXECUTOR.get(id) with a unique ID, it sends requests to a unique container instance. The container will automatically boot on the first fetch, then put itself to sleep after a configurable timeout, in this case 1 minute. You only pay for the time that the container is actively running.

When you request a new container, we boot one in a Cloudflare location near the incoming request. This means that low-latency workloads are well-served no matter the region. Cloudflare takes care of all the pre-warming and caching so you don’t have to think about it.

This allows each user to run code in their own secure environment.

Stateless and global: FFmpeg everywhere

Stateless and autoscaling applications work equally well on Cloudflare Containers.

Imagine you want to run a container that takes a video file and turns it into an animated GIF using FFmpeg. Unlike the previous example, any container can serve any request, but you still don’t want to send bytes across an ocean and back unnecessarily. So, ideally the app can be deployed everywhere.

To do this, you declare a container in Wrangler config and turn on autoscaling. This specific configuration ensures that one instance is always running and if CPU usage increases beyond 75% of capacity, additional instances are added:

"containers": [
  {
    "class_name": "GifMaker",
    "image": "./Dockerfile", // container source code can be alongside Worker code
    "instance_type": "basic",
    "autoscaling": {
      "minimum_instances": 1,
      "cpu_target": 75,
    }
  }
],
// ...rest of wrangler.jsonc...

To route requests, you just call env.GIF_MAKER.fetch and requests are automatically sent to the closest container:

import { Container } from "cloudflare:workers";

export class GifMaker extends Container {
  defaultPort: 1337,
}

export default {
  async fetch(request, env) {
    const url = new URL(request.url);

    if (url.pathname === "/make-gif") {
      return env.GIF_MAKER.fetch(request)
    }

    // ... rest of Worker ...
  },
};

Going beyond the basics

From the examples above, you can see that getting a basic container service running on Workers just takes a few lines of config and a little Workers code. There’s no need to worry about capacity, artifact registries, regions, or scaling.

For more advanced use, we’ve designed Cloudflare Containers to run on top of Durable Objects and work in tandem with Workers. Let’s take a look at the underlying architecture and see some of the advanced use cases it enables.

Durable Objects as programmable sidecars

Routing to containers is enabled using Durable Objects under the hood. In the examples above, the Container class from cloudflare:workers just wraps a container-enabled Durable Object and provides helper methods for common patterns. In the rest of this post, we’ll look at examples using Durable Objects directly, as this should shed light on the platform’s underlying design.

Each Durable Object acts as a programmable sidecar that can proxy requests to the container and manages its lifecycle. This allows you to control and extend your containers in ways that are hard on other platforms.

You can manually start, stop, and execute commands on a specific container by calling RPC methods on its Durable Object, which now has a new object at this.ctx.container:

class MyContainer extends DurableObject {
  // these RPC methods are callable from a Worker
  async customBoot(entrypoint, envVars) {
    this.ctx.container.start({ entrypoint, env: envVars });
  }

  async stopContainer() {
    const SIGTERM = 15;
    this.ctx.container.signal(SIGTERM);
  }

  async startBackupScript() {
    await this.ctx.container.exec(["./backup"]);
  }
}

You can also monitor your container and run hooks in response to Container status changes.

For instance, say you have a CI job that runs builds in a Container. You want to post a message to a Queue based on the exit status. You can easily program this behavior:

class BuilderContainer extends DurableObject {
  constructor(ctx, env) {
    super(ctx, env)
    async function onContainerExit() {
      await this.env.QUEUE.send({ status: "success", message: "Build Complete" });
    }

    async function onContainerError(err) {
      await this.env.QUEUE.send({ status: "error", message: err});
    }

    this.ctx.container.start();
    this.ctx.container.monitor().then(onContainerExit).catch(onContainerError); 
  }

  async isRunning() { return this.ctx.container.running; }
}

And lastly, if you have state that needs to be loaded into a container each time it runs, you can use status hooks to persist state from the container before it sleeps and to reload state into the container after it starts:

import { startAndWaitForPort } from "./helpers"

class MyContainer extends DurableObject {
  constructor(ctx, env) {
    super(ctx, env)
    this.ctx.blockConcurrencyWhile(async () => {
      this.ctx.storage.sql.exec('CREATE TABLE IF NOT EXISTS state (value TEXT)');
      this.ctx.storage.sql.exec("INSERT INTO state (value) SELECT '' WHERE NOT EXISTS 
(SELECT * FROM state)");
      await startAndWaitForPort(this.ctx.container, 8080);
      await this.setupContainer();
      this.ctx.container.monitor().then(this.onContainerExit); 
    });
  }

  async setupContainer() {
    const initialState = this.ctx.storage.sql.exec('SELECT * FROM state LIMIT 1').one().value;
    return this.ctx.container
      .getTcpPort(8080)
      .fetch("http://container/state", { body: initialState, method: 'POST' });
  }

  async onContainerExit() {
    const response = await this.ctx.container
      .getTcpPort(8080)
      .fetch('http://container/state');
    const newState = await response.text();
    this.ctx.storage.sql.exec('UPDATE state SET value = ?', newState);
  }
}

Building around your Containers with Workers

Not only do Durable Objects allow you to have fine-grained control over the Container lifecycle, the whole Workers platform allows you to extend routing and scheduling behavior as you see fit.

Using Workers as an API gateway

Workers provide programmable ingress logic from over 300 locations around the world. In this sense, they provide similar functionality to an API gateway.

For instance, let’s say you want to route requests to a different version of a container based on information in a header. This is accomplished in a few lines of code:

export default {
  async fetch(request, env) {
    const isExperimental = request.headers.get("x-version") === "experimental";
    
    if (isExperimental) {
      return env.MY_SERVICE_EXPERIMENTAL.fetch(request);
    } else {
      return env.MY_SERVICE_STANDARD.fetch(request);
    }
  },
};

Or you want to rate limit and authenticate requests to the container:

async fetch(request, env) {
  const url = new URL(request.url);

  if (url.pathname.startsWith('/api/')) {
    const token = request.headers.get("token");

    const isAuthenticated = await authenticateRequest(token);
    if (!isAuthenticated) {
      return new Response("Not authenticated", { status: 401 });
    }

    const { withinRateLimit } = await env.MY_RATE_LIMITER.limit({ key: token });
    if (!withinRateLimit) {
      return new Response("Rate limit exceeded for token", { status: 429 });
    }

    return env.MY_APP.fetch(request);
  }
  // ...
}

Using Workers as a service mesh

By default, Containers are private and can only be accessed via Workers, which can connect to one of many container ports. From within the container, you can expose a plain HTTP port, but requests will still be encrypted from the end user to the moment we send the data to the container’s TCP port in the host. Due to the communication being relayed through the Cloudflare network, the container does not need to set up TLS certificates to have secure connections in its open ports.

You can connect to the container through a WebSocket from the client too. See this repository for a full example of using Websockets.

Just as the Durable Object can act as proxy to the container, it can act as a proxy from the container as well. When setting up a container, you can toggle Internet access off and ensure that outgoing requests pass through Workers.

// ... when starting the container...
this.ctx.container.start({ 
  workersAddress: '10.0.0.2:8080',
  enableInternet: false, // 'enableInternet' is false by default
});

// ... container requests to '10.0.0.2:8080' securely route to a different service...
override async onContainerRequest(request: Request) {
  const containerId = this.env.SUB_SERVICE.idFromName(request.headers['X-Account-Id']);
  return this.env.SUB_SERVICE.get(containerId).fetch(request);
}

You can ensure all traffic in and out of your container is secured and encrypted end to end without having to deal with networking yourself.

This allows you to protect and connect containers within Cloudflare’s network… or even when connecting to external private networks.

Using Workers as an orchestrator

You might require custom scheduling and scaling logic that goes beyond what Cloudflare provides out of the box.

We don’t want you having to manage complex chains of API calls or writing an operator to get the logic you need. Just write some Worker code.

For instance, imagine your containers have a long startup period that involves loading data from an external source. You need to pre-warm containers manually, and need control over the specific region to prewarm. Additionally, you need to set up manual health checks that are accessible via Workers. You’re able to achieve this fairly simply with Workers and Durable Objects.

import { Container, DurableObject } from "cloudflare:workers";

// A singleton Durable Object to manage and scale containers

class ContainerManager extends DurableObject {
  scale(region, instanceCount) {
    for (let i = 0; i < instanceCount; i++) {
      const containerId = env.CONTAINER.idFromName(`instance-${region}-${i}`);
      // spawns a new container with a location hint
      await env.CONTAINER.get(containerId, { locationHint: region }).start();
    }
  }

  async setHealthy(containerId, isHealthy) {
    await this.ctx.storage.put(containerId, isHealthy);
  }
}

// A Container class for the underlying compute

class MyContainer extends Container {
  defaultPort = 8080;

  async onContainerStart() {
    // run healthcheck every 500ms
    await this.scheduleEvery(0.5, 'healthcheck');
  }

  async healthcheck() {
    const manager = this.env.MANAGER.get(
      this.env.MANAGER.idFromName("manager")
    );
    const id = this.ctx.id.toString();

    await this.container.fetch("/_health")
      .then(() => manager.setHealthy(id, true))
      .catch(() => manager.setHealthy(id, false));
  }
}

The ContainerManager Durable Object exposes the scale RPC call, which you can call as needed with a region and instanceCount which scales up the number of active Container instances in a given region using a location hint. The this.schedule code executes a manually defined healthcheck method on the Container and tracks its state in the Manager for use by other logic in your system.

These building blocks let users handle complex scheduling logic themselves. For a more detailed example using standard Durable Objects, see this repository.

We are excited to see the patterns you come up with when orchestrating complex applications built with containers, and trust that between Workers and Durable Objects, you’ll have the tools you need.

Integrating with more of Cloudflare’s Developer Platform

Since it is Developer Week 2025, we would be remiss to not talk about Workflows, which just went GA, and Agents, which just got even better.

Let’s finish up by taking a quick look at how you can integrate Containers with these two tools.

Running a short-lived job with Workflows & R2

You need to download a large file from R2, compress it, and upload it. You want to ensure that this succeeds, but don’t want to write retry logic and error handling yourself. Additionally, you don’t want to deal with rotating R2 API tokens or worry about network connections — it should be secure by default.

This is a perfect opportunity for a Workflow using Containers. The container can do the heavy lifting of compressing files, Workers can stream the data to and from R2, and the Workflow can ensure durable execution.

export class EncoderWorkflow extends WorkflowEntrypoint {
  async run(event: WorkflowEvent, step: WorkflowStep) {
    const id = this.env.ENCODER.idFromName(event.instanceId);
    const container = this.env.ENCODER.get(id);

    await step.do('init container', async () => {
      await container.init();
    });

    await step.do('compress the object with zstd', async () => {
      await container.ensureHealthy();
      const object = await this.env.ARTIFACTS.get(event.payload.r2Path);
      const result = await container.fetch('http://encoder/zstd', {
        method: 'POST', body: object.body 
      });
      await this.env.ARTIFACTS.put(`results${event.payload.r2Path}`, result.body);
    });

    await step.do('cleanup container', async () => {
      await container.destroy();
    });
  }
}

Calling a Container from an Agent

Lastly, imagine you have an AI agent that needs to spin up cloud infrastructure (you like to live dangerously). To do this, you want to use Terraform, but since it’s run from the command line, you can’t run it on Workers.

By defining a tool, you can enable your Agent to run the shell commands from a container:

// Make tools that call to a container from an agent

const createExternalResources = tool({
  description: "runs Terraform in a container to create resources",
  parameters: z.object({ sessionId: z.number(), config: z.string() }),
  execute: async ({ sessionId, config }) => {
    return this.env.TERRAFORM_RUNNER.get(sessionId).applyConfig(config);
  },
});

// Expose RPC Methods that call to the container

class TerraformRunner extends DurableObject {
  async applyConfig(config) {
    await this.ctx.container.getTcpPort(8080).fetch(APPLY_URL, {
      method: 'POST',
      body: JSON.stringify({ config }),
    });
  }

  // ...rest of DO...
}

Containers are so much more powerful when combined with other tools. Workers make it easy to do so in a secure and simple way.

Pay for what you use and use the right tool

The deep integration between Workers and Containers also makes it easy to pick the right tool for the job with regards to cost.

With Cloudflare Containers, you only pay for what you use. Charges start when a request is sent to the container or it is manually started. Charges stop after the container goes to sleep, which can happen automatically after a configurable timeout. This makes it easy to scale to zero, and allows you to get high utilization even with highly-variable traffic.

Containers are billed for every 10ms that they are actively running at the following rates:

Memory: $0.0000025 per GB-second
CPU: $0.000020 per vCPU-second
Disk $0.00000007 per GB-second

After 1 TB of free data transfer per month, egress from a Container will be priced per-region. We'll be working out the details between now and the beta, and will be launching with clear, transparent pricing across all dimensions so you know where you stand.

Workers are lighter weight than containers and save you money by not charging when waiting on I/O. This means that if you can, running on a Worker helps you save on cost. Luckily, on Cloudflare it is easy to route requests to the right tool.

Cost comparison

Comparing containers and functions services on paper is always going to be an apples to oranges exercise, and results can vary so much depending on use case. But to share a real example of our own, a year ago when Cloudflare acquired Baselime, Baselime was a heavy user of AWS Lambda. By moving to Cloudflare, they lowered their cloud compute bill by 80%.

Below we wanted to share one representative example that compares costs for an application that uses both containers and serverless functions together. It’d be easy for us to come up with a contrived example that uses containers sub-optimally on another platform, for the wrong types of workloads. We won’t do that here. We know that navigating cloud costs can be challenging, and that cost is a critical part of deciding what type of compute to use for which pieces of your application.

In the example below, we’ll compare Cloudflare Containers + Workers against Google Cloud Run, a very well-regarded container platform that we’ve been impressed by.

Example application

Imagine that you run an application that serves 50 million requests per month, and each request consumes an average 500 ms of wall-time. Requests to this application are not all the same though — half the requests require a container, and the other half can be served just using serverless functions.

Requests per month	Wall-time (duration)	Compute required	Cloudflare	Google Cloud
25 million	500ms	Container + serverless functions	Containers + Workers	Google Cloud Run + Google Cloud Run Functions
25 million	500ms	Serverless functions	Workers	Google Cloud Run Functions

Container pricing

On both Cloud Run and Cloudflare Containers, a container can serve multiple requests. On some platforms, such as AWS Lambda, each container instance is limited to a single request, pushing cost up significantly as request count grows. In this scenario, 50 requests can run simultaneously on a container with 4 GB memory and half of a vCPU. This means that to serve 25 million requests of 500ms each, we need 625,000 seconds worth of compute

In this example, traffic is bursty and we want to avoid paying for idle-time, so we’ll use Cloud Run’s request-based pricing.

	Price per vCPU second	Price per GB-second of memory	Price per 1m requests	Monthly Price for Compute + Requests
Cloudflare Containers	$0.000020	$0.0000025	$0.30	$20.00
Google Cloud Run	$0.000024	$0.0000025	$0.40	$23.75

^{* Comparison does not include free tiers for either provider and uses a single Tier 1 GCP region}

Compute pricing for both platforms are comparable. But as we showed earlier in this post, Containers on Cloudflare run anywhere, on-demand, without configuring and managing regions. Each container has a programmable sidecar with its own database, backed by Durable Objects. It’s the depth of integration with the rest of the platform that makes containers on Cloudflare uniquely programmable.

Function pricing

The other requests can be served with less compute, and code written in JavaScript, TypeScript, Python or Rust, so we’ll use Workers and Cloud Run Functions.

These 25 million requests also run for 500 ms each, and each request spends 480 ms waiting on I/O. This means that Workers will only charge for 20 ms of “CPU-time”, the time that the Worker actually spends using compute. This ratio of low CPU time to high wall time is extremely common when building AI apps that make inference requests, or even when just building REST APIs and other business logic. Most time is spent waiting on I/O. Based on our data, we typically see Workers use less than 5 ms of CPU time per request vs seconds of wall time (waiting on APIs or I/O).

The Cloud Run Function will use an instance with 0.083 vCPU and 128 MB memory and charge on both CPU-s and GiB-s for the full 500 ms of wall-time.

	Total Price for “wall-time”	Total Price for “CPU-time”	Total Price for Compute + Requests
Cloudflare Workers	N/A	$0.83	$8.33
Google Cloud Run Functions	$1.44	N/A	$11.44

^{* Comparison does not include free tiers and uses a single Tier 1 GCP region.}

This comparison assumes you have configured Google Cloud Run Functions with a max of 20 concurrent requests per instance. On Google Cloud Run Functions, the maximum number of concurrent requests an instance can handle varies based on the efficiency of your function, and your own tolerance for tail latency that can be introduced by traffic spikes.

Workers automatically scale horizontally, don’t require you to configure concurrency settings (and hope to get it right), and can run in over 300 locations.

A holistic view of costs

The most important cost metric is the total cost of developing and running an application. And the only way to get the best results is to use the right compute for the job. So the question boils down to friction and integration. How easily can you integrate the ideal building blocks together?

As more and more software makes use of generative AI, and makes inference requests to LLMs, modern applications must communicate and integrate with a myriad of services. Most systems are increasingly real-time and chatty, often holding open long-lived connections, performing tasks in parallel. Running an instance of an application in a VM or container and calling it a day might have worked 10 years ago, but when we talk to developers in 2025, they are most often bringing many forms of compute to the table for particular use cases.

This shows the importance of picking a platform where you can seamlessly shift traffic from one source of compute to another. If you want to rate-limit, serve server-side rendered pages, API responses and static assets, handle authentication and authorization, make inference requests to AI models, run core business logic via Workflows, or ingest streaming data, just handle the request in Workers. Save the heavier compute only for where it is actually the only option. With Cloudflare Workers and Containers, this is as simple as an if-else statement in your Worker. This makes it easy to pick the right tool for the job.

Coming June 2025

We are collecting feedback and putting the finishing touches on our APIs now, and will release the open beta to the public in late June 2025.

From day one of building Cloudflare Workers, it’s been our goal to build an integrated platform, where Cloudflare products work together as a system, rather than just as a collection of separate products. We’ve taken this same approach with Containers, and aim to make Cloudflare not only the best place to deploy containers across the globe, but the best place to deploy the types of complete applications that developers are building, that use containers in tandem with serverless functions, Workflows, Agents, and much more.

We’re excited to get this into your hands soon. Stay on the lookout this summer.