field notes/self-hosted-sandboxes-mcp-tunnels

June 29, 2026·By Stride Techworks·11 min read

Run the agent, keep the data: self-hosted sandboxes and MCP tunnels, explained for small teams

The thing blocking your agent from production usually isn't the model — it's that the data can't leave the building. Anthropic's May 19 release splits the agent loop from tool execution and reaches private systems through an outbound-only tunnel. Here's what each piece actually is, whether you need it, and what to do if you don't have an enterprise contract.

agent-systems infrastructure operators security mcp on-prem

For two years the standard reason an agent didn't make it to production was the model. It wasn't reliable enough, it hallucinated a tool call, it couldn't hold the thread. That excuse is mostly gone now.

The reason agents stall in front of me today is almost never the model. It's a sentence that comes up in the first call: "the data can't leave the building."

The customer records, the claims, the case files, the source repo, the internal API that actually does something — those can't go bouncing through a managed sandbox that the security team needs six weeks to clear. So the agent that would obviously help sits in a demo, and the real workflow stays manual.

On May 19, 2026, at Code with Claude in London, Anthropic shipped the architectural answer to that exact objection: self-hosted sandboxes and MCP tunnels for Claude Managed Agents. This post is the builder-to-builder read on what they are, what splits where, whether you actually need them, and what to do if you're a small team without an enterprise agreement — because the pattern matters even if the product doesn't fit you yet.

The one idea underneath both features: separate orchestration from execution

Here's the whole thing in one sentence. An agent has two halves — the loop that decides what to do, and the place where the doing happens — and Anthropic just made it possible to keep the second half inside your walls while they run the first half.

The agent loop is orchestration: reading context, choosing the next tool, recovering from an error, managing the window. That's the part Anthropic is good at and the part that benefits from running on their infrastructure.

Tool execution is the other half: actually running the code, reading the file, hitting the database, building the repo, generating the image. That's the part that touches your data.

For most of agent history those two halves ran in the same place, which meant your data went wherever the loop went. The May release pries them apart. The loop stays on Anthropic; execution moves to you. Once you've internalized that split, both features are just two faces of it — one for running code, one for reaching services.

Self-hosted sandboxes: run the agent's hands inside your perimeter

A self-hosted sandbox is an execution environment you control — your own infrastructure or a managed provider — where the agent's tool calls actually run, while Anthropic keeps handling orchestration, context, and recovery. It's in public beta as of the May 19 launch.

Concretely: when the agent decides to run a build, extract text from a PDF, or call an internal script, that work happens in a sandbox sitting in your network, against your files and repos. Anthropic never holds the execution environment. You get the things a security review actually asks about — network policy, audit logging, runtime images, compute sizing, and data residency — because the environment is yours.

You can host the sandbox yourself or use one of four launch providers, and they're genuinely different tools, not interchangeable logos:

Cloudflare — microVMs, zero-trust networking, and tightly controlled outbound traffic. Reach for this when egress control is the whole point: you want to pin exactly what the agent can phone home to.
Daytona — long-running, stateful environments reachable over SSH or preview URLs. Good when the work isn't a quick function call but a session that persists — an environment the agent comes back to.
Modal — AI-focused workloads with scalable CPU and GPU. This is the one for heavy lifting: long builds, image generation, anything that needs real compute on demand.
Vercel — sandbox isolation plus VPC peering and credential injection at the network boundary. The fit when the agent needs to talk to services already living in your VPC without you minting long-lived secrets into the sandbox.

The practical win is mundane and large: resource-intensive jobs like long builds and image generation stop being a problem you route around, because you size the compute, and the repos and files never leave home.

MCP tunnels: let the agent reach private systems without opening a door

An MCP tunnel lets an agent connect to a private MCP server — an internal database, API, ticketing system, or knowledge base — without exposing that server to the public internet. It's in research preview.

If you've ever tried to give an agent access to something internal, you know the bad options. You either open an inbound firewall rule and now there's a hole in your perimeter that a security person will rightly hate, or you copy the data out to somewhere the agent can reach and now your system of record has a shadow twin. Both are how breaches and audit findings happen.

The tunnel kills both. You deploy a lightweight gateway inside your network that opens a single outbound, encrypted connection to Anthropic — and the agent reaches your private MCP server back down that connection. No inbound rules. Nothing listening on the public internet. It's the same shape as a reverse tunnel or a zero-trust connector, applied to agent tooling, and managed from organization settings in the Claude Console.

That's the difference between "we exposed our ticketing system to an AI vendor" — a sentence that ends a deal — and "the agent reads tickets through an outbound tunnel that our firewall already permits." Same capability, completely different security review.

If you want the longer version of why a naked, internet-exposed MCP server is dangerous, we wrote that up separately — researchers have found hundreds of MCP servers sitting on the internet with zero authentication. See MCP security for small teams. The tunnel is, in part, the official answer to that footgun.

Why this is the unlock, not the model

The most honest line I've read about agents in production came from a developer reacting to this release: the compliance team is the real bottleneck for production agents, not the model. Self-hosted sandboxes and MCP tunnels are the layer that lets agents run inside the customer's perimeter instead of behind a sandbox the security team takes six weeks to clear.

That matches what we see. The teams stuck at the demo-to-production line are rarely stuck on capability. They're stuck on a control question — where does the data live, who can reach it, what's logged — and until now the answer for managed agents was "trust the vendor's environment," which is a non-starter in healthcare, finance, legal, government, and frankly any business with a customer list it doesn't want leaking. This release reframes the answer to "your environment, your logs, your egress rules." That's the version a security review can actually approve.

It's also a tell about where the industry is going: orchestration as a service you rent, execution and data as things you keep. That separation is going to define serious agent architecture for the next few years, and it's worth building toward even if you never touch Anthropic's managed product.

Do you even need this? An honest checklist

Most of the writing about this release assumes you're an enterprise with a Claude Console and a compliance department. A lot of you aren't. So before you go requesting beta access, here's the straight read on whether this is your problem:

Does real, sensitive data flow through the agent's tools? Customer PII, health or financial records, source code you can't expose, internal systems of record. If it's all public or synthetic data, you probably don't need any of this yet.
Is "data can't leave our environment" an actual constraint — contractual, regulatory, or just a hard line from your own leadership? If yes, this architecture is squarely for you.
Are you on Claude Managed Agents (or the Messages API) specifically? Self-hosted sandboxes and MCP tunnels are features of that platform. If you're running your own agent loop with the Agent SDK on your own boxes, you may already have the control these features add — the concepts still apply, the product may not.
Do you have someone who can stand up a gateway and a sandbox provider, and reason about egress? These are infrastructure features. They assume an operator. If that operator doesn't exist on your team yet, that gap is the real first task.

If you answered no to the data question, file this away and move on — you're not behind. If you answered yes and the rest is fuzzy, that fuzziness is the project.

What to do if you're a small team without an enterprise contract

The product is enterprise-shaped, but the pattern is for everyone, and there's a version of "keep the data in" you can run at any size:

Adopt the split as a design rule now. Whatever agent you're building, draw the line between the loop and the execution, and decide deliberately where each runs and what data crosses the boundary. Even on a single box, knowing which calls touch sensitive data tells you what to isolate. This is the substrate-first habit we describe in what we mean by an operator stack.

Keep private systems behind outbound connections, not inbound holes. You don't need Anthropic's tunnel to apply its lesson. A reverse tunnel, a zero-trust connector, or a private network is the right shape for any agent reaching an internal service. The wrong shape — an MCP server on a public IP with a weak token — is the one to delete today.

Run the genuinely private parts locally. Some data shouldn't transit anyone's cloud, full stop. That's why we build Local Voice for on-prem voice and transcription, and why our AI Systems & Automation work deploys private AI on hardware you own when the data can't leave the building. The same instinct that makes self-hosted sandboxes attractive is the one that makes on-prem the right call for your most sensitive workflow.

Wire the boundary into your operator stack. Once more than one agent is in play, which agent can reach which private system becomes a coordination problem, not just a network one — the thing Org-Desk exists to handle — and the knowledge those agents read should live in a deliberate layer like Operator Vault rather than scattered across whatever the agent happened to scrape.

Threat-model it before it's in production, not after. Moving fast with AI and then running what you built in production is exactly the situation DFNDR was built for — practical threat modeling and security hardening for small teams, not enterprise security theater.

The bottom line

The headline feature of mid-2026 isn't a smarter model. It's a boundary you can finally draw cleanly: agent loop on one side, your data and execution on the other, connected by a single outbound wire you control. Anthropic productized it for enterprises on May 19. The architecture is available to anyone willing to design for it — and for the workflows where data genuinely can't leave the building, designing for it is the whole job.

If "the data can't leave the building" is the sentence standing between you and an agent that would obviously help, that's a scoping conversation we have most weeks.

FAQ

What is a self-hosted sandbox in Claude Managed Agents? A self-hosted sandbox is an execution environment you control — your own infrastructure or a managed provider like Cloudflare, Daytona, Modal, or Vercel — where the agent's tool calls actually run. Anthropic continues to handle orchestration, context management, and error recovery, but the code execution, file access, and service calls happen inside your perimeter, giving you control over network policy, audit logging, runtime images, and data residency. It entered public beta on May 19, 2026.

What is an MCP tunnel? An MCP tunnel lets a Claude agent (or the Messages API) connect to a private Model Context Protocol server — an internal database, API, ticketing system, or knowledge base — without exposing that server to the public internet. Instead of opening an inbound firewall rule, you deploy a lightweight gateway that makes a single outbound, encrypted connection to Anthropic infrastructure, and the agent reaches your private server back through it. It launched in research preview on May 19, 2026.

What's the difference between the agent loop and tool execution? The agent loop is the orchestration half of an agent: reading context, choosing the next tool, managing the window, and recovering from errors. Tool execution is the other half: actually running the code, reading the file, or hitting the database — the part that touches your data. Self-hosted sandboxes let you keep tool execution inside your environment while the loop runs on Anthropic's infrastructure.

Do I need self-hosted sandboxes or MCP tunnels for my project? You need them if real sensitive data flows through your agent's tools and "data can't leave our environment" is a genuine contractual, regulatory, or internal constraint — and you're using Claude Managed Agents or the Messages API. If your data is public or synthetic, or you run your own agent loop with the Agent SDK on your own infrastructure, you may already have equivalent control and can apply the pattern without the specific product.

How do small teams run AI agents without data leaving the building? Adopt the same split the enterprise features use: decide where the agent loop runs versus where tool execution runs, and keep anything sensitive on your side of the boundary. Reach private systems through outbound connections (a reverse tunnel or zero-trust connector) rather than opening inbound firewall holes, run the most sensitive parts on-premises, and threat-model the boundary before going to production. You don't need an enterprise contract to apply the architecture — you need to design for it.

"The data can't leave the building" shouldn't be the reason a useful agent stays a demo. Stride TechWorks builds agents that run inside your perimeter — private, on-prem where it has to be, hardened before it ships. Start with a Workflow Audit, scope a private automation, or tell us what's blocking you on the contact page. Receipts over slideware.

end of note

← back to field notes

field notes

Loading field notes.

filter by tag

allsystemsagentsoperations

loading note metadata