Ecosystem · v1 · outside-in

Tenki sits at the intersection of four fast-moving ecosystems. The convergence is the moat.

AI coding agents (the consumers of Sandbox), the CI/runners field (where Runners drops in), the AI code-review pack (where Code Reviewer leads on recall), and the agent-sandbox/compute layer (where Luxor's owned metal is the supply). Three of Tenki's products run on one Firecracker microVM substrate — no competitor owns the whole loop. Target persona across all four: AI-native builders and the CI owners who pay the bill.

Pillar 1 · AI coding agents & harnesses

The consumers of Sandbox. Win the harness, win the runtime.

Sandbox has no pull without an agent in front of it. Every major coding agent needs somewhere isolated to write and run code — that is exactly what disposable Firecracker microVMs are for. Tenki Sandbox supports Claude Code and Codex today; the rest are the reference-architecture roadmap (Initiative 2).

Supported today

Claude Code (Anthropic)

The flagship reference architecture — "give your agent root without giving it yours." Co-marketed with Anthropic; the same Claude Opus harness Tenki's Code Reviewer benchmark runs on.

Supported today

Codex (OpenAI)

Second first-class Sandbox reference. Codex + Sandbox repo + 5-min video + cost calculator, co-marketed with OpenAI.

Roadmap

Cursor (Anysphere)

Highest-density AI-coding persona. Quickstart for running Cursor's agent inside Sandbox; a featured-placement and co-marketing target (see Partners).

Roadmap

Devin (Cognition)

Autonomous SWE agent that needs heavy isolated compute. Quickstart per harness; Cognition is also a cohort/design-partner target.

Roadmap

Cline

Open-source, VS Code-native agent — a natural ally for an OSS-friendly quickstart and long-tail acquisition.

Roadmap

Aider

Popular terminal-native pair-programmer. Quickstart guide; community-driven adoption surface.

The play. Lock canonical, maintained references for the agents Tenki already supports (Claude Code, Codex), then expand down the list one harness at a time. Each reference is co-marketing leverage and a distribution surface — the agent vendors send their users somewhere to run code safely, and that somewhere should be Tenki.

Pillar 2 · CI & runners landscape

A crowded field of GitHub Actions accelerators. Tenki's edge is owned metal.

GitHub Actions is the incumbent everyone starts on — and the bill everyone wants to cut. A wave of drop-in faster/cheaper runners has emerged, almost all VC-funded and renting cloud capacity at retail margins. Tenki's wedge: the same one-line runs-on migration, but on compute Luxor owns (mining + the GPU/AI build-out), so the cost structure is structural, not promotional.

PlayerPositionHow Tenki differs
GitHub ActionsThe incumbent / default; recently cut hosted-runner prices ~39%Tenki is the drop-in replacement — one runs-on change, 30% faster, up to 60% cheaper, plus a Migration Wizard that opens the PR
DepotWell-funded pure-play ($10M Series A); fast runners + remote Docker builds; loud brandOwned-compute cost moat + the 3-product convergence (CI that shares a substrate with review and agent sandboxes)
BlacksmithWell-funded pure-play ($13.5M, GV); gaming-CPU bare metal; strong mindshareSame drop-in promise on Luxor's captive metal; honest head-to-head benchmarks (speed + $/run)
NamespaceFast CI + remote builders + dev environmentsTenki adds AI review and agent Sandbox on the same isolation primitive
WarpBuildCheaper/faster hosted runners + cross-platform buildsCompete on cadence + economics; own the evaluator's comparison search
BuildJetEarly low-cost runner play for GitHub ActionsTenki's owned metal undercuts retail-cloud resale economics
UbicloudOpen-source cloud; managed runners on bare metalTenki bundles CI + review + Sandbox; enterprise-trust inheritance (SOC 1/2 Type II parent)
RunsOnSelf-hosted runners in your own AWS accountTenki is fully managed on owned compute — no cloud account, no per-minute cloud bill

Where Tenki fits: the drop-in plus owned metal quadrant. Everyone else is either the incumbent (GitHub) or a VC-funded reseller of someone else's cloud. Tenki is the only one that pairs the lowest-friction migration with a structural cost advantage from a profitable parent's captive compute.

Pillar 3 · AI code review

Tenki leads on recall. The precision gap is the brand — not the embarrassment.

AI PR review is the loudest, most-adopted category of the three. CodeRabbit owns adoption mindshare; the model vendors are bundling review into their agents. Tenki's Code Reviewer leads recall at 68.9% — roughly 2× the next best — but trails on precision (29.9%). The honest gap is the most interesting unsolved problem in the category, and the spine of the Benchmark Series (Initiative 1).

68.9%
Tenki review recall
~2× the next best · first-party benchmark
29.9%
Tenki review precision
the honest gap — and the content engine
7
Reviewers benchmarked
recall and precision, on bugs not in Tenki's repos
PlayerPositionTenki's wedge
CodeRabbitAdoption-mindshare leader in AI PR reviewReproducible head-to-heads + "merge decisions, not nitpicks" — win on credibility, not marketing spend
GreptileContext-heavy, codebase-aware reviewTenki's recall lead on real OSS bugs; published open harness
GitHub Copilot (review)Bundled into the platform devs already useSpecialist depth + radical honesty on metrics the incumbent won't publish
Cursor BugbotReview bundled into the highest-density agent IDEStand-alone, harness-agnostic, benchmark-backed
GraphiteStacked-PR workflow with AI review attachedTenki competes on detection quality, not workflow lock-in
Devin (review)Autonomous agent that also reviewsTenki's review runs on the same substrate as CI + Sandbox — one loop
EllipsisAutomated PR review + fixesTransparent recall/precision benchmarks; publish where Tenki loses

Trust, not just recall. The category is full of tools that claim to catch everything and quietly bury you in false positives. Tenki's move is to publish recall and precision — including where it loses — on bugs that aren't in its own benchmark repos. Honesty is literally a Luxor value ("honesty over kindness"), and here it's also the only durable differentiation in a field racing to the same demo.

Pillar 4 · Agent sandboxes & compute

Two layers, one substrate: the agent sandbox above, Luxor's compute below.

Sandbox competes directly with the agent-runtime pure-plays. Underneath, the whole industry is a compute-supply problem — which is exactly the layer Luxor was built for. Tenki is the only player that owns both the consumer-facing sandbox and the metal it runs on.

The sandbox market in four layers

"Where does untrusted LLM-generated code actually run?" is the single most important infrastructure question for autonomous coding agents — everything else (inference, orchestration, memory, observability) assumes the sandbox layer is correct, fast, and safe. The market stacks into four layers, and which layer you operate at sets your tradeoffs. Tenki Sandbox plays in Layer B — the agent-sandbox platform layer where almost every "build-your-own-harness" buying decision happens.

Layer A · Primitives

The isolation tech

Firecracker, gVisor, Kata Containers, libkrun. The underlying isolation technology — hyperscaler-dominated, stable, open source. Everyone above builds on these. Tenki builds on Firecracker.

Layer B · Tenki competes here

Agent-sandbox platforms

E2B, Contree, Daytona, Modal, Sprites.dev, Blaxel, Runloop, Northflank (and Tenki). Managed services built on Layer A. Where teams building their own agent harness shop for a runtime.

Layer C · Embedded

Inside agent products

Cursor background agents, Devin workspaces, Copilot Workspace, Replit Agent. Sandboxes shipped as a feature of a broader agent product — rarely bought standalone.

Layer D · The disruption

Model-provider managed agents

Claude Managed Agents (Anthropic, launched Apr 2026). Collapses model + sandbox + harness + state into one API. For standard agents, this can eliminate the Layer-B buying decision entirely.

Agent-sandbox vendor landscape (Layer B)

The purpose-built sandboxes designed specifically to run AI-generated code — the direct competitive set for Tenki Sandbox. All public; framed factually. The Tenki row is highlighted.

VendorIsolationPersistenceCold startGPUStrength
Tenki Layer B Firecracker microVM (per task) Persistent volumes + snapshot/restore <100ms boot No public GPU Sandbox-first brand + native ADE desktop app + integrated runners→review loop + owned-compute price
E2B Firecracker microVM Ephemeral / pause (beta) ~150ms No SDK-first, SOC 2, 200M+ sandboxes created; the mindshare incumbent
Contree microVM (on Nebius) Git-like branching + snapshots Sub-second Yes Git-style fork/rollback, MCP-native, ships 7,000+ SWE-bench environments as tags
Daytona Docker containers Stateful, unlimited ~90ms Yes "Fastest creation," GPU support, Computer-Use desktops
Modal gVisor sandbox Snapshots Sub-second Yes (A100/H100) 50K+ concurrency, full GPU economics, SOC 2
Sprites.dev Firecracker microVM Indefinite + hibernate Hibernate ~300ms No Zero idle cost; per-second billing for always-on agents
Blaxel Firecracker microVM Standby snapshots + hibernate ~25ms resume No Agents co-located with sandboxes; SOC 2 / HIPAA / ISO 27001; YC X25 ($7.3M seed)
Runloop Custom hypervisor Snapshots Sub-second No SWE-bench focus, 10K+ parallel, VPC deploy, SOC 2
Northflank microVM / gVisor Stateful Sub-second Yes (H100s) Enterprise VPC (AWS/GCP/Azure), multi-cloud, SOC 2
Vercel Sandbox Firecracker microVM Snapshot resume + hibernate Sub-second No Part of the Vercel Agents stack; dev-server port exposure
CodeSandbox SDK microVMs Forking / snapshots Sub-second No SOC 2; owned by Together AI
Microsandbox OSS libkrun microVM Stateful Sub-second No Self-hosted, network-layer secret injection, YC
OpenSandbox OSS Docker / K8s Stateful Sub-second No Protocol-driven K8s runtime, Alibaba-backed
Cube Sandbox OSS Docker / microVM Stateful + snapshots Sub-second No Tencent Cloud's production sandbox stack, open-sourced; one-click deploy
SmolVM OSS Firecracker / Hypervisor.framework Pause / resume Sub-200ms No Single-executable microVM, Mac/Linux dev parity, pre-installed Claude Code / Codex

Where Tenki fits — and wins — in Layer B. Tenki sits squarely in the agent-sandbox platform layer, and Firecracker is the security sweet spot for untrusted LLM-generated code: hardware-level isolation (a dedicated kernel per sandbox, hypervisor-exploit-only attack surface) with container-like startup speed and cost. That's the right default for code an agent wrote and no human reviewed. The differentiators against the pure-plays:

  • Sandbox-first brand. Most rivals treat the sandbox as an afterthought bolted onto a broader compute or dev-env product. Tenki leads with it — including a native Sandbox ADE desktop app, not just an SDK.
  • Easy local / native setup. The desktop ADE and supported harnesses (Claude Code + Codex) make first-run trivial — versus the hard self-host path of container-first rivals like Daytona.
  • The long-running / self-improving-agent wedge. Most of Layer B optimizes for short-lived, ephemeral sandboxes. Tenki's persistent volumes + snapshot/restore target durable, multi-session agent environments — the part of the market the incumbents underserve.
  • The integrated loop. Runners → review → sandbox on one substrate. No sandbox pure-play also owns CI and code review; the agent writes code, runs it in CI, and gets it reviewed, all in per-job isolation.
  • Owned-compute price. Every Layer-B pure-play rents its metal at retail cloud margins; Tenki runs on compute Luxor owns, so the cost advantage is structural, not promotional.

The Layer D threat — and the response. The biggest disruption isn't another pure-play; it's the model providers moving down the stack. Claude Managed Agents (launched April 2026) bundles the agent loop, tool execution, the sandbox container, and state persistence behind one REST API — collapsing what used to be a stack (model + E2B sandbox + a harness + a memory store) into a single call. For teams building a standard coding or task agent, that can eliminate the Layer-B buying decision entirely: you get the sandbox bundled with your inference. Tenki's answer is to specialize where Layer D is weakest — long-running and self-improving agents, the integrated runners→review→sandbox loop, and owned-compute price — and to stay MCP-native and harness-agnostic so Tenki is the portable, controllable runtime for teams that refuse to hand their harness and their data to a single model vendor.

The Luxor compute side — the supply context

Tenki's dev products are the consumer wedge; Luxor's owned compute is the supply. The /compute GPU marketplace and /hardware are the consumer-facing edge of Luxor's "compute as a commodity" thesis, sitting alongside the neoclouds that supply the broader AI build-out.

Tenki / Luxor

/compute + /hardware

Luxor's GPU marketplace and hardware storefront — the supply side made tradable. The economics that let Runners and Sandbox undercut VC-funded rivals on owned metal.

Supply context

CoreWeave

The flagship GPU neocloud. Context for where AI compute supply concentrates — and what Luxor's owned-metal thesis competes with on economics.

Supply context

Lambda

GPU cloud for AI training/inference. Part of the neocloud supply landscape Tenki's economics narrative references.

Supply context

Nebius

Full-stack AI cloud / neocloud. Supply-side comparable for the "compute is the new digital oil" story. (Disclosed: opencolin currently runs developer demand-gen for Nebius; wound down on engagement — see Overview.)

The convergence thesis — Tenki's unique position. Runners, Code Reviewer, and Sandbox are secretly one product: all three run on the same Firecracker microVM substrate, on compute Luxor owns. In the agent era they collapse into a single loop — where an autonomous agent writes code, runs it in CI, and gets it reviewed, all in per-job isolation, on metal with a structural cost advantage. The CI pure-plays don't have review or sandbox. The review pure-plays don't have CI or compute. The sandbox pure-plays rent their metal. No competitor owns the whole loop and the substrate beneath it — that is the entire ecosystem argument, and the spine of every benchmark, reference architecture, and partner pitch.

The synthesis

Four ecosystems, one through-line.

The narrative that ties every post, talk, benchmark, and partnership together — and the reason DevRel is the lever, not more engineering hours.

Consumers

Agents pull Sandbox

Every coding agent needs isolated compute. Own the harness references (Claude Code + Codex today) and Sandbox becomes the default runtime.

Wedge

Runners pay the bills

Drop-in GitHub Actions replacement on owned metal — the cash wedge that funds the agent-era bet and seeds the multi-product North Star.

Trust

Review earns credibility

The recall lead plus radical honesty on precision is the trust engine that converts trials to paid usage across all three products.

Moat

Compute is the floor

Luxor's owned metal turns every per-play competitor's retail-cloud margin into Tenki's structural advantage. The agent era runs on compute you can trust.

Convert these ecosystems into distribution → See the benchmark series