AI coding agents (the consumers of Sandbox), the CI/runners field (where Runners drops in), the AI code-review pack (where Code Reviewer leads on recall), and the agent-sandbox/compute layer (where Luxor's owned metal is the supply). Three of Tenki's products run on one Firecracker microVM substrate — no competitor owns the whole loop. Target persona across all four: AI-native builders and the CI owners who pay the bill.
Sandbox has no pull without an agent in front of it. Every major coding agent needs somewhere isolated to write and run code — that is exactly what disposable Firecracker microVMs are for. Tenki Sandbox supports Claude Code and Codex today; the rest are the reference-architecture roadmap (Initiative 2).
The flagship reference architecture — "give your agent root without giving it yours." Co-marketed with Anthropic; the same Claude Opus harness Tenki's Code Reviewer benchmark runs on.
Second first-class Sandbox reference. Codex + Sandbox repo + 5-min video + cost calculator, co-marketed with OpenAI.
Highest-density AI-coding persona. Quickstart for running Cursor's agent inside Sandbox; a featured-placement and co-marketing target (see Partners).
Autonomous SWE agent that needs heavy isolated compute. Quickstart per harness; Cognition is also a cohort/design-partner target.
Open-source, VS Code-native agent — a natural ally for an OSS-friendly quickstart and long-tail acquisition.
Popular terminal-native pair-programmer. Quickstart guide; community-driven adoption surface.
The play. Lock canonical, maintained references for the agents Tenki already supports (Claude Code, Codex), then expand down the list one harness at a time. Each reference is co-marketing leverage and a distribution surface — the agent vendors send their users somewhere to run code safely, and that somewhere should be Tenki.
GitHub Actions is the incumbent everyone starts on — and the bill everyone wants to cut. A wave of drop-in faster/cheaper runners has emerged, almost all VC-funded and renting cloud capacity at retail margins. Tenki's wedge: the same one-line runs-on migration, but on compute Luxor owns (mining + the GPU/AI build-out), so the cost structure is structural, not promotional.
| Player | Position | How Tenki differs |
|---|---|---|
| GitHub Actions | The incumbent / default; recently cut hosted-runner prices ~39% | Tenki is the drop-in replacement — one runs-on change, 30% faster, up to 60% cheaper, plus a Migration Wizard that opens the PR |
| Depot | Well-funded pure-play ($10M Series A); fast runners + remote Docker builds; loud brand | Owned-compute cost moat + the 3-product convergence (CI that shares a substrate with review and agent sandboxes) |
| Blacksmith | Well-funded pure-play ($13.5M, GV); gaming-CPU bare metal; strong mindshare | Same drop-in promise on Luxor's captive metal; honest head-to-head benchmarks (speed + $/run) |
| Namespace | Fast CI + remote builders + dev environments | Tenki adds AI review and agent Sandbox on the same isolation primitive |
| WarpBuild | Cheaper/faster hosted runners + cross-platform builds | Compete on cadence + economics; own the evaluator's comparison search |
| BuildJet | Early low-cost runner play for GitHub Actions | Tenki's owned metal undercuts retail-cloud resale economics |
| Ubicloud | Open-source cloud; managed runners on bare metal | Tenki bundles CI + review + Sandbox; enterprise-trust inheritance (SOC 1/2 Type II parent) |
| RunsOn | Self-hosted runners in your own AWS account | Tenki is fully managed on owned compute — no cloud account, no per-minute cloud bill |
Where Tenki fits: the drop-in plus owned metal quadrant. Everyone else is either the incumbent (GitHub) or a VC-funded reseller of someone else's cloud. Tenki is the only one that pairs the lowest-friction migration with a structural cost advantage from a profitable parent's captive compute.
AI PR review is the loudest, most-adopted category of the three. CodeRabbit owns adoption mindshare; the model vendors are bundling review into their agents. Tenki's Code Reviewer leads recall at 68.9% — roughly 2× the next best — but trails on precision (29.9%). The honest gap is the most interesting unsolved problem in the category, and the spine of the Benchmark Series (Initiative 1).
| Player | Position | Tenki's wedge |
|---|---|---|
| CodeRabbit | Adoption-mindshare leader in AI PR review | Reproducible head-to-heads + "merge decisions, not nitpicks" — win on credibility, not marketing spend |
| Greptile | Context-heavy, codebase-aware review | Tenki's recall lead on real OSS bugs; published open harness |
| GitHub Copilot (review) | Bundled into the platform devs already use | Specialist depth + radical honesty on metrics the incumbent won't publish |
| Cursor Bugbot | Review bundled into the highest-density agent IDE | Stand-alone, harness-agnostic, benchmark-backed |
| Graphite | Stacked-PR workflow with AI review attached | Tenki competes on detection quality, not workflow lock-in |
| Devin (review) | Autonomous agent that also reviews | Tenki's review runs on the same substrate as CI + Sandbox — one loop |
| Ellipsis | Automated PR review + fixes | Transparent recall/precision benchmarks; publish where Tenki loses |
Trust, not just recall. The category is full of tools that claim to catch everything and quietly bury you in false positives. Tenki's move is to publish recall and precision — including where it loses — on bugs that aren't in its own benchmark repos. Honesty is literally a Luxor value ("honesty over kindness"), and here it's also the only durable differentiation in a field racing to the same demo.
Sandbox competes directly with the agent-runtime pure-plays. Underneath, the whole industry is a compute-supply problem — which is exactly the layer Luxor was built for. Tenki is the only player that owns both the consumer-facing sandbox and the metal it runs on.
"Where does untrusted LLM-generated code actually run?" is the single most important infrastructure question for autonomous coding agents — everything else (inference, orchestration, memory, observability) assumes the sandbox layer is correct, fast, and safe. The market stacks into four layers, and which layer you operate at sets your tradeoffs. Tenki Sandbox plays in Layer B — the agent-sandbox platform layer where almost every "build-your-own-harness" buying decision happens.
Firecracker, gVisor, Kata Containers, libkrun. The underlying isolation technology — hyperscaler-dominated, stable, open source. Everyone above builds on these. Tenki builds on Firecracker.
E2B, Contree, Daytona, Modal, Sprites.dev, Blaxel, Runloop, Northflank (and Tenki). Managed services built on Layer A. Where teams building their own agent harness shop for a runtime.
Cursor background agents, Devin workspaces, Copilot Workspace, Replit Agent. Sandboxes shipped as a feature of a broader agent product — rarely bought standalone.
Claude Managed Agents (Anthropic, launched Apr 2026). Collapses model + sandbox + harness + state into one API. For standard agents, this can eliminate the Layer-B buying decision entirely.
The purpose-built sandboxes designed specifically to run AI-generated code — the direct competitive set for Tenki Sandbox. All public; framed factually. The Tenki row is highlighted.
| Vendor | Isolation | Persistence | Cold start | GPU | Strength |
|---|---|---|---|---|---|
| Tenki Layer B | Firecracker microVM (per task) | Persistent volumes + snapshot/restore | <100ms boot | No public GPU | Sandbox-first brand + native ADE desktop app + integrated runners→review loop + owned-compute price |
| E2B | Firecracker microVM | Ephemeral / pause (beta) | ~150ms | No | SDK-first, SOC 2, 200M+ sandboxes created; the mindshare incumbent |
| Contree | microVM (on Nebius) | Git-like branching + snapshots | Sub-second | Yes | Git-style fork/rollback, MCP-native, ships 7,000+ SWE-bench environments as tags |
| Daytona | Docker containers | Stateful, unlimited | ~90ms | Yes | "Fastest creation," GPU support, Computer-Use desktops |
| Modal | gVisor sandbox | Snapshots | Sub-second | Yes (A100/H100) | 50K+ concurrency, full GPU economics, SOC 2 |
| Sprites.dev | Firecracker microVM | Indefinite + hibernate | Hibernate ~300ms | No | Zero idle cost; per-second billing for always-on agents |
| Blaxel | Firecracker microVM | Standby snapshots + hibernate | ~25ms resume | No | Agents co-located with sandboxes; SOC 2 / HIPAA / ISO 27001; YC X25 ($7.3M seed) |
| Runloop | Custom hypervisor | Snapshots | Sub-second | No | SWE-bench focus, 10K+ parallel, VPC deploy, SOC 2 |
| Northflank | microVM / gVisor | Stateful | Sub-second | Yes (H100s) | Enterprise VPC (AWS/GCP/Azure), multi-cloud, SOC 2 |
| Vercel Sandbox | Firecracker microVM | Snapshot resume + hibernate | Sub-second | No | Part of the Vercel Agents stack; dev-server port exposure |
| CodeSandbox SDK | microVMs | Forking / snapshots | Sub-second | No | SOC 2; owned by Together AI |
| Microsandbox OSS | libkrun microVM | Stateful | Sub-second | No | Self-hosted, network-layer secret injection, YC |
| OpenSandbox OSS | Docker / K8s | Stateful | Sub-second | No | Protocol-driven K8s runtime, Alibaba-backed |
| Cube Sandbox OSS | Docker / microVM | Stateful + snapshots | Sub-second | No | Tencent Cloud's production sandbox stack, open-sourced; one-click deploy |
| SmolVM OSS | Firecracker / Hypervisor.framework | Pause / resume | Sub-200ms | No | Single-executable microVM, Mac/Linux dev parity, pre-installed Claude Code / Codex |
Where Tenki fits — and wins — in Layer B. Tenki sits squarely in the agent-sandbox platform layer, and Firecracker is the security sweet spot for untrusted LLM-generated code: hardware-level isolation (a dedicated kernel per sandbox, hypervisor-exploit-only attack surface) with container-like startup speed and cost. That's the right default for code an agent wrote and no human reviewed. The differentiators against the pure-plays:
The Layer D threat — and the response. The biggest disruption isn't another pure-play; it's the model providers moving down the stack. Claude Managed Agents (launched April 2026) bundles the agent loop, tool execution, the sandbox container, and state persistence behind one REST API — collapsing what used to be a stack (model + E2B sandbox + a harness + a memory store) into a single call. For teams building a standard coding or task agent, that can eliminate the Layer-B buying decision entirely: you get the sandbox bundled with your inference. Tenki's answer is to specialize where Layer D is weakest — long-running and self-improving agents, the integrated runners→review→sandbox loop, and owned-compute price — and to stay MCP-native and harness-agnostic so Tenki is the portable, controllable runtime for teams that refuse to hand their harness and their data to a single model vendor.
Tenki's dev products are the consumer wedge; Luxor's owned compute is the supply. The /compute GPU marketplace and /hardware are the consumer-facing edge of Luxor's "compute as a commodity" thesis, sitting alongside the neoclouds that supply the broader AI build-out.
Luxor's GPU marketplace and hardware storefront — the supply side made tradable. The economics that let Runners and Sandbox undercut VC-funded rivals on owned metal.
The flagship GPU neocloud. Context for where AI compute supply concentrates — and what Luxor's owned-metal thesis competes with on economics.
GPU cloud for AI training/inference. Part of the neocloud supply landscape Tenki's economics narrative references.
Full-stack AI cloud / neocloud. Supply-side comparable for the "compute is the new digital oil" story. (Disclosed: opencolin currently runs developer demand-gen for Nebius; wound down on engagement — see Overview.)
The convergence thesis — Tenki's unique position. Runners, Code Reviewer, and Sandbox are secretly one product: all three run on the same Firecracker microVM substrate, on compute Luxor owns. In the agent era they collapse into a single loop — where an autonomous agent writes code, runs it in CI, and gets it reviewed, all in per-job isolation, on metal with a structural cost advantage. The CI pure-plays don't have review or sandbox. The review pure-plays don't have CI or compute. The sandbox pure-plays rent their metal. No competitor owns the whole loop and the substrate beneath it — that is the entire ecosystem argument, and the spine of every benchmark, reference architecture, and partner pitch.
The narrative that ties every post, talk, benchmark, and partnership together — and the reason DevRel is the lever, not more engineering hours.
Every coding agent needs isolated compute. Own the harness references (Claude Code + Codex today) and Sandbox becomes the default runtime.
Drop-in GitHub Actions replacement on owned metal — the cash wedge that funds the agent-era bet and seeds the multi-product North Star.
The recall lead plus radical honesty on precision is the trust engine that converts trials to paid usage across all three products.
Luxor's owned metal turns every per-play competitor's retail-cloud margin into Tenki's structural advantage. The agent era runs on compute you can trust.