The AI Power Struggle | Artifocial Blog

TL;DR: The week OpenAI rushed to fill Anthropic's Pentagon seat, shipped two models in four days, Alibaba's 9B model embarrassed 120B rivals, and your phone learned to order dinner.

Watch our video: on Youtube or Rumble

The Big Story: The Pentagon-AI Triangle

The most consequential AI story of the week wasn't a model release — it was the fallout from Anthropic's refusal to grant unrestricted military access to Claude.

The backstory: on February 27, the Trump administration banned Anthropic from federal use and the Pentagon designated the company a "supply chain risk" after Anthropic held firm on its red lines — no autonomous weapons, no domestic mass surveillance. The very next day, OpenAI announced it had signed a contract to deploy its models on the Pentagon's classified networks.

The optics were brutal. By March 3, CEO Sam Altman admitted the deal "looked opportunistic and sloppy" and announced amendments adding explicit language that OpenAI systems would not be used for domestic surveillance of U.S. persons. The Pentagon also confirmed intelligence agencies like the NSA would require a separate contract modification to access OpenAI's services. Anthropic CEO Dario Amodei went further, calling OpenAI's public messaging around the deal "straight up lies."

Internally, OpenAI staff weren't happy either. CNN reported that many employees "really respect" Anthropic for standing up to the Pentagon and are frustrated with their own company's handling of the situation. Meanwhile, over 100 Google DeepMind employees signed an internal letter to chief scientist Jeff Dean opposing military applications of Gemini for surveillance or autonomous weapons.

By March 5, Anthropic was back at the negotiating table with the Pentagon in what the Financial Times described as a last-ditch effort to reach acceptable terms. The consumer response was unexpected: Anthropic surged to the top of Apple's App Store download charts, its refusal to comply becoming a trust signal that drove adoption.

The episode crystallized a tension the industry can no longer avoid. As the Center for American Progress argued, the Pentagon's rapid pivot from one AI provider to another — with hastily written contracts and retroactive amendments — demonstrates the need for Congressional oversight of military AI procurement.

Why it matters for practitioners: Enterprise adoption decisions increasingly hinge on a provider's policy posture, not just benchmark scores. OpenAI's rushed contract and subsequent walkback is a case study in how defense partnerships create reputational risk. If your organization is evaluating frontier models, acceptable use policies — and how they hold up under government pressure — are now a procurement consideration.

OpenAI Ships GPT-5.3 Instant and GPT-5.4 in One Week

OpenAI had a prolific week. GPT-5.3 Instant landed on March 3, followed by GPT-5.4 on March 5 — two model releases in four days.

GPT-5.3 Instant addresses two persistent ChatGPT complaints: hallucination (reduced up to 26.8%) and the tendency to lecture before answering. It's positioned as a faster, more direct conversational model.

GPT-5.4 is the bigger release. It unifies reasoning, coding (inheriting GPT-5.3-Codex capabilities), and agentic workflows into a single frontier model. Key specs: up to 1 million token context window via the API (the largest OpenAI has offered), 33% fewer factual errors per claim compared to GPT-5.2, and improved handling of professional workflows involving documents, spreadsheets, and presentations. GPT-5.4 Thinking replaced GPT-5.2 Thinking for Plus, Team, and Pro subscribers.

The rapid cadence — two models in one week — signals OpenAI is under competitive pressure. As Gizmodo noted, this comes at a moment when Anthropic's Claude and Google's Gemini have been gaining ground in enterprise adoption and developer mindshare.

Why it matters for practitioners: The 1M token context window in GPT-5.4's API is the headline feature for engineering teams. If you're working on long-document analysis, large codebase understanding, or retrieval-augmented generation over big corpora, this substantially expands what's possible in a single API call. The hallucination reduction in GPT-5.3 Instant also matters for production deployments where factual reliability is critical.

Qwen 3.5: The Open-Source Model That Keeps Punching Up

Alibaba's Qwen team had a landmark week. The Qwen 3.5 Small Model Series — four variants at 0.8B, 2B, 4B, and 9B parameters — launched on March 1-2, and the benchmarks are turning heads.

The headline result: Qwen3.5-9B outperforms OpenAI's GPT-OSS-120B (a model 13x its size) on multiple benchmarks, including GPQA Diamond (81.7 vs. 71.5), HMMT Feb 2025 (83.2 vs. 76.7), and MMMU-Pro (70.1 vs. 59.7). On video understanding (Video-MME), the 9B scored 84.5, significantly ahead of Gemini 2.5 Flash-Lite's 74.6. The 2B model runs on any recent iPhone in airplane mode, processing both text and images.

What makes this architecturally significant is the move to native multimodality. Starting from the 4B variant, Qwen 3.5 incorporates vision and language tokens within the same latent space from early training stages — not bolted on via adapters after the fact. The result is measurably better spatial reasoning, OCR accuracy, and visual-grounded responses. The underlying architecture combines Gated Delta Networks with sparse Mixture-of-Experts (MoE), a hybrid that's proving efficient across scales.

This follows the flagship Qwen 3.5 release from mid-February: a 397B-parameter MoE model with just 17B active parameters per forward pass, supporting up to 1M tokens of context. The full model targets native multimodal agentic workflows — not just chat, but browsing, planning, and tool use. The agentic capabilities are concrete: Qwen3.5-9B scores 66.1 on BFCL-V4 (function calling) and 41.8 on OSWorld-Verified (desktop automation).

Why it matters for practitioners: The Qwen 3.5 series is particularly relevant to our work — we use Qwen2.5-3B-Instruct in our self-play notebooks. The jump from Qwen 2.5 to 3.5 represents a generational leap in architecture (native multimodality, MoE, Gated Delta Networks). For anyone running local inference or building on open-source models, the 9B's performance-per-parameter ratio reshapes the calculus of what's possible without cloud APIs. The models are available on Hugging Face and ModelScope under permissive licenses.

Gemini Goes Agentic on Android

Google's March Pixel Drop brought something genuinely new: Gemini can now execute multi-step tasks on your behalf inside real apps.

In "agentic mode," Gemini doesn't just answer questions — it takes actions. It can order food on DoorDash, book an Uber, place grocery orders on Kroger or Walmart, all by navigating the apps autonomously. The AI agent operates in a virtual environment (not accessing raw device data) and requires your explicit confirmation before any payment or commitment.

The rollout is limited: U.S. and South Korea only, restricted to Pixel 10 and Samsung Galaxy S26 (which Samsung is billing as the first "agentic AI phone" at launch on March 11). The app ecosystem is also narrow at launch — a handful of food delivery, ride-sharing, and grocery services.

But the architecture matters more than the initial scope. This is the first mainstream deployment of an AI agent that interacts with third-party apps through their actual interfaces, not through pre-built API integrations. It's the difference between a chatbot that can call an API and an agent that can use an app the way you would.

Why it matters for practitioners: If you're building mobile apps or services, the agentic model changes the interface contract. Your app may increasingly be "used" by an AI agent rather than a human, which has implications for UI design, authentication flows, and rate limiting. Start thinking about how your product behaves when the user is an LLM.

MWC Barcelona: The Agentic Stack Goes Telecom

MWC 2026 (March 2-5) was dominated by one theme: agentic AI as the organizing principle for everything from network operations to device strategy.

Highlights: NVIDIA announced a 6G coalition with major telecoms (BT, Deutsche Telekom, Ericsson, Nokia, SK Telecom, T-Mobile, and others) committed to building AI-native network infrastructure, releasing an open-source 30B parameter telecom-specific model. Huawei introduced its Agentic BSS platform where AI agents autonomously design and launch service packages. Samsung positioned the Galaxy S26 as a fundamentally agent-oriented device rather than an app-oriented one.

The shift from "AI as feature" to "AI as architecture" is real in the telecom space. Networks that used to require human operators for configuration changes are moving toward autonomous agent-managed operations.

DeepSeek V4: The Model That Hasn't Dropped Yet

The most anticipated release of the week didn't actually happen — but the shadow it cast was significant.

DeepSeek V4 has been confirmed by multiple outlets (Financial Times, Reuters, The Information) as a trillion-parameter MoE model with ~32B active parameters, native multimodal capabilities, 1M token context, and optimization for Huawei Ascend chips. Internal benchmarks reportedly show it outperforming Claude and ChatGPT on long-context coding tasks. It will be released under an open-source license.

As of March 7, V4 still hadn't dropped, despite reports it was planned for the first week of March to coincide with China's Two Sessions parliamentary meetings. The timing is strategic — demonstrating Chinese AI capability on the national stage while reducing dependence on U.S. semiconductors.

Why it matters for practitioners: If the benchmarks hold, V4 could be the strongest open-source model available for long-context coding workflows. The Huawei Ascend optimization is also significant — it proves frontier models can be trained on non-NVIDIA hardware, which has implications for global chip supply and export control policy.

Figma × Codex: Design-to-Code Gets an MCP Bridge

Figma and OpenAI announced a deep integration connecting Figma's design platform with OpenAI Codex through the Model Context Protocol (MCP). The integration lets teams move bidirectionally between design and code: bring Figma designs into Codex for implementation, or turn running UI code back into editable Figma frames.

This came a week after Figma struck a similar partnership with Anthropic for Claude Code integration. The use of MCP as the interoperability layer is notable — it's becoming the standard protocol for connecting AI coding assistants with external tools and data sources.

Why it matters for practitioners: The MCP-based approach means these integrations are not bespoke one-offs. If you're building developer tools, MCP compatibility is increasingly table-stakes for working with frontier AI coding assistants. The bidirectional design↔code workflow also suggests a future where the boundary between "designing" and "building" continues to blur.

By the Numbers

$2.52 trillion — Gartner's forecast for worldwide AI spending in 2026, a 44% year-over-year increase. AI infrastructure alone accounts for $1.37 trillion.
81.7 vs. 71.5 — Qwen3.5-9B vs. GPT-OSS-120B on GPQA Diamond. A 9B model beating a 120B model on graduate-level reasoning.
26.8% — Hallucination reduction in GPT-5.3 Instant compared to its predecessor.
1,000,000 — Token context window in GPT-5.4's API, OpenAI's largest ever.
397B → 17B — Qwen 3.5 flagship total parameters vs. active parameters per forward pass. MoE efficiency in action.
$380 billion — Anthropic's reported valuation after its latest $30 billion raise.
100+ — Google DeepMind employees who signed an internal letter opposing military AI applications.

What to Watch Next Week

DeepSeek V4 release — If it lands, expect benchmark comparisons to flood social media within hours.
Anthropic-Pentagon resolution — Talks have reopened; the outcome will set precedent for the entire industry.
Samsung Galaxy S26 launch (March 11) — The first mass-market "agentic AI phone" hits retail. Early reviews will reveal whether Gemini's on-device agent mode is genuinely useful or a demo feature.
Qwen 3.5 independent benchmarks — The official numbers are remarkable; community reproduction will determine if they hold up across diverse tasks.
Apple's Siri overhaul — Reports suggest the reimagined Siri with Gemini integration is targeted for iOS 26.4, potentially previewed at a spring event.

Stay connected:

📧 Subscribe to our newsletter for updates
📺 Watch our YouTube channel for AI news and tutorials
🐦 Follow us on Twitter for quick updates
🎥 Check us on Rumble for video content