Introduction: The End of Turn-Based AI

Thinking Machines Lab has unveiled a research preview of TML-Interaction-Small, a 276B parameter Mixture-of-Experts model that processes audio, video, and text in 200ms chunks simultaneously. This is not an incremental improvement—it is a structural break from every major voice assistant on the market. The architecture eliminates the need for external voice-activity detection (VAD) and runs two parallel streams: a real-time interaction model for continuous full-duplex exchange and an asynchronous background model for sustained reasoning and tool use. The result is an AI that listens, thinks, and acts without pausing—a native multimodal collaborator rather than a query-response machine.

Strategic Analysis: Why This Matters Now

The Architectural Advantage

Standard AI assistants operate in turns: user speaks, model processes, model responds. This creates latency, interrupts flow, and limits complex task execution. TML-Interaction-Small’s multi-stream, time-aligned micro-turn architecture processes 200ms chunks of audio, video, and text simultaneously. The real-time interaction model maintains full-duplex exchange while the background model handles reasoning and tool use, sharing full conversation context. This eliminates the cognitive bottleneck of turn-taking, enabling fluid human-AI collaboration.