OpenAI launched a research preview of GPT-5.3-Codex-Spark on Thursday, a new coding model engineered for significantly faster response times and powered by hardware from Cerebras Systems. The release marks OpenAI’s first major foray into utilizing chip infrastructure outside of its long-standing relationship with Nvidia, a move signaling a potential shift in the landscape of AI model deployment.
Codex-Spark is designed for “real-time software development where responsiveness matters as much as intelligence,” according to OpenAI. The model achieves speeds exceeding 1,000 tokens per second on ultra-low latency hardware, enabling near-instant feedback during live coding sessions. This speed is intended to address a key limitation of current AI coding agents – the delays that can disrupt a developer’s workflow.
The partnership with Cerebras, announced in January, sees OpenAI leveraging the Wafer Scale Engine 3, a large-scale processor developed by Cerebras. While OpenAI maintains that GPUs “remain foundational” for its training and broader inference needs, the company emphasized that Cerebras’ technology excels in scenarios demanding extremely low latency. “Cerebras complements that foundation by excelling at workflows that demand extremely low latency, tightening the end-to-end loop so utilize cases such as real-time coding in Codex feel more responsive as you iterate,” an OpenAI spokesperson stated.
The move to diversify chip suppliers comes amid a complex relationship with Nvidia. Reuters reported that OpenAI had expressed dissatisfaction with the speed of some Nvidia chips for inference tasks, a workload ideally suited for Cerebras’ specialized hardware. While a planned $100 billion infrastructure deal with Nvidia has yet to materialize, Nvidia has since committed to a $20 billion investment.
Codex-Spark represents a smaller, optimized version of OpenAI’s Codex model, prioritizing speed over sheer capability. According to OpenAI, it produces more capable responses than GPT-5.1-Codex-mini while completing tasks in a fraction of the time. The model is particularly adept at making precise edits, revising plans and answering contextual questions about existing codebases.
The release of Codex-Spark follows a period of rapid iteration within OpenAI’s coding agent development. The company released GPT-5.2 in December, spurred by internal pressure from CEO Sam Altman to address competitive challenges from Google and Anthropic. The competitive landscape has seen a surge in capable coding agents, with Anthropic’s Claude Code also gaining prominence.
While Cerebras has demonstrated even faster token processing speeds on other models – reaching 2,100 tokens per second on Llama 3.1 70B and 3,000 tokens per second on OpenAI’s gpt-oss-120B – the comparatively lower speed of Codex-Spark may reflect the complexity of the model itself.
OpenAI also recently announced a $38 billion cloud computing agreement with Amazon in November 2025 and signed a multi-year deal with AMD in October 2025, further demonstrating its commitment to diversifying its infrastructure partnerships.