AI21 Labs Unveils jamba Reasoning 3B, a Remarkably Compact LLM with Extensive Context Window
TEL AVIV, ISRAEL - november 14, 2024 – AI21 Labs today announced Jamba Reasoning 3B, a new large language model (LLM) designed to deliver powerful reasoning capabilities in a remarkably small package. the model, boasting 3 billion parameters, can process a 250,000-token context window – a feat previously unattainable for models of its size - and run directly on standard laptops.
jamba Reasoning 3B’s hybrid architecture contributes to both its speed and reduced memory requirements, lowering computing needs. AI21 Labs testing demonstrated the model can process 35 tokens per second on a standard MacBook Pro. According to AI21 Labs’ Sarel Goshen, the model excels at function calling, policy-grounded generation, and tool routing, making it suitable for tasks like agenda creation from meeting information directly on a device, while more complex reasoning can leverage GPU clusters.
The launch of Jamba Reasoning 3B reflects a growing industry trend toward smaller, more efficient AI models. Meta released its MobileLLM-R1 family of reasoning models – ranging from 140 million to 950 million parameters – in September, designed for math, coding, and scientific reasoning. google’s Gemma, initially released to run on portable devices, has also been expanded. Even established companies like FICO are developing specialized models, having recently launched FICO Focused Language and FICO Focused Sequence for finance-specific applications.
Goshen emphasized that Jamba Reasoning 3B distinguishes itself through its combination of small size and reasoning capability without sacrificing speed. Benchmark testing confirms its performance, with jamba Reasoning 3B outperforming models like Qwen 4B, Meta’s Llama 3.2B-3B, and Microsoft’s Phi-4-Mini on the IFBench and Humanity’s Last Exam tests, though it placed second to Qwen 4 on MMLU-Pro.
Beyond performance, Goshen highlighted the benefits of smaller models for enterprise applications, including increased steerability and enhanced privacy due to on-device inference. “I do believe there’s a world where you can optimize for the needs and the experience of the customer, and the models that will be kept on devices are a large part of it,” he said.