Barcelona, Spain – Xiaomi unveiled its “Local Copilot,” dubbed Xiaomi Miloco, on Monday, marking a significant step toward integrating artificial intelligence directly into the home environment. The system combines advanced language models with multimodal perception and data analysis, enabling devices to interpret and respond intelligently to household activities, according to a statement released by Xiaomi International communications director Angus Ng.
Miloco represents a shift from device-level intelligence to a system-wide intelligence throughout the home, moving beyond isolated functions and screen-based interactions. The project aims to create a more intuitive and connected technological experience for users, extending beyond mobile devices into daily life, work, and the home, Ng explained.
The core of Miloco is built on MiMo-VL-7B, a vision-language backbone with video understanding and instruction-following capabilities. Xiaomi has enhanced this foundation with features designed to recognize everyday activities – including esports, workouts, television viewing, and reading – and interpret common hand gestures such as the V sign, thumbs-up, open palm, OK sign, and the shaka hand sign.
Development involved a two-stage training pipeline. The first stage, Supervised Fine-Tuning (SFT), focused on improving the model’s core capabilities within home scenarios, utilizing chain-of-thought supervision to encourage structured knowledge acquisition and token-budget-aware reasoning for concise responses. The second stage employed GRPO-based reinforcement learning, leveraging the Time-R1 data strategy – research accepted at NeurIPS 2025 – to build efficient training datasets across multiple domains.
According to a technical report released December 22, 2025, Xiaomi has also released both MiMo-VL-Miloco-7B and its quantized variant, MiMo-VL-Miloco-7B-GGUF, alongside a Gradio demo and integration into the open-source Xiaomi Miloco framework. This release provides a foundation for researchers and developers to explore privacy-preserving, on-device multimodal intelligence in real-world smart-home settings.
The system is designed to “keep-it-general,” specializing in home tasks whereas preserving broad understanding and language generation capabilities. Xiaomi aims to enable devices to anticipate needs, such as a robot vacuum cleaner recognizing a spill and initiating cleaning, or lighting systems adjusting based on viewing preferences, all without requiring manual configuration.
As of March 3, 2026, Xiaomi has not announced specific timelines for broader consumer availability of Miloco-integrated devices, nor detailed plans for data privacy protocols beyond the stated focus on privacy-preserving on-device intelligence.