Nagoya University Unveils J-Moshi: Advanced Japanese AI Voice Assistant Mimicking Human Conversation
Nagoya, Japan – Researchers at Nagoya University have announced the growth of J-Moshi, a elegant AI voice assistant designed too replicate natural Japanese human speech. This groundbreaking system is an enhancement of Moshi, an existing spoken dialog system created by Kyotoi, an open-source AI research institute. Moshi itself is recognized for its ability to express emotions in real-time.
The development of J-Moshi took approximately four months adn involved training on extensive Japanese speech datasets. The AI system is noted for its remarkable ability to perfectly mimic human speech patterns, capturing the natural flow of Japanese conversation. A key feature highlighted is its capacity to accurately reproduce “backchannel” responses, which are common interjections made by Japanese speakers during dialogue. A demonstration of J-Moshi’s audio output is available for review, showcasing its natural sound.
A important portion of the training data for J-Moshi included the J-CHAT dataset, a Japanese dialogue dataset compiled and released by the University of Tokyo. This dataset comprises around 67,000 hours of audio data sourced from podcasts and YouTube. The research team also incorporated smaller, high-quality dialogue datasets, including audio recorded in laboratory settings and archival recordings from 20 to 30 years ago. To further augment the training data, a speech synthesis program was utilized to convert chat conversations into artificial voices for the AI’s learning process.
J-Moshi is also accessible on the Hugging Face platform, wiht the model available under the identifier nu-dialogue/j-moshi-ext.
The researchers point out that the availability of Japanese speech data is comparatively limited when contrasted with English speech data. This scarcity has historically posed challenges for adapting conventional voice dialogue systems to specific industries. However, the Nagoya University team believes that J-Moshi’s capabilities could be commercially valuable in sectors such as Japanese call centers, healthcare, and customer service.
The research team is led by Professor Ryuichiro Higashinaka, who brings extensive experience from his 19-year tenure as a corporate researcher at NTT. During his time at NTT, Professor Higashinaka contributed to the development of consumer dialogue systems and voice agents, including a project focused on a question-and-answer function for the voice agent service ‘Shabette Concierge’.