Seedance 2.0: Volcengine’s Multimodal Video Generation API Arrives in February

by Rachel Kim – Technology Editor

Volcengine has launched Seedance 2.0 via its Ark Experience Center, with an API release scheduled for the latter half of February. The new version of the “Doubao” video generation model transitions to a multimodal audio-video generation architecture, accepting four input types: text, image, audio, and video.

The company promises a significant quality improvement over version 1.5, particularly in scenes with high dynamic range and complex interactions, with enhanced physical fidelity and more controllable execution. According to Volcengine, Seedance 2.0 can process up to nine images, three videos, and three audio tracks as references, for outputs of up to 15 seconds in length.

Image references allow for precise control over subject, elements, and décor, extending to the reproduction of composition and character details. Video references replicate camera angles, camera movements, complex actions, and sound effects, transforming existing assets – product shots, commercials, brand music, characters – into coherent starting materials. The engine likewise accepts mixed combinations, such as combining environment and character images with a song generated by a music model to produce a music video, with precise synchronization of vocals and percussion to the visuals.

Seedance 2.0 also introduces editing capabilities, including targeted modification of a shot, role, gesture, or narrative segment, and a video extension function to seamlessly chain shots according to user instructions. Volcengine states the visual rendering gains realism, with trajectories conforming to the laws of physics and improved performance in multi-subject scenarios. Examples cited include sports sequences where kinematics respect gravity, inertia, and biomechanics.

The model is also reported to be more disciplined with longer prompts, maintaining subject identity and capable of planning a shot grammar autonomously, executing complex scripts – styles, effects, camera movements, temporal sequences – such as a long take in an ink style, from takeoff to landing after flying over clouds and skimming across the water.

Volcengine positions this iteration as a means of reducing costs and timelines for professional audio-video production, partially replacing complex effects, and shoots. Targeted sectors range from e-commerce to advertising, to film/TV creation, short-form content, and online education. According to a December 19, 2025 report by 36kr, Volcengine’s Doubao flagship model 1.8 and Seedance 1.5 pro were key products launched at the Force Conference, with the daily average token usage of the Doubao large model exceeding 50 trillion.

Seedance 2.0 is currently accessible on the Ark Experience Center, with an integrated media library and model prompts to accelerate adoption. The API is slated for online launch at the end of February, with pricing yet to be announced.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.