← 返回首页
Bernini: Latent Semantic Planning for Video Diffusion

MLLM Planner · DiT Renderer · Unified Video Generation and Editing

Bernini Latent Semantic Planning for Video Diffusion

Bernini Team, ByteDance

Bernini is a unified framework for video generation and editing with self-supervised vision-text reasoning. It combines an MLLM-based semantic planner with a DiT-based renderer.

Coming Soon

We are preparing the demo videos and more visual results.

© 2026 Bernini Team, ByteDance. All rights reserved.

Paper · Code