MLLM Planner · DiT Renderer · Unified Video Generation and Editing
Bernini Team, ByteDance
Bernini is a unified framework for video generation and editing with
self-supervised vision-text reasoning. It combines an MLLM-based
semantic planner with a DiT-based renderer.