Long-Horizon Action-Conditioned Video Generation: Challenges and Solution. (a) General TI2V is single-shot and open-loop, often causing incomplete actions and hallucinated motions. (b) We propose a closed-loop think-act-reflect framework for iterative planning, generation, and verification. (c) We introduce the ActVideoGen-Dataset and Benchmark for task-specific experiments. (d) Our closed-loop design enables self-evolving, continually improving video generation quality.
SPIRAL Overview. (a) Closed-Loop Framework: PlanAgent decomposes abstract goals into atomic plans for action-conditioned video generation, CriticAgent evaluates videos and triggers dual-level inner/outer refinement feedback. (b) Self-Evolving via GRPO: guided by PlanAgent, VideoGenerator produces rollouts and is optimized using CriticAgent rewards.
Decomposes a high-level goal and visual context into ordered, object-centric action plans with explicit pre-conditions and post-conditions for each generation step.
Synthesizes each video segment from the current sub-action and accumulated context, enabling long-horizon generation through step-wise controllable execution.
Evaluates action-video alignment, detects local failures or global drift, and returns feedback that triggers refinement, regeneration, or replanning.
A side-by-side view of the agentic execution process and the final long-horizon video.
End-to-end pipeline of SPIRAL. Given a user goal, PlanAgent decomposes the task into step-wise actions, VideoGenerator synthesizes each segment, and CriticAgent verifies alignment before final long-horizon composition.
A side-by-side view of CriticAgent-triggered local refinement and the corrected video result.
Closed-loop feedback refinement. CriticAgent detects local failures, SPIRAL refines the action instruction, regenerates a corrected segment, and continues the procedure without propagating errors.
Step 1: Open the Back Cover→Step 2: Insert the RAM (Physical Violation)
Step 1: Open the Back Cover→Step 2: Insert the RAM
Step 1: Remove the Gas Cap→Step 2: Insert the Fuel Nozzle (Missing Action)
Step 1: Remove the Gas Cap→Step 2: Insert the Fuel Nozzle
Step 1: Rinse with Water→Step 2: Dry with a Towel (Incomplete Action)
Step 1: Rinse with Water→Step 2: Dry with a Towel
Step 1: Wash the Onion→Step 2: Wash the Green Pepper (Sudden Switch)
Step 1: Wash the Onion→Step 2: Wash the Green Pepper
Step 1: Pour in Hot Tea (Physical Violation)→Step 2: Pour in Milk (Sudden Switch)
Step 1: Pour in Hot Tea→Step 2: Pour in Milk
Step 1: Show a Blank Piece of Paper→Step 2: Fold the Paper to Produce Money (Sudden Switch)
Step 1: Show a Blank Piece of Paper→Step 2: Fold the Paper to Produce Money
Step 1: Pull the Safety Pin (Physical Violation)→Step 2: Spray the fire
Step 1: Pull the Safety Pin→Step 2: Spray the fire
Step 1: Spray Cleaner→Step 2: Wipe with a Cloth (Physical Violation)
Step 1: Spray Cleaner→Step 2: Wipe with a Cloth
Step 1: Open the Refrigerator→Step 2: Take Out Tomatoes→Step 3: Wash the Tomatoes (Physical Violation)→Step 4: Cut the Tomatoes (Physical Violation)→Step 5: Seal in a Bag (Incomplete Action)
Step 1: Open the Refrigerator→Step 2: Take Out Tomatoes→Step 3: Close the Right Door→Step 4: Close the Left Door→Step 5: Place on the Cutting Board→Step 6: Wash the Tomatoes→Step 7: Cut the Tomatoes→Step 8: Seal in a Bag
Step 1: Slice Tomatoes & Cucumbers (Missing Actions)→Step 2: Place Tomatoes & Cucumbers in Bowl (Incomplete Action)→Step 3: Get and Pour Salad Dressing (Physical Violation)
Step 1: Slice Cucumbers→Step 2: Place Cucumbers in Plate→Step 3: Slice Tomatoes→Step 4: Get Salad Bowl→Step 5: Place Tomatoes in Bowl→Step 6: Place Cucumbers in Bowl→Step 7: Get Salad Dressing→Step 8: Pour the Dressing→Step 9: Toss with Spoon