Video edited by TV-LiVE

TV-LiVE Training-Free, Text-Guided Video Editing
via Layer Informed Vitality Exploitation

1KAIST AI   2University of Seoul
Under Review

Brief Introduction

We present TV-LiVE, a Training-free and text-guided Video editing framework via Layer-informed Vitality Exploitation. We empirically identify vital layers within the video generation model that significantly influence the quality of generated outputs. Notably, these layers are closely associated with Rotary Position Embeddings (RoPE). Based on this observation, our method enables both object addition and non-rigid video editing by selectively injecting key and value features from the source model into the corresponding layers of the target model guided by the layer vitality. For object addition, we further identify prominent layers to extract the mask regions corresponding to the newly added target prompt. We found that the extracted masks from the prominent layers faithfully indicate the region to be edited.

Our contributions include:
(1) Propose a training-free, text-guided video editing method for DiT-based video generation model.
(2) Analyze the internal layer properties of DiT-based video generation model.
(3) Outperform recent video editing approaches.

Object Addition

Non-rigid Video Editing

Comparison

Object Addition

Non-rigid Video Editing