We present TV-LiVE,
a Training-free and text-guided Video editing framework via Layer-informed Vitality Exploitation.
We empirically identify vital layers within the video generation
model that significantly influence the quality of generated outputs. Notably, these
layers are closely associated with Rotary Position Embeddings (RoPE). Based
on this observation, our method enables both object addition and non-rigid video
editing by selectively injecting key and value features from the source model into
the corresponding layers of the target model guided by the layer vitality. For
object addition, we further identify prominent layers to extract the mask regions
corresponding to the newly added target prompt. We found that the extracted masks
from the prominent layers faithfully indicate the region to be edited.
Our contributions include:
(1) Propose a training-free, text-guided video editing method for DiT-based video generation model.
(2) Analyze the internal layer properties of DiT-based video generation model.
(3) Outperform recent video editing approaches.