Video edited by TV-LiVE

TV-LiVE Training-Free, Text-Guided Video Editing
via Layer Informed Vitality Exploitation

Min-Jung Kim¹, Dongjin Kim¹, Seokju Yun², Jaegul Choo¹,

¹KAIST AI ²University of Seoul

Brief Introduction

We present TV-LiVE, a Training-free and text-guided Video editing framework via Layer-informed Vitality Exploitation. We empirically identify vital layers within the video generation model that significantly influence the quality of generated outputs. Notably, these layers are closely associated with Rotary Position Embeddings (RoPE). Based on this observation, our method enables both object addition and non-rigid video editing by selectively injecting key and value features from the source model into the corresponding layers of the target model guided by the layer vitality. For object addition, we further identify prominent layers to extract the mask regions corresponding to the newly added target prompt. We found that the extracted masks from the prominent layers faithfully indicate the region to be edited.

Our contributions include:
(1) Propose a training-free, text-guided video editing method for DiT-based video generation model.
(2) Analyze the internal layer properties of DiT-based video generation model.
(3) Outperform recent video editing approaches.

Object Addition

Non-rigid Video Editing

Comparison

Object Addition

Source (Original) : A person standing still in a snowy landscape

Target (Edit) : A person standing still in a snowy landscape holding a snowboard

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe

Source (Original) : A side view of a dog sitting on the beach

Target (Edit) : A side view of a dog wearing sunglasses sitting on the beach

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe

Source (Original) : A side view of a child painting at an easel

Target (Edit) : A side view of a child painting at an easel wearing a colorful apron

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe

Source (Original) : A turtle is walking on the sand

Target (Edit) : A turtle with a leaf on its back is walking on the sand

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe

Source (Original) : A child is reading under a tree

Target (Edit) : A child with a flashlight is reading under a tree

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe

Non-rigid Video Editing

Source (Original) : A horse standing still in a meadow

Target (Edit) : A horse trotting across the field

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe

Source (Original) : A man standing under a tree with falling leaves

Target (Edit) : A man kneeling to pick up a leaf from the ground

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe

Source (Original) : A dog running in the rain

Target (Edit) : A dog shaking off the rain

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe

Source (Original) : A skier paused on a mountain slope during a light snowfall

Target (Edit) : A skier adjusting their goggles on the snowy mountain slope

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe

Source (Original) : A lamb resting in a meadow

Target (Edit) : A lamb resting in a meadow while rolling over onto its side

Original

TV-LiVE

CogV2V

CogInv

BIVDiff

RAVE

VidToMe