ECCV 2026

Goku: A Million-Scale Universal Dataset and
Benchmark for Instruction-Based Video Editing

Sen Liang^1,2⋆, Cong Wang^2⋆, Zhentao Yu², Fengbin Guan¹, Zhengguang Zhou², Teng Hu², Youliang Zhang², Yuan Zhou², Xin Li¹, Qinglin Lu², Zhibo Chen^1†

¹University of Science and Technology of China ²Tencent Hunyuan

Dataset Part1 Dataset Part2

Important If you notice temporal inconsistency between the original and edited videos, it is usually caused by webpage loading. Please refresh the page and try again. On Windows systems, playback issues may occur due to some unknown issues.

Abstract

Existing instruction-based video editing datasets commonly focus on single-task appearance editing, failing to meet the complex creative demands of real-world scenarios. To bridge this gap, we present Goku, a large-scale dataset featuring 2 million high-quality, instruction-aligned video editing pairs, which is the first to extend task boundaries from basic appearance editing to multi-task and structural manipulations (e.g., precise control of subject movement). To tackle the data synthesis challenges inherent in these complex tasks, we design an efficient data synthesis pipeline that decomposes complex edits into controllable sub-problems and introduce a progressive filtering system for data reliability throughout the whole process. Furthermore, we explore the optimal network structures on Goku, and propose Goku-Edit. To deeply comprehend complex editing instructions, Goku-Edit leverages an MLLM as its text encoder and adopts a decoupled dual-branch design: a dedicated mask branch handles structural control, freeing the main branch for appearance rendering. A comprehensive video editing benchmark, Goku-Bench, is also proposed with 1,000 human-verified test cases and 7 novel editing-specific metrics. Evaluated on Goku-Bench, Goku-Edit obtains up to +8% improvement on other open-source models in terms of instruction following.

Video Editing • Dataset • Benchmark • Instruction Following • Multi-Task Editing

Goku: A Million-Scale Universal Dataset and
Benchmark for Instruction-Based Video Editing

Abstract

Dataset Samples

Additional Model Results

Comparison with Existing Methods

Goku: A Million-Scale Universal Dataset andBenchmark for Instruction-Based Video Editing

Abstract

Dataset Samples

Additional Model Results

Comparison with Existing Methods

Goku: A Million-Scale Universal Dataset and
Benchmark for Instruction-Based Video Editing