Sixth International Workshop on Large Scale Holistic Video Understanding

Holistic Video Understanding is a joint project of the KU Leuven, University of Bonn, KIT, ETH, and the HVU team.

ABOUT THE WORKSHOP

In recent years, the ability of computer systems to classify and analyze online videos has greatly improved. Significant advancements have been made in specific video recognition tasks, such as action and scene recognition. However, the comprehensive understanding of videos, known as holistic video understanding (HVU), has not received the attention it deserves. Current video understanding systems are specialized, focusing on narrow tasks. For real-world applications like video search engines, media monitoring systems, and defining a humanoid robot's environment, integrating state-of-the-art methods is essential. To address this need, we are hosting a workshop focused on HVU. This workshop will cover recognizing scenes, objects, actions, attributes, and events in real-world videos. We are introducing our HVU dataset, organized hierarchically with a semantic taxonomy for holistic video understanding. While many existing datasets focus on human action or sport recognition, our new dataset aims to broaden the scope and draw attention to the potential for more comprehensive video understanding solutions. Our workshop will gather ideas related to multi-label and multi-task recognition in real-world videos, using our dataset to test and showcase research efforts.

WHAT IS OUR GOAL?

The primary goal of this workshop is to create a comprehensive video benchmark that integrates the recognition of all semantic concepts. Single class labels per task often fall short in capturing the full content of a video. Engaging with the world’s leading experts on this issue will provide invaluable insights and ideas for all participants. We also invite the community to contribute to the expansion of the HVU dataset, which will drive research in video understanding as a multifaceted problem. As organizers, we look forward to receiving constructive feedback from users and the community on how to enhance the benchmark.

TOPICS

Large scale video understanding
Multi-Modal learning from videos
Multi concept recognition from videos
Multi task deep neural networks for videos
Learning holistic representation from videos
Weakly supervised learning from web videos
Object, scene and event recognition from videos
Unsupervised video visual representation learning
Unsupervised and self-supervised learning with videos

local Time	Description	Speaker
08:50	Opening Remarks
08:55	Invited Speaker 1:	Daniel Bolya
09:35	Invited Speaker 2:	Umar Iqbal
10:15	Coffee Break
10:30	Invited Speaker 3:	Gül Varol
11:15	Invited Speaker 4:	Dima Damen
12:00	Closing Remarks

Sixth International Workshop on Large Scale Holistic Video Understanding

ABOUT THE WORKSHOP

WHAT IS OUR GOAL?

TOPICS

SPEAKERS

WORKSHOP PROGRAM

ORGANIZERS

CONTACT