Fifth International Workshop on Large Scale Holistic Video Understanding

Holistic Video Understanding is a joint project of the KU Leuven, University of Bonn, KIT, ETH, and the HVU team.

ABOUT THE WORKSHOP

In recent years, the ability of computer systems to classify and analyze online videos has greatly improved. Significant advancements have been made in specific video recognition tasks, such as action and scene recognition. However, the comprehensive understanding of videos, known as holistic video understanding (HVU), has not received the attention it deserves. Current video understanding systems are specialized, focusing on narrow tasks. For real-world applications like video search engines, media monitoring systems, and defining a humanoid robot's environment, integrating state-of-the-art methods is essential. To address this need, we are hosting a workshop focused on HVU. This workshop will cover recognizing scenes, objects, actions, attributes, and events in real-world videos. We are introducing our HVU dataset, organized hierarchically with a semantic taxonomy for holistic video understanding. While many existing datasets focus on human action or sport recognition, our new dataset aims to broaden the scope and draw attention to the potential for more comprehensive video understanding solutions. Our workshop will gather ideas related to multi-label and multi-task recognition in real-world videos, using our dataset to test and showcase research efforts.

WHAT IS OUR GOAL?

The primary goal of this workshop is to create a comprehensive video benchmark that integrates the recognition of all semantic concepts. Single class labels per task often fall short in capturing the full content of a video. Engaging with the world’s leading experts on this issue will provide invaluable insights and ideas for all participants. We also invite the community to contribute to the expansion of the HVU dataset, which will drive research in video understanding as a multifaceted problem. As organizers, we look forward to receiving constructive feedback from users and the community on how to enhance the benchmark.

TOPICS

Large scale video understanding
Multi-Modal learning from videos
Multi concept recognition from videos
Multi task deep neural networks for videos
Learning holistic representation from videos
Weakly supervised learning from web videos
Object, scene and event recognition from videos
Unsupervised video visual representation learning
Unsupervised and self-supervised learning with videos

WORKSHOP PROGRAM

Room: Summit 429
17, June, 2024 (AM - in-person)

PDT	Description	Speaker	Title
08:35	Opening Remarks
08:40	Invited Speaker 1:	Angela Yao	Part I VideoQA in the Era of LLMs
09:15	Invited Speaker 2:	Lu Yuan	Revolutionizing Computer Vision: The Power of Small Vision Language Foundation Models
09:50	Invited Speaker 3:	Cees Snoek	Learning to Generalize in Video Space and Time
10:25	Invited Speaker 4:	Yale Song	Procedural Activity Understanding
11:00	Oral Presentations	Details Below
12:00	Closing Remarks

Papers

1. From Video Generation to Embodied AI; Ruoshi Liu, Carl Vondrick
2. MoReVQA: Exploring Modular Reasoning Models for Video Question Answering; Juhong Min · Shyamal Buch · Arsha Nagrani · Minsu Cho · Cordelia Schmid
3. Learning from One Continuous Video Stream; Joao Carreira · Michael King · Viorica Patraucean · Dilara Gokay · Catalin Ionescu · Yi Yang · Daniel Zoran · Joseph Heyward · Carl Doersch · Yusuf Aytar · Dima Damen · Andrew Zisserman
4. PEEKABOO: Interactive Video Generation via Masked-Diffusion; Yash Jain · Anshul Nasery · Vibhav Vineet · Harkirat Behl
5. Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language; Mark Hamilton · Andrew Zisserman · John Hershey · William Freeman
6. MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding; Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim

Fifth International Workshop on Large Scale Holistic Video Understanding

ABOUT THE WORKSHOP

WHAT IS OUR GOAL?

TOPICS

SPEAKERS

WORKSHOP PROGRAM

Papers

ORGANIZERS

CONTACT