Seventh International Workshop on Large Scale Holistic Video Understanding

Holistic Video Understanding is a joint project of the KU Leuven, University of Bonn, KIT, ETH, and the HVU team.

WHAT IS OUR GOAL?

This workshop aims to advance the field of video understanding by fostering discussions around holistic and generalist video foundation models. Building upon the Holistic Video Understanding (HVU) initiative and dataset introduced in 2019, we have successfully organized eight HVU workshops and tutorials at top-tier venues such as CVPR and ICCV, uniting researchers, practitioners, and students from around the world. These efforts have played a central role in moving the community beyond narrow action recognition tasks toward multi-faceted, semantic, and generalist video understanding.With the emergence of large-scale foundation models and video large language models (Video-LLMs), the landscape of video understanding is rapidly evolving. These models enable unified reasoning across spatial, temporal, and multimodal dimensions, yet introduce new challenges in scalability, efficiency, interpretability, and responsible deployment.The HVU Workshop 2025 will provide a platform to explore these frontiers, discussing topics such as multimodal representation learning, long-context reasoning, evaluation of general-purpose video systems, efficient adaptation and scaling laws, and the ethical and societal implications of video AI. Our goal is to bring together a diverse and inclusive community to define the next chapter of holistic, generalist, and responsible video understanding.

ABOUT THE WORKSHOP

In recent years, the rise of large-scale foundation models and video large language models (Video-LLMs) has transformed how we approach video understanding. Instead of task-specific networks, we now see generalist models capable of reasoning over long temporal horizons, aligning multiple modalities (vision, language, and audio), and adapting to a wide range of downstream tasks. These developments open new opportunities but also introduce challenges in scalability, efficiency, evaluation, and responsible deployment. In this edition of the HVU Workshop, we aim to bring together researchers and practitioners to discuss the advances and open questions in this new era of Video Foundation Models. Building on the legacy of holistic video understanding, this workshop focuses on unified architectures, training methodologies, and evaluation paradigms for generalist video models.

TOPICS

Foundation models for video understanding and reasoning
Multimodal and temporal alignment in video-language models
Scaling laws, architectural design, and training strategies for video LLMs
Evaluation and benchmarking of general-purpose video models
Long-context modeling and memory mechanisms for video sequences
Efficient adaptation and transfer learning in large-scale video models
Synthetic data generation and curation for video foundation models
Societal, ethical, and safety implications of video LLMs
Responsible deployment and bias evaluation in multimodal models

WORKSHOP PROGRAM

Room: Don Alberto 3
Monday 1 December 2025 (In-person)

local Time	Description	Speaker
08:50	Opening Remarks
09:00	Invited Speaker 1:	Cees Snoek
09:30	Invited Speaker 2:	Roozbeh Mottaghi
10:00	Coffee Break
10:30	Invited Speaker 3:	Shiry Ginosar
11:00	Invited Speaker 4:	Hilde Kuehne
11:30	Contributed Oral Presentations
12:30	Closing Remarks

Contributed Oral Talks

1. VideoWeave: A Data-Centric Approach for Efficient Video Understanding
2. Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders
3. DynaStride: Dynamic Stride Windowing with MMCoT for Multi-Scene Captioning
4. AdCare-VLM: Towards a Unified and Pre-aligned Latent Representation for Healthcare Video Understanding

CALL FOR PAPER

Prospective authors will be invited to submit a regular paper of previously unpublished work (NeurIPS 2025 paper format) or an extended abstract of a published work. The review process will be double blind. All the submissions will be peer-reviewed by the international program committee. Accepted papers will be presented as posters or contributed talks. Accepted extended abstracts will also be presented at the poster session or presentations.

Workshop

December 1st

Submission

*You can submit papers in two different formats:

We will accept papers that have not been published elsewhere or have been recently published elsewhere including NeurIPS 2025. Accepted papers will appear in NeurIPS proceedings. For submissions of papers, we will follow the Double Blind review process, in that authors do not know the names of the reviewers of their papers, and reviewers do not know the names of the authors. The authors must follow the NeurIPS 2025 submission policy. Papers are limited to eight pages, including figures and tables, in the NeurIPS style. Additional pages containing only cited references are allowed. Supplementary materials and appendices are allowed but will be considered optional for reviewers. Please refer to the NeurIPS 2025 website for more information. Papers that are not properly anonymized, or do not use the template, or have more than eight pages (excluding references) will be rejected without review. The accepted papers must follow the NeurIPS 2025 camera-ready format as per the instructions are given here but limit your paper to 4-8 pages excluding references.
For submissions of papers that have been published or accepted for publication in a recent venue, we will follow the Single Blind review process, in that authors do not know the names of the reviewers of their papers, but reviewers do know the names of the authors. Authors MUST indicate, in the footnote section on the first page of their submission, which venue their papers have been published or will be published. For example, if the paper will appear at NeurIPS 2025, the submission should include a footnote on the first page showing "To appear at NeurIPS 2025".

Submit Your Work

7th International Workshop on Large Scale Holistic Video Understanding: Toward Video Foundation Models

Header Image: Gobierno CDMX, CC0, via Wikimedia Commons

WHAT IS OUR GOAL?

ABOUT THE WORKSHOP

TOPICS

SPEAKERS

WORKSHOP PROGRAM

Contributed Oral Talks

CALL FOR PAPER

Paper Submission Starts*

October 23

Paper Submission Deadline

November 4

Notification of Acceptance

November 5th

Camera Ready Submission

November 15

Workshop

December 1st

Submission

ORGANIZERS

CONTACT