7th International Workshop on Large Scale Holistic Video Understanding: Toward Video Foundation Models


In Conjunction with NeuRIPS 2025
Header Image: Gobierno CDMX, CC0, via Wikimedia Commons

Holistic Video Understanding is a joint project of the KU Leuven, University of Bonn, KIT, ETH, and the HVU team.

WHAT IS OUR GOAL?


This workshop aims to advance the field of video understanding by fostering discussions around holistic and generalist video foundation models. Building upon the Holistic Video Understanding (HVU) initiative and dataset introduced in 2019, we have successfully organized eight HVU workshops and tutorials at top-tier venues such as CVPR and ICCV, uniting researchers, practitioners, and students from around the world. These efforts have played a central role in moving the community beyond narrow action recognition tasks toward multi-faceted, semantic, and generalist video understanding.With the emergence of large-scale foundation models and video large language models (Video-LLMs), the landscape of video understanding is rapidly evolving. These models enable unified reasoning across spatial, temporal, and multimodal dimensions, yet introduce new challenges in scalability, efficiency, interpretability, and responsible deployment.The HVU Workshop 2025 will provide a platform to explore these frontiers, discussing topics such as multimodal representation learning, long-context reasoning, evaluation of general-purpose video systems, efficient adaptation and scaling laws, and the ethical and societal implications of video AI. Our goal is to bring together a diverse and inclusive community to define the next chapter of holistic, generalist, and responsible video understanding.

ABOUT THE WORKSHOP


In recent years, the rise of large-scale foundation models and video large language models (Video-LLMs) has transformed how we approach video understanding. Instead of task-specific networks, we now see generalist models capable of reasoning over long temporal horizons, aligning multiple modalities (vision, language, and audio), and adapting to a wide range of downstream tasks. These developments open new opportunities but also introduce challenges in scalability, efficiency, evaluation, and responsible deployment. In this edition of the HVU Workshop, we aim to bring together researchers and practitioners to discuss the advances and open questions in this new era of Video Foundation Models. Building on the legacy of holistic video understanding, this workshop focuses on unified architectures, training methodologies, and evaluation paradigms for generalist video models.

TOPICS


  • Foundation models for video understanding and reasoning
  • Multimodal and temporal alignment in video-language models
  • Scaling laws, architectural design, and training strategies for video LLMs
  • Evaluation and benchmarking of general-purpose video models
  • Long-context modeling and memory mechanisms for video sequences
  • Efficient adaptation and transfer learning in large-scale video models
  • Synthetic data generation and curation for video foundation models
  • Societal, ethical, and safety implications of video LLMs
  • Responsible deployment and bias evaluation in multimodal models

SPEAKERS


TBA

WORKSHOP PROGRAM


Room: Don Alberto 3
Monday 1 December 2025 (In-person)

   local Time    Description    Speaker   
08:50    Opening Remarks
09:00    Invited Speaker 1:    TBA
09:30    Invited Speaker 2:    TBA
10:00    Poster Session and Coffee Break
10:30    Invited Speaker 3:    TBA
11:00    Invited Speaker 4:    TBA
11:30    Contributed Oral Presentations
12:00    Panel Discussion
12:45    Closing Remarks

CALL FOR PAPER


Prospective authors will be invited to submit a regular paper of previously unpublished work (NeuRIPS 2025 paper format) or an extended abstract of a published work. The review process will be double blind. All the submissions will be peer-reviewed by the international program committee. Accepted papers will be presented as posters or contributed talks. Accepted extended abstracts will also be presented at the poster session or presentations.

Workshop

December 1st

Submission


*You can submit papers in two different formats:

  1. We will accept papers that have not been published elsewhere or have been recently published elsewhere including NeuRIPS 2025. Accepted papers will appear in NeuRIPS proceedings. For submissions of papers, we will follow the Double Blind review process, in that authors do not know the names of the reviewers of their papers, and reviewers do not know the names of the authors. The authors must follow the NeuRIPS 2025 submission policy. Papers are limited to eight pages, including figures and tables, in the NeuRIPS style. Additional pages containing only cited references are allowed. Supplementary materials and appendices are allowed but will be considered optional for reviewers. Please refer to the NeuRIPS 2025 website for more information. Papers that are not properly anonymized, or do not use the template, or have more than eight pages (excluding references) will be rejected without review. The accepted papers must follow the NeuRIPS 2025 camera-ready format as per the instructions are given here but limit your paper to 4-8 pages excluding references.
  2. For submissions of papers that have been published or accepted for publication in a recent venue, we will follow the Single Blind review process, in that authors do not know the names of the reviewers of their papers, but reviewers do know the names of the authors. Authors MUST indicate, in the footnote section on the first page of their submission, which venue their papers have been published or will be published. For example, if the paper will appear at NeuRIPS 2025, the submission should include a footnote on the first page showing "To appear at NeuRIPS 2025".

Submit Your Work

ORGANIZERS


CONTACT