First International Workshop on Large Scale Holistic Video Understanding


In Conjunction with ICCV 2019, Seoul, Korea

Holistic Video Understanding is a joint project of the KU Leuven, University of Bonn, KIT, ETH, and the HVU team.

ABOUT THE WORKSHOP


In the last years, we have seen tremendous progress in the capabilities of computer systems to classify video clips taken from the Internet or to analyze human actions in videos. There are lots of works in video recognition field focusing on specific video understanding tasks, such as action recognition, scene understanding, etc. There have been great achievements in such tasks, however, there has not been enough attention toward the holistic video understanding task as a problem to be tackled. Current systems are expert in some specific fields of the general video understanding problem. However, for real-world applications, such as, analyzing multiple concepts of a video for video search engines and media monitoring systems or providing an appropriate definition of the surrounding environment of a humanoid robot, a combination of current state-of-the-art methods should be used. Therefore, in this workshop, we intend to introduce the holistic video understanding as a new challenge for the video understanding efforts. This challenge focuses on the recognition of scenes, objects, actions, attributes, and events in the real world user-generated videos. To be able to address such tasks, we also introduce our new dataset named Holistic Video Understanding~(HVU dataset) that is organized hierarchically in a semantic taxonomy of holistic video understanding. Almost all of the real-world conditioned video datasets are targeting human action or sport recognition. So our new dataset can help the vision community and bring more attention to bring more interesting solutions for holistic video understanding. The workshop is tailored to bringing together ideas around multi-label and multi-task recognition of different semantic concepts in the real world videos. And the research efforts can be tried on our new dataset.

WHAT IS OUR GOAL?


The main objective of the workshop is to establish a video benchmark integrating joint recognition of all the semantic concepts, as a single class label per task is often not sufficient to describe the holistic content of a video. The planned panel discussion with world’s leading experts on this problem will be a fruitful input and source of ideas for all participants. Further, we invite the community to help to extend the HVU dataset that will spur research in video understanding as a comprehensive, multi-faceted problem. We as organizers expect to receive valuable feedback from users and from the community on how to improve the benchmark.

TOPICS


  • Large scale video understanding
  • Multi-Modal learning from videos
  • Multi concept recognition from videos
  • Multi task deep neural networks for videos
  • Learning holistic representation from videos
  • Weakly supervised learning from web videos
  • Object, scene and event recognition from videos
  • Unsupervised video visual representation learning
  • Unsupervised and self-­supervised learning with videos

SPEAKERS


AWARDS


Best Paper Award


CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

Best Poster Award


Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking
Two NVIDIA TITAN RTX GPUs was awarded to the best paper and the best poster.

WORKSHOP PROGRAM


October 27th Half day, PM Seoul, Korea - COEX Convention Center

Time Description Speaker/Paper ID
13:30 Opening remark
13:40 Invited Speaker 1: Juan Carlos Niebles
14:10 Invited Speaker 2: Raquel Urtasun
14:40 Coffee and Posters
15:25 4 Oral Talks 23, 16, 8, 22
16:15 Invited Speaker 3: Du Tran
16:45 Invited Speaker 4: David Ross
17:15 3 Oral Talks: 7, 4, 9
17:50 Conclusion and Awards

Oral Papers


4. Recurrent Convolutions for Causal 3D CNNs; Gurkirt Singh, Fabio Cuzzolin
7. Video Representation Learning by Dense Predictive Coding; Tengda Han, Weidi Xie, Andrew Zisserman
8. Towards Segmenting Anything That Moves; Achal D Dave, Pavel Tokmakov, Deva Ramanan
9. Video-Text Compliance: Activity Verification Based on Natural Language Instructions; Mayoore Jaiswal et al.
16. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips; Antoine Miech et al.
22. Deep Multimodal Feature Encoding for Video Ordering; Vivek Sharma, Makarand Tapaswi, Rainer Stiefelhagen
23. CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning; Rohit Girdhar, Deva Ramanan

Poster Session


-Recurrent Convolutions for Causal 3D CNNs; Gurkirt Singh, Fabio Cuzzolin
-Level Selector Network for Optimizing Accuracy-Specificity Trade-offs; Ahsan Iqbal, Jürgen Gall
-End-to-End Video Captioning; Silvio Olivastri, Gurkirt Singh, Fabio Cuzzolin
-Video Representation Learning by Dense Predictive Coding; Tengda Han, Weidi Xie, Andrew Zisserman
-Towards Segmenting Anything That Moves; Achal D Dave, Pavel Tokmakov, Deva Ramanan
-Video-Text Compliance: Activity Verification Based on Natural Language Instructions; Mayoore Jaiswal, Frank Liu, Anupama Jagannathan, Anne Gattiker, Inseok Hwang, Jinho Lee, Matt Tong, Sahil Dureja, Soham Shah, Peter Hofstee, Valerie Chen, Suvadip Paul, Rogerio Feris
-Interpretable Spatio-temporal Attention for Video Action Recognition; Lili Meng
-Markov Decision Process for Video Generation; Vladyslav Yushchenko, Nikita Araslanov, Stefan Roth
-Next-flow: Hybrid multi-tasking with next-frame prediction to boost optical-flow estimation in the wild; Nima Sedaghat, Mohammadreza Zolfaghari
-Use What You Have: Video retrieval using representations from collaborative experts; Yang Liu, Samuel Albanie, Arsha Nagrani, Andrew Zisserman
-HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips; Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic
-Coupled Recurrent Network (CRN); Lin Sun
-Deep Multimodal Feature Encoding for Video Ordering; Vivek Sharma, Makarand Tapaswi, Rainer Stiefelhagen
-CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning; Rohit Girdhar, Deva Ramanan
-Class-Agnostic Object Tracking with a Focus on the Object; Achal D Dave, Pavel Tokmakov, Cordelia Schmid, Deva Ramanan
-Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking; Alaaeldin M El-Nouby, Shuangfei Zhai, Graham Taylor, Josh Susskind

CALL FOR PAPER


Prospective authors will be invited to submit a regular paper of previously unpublished work (ICCV paper format) or an extended abstract of a published or ongoing work. The review process will be double blind. All the submissions will be peer-reviewed by the international program committee. Accepted papers will be presented as posters or contributed talks and will be considered non-archival and published via the Open Access versions, provided by the Computer Vision Foundation. Accepted extended abstracts will be presented at the poster session.

Best paper and best poster will receive prizes

Submission


*You can submit papers in three different formats:

  1. We will accept papers that have not been published elsewhere or have been recently published elsewhere including ICCV 2019. Accepted papers will appear in ICCV proceedings. Please follow the ICCV 2019 camera-ready format as per the instructions are given here but limit your paper to 4-8 pages excluding references. For submissions of papers, we will follow the Double Blind review process, in that authors do not know the names of the reviewers of their papers, and reviewers do not know the names of the authors. Therefore, the deadline for paper submission is 1st August 2019. Notification to the authors by 15th August 2019.
  2. For submissions of papers that have been published or accepted for publication in a recent venue, we will follow the Single Blind review process, in that authors do not know the names of the reviewers of their papers, but reviewers do know the names of the authors. Authors MUST indicate, in the footnote section on the first page of their submission, which venue their papers have been published or will be published. For example, if the paper will appear at ICCV 2019, the submission should include a footnote on the first page showing "To appear at 2019 IEEE/CVF International Conference on Computer Vision". Therefore, the deadline for paper submission is 1st September 2019. Notification to the authors by 15th September 2019.
  3. Authors can also submit a maximum of 4-8 pages (excluding references) which will be peer-reviewed. We will follow the Single Blind review process. However, they will not be included in the proceedings. Accepted papers will be presented as posters or contributed talks. Authors of accepted papers will be asked to post their submissions on arXiv. The workshop website will provide links to the accepted papers on arXiv. Accepted papers will be considered non-archival, and may be submitted elsewhere (modified or not). Therefore, the deadline for paper submission is 1st September 2019. Notification to the authors by 15th September 2019.
All papers must be formatted using the ICCV template style, which can be obtained at ICCV style.

Submit Your Work

Program Commitee


  • Cees Snoek (UvA)
  • Ivan Laptev (INRIA)
  • Chen Huang (Apple)
  • Mubarak Shah (UCF)
  • Efstratios Gavves (UvA)
  • Noureldien Hussein (UvA)
  • Suman Shah (ETH Zürich)
  • Jan van Gemert (TU Delft)
  • Hamed Pirsiavash (UMBC)
  • Silvia-Laura Pintea (TU Delft)
  • Du Tran (Facebook Research)
  • Dima Damen (University of Bristol)
  • Rohit Girdhar (Facebook Research)
  • Jack Valmadre (University of Oxford)
  • Hilde Kuehne (MIT-IBM Watson Lab)
  • Hakan Bilen (University of Edinburgh)
  • Alexander Richard (University of Bonn)
  • Jakub Tomczak (Qualcomm AI Research)
  • Christoph Feichtenhofer (Facebook Research)
  • Saquib Sarfraz (KIT)
  • Josh Susskind (Apple)
  • Mohammad Sabokro (IPM)
  • Makarand Tapaswi (INRIA)
  • Andrew Owens (UC Berkley)
  • Ross Goroshin (Google Brain)
  • Tinne Tuytelaars (KU Leuven)
  • Miguel Angel Bautista (Apple)
  • Chen Sun (Google Research)
  • David Ross (Google Research)
  • Limin Wang (Nanjing University)
  • Yale Song (Microsoft Cloud & AI)
  • Matt Feiszli (Facebook Research)
  • Joao Carreira (Google Deepmind)
  • Philippe Weinzaepfel (NAVER Labs)
  • Hilde Kuehne (MIT-IBM Watson Lab)
  • Sourish Chaudhuri (Google Research)
  • Basura Fernando (A*STAR Singapore)
  • Angela Yao (National University of Singapore)

ORGANIZERS


SPONSORS



Facebook AI Research

Apple


Sensifai

CONTACT