posted by user: klschoef || 2422 views || tracked by 1 users: [display]

VBS 2023 : Video Browser Showdown 2023


When Jan 9, 2023 - Jan 12, 2023
Where Bergen, Norway
Submission Deadline Oct 15, 2022
Notification Due Oct 27, 2022
Final Version Due Nov 7, 2022
Categories    multimedia   video retrieval   video search   competition / challenge

Call For Papers

*** Video Browser Showdown 2023 (VBS 2023) ***

January 9-12, 2023 in Bergen, Norway

In conjunction with the 29th International Conference on MultiMedia Modeling (MMM 2023)

The VBS is an international video content search competition anually performed since 2012, which evaluates the state-of-the-art of interactive video retrieval systems. As in previous years, VBS2023 will be part of the International Conference on MultiMedia Modeling 2023 (MMM2023) in Bergen, Norway, and organized as a special side event to the Welcome Reception. It will be a moderated session where participants solve Known-Item Search (KIS) and Ad-Hoc Video Search (AVS) tasks that are issued as live presentation of scenes of interest, either as a visual clip (randomly selected), or as a textual description. The goal is to find correct segments as fast as possible (for KIS exactly one segment, for AVS many segments) and submit the segment description (video id and frame number) to the VBS server (DRES), which evaluates the correctness of submissions. The VBS aims at pushing research on large-scale video retrieval systems that are effective, fast, and easy to use for content search scenarios that are truly relevant in practice.

This year we plan several changes to make the VBS competition even more challenging, since progress of visual content recognition has significantly improved over the last few years. These changes will include smaller target scenes for KIS tasks (e.g., only 3 seconds, instead of 20) as well as a dedicated session issuing tasks in a sub-dataset with highly redundant content (from scuba diving).

VBS2023 will use the V3C1+V3C2 dataset (from the Vimeo Creative Commons Collection) in collaboration with NIST, i.e. TRECVID 2022 (i.e. with the Ad-Hoc Video Search (AVS) Task), as well as marine video (underwater / scuba diving) dataset. V3C1 consists of 7475 video files, amounting for 1000 hours of video content (1082659 predefined segments) and 1.3 TB in size and was also used in previous years. V3C2 contains additional 9760 video files, amounting for 1300 hours of video content (1425454 predefined segments) and 1.6 TB in size. In order to download the dataset (which is provided by NIST), please complete this data agreement form and send a scan to with CC to and You will be provided with a link for downloading the data.

**NEW**: The marine video (underwater / scuba diving) dataset has been provided by Prof. Sai-Kit Yeung (many thanks!) and can be downloaded directly from this website (please contact Tan Sang Ha,, for the username and password):

The VBS consists of an expert session and a novice session. In the expert session the developers of the systems themselves try to solve different types of content search queries that are issued in an ad-hoc manner. Although the dataset is available to the researchers several months before the actual competition, the queries are unknown in advance and issued on-site. In the novice session volunteers from the MMM conference audience (without help from the experts) are required to solve another set of tasks. This should ensure that the interactive video retrieval tools do not only improve in terms of retrieval performance but also in terms of usage (i.e., ease-of-use).

For Known-Item Search (KIS) tasks a single/unique video clip is randomly selected from the dataset and visually presented with the projector on-site. The participants need to find exactly the single instance presented, as fast as possible. Another task variation of this kind is textual KIS, where instead of a visual presentation, the searched segment is described only by text. For Ad-hoc Video Search (AVS) tasks, a rather general description of many shots is presented (e.g., "Find all shots showing cars in front of trees") and participants need to find as many correct examples (instances) as fast as possible. Each query has a time limit of 5 minutes and is rewarded on success with a score that depends on several factors: the required search time, the number of false submissions (which are penalized), and the number of different instances found for AVS tasks. For the latter case it is also considered, how many different "ranges" were submitted for an AVS tasks. For example, many different but temporally close shots in the same video count much less than several different shots from different videos.

**NEW**: There will be a dedicated session that tests tasks issued for the scuba-diving dataset, so that VBS systems internally may switch to this dataset exclusively (without V3C1+V3C2).

The VBS uses its own VBS Server (the DRES server [1]) to evaluate found segments for correctness. Therefore, all participants need to connect to the server via a dedicated network (typically Ethernet with CAT-5) and need to submit found segments to the server via a simple HTTP-like protocol (in case VBS2023 will be a virtual/hybrid event, DRES will run as a public service). The server is connected to a beamer and presents the current score of all teams in a live manner (in addition to presenting task descriptions). The server as well as example tasks from the previous years are provided here [11].

Anyone with an exploratory video search tool that allows for retrieval, interactive browsing, exploration in a video collection may participate. There are no restrictions in terms of allowed features, except for presentation screen recording during the competition, which is disallowed. That means in addition to interactive content search you can use any automatic content search as well.

In order to give new teams an easy entry, we provide results of content analysis to all teams. The V3C1 and V3C2 datasets already come with segmentation information and include shot boundaries as well as keyframes. Moreover, we provide resulting data for V3C1 and V3C2 from different content analysis steps (e.g., color, faces, text, detected ImageNet classes, etc.). The analysis data is available at [2] and described here [3] and here [4] for V3C2. Also the ASR data has been released here [5] (many thanks to Luca Rossetto et al.)! Moreover, the SIRET team shared their shot detection network too [6] (many thanks to Jakub Lokoc ans his team)!

If you want to join the VBS competition but do not have enough resources to build a new system from scratch, you can start with and extend a simple lightweight version of SOMHunter, the winning system at VBS 2020. The system is provided with all the necessary metadata for the V3C1 dataset [7].
Providing a solid basis for research and development in the area of multimedia management retrieval, vitrivr (overall winner of VBS 2021) is a modular open-source multimedia retrieval stack which has been participating to VBS for several years. It’s flexible architecture allows it to serve as a platform for the development of new retrieval approaches. The entire stack is available at [8].

Moreover, the SIRET team provides a state-of-the-art shot boundary detection network TransNet V2 in [9] (see paper here: [10]).

To participate please submit an extended demo paper (4-6 pages in Springer LNCS format) until the deadline (**October 8, 2022**) via the MMM 2022 Submission System (, please select "Video Browser Showdown" track). The submission should include a detailed description of the video search tool (including a screenshot of the tool) and show describe how it supports interactive search in video data. Submissions will be peer-reviewed to ensure maximum quality. Accepted papers will be published in the proceedings of the MMM conference and should also be presented as posters during the VBS session.

We plan to write a joint journal paper after the VBS competition, where each participating team should contribute to. The winning team will be honored to be in charge of the journal paper (as a main author).

Paper submission: October 15, 2022 (extended)
Notification of acceptance: October 27, 2022
Camera-ready and author registration: November 7, 2022

- Klaus Schoeffmann, Klagenfurt University, Austria –
- Werner Bailer, Joanneum Research, Austria –
- Jakub Lokoc, Charles University, Czech Republic –
- Cathal Gurrin, Dublin City University, Ireland –


Related Resources

VBS 2024   Video Browser Showdown 2024
IEEE-Ei/Scopus-SGGEA 2024   2024 Asia Conference on Smart Grid, Green Energy and Applications (SGGEA 2024) -EI Compendex
ICIP 2024   International Conference on Image Processing
SPIE-Ei/Scopus-CVCM 2024   2024 5th International Conference on Computer Vision, Communications and Multimedia (CVCM 2024) -EI Compendex
AVSS 2024   20th IEEE International Conference on Advanced Video and Signal-Based Surveillance
VSIP 2024   ACM--2024 The 6th International Conference on Video, Signal and Image Processing (VSIP 2024)
SPIE-Ei/Scopus-ITNLP 2024   2024 4th International Conference on Information Technology and Natural Language Processing (ITNLP 2024) -EI Compendex
ICVIP 2024   2024 The 8th International Conference on Video and Image Processing (ICVIP 2024)
ITCCMA 2024   11th International Conference on Information Technology, Control, Chaos, Modeling and Applications
ICVSP 2024   2024 3rd International Conference on Video and Signal Processing (ICVSP 2024)