FG 2020 : IEEE FG 2020 Chalearn Looking at People Workshop and Challenge on Identity-preserved Human Detection (IPHD)
Call For Papers
IEEE FG 2020 Chalearn Looking at People Challenge on Identity-preserved Human Detection (IPHD) at
The competition is now up and running!
For the competition, we ask the participants to perform human detection in depth and/or thermal images. Human detection in images/video is a challenging computer vision problem with applications in human-computer interaction, patient monitoring, surveillance, and autonomous driving, just to mention a few. In some applications, however, keeping people's privacy is a big concern for both users and companies/institutions involved. Most notably, unintended identity revelation of subjects is perhaps the greatest peril. While video data from RGB cameras are massively available to train powerful detection models, the nature of these data may also allow unpermitted third parties to access such data to try to identify observed subjects. We argue that moving away from visual sensors that capture identity information in the first place is the safest bet. However, the lack of these more privacy-safe data affects the ability to train big deep-learning models, thus affecting negatively the popularity of these sensors.
For this competition, we offer a freshly-recorded multimodal image dataset consisting of over 100K spatiotemporally aligned depth-thermal images of different people recorded in public and private spaces: street, university (cloister, hallways, and rooms), a research center, libraries, and private houses. In particular, we used RealSense D435 for depth and FLIR Lepton v3 for thermal. Given the noisy nature of such commercial depth camera and the thermal image resolution, the subjects are hardly identifiable. The dataset contains a mix of close-range in-the-wild pedestrian scenes and indoor ones with people performing in scripted scenarios, thus covering a larger space of poses, clothing, illumination, background clutter, and occlusions. The scripted scenarios include basic actions such as: sit on the sofa, lay on the floor, interacting with kitchen appliances, cooking, eating, working on the computer, talking on the phone, and so on. The camera position is not necessarily static, but sometimes held by a person. The data were originally collected as videos from different duration (from seconds to hours) but skipping frames where no movement was observed. The ordering of frames is removed to make it an image dataset (the only information provided will be the video ID).
There are three tracks associated to this contest:
1. Depth track. Given the provided depth frames (and bounding box groundtruth annotations), the participants will be asked to develop their depth-based human detection method. Depth cameras are cost-effective devices that provide geometric information of the scene at a resolution and frame acquisition speed that is comparable to RGB cameras. The downside is their noisiness at large real distances. The method developed by the participants will need to output a list of bounding boxes (and their confidence scores) per frame containing each person in it. The performance on depth image-based human detection will be evaluated.
2. Thermal track. Given the provided thermal frames (and bounding box groundtruth annotations), the participants will be asked to develop their thermal-based human detection method. Thermal cameras provide temperature readings from the scene. They are less noisy than depth cameras, but at a comparable price they offer a much lower image resolution. The method developed by the participants will need to output a list of bounding boxes (and their confidence scores) per frame containing each person in it. The performance on depth image-based human detection will be evaluated.
3. Depth-Thermal Fusion track. Given the provided aligned depth-thermal frames (and bounding box groundtruth annotations), the participants will be asked to develop their multimodal (depth and thermal) human detection method. Both modalities have been temporally and spatially aligned and, hence, so they will try to exploit their potential complementarity with a proper fusion strategy. The participants will need to output a list of bounding boxes per frame (and their confidence scores) containing each person in it. The performance on depth image-based human detection will be evaluated.
The competition will be run in the CodaLab platform. The participants will register through the platform, where they will be able to access to the different tracks (corresponding data, evaluation scripts, leaderboard, etc).
The CodaLab can be found at:
The participants will be invited to submit their papers to the associated event:
IEEE FG 2020 Workshop on Privacy-aware Computer Vision,
Accepted papers will be published within IEEE FG 2020 proceedings.
- Start of the competition: November 19th, 2019
- Release of encrypted test data and validation groundtruth: January 22th, 2020
- Start of test phase: January 25th, 2020
- End of the quantitative competition: February 4th, 2020
- Fact sheets and material submission: February 8th, 2020
- Verification of results: February 8th, 2020
(Optionally, for those participants submitting papers to the associated workshop)
- Paper submission deadline: February 15th, 2020
- Notification to authors: February 23th, 2020
- Camera-ready submission deadline: February 27th, 2020
ORGANIZATION & SPONSORS
Albert Clapés, Computer Vision Center at Universitat Autònoma de Barcelona
Carla Morral, Universitat de Barcelona
Julio C.S. Jacques Junior, Computer Vision Center at Universitat Autònoma de Barcelona & Universitat Oberta de Catalunya
Sergio Escalera, Computer Vision Center at Universitat Autònoma de Barcelona & Universitat de Barcelona
This event is sponsored by Chalearn.