ACCVW-MLAC 2016 : ACCV'16 Workshop on Multi-view Lip-reading & Audio-visual Challenges


When Nov 20, 2016 - Nov 20, 2016
Where Taipai, Taiwan
Submission Deadline Aug 20, 2016
Notification Due Sep 10, 2016
Final Version Due Sep 17, 2016
Categories    speech   machine learning   computer science

Call For Papers

There is clear evidence that visual cues play an important role in automatic speech recognition either when audio is seriously corrupted by noise, through audiovisual speech recognition (AVSR), or even when it is inaccessible, through automatic lip-reading (ALR).

This workshop is aimed to challenge researchers to deal with the large variation caused by camera-view changes in the context of ALR/AVSR. To this end, we have collected a multi-view audiovisual database, named 'OuluVS2', which includes 52 speakers uttering both discrete and continuous utterances, simultaneously recorded by 5 cameras from 5 different viewpoints. To facilitate participants, we have pre-processed most of the data to extract the regions of interest, that is, a rectangular area including the talking mouth.

The database is suitable for research concerning visual speech as well as for other machine learning problem such as multi-view learning and transfer learning.

Researchers are invited to tackle (but not limited to) the following problems:
- Single-view ALR/AVSR
- Multiple-view ALR/AVSR
- Cross-view ALR/AVSR



Dr. Ziheng Zhou (University of Oulu, Finland)
Prof. Guoying Zhao (University of Oulu, Finland)
Prof. Richard Bowden (University of Surrey)
Prof. Takeshi Saitoh (Kyushu Institute of Technology)

