Cross-Media Lecture Retrieval System for Lecture Videos
Title: Cross-Media Lecture Retrieval System for Lecture Videos
Abstract: This research proposes a cross-media lecture-on-demand system that allows users to selectively view specific segments of lecture videos by submitting text queries. Users can easily formulate queries by using the textbook associated with a target lecture, even if they cannot come up with effective keywords. The system extracts the audio track from a target lecture video, generates a transcription using large vocabulary continuous speech recognition, and produces a text index. Experimental results show that adapting speech recognition to the topic of the lecture increases recognition accuracy and improves retrieval accuracy to a level comparable with human transcription.
Research Question: How can a cross-media lecture-on-demand system be designed to allow users to retrieve relevant video/audio passages in response to text queries, improving the efficiency of information retrieval from lecture videos?
Methodology: The proposed system consists of an online and offline process. In the offline process, the audio track from a target lecture video is extracted and segmented into a number of passages. A speech recognition system transcribes each passage, and the transcribed passages are indexed for efficient retrieval. To adapt speech recognition to a specific lecturer, unsupervised speaker adaptation is performed using an initial speech recognition result (i.e., a transcription). In the online process, users can submit text queries to retrieve relevant video/audio passages.
Results: The experimental results demonstrate that adapting speech recognition to the topic of the lecture increases recognition accuracy and improves retrieval accuracy to a level comparable with human transcription. This indicates that the proposed system effectively retrieves relevant video/audio passages in response to text queries.
Implications: The research highlights the potential of cross-media systems in improving the efficiency of information retrieval from multimedia contents. By adapting speech recognition to the topic of the lecture, the system achieves high recognition and retrieval accuracy, making it a promising approach for other multimedia retrieval applications. Additionally, the use of textbooks to formulate queries provides a user-friendly interface, allowing users to retrieve relevant information more easily and efficiently.
Link to Article: https://arxiv.org/abs/0309021v1 Authors: arXiv ID: 0309021v1