Building a Test Collection for Speech-Driven Web Retrieval
Title: Building a Test Collection for Speech-Driven Web Retrieval
Abstract: This research aimed to create a test collection, a type of benchmark data, for speech-driven web retrieval systems. The collection was produced for a subtask of the NTCIR-3 Web retrieval main task, which was modeled after the TREC-style evaluations. The goal was to provide researchers with a reusable and publicly available test collection and tools to develop and operate speech recognition systems. This would allow researchers to experiment with similar methods as those described in the paper.
Research Question: How can a test collection be effectively created for speech-driven web retrieval to improve the performance of retrieval systems?
Methodology: The research team produced spoken queries and language models for speech recognition. They used the search topics and document collection from the Web retrieval main task to generate spoken queries and language models. They also created language models that could be used with existing speech recognition engines. All data, including the spoken queries, language models, and document collections, were included in the NTCIR-3 Web retrieval test collection.
Results: The experimental results showed that using target documents for language modeling and enhancing the vocabulary size in speech recognition were effective in improving the system's performance.
Implications: The creation of this test collection provides researchers with a valuable tool to develop and evaluate speech-driven web retrieval systems. It also encourages collaboration between the information retrieval and speech processing communities. The research highlights the importance of creating effective test collections to improve the performance of retrieval systems.
Link to Article: https://arxiv.org/abs/0309019v1 Authors: arXiv ID: 0309019v1