Improving Language Models for Speech Recognition with Data-Oriented Parsing

From Simple Sci Wiki
Revision as of 03:21, 24 December 2023 by SatoshiNakamoto (talk | contribs) (Created page with "Title: Improving Language Models for Speech Recognition with Data-Oriented Parsing Abstract: This research aims to improve language models for speech recognition by incorporating data-oriented parsing (DOP) techniques. DOP is a method that constructs a stochastic tree-substitution grammar (STSG) from a treebank, allowing it to capture both headword and non-headword dependencies. The study investigates the effectiveness of DOP as a language model and proposes a maximum l...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Title: Improving Language Models for Speech Recognition with Data-Oriented Parsing

Abstract: This research aims to improve language models for speech recognition by incorporating data-oriented parsing (DOP) techniques. DOP is a method that constructs a stochastic tree-substitution grammar (STSG) from a treebank, allowing it to capture both headword and non-headword dependencies. The study investigates the effectiveness of DOP as a language model and proposes a maximum likelihood training approach to improve its performance. The results show that DOP outperforms traditional 3-gram models, especially when non-headword dependencies are taken into account. This research contributes to the field of speech recognition by introducing a new approach to language modeling that can significantly reduce word error rates.

Main Research Question: Can data-oriented parsing techniques improve language models for speech recognition, and how can these models be further enhanced through maximum likelihood training?

Methodology: The study uses the OVIS spoken language corpus, which consists of 10,000 syntactically and semantically annotated sentences with corresponding word-graphs. The research proposes a DOP approach as a language model for speech recognition and tests it against traditional 3-gram models. The DOP model is trained using a maximum likelihood reestimation procedure, which involves reevaluating the subtree probabilities based on the observed data.

Results: The experimental evaluation of the various language models on the OVIS corpus reveals that the DOP model outperforms the 3-gram model. However, the elimination of subtrees with two or more non-headwords leads to a significant deterioration of the word error rate. This finding highlights the importance of capturing non-headword dependencies for accurate language modeling.

Implications: The research demonstrates that data-oriented parsing techniques can be effectively used as language models for speech recognition. The maximum likelihood training approach further enhances the performance of the DOP model, proving that it can significantly reduce word error rates. This study contributes to the field of speech recognition by introducing a new approach to language modeling that can improve the accuracy of automatic speech recognition systems.

Link to Article: https://arxiv.org/abs/0110051v1 Authors: arXiv ID: 0110051v1