Discovery in Transcribed Speech
Title: Discovery in Transcribed Speech
Research Question: Can a statistical model be used to segment and discover words in continuous speech without relying on explicit cues?
Methodology: The study presents a statistical model for word discovery in transcribed speech. It describes an incremental unsupervised learning algorithm that infers word boundaries based on this model. The algorithm's performance is evaluated and compared to other models that have been used for similar tasks.
Results: Empirical tests showed that the algorithm is competitive with other models. The study also extends previous work to higher-order n-grams and discusses the results in their light. Additionally, results of experiments suggested in Brent (1999) regarding different ways of estimating phoneme probabilities are reported.
Implications: The research suggests that a conservative, traditional approach can be competitive in tasks where nontraditional approaches have been proposed. It also extends previous work and provides insights into the performance of unsupervised learning algorithms in this domain.
Conclusion: The study presents a statistical model for word discovery in transcribed speech and an unsupervised learning algorithm based on this model. Empirical tests show that the algorithm is competitive with other models, suggesting that a bare-bones language model can still be useful and provide valuable insights into the performance of different cues in word discovery.
Link to Article: https://arxiv.org/abs/0111065v1 Authors: arXiv ID: 0111065v1