Using Content Models to Improve Information Ordering and Extractive Summarization
Title: Using Content Models to Improve Information Ordering and Extractive Summarization
Abstract: This research investigates the use of content models, based on Hidden Markov Models (HMMs), to improve information ordering and extractive summarization. Content models represent topics and their relationships within a specific domain, allowing for the organization of information in a meaningful way. The study shows that these models outperform existing methods, making them a promising tool for various text processing tasks.
Main Research Question: Can content models, based on Hidden Markov Models, be used to improve information ordering and extractive summarization in text processing tasks?
Methodology: The study uses a knowledge-lean method to learn content models directly from unannotated documents. These models represent types of information characteristic to the domain, and state transitions capture possible information presentation orderings. The models are then applied to two tasks: information ordering and extractive summarization.
Results: The results show that content models outperform existing methods for information ordering by a wide margin. For extractive summarization, a new learning algorithm for sentence selection is developed, resulting in summaries that yield 88% match with human-written output, significantly better than the standard "leading n sentences" baseline.
Implications: The success of content models in these two tasks demonstrates their flexibility and effectiveness. This suggests that the formalism can prove useful in a broader range of text processing applications, making it conceptually intuitive and efficiently learnable from raw document collections.
Link to Article: https://arxiv.org/abs/0405039v1 Authors: arXiv ID: 0405039v1