Maximal Parse Accuracy?

From Simple Sci Wiki
Jump to navigation Jump to search

Title: Maximal Parse Accuracy?

Authors: Rens Bod

Abstract: This research aims to find the minimal set of fragments that achieves maximal parse accuracy in Data Oriented Parsing. The study uses the Penn Wall Street Journal treebank and investigates several strategies for constraining the set of subtrees. The results show that an upper bound of the number of words in the subtree frontiers and an upper bound on the depth of unlexicalized subtrees do not decrease the parse accuracy. Additionally, the study found that counts of subtrees with several nonheadwords are important, leading to improved parse accuracy over previous parsers tested on the WSJ.

Main Research Question: What is the minimal set of fragments that achieves maximal parse accuracy in Data Oriented Parsing?

Methodology: The study uses the Penn Wall Street Journal treebank, a large collection of parsed sentences. The Data Oriented Parsing (DOP) model, which takes a very large and extremely redundant set of subtrees, is used as a basis. The research investigates several strategies for constraining this set of subtrees.

Results: The study finds that an upper bound of the number of words in the subtree frontiers and an upper bound on the depth of unlexicalized subtrees do not decrease the parse accuracy. Furthermore, counts of subtrees with several nonheadwords are found to be important, resulting in improved parse accuracy.

Implications: This research suggests that it is possible to impose constraints on the subtrees used in the DOP model without deteriorating or improving the parse accuracy. It also highlights the importance of considering counts of subtrees with several nonheadwords in achieving maximal parse accuracy.

Conclusion: In conclusion, the study finds that an upper bound of the number of words in the subtree frontiers and an upper bound on the depth of unlexicalized subtrees do not decrease the parse accuracy. Additionally, the research highlights the importance of considering counts of subtrees with several nonheadwords in achieving maximal parse accuracy.

Link to Article: https://arxiv.org/abs/0110050v1 Authors: arXiv ID: 0110050v1