V. Kromer
Title: V. Kromer
Research Question: Can a mathematical model be created to accurately predict the average word length in a text, taking into account the distribution of word lengths?
Methodology: The author proposed a new mathematical model for word length distribution. This model is based on the GFD/G48/G45/G44/G51/G52/G59/G10/G29/G58/G46/G4E/G56/G03/G47/G4C/G56/G57/G55/G4C/G45 distribution with uniform parameter distribution. This distribution has two parameters: λ1 and λ2, which represent the average word length at the beginning and end of the text, respectively. The author used the Chi-square criterion to determine the values of λ1 and λ2 and normalized the total number of text words.
Results: The author applied this model to 13 different languages, some of which were ancient and modern. They found that the correspondence between the theoretical and empirical distributions was satisfactory for most languages. The model was considered fit for the word structure description of languages like Latin, Czech, and Quechua. The author also found an interesting case where λ1 was not equal to λ2, resulting in a satisfactory correspondence. They also found a linear relationship between λ0 and λ1, which was approximated by a function.
Implications: This new model provides a more accurate way to predict the average word length in a text. It also offers insights into the distribution of word lengths within a text and how this distribution changes with the average word length. This can be useful for linguists and researchers studying language structure and text analysis.
Link to Article: https://arxiv.org/abs/0102026v1 Authors: arXiv ID: 0102026v1