ReaderBench: multilevel analysis of Russian text characteristics
Exploring a new open source version of the ReaderBench platform. Calculation of indexes of complexity of texts of various levels according to the Common European scale. Evaluation of the complexity of the perception of the text in various fields.
Рубрика | Программирование, компьютеры и кибернетика |
Вид | статья |
Язык | английский |
Дата добавления | 16.08.2023 |
Размер файла | 589,0 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
When considering semantics, text cohesion does not differ that much in comparison to the other categories, although is statistically significant at in-between sentences from the input paragraph. Yet again, this was an expected result, given that text cohesion is a measure of how well ideas relate to one another and flow throughout the text; nevertheless, texts are well written by experts and should be cohesive.
Our statistical analysis pinpointed that the difficulty of Russian texts comes from the usage of more descriptive passages that include phrases rich in nouns and adjectives. Other characteristics, such as the number of (unique) words, are logical implications of the previous idea. Given that the considered corpus was developed by language experts and can be considered of reference for the Russian educational system, our findings can further support the design of new materials for L2 education. In addition, ReaderBench can be used in other experiments or domains where textual complexity is an important factor, as it can be used to quantify the differences between B and C language level texts, between manuals from two different grade levels, or to estimate the difficulty of science, politics, or law texts.
This paper introduced the adaptation of the open-source ReaderBench framework to support multilevel analyses of Russian language in terms of identifying text characteristics reflective of its difficulty. Numerous improvements were made, starting from code refactoring, the addition of new indices (e.g., adjacent cohesion for sentences and for paragraphs, inter-paragraph cohesion) and of the maximum aggregation function, the integration of BERT language model as input for building the CNA graph, as well as the usage of the MUSE version of word2vec that provides multilingual word embeddings.
The ReaderBench textual complexity indices together with BERT contextualized embeddings were used as inputs to predict the language level of texts from two classes: A (Basic User) and B (Independent User). Both approaches, namely neural network architectures and the statistical analyses using the Kruskal - Wallis test, confirmed that the complexity indices from ReaderBench are reliable predictors for text difficulty. The best performance of the neural network using both handcrafted features and BERT embeddings achieved a 92.36% leave one text out cross-validation, thus arguing for the model's capability to distinguish between text of various difficulties.
ReaderBench can be used to assess the complexity of Russian texts in different domains, including law, science, or politics. In addition, our framework can be employed by designers and developers of educational materials to evaluate and rank learning materials.
In terms of future work, we want to further extend the list of Russian textual complexity indices available in ReaderBench, including discourse markers and the Russian WordNet which currently is not aligned with the Open Multilingual Wordnet format. In addition, we envision performing additional studies regarding the complexity of the Russian texts and focusing on textbooks used in the Russian educational system, as well as multilingual analyses highlighting language specificities.
References
1. Abadi, Martin. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16) Savannah, GA, USA: {USENIX} Association. 265-283.
2. Akhtiamov, Raouf B. 2019. Dictionary of abstract and concrete words of the Russian language: A methodology for creation and application. Journal of Research in Applied Linguistics. Saint Petersburg, Russia: Springer. 218-230.
3. Bansal, S. 2014. Textstat. Retrieved September 1st, 2021. URL: https://github.com/ shivam5992/textstat (accessed 26.05.2022).
4. Blei, David M., Andrew Y. Ng & Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (4-5). 993-1022.
5. BNC Consortium. 2007. British national corpus. Oxford Text Archive Core Collection. Boguslavsky, Igor, Leonid Iomdin & Victor Sizov. 2004. Multilinguality in ETAP-3: Reuse of lexical resources. In Proceedings of the Workshop on Multilingual Linguistic Resources. Geneva, Switzerland: COLING. 1-8.
6. Brysbaert, Marc, Boris New & Emmanuel Keuleers. 2012. Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods 44 (4). 991-997. Brysbaert, Marc, Amy Beth Warriner & Victor Kuperman. 2014. Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods 46 (3). 904-911. Choi, Joon Suh & Scott A. Crossley. 2020. ARTE: Automatic Readability Tool for English. NLP Tools for the Social Sciences. linguisticanalysistools.org. Retrieved September 1st, 2021. URL: https://www.linguisticanalysistools.org/arte.html (accessed 26.05.2022). Churunina, Anna A., Ehl'zara Gizzatullina-Gafiyatova, Artem Zaikin & Marina I. Solnyshkina. 2020. Lexical Features of Text Complexity: The case of Russian academic texts. In SHS Web of Conferences. Nizhny Novgorod, Russia: EDP Sciences.
7. Coltheart, Max. 1981. The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A 33 (4). 497-505.
8. Conneau, Alexis, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer & Herve Jegou. 2018. Word translation without parallel data. In 6th International Conference on Learning Representations. Vancouver, BC, Canada: OpenReview.net.
9. Crossley, Scott A., Franklin Bradfield & Analynn Bustamante. 2019. Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. Journal of Writing Research 11 (2). 251-270.
10. Crossley, Scott A., Kristopher Kyle, Jodi Davenport & Danielle S. McNamara. 2016. Automatic assessment of constructed response data in a Chemistry Tutor. In International Conference on Educational Data Ining. Raleigh, North Carolina, USA: International Educational Data Mining Society. 336-340.
11. Dale, Edgar & Jeanne S. Chall. 1948. A formula for predicting readability: Instructions. Educational Research Bulletin 27 (1). 37-54.
12. Dascalu, Mihai. 2014. Analyzing Discourse and Text Complexity for Learning and Collaborating, Studies in Computational Intelligence (534). Switzerland: Springer. Dascalu, Mihai, Philippe Dessus, Stefan Trausan-Matu & Maryse Bianco. 2013. ReaderBench, an environment for analyzing text complexity and reading strategies. In H. Chad Lane, Kalina Yacef, Jack Mostow & Philip Pavlik (eds.), 16th Int. Conf on Artificial Intelligence in Education (AIED 2013), 379-388. Memphis, TN, USA: Springer.
13. Dascalu, Mihai, Danielle S. McNamara, Stefan Trausan-Matu & Laura K. Allen. 2018. Cohesion Network Analysis of CSCL Participation. Behavior Research Methods 50 (2). 604-619. https://doi.org/10.3758/s13428-017-0888-4
14. Dascalu, Mihai, Lucia Larise Stavarache, Stefan Trausan-Matu & Philippe Dessus. 2014. Reflecting comprehension through French textual complexity factors. In 26th Int. Conf on Tools with Artificial Intelligence (ICTAI2014). 615-619. Limassol, Cyprus: IEEE.
15. Dascalu, Mihai, Wim Westera, Stefan Ruseti, Stefan Trausan-Matu & Hub J. Kurvers. 2017. ReaderBench learns Dutch: Building a comprehensive automated essay scoring system for Dutch. In Anne E. Baker, Xiangen Hu, Ma. Mercedes T. Rodrigo, Benedict du Boulay, Ryan Baker (eds.), 18th Int. Conf. on Artificial Intelligence in Education (AIED 2017), 52-63. Wuhan, China: Springer.
16. Davies, Mark. 2010. The corpus of contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing 25 (4). 447-464.
17. Delvin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, MN, USA: Association for Computational Linguistics. 4171-4186.
18. Flesch, Rudolf F. 1949. Art of Readable Writing.
19. Gabitov, Azat, Marina Solnyshkina, Liliya Shayakhmetova, Liliya Ilyasova & Saida Adobarova. 2017. Text complexity in Russian textbooks on social studies. Revista Publicando 4 (13 (2)). 597-606.
20. Gifu, Daniela, Mihai Dascalu, Stefan Trausan-Matu & Laura K. Allen. 2016. Time evolution of writing styles in Romanian language. In 28th Int. Conf. on Tools with Artificial Intelligence (ICTAI 2016). San Jose, CA: IEEE. 1048-1054.
21. Graesser, Arthur C., Danielle S. McNamara, Max M. Louwerse & Zhiqiang Cai. 2004. Coh - Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers 36 (2). 193-202.
22. Guryanov, Igor, Iskander Yarmakeev, Aleksandr Kiselnikov & Iena Harkova. 2017. Text complexity: Periods of study in Russian linguistics. Revista Publicando 4 (13 (2)). 616 - 625.
23. Gutu-Robu, Gabriel, Maria-Dorinela Sirbu, Ionut S Cristian Paraschiv, Mihai Dascalu, Philippe Dessus & Stefan Trausan-Matu. 2018. Liftoff - ReaderBench introduces new online functionalities. Romanian Journal of Human - Computer Interaction 11 (1). 76-91.
24. Honnibal, Montani & I. Montani. 2017. Spacy 2: Natural language understanding with bloom embeddings. Convolutional Neural Networks and Incremental Parsing 7 (1).
25. Hopkins, Kenneth D. & Douglas L. Weeks. 1990. Tests for normality and measures of skewness and kurtosis: Their place in research reporting. Educational and Psychological Measurement 50 (4). 717-729.
26. Kincaid, J. Peter, Robert P. Fishburne Jr., Richard L. Rogers & Brad S. Chissom. 1975. Derivation of New Readability Formulas: (AutomatedReadability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Naval Air Station Memphis: Chief of Naval Technical Training.
27. Kozea. 2016. Pyphen. Retrieved September 1st, 2021. URL: https://pyphen.org/ (accessed 20.05.2022).
28. Kruskal, William H. & Allen W. Wallis. 1952. Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association 47 (260). 583-621.
29. Kuperman, Victor, Hans Stadthagen-Gonzalez & Marc Brysbaert. 2012. Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods 44 (4). 978-990.
30. Kuratov, Yuri & Mikhail Arkhipov. 2019. Adaptation of deep bidirectional multilingual transformers for Russian language. arXivpreprint arXiv:1905.07213.
31. Kyle, Kristopher. 2016. Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-based Indices of Syntactic Sophistication.
32. Kyle, Kristopher, Scott A. Crossley & Cynthia Berger. 2018. The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods 50 (3). 1030-1046.
33. Kyle, Kristopher, Scott A. Crossley & Scott Jarvis. 2021. Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly 18 (2). 154-170.
34. Kyle, Kristopher, Scott A. Crossley & Youjin J. Kim. 2015. Native language identification and writing proficiency. International Journal of Learner Corpus Research 1 (2). 187-209.
35. Landauer, Thomas K., Peter W. Foltz & Darrell Laham. 1998. An introduction to Latent Semantic Analysis. Discourse Processes25 (2/3). 259-284.
36. LanguageTool. 2021. Language Tool. Retrieved September 1st, 2021. URL: https://languagetool.org/ (accessed 20.05.2022).
37. Loukachevitch, Natalia V., G. Lashevich, Anastasia A. Gerasimova, Vyacheslav V. Ivanov. Boris V. Dobrov. 2016. Creating Russian wordnet by conversion. In Computational Linguistics and Intellectual Technologies: Annual conference Dialogue 2016. Moscow, Russia. 405-415.
38. Mc Laughlin, G.H. 1969. SMOG grading-a new readability formula. Journal of Reading 12 (8). 639-646.
39. Mccarthy, Kathryn, Danielle Siobhan, Marina I. Solnyshkina, Fanuza Kh. Tarasove & Roman V. Kupriyanov. 2019. The Russian language test: Towards assessing text comprehension. Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriya 2: Yazykoznanie 18 (4). 231-247.
40. Mikolov, Tomas, Kai Chen, Greg Corrado & Jeffrey Dean. 2013. Efficient estimation of word representation in Vector Space. In Workshop at ICLR. Scottsdale, AZ.
41. Myint. 2014. language-check. Retrieved September 1st, 2021. URL: https://github.com/myint/ language-check (accessed 23.05.2022).
42. Pearson, Karl. 1895. VII. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London 58. 240-242.
43. Pedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot & Edouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12. 2825-2830.
44. Quispesaravia, Andre, Walter Perez, Marco Sobrevilla Cabezudo & Fernando Alva-Manchego. 2016. Coh-Metrix-Esp: A complexity analysis tool for documents written in Spanish. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 4694-4698.
45. Rehurek, Radim & Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA. 45-50.
46. Roscoe, Rod, Laura K. Allen, Jennifer L. Weston & Scott A. Crossley. 2014. The Writing Pal intelligent tutoring system: Usability testing and development. Computers and Composition 34. 39-59.
47. Sadoski, Mark, Ernest T. Goetz & Maximo Rodriguez. 2000. Engaging texts: Effects of concreteness on comprehensibility, interest, and recall in four text types. Journal of Educational Psychology 92 (1). 85.
48. Sakhovskiy, Andrey, Valery D. Solovyev & Marina Solnyshkina. 2020. Topic modeling for assessment of text complexity in Russian textbooks. In 2020 Ivannikov Ispras Open Conference (ISPRAS). Moscow, Russia: IEEE. 102-108.
49. Schmid, Helmut, Marco Baroni, Erika Zanchetta & Achim Stein. 2007. Il sistema `tree-tagger arricchito' - The enriched TreeTagger system. IA Contributi Scientifici 4 (2). 22-23.
50. Senter, R.J. & E.A. Smith. 1967. Automated readability index: CINCINNATI UNIV OH.
51. Shannon, Claude E. 1948. A mathematical theory of communication. The Bell System Technical Journal 27 (3). 379-423.
52. Shapiro, S.S. & M.B. Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika52 (3/4). 591-611.
53. Sharoff, Serge, Elena Umanskaya & James Wilson. 2014. A Frequency Dictionary of Russian: Core Vocabulary for Learners. Routledge.
54. Solnyshkina, Marina I., Valery Solovyev, Vladimir Ivanov & Andrey Danilov. 2018. Studying text complexity in Russian academic corpus with Multi-Level Annotation. CEUR WORKSHOP PROCEEDINGS. Proceedings of Computational Models in Language and Speech Workshop, co-located with the 15th TEL International Conference on Computational and Cognitive Linguistics, TEL 2018.
55. Solovyev, Valery, Marina Solnyshkina, Mariia Andreeva, Andrey Danilov & Radif Zamaletdinov. 2020. Text complexity and abstractness: Tools for the Russian language. In International Conference «Internet and Modern Society» (IMS-2020). St. Petersburg, Russia: CEUR Proceedings. 75-87.
56. Solovyev, Valery, Marina I. Solnyshkina & Vladimir Ivanov. 2018. Complexity of Russian academic texts as the function of syntactic parameters. In 19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing. Hanoi, Vietnam: Springer Lecture Notes in Computer Science.
57. Spearman, Carl. 1987. The proof and measurement of association between two things. The American Journal of Psychology100 (3/4). 441-471.
58. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser & Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. Long Beach, CA, USA: Curran Associates, Inc. 59986008.
59. Vorontsov, Konstantin & Anna Potapenko. 2015. Additive regularization of topic models. Machine Learning 101 (1) 303-323.
Appendix
Statistically significant ReaderBench indices
Index |
A M (SD) |
B M (SD) |
X2 (1) |
P |
|
Max (Dep_nmod / Sent) |
0.49 (0.84) |
1.24 (1.33) |
84.48 |
<.001 |
|
M (POS_noun / Sent) |
1.50 (1.25) |
2.58 (1.84) |
84.31 |
<.001 |
|
M (Dep_nmod / Sent) |
0.22 (0.41) |
0.64 (0.79) |
83.55 |
<.001 |
|
Max (POS_noun / Sent) |
2.27 (2.12) |
3.82 (2.63) |
82.50 |
<.001 |
|
M (Dep_nmod / Par) |
1.10 (3.57) |
2.14 (2.67) |
81.24 |
<.001 |
|
M (UnqPOS_noun / Sent) |
1.50 (1.24) |
2.51 (1.76) |
79.97 |
<.001 |
|
Max (UnqPOS_noun / Sent) |
2.26 (2.09) |
3.73 (2.54) |
78.43 |
<.001 |
|
Max (NgramEntr_2 / Word) |
2.05 (0.34) |
2.20 (0.45) |
76.74 |
<.001 |
|
M (Chars / Word) |
3.97 (0.98) |
4.43 (1.12) |
76.03 |
<.001 |
|
Max (Chars / Word) |
9.21 (2.46) |
10.76 (3.39) |
74.83 |
<.001 |
|
M (POS_noun / Par) |
6.96 (18.39) |
8.81 (8.13) |
73.83 |
<.001 |
|
M (Dep_amod / Par) |
1.91 (6.70) |
2.41 (2.72) |
73.77 |
<.001 |
|
M (Syllab / Word) |
1.73 (0.32) |
1.89 (0.46) |
73.08 |
<.001 |
|
Max (Syllab / Word) |
3.66 (1.04) |
4.27 (1.39) |
72.10 |
<.001 |
|
M (UnqPOS_noun / Par) |
5.80 (13.29) |
7.96 (7.10) |
69.45 |
<.001 |
|
M (POS_adj / Par) |
2.65 (8.85) |
3.28 (3.39) |
69.28 |
<.001 |
|
M (UnqPOS_adj / Par) |
2.42 (7.52) |
3.18 (3.25) |
69.05 |
<.001 |
|
Max (ParseDepth / Sent) |
4.06 (1.61) |
5.09 (2.06) |
66.62 |
<.001 |
|
Max (Dep_amod / Sent) |
0.70 (1.12) |
1.29 (1.23) |
66.28 |
<.001 |
|
M (Dep_amod / Sent) |
0.35 (0.57) |
0.73 (0.82) |
65.66 |
<.001 |
|
SD (Dep_nmod / Sent) |
0.18 (0.36) |
0.47 (0.61) |
63.21 |
<.001 |
|
Max (POS_adj / Sent) |
0.97 (1.30) |
1.68 (1.46) |
62.27 |
<.001 |
|
Max (UnqPOS_adj / Sent) |
0.97 (1.29) |
1.66 (1.43) |
62.25 |
<.001 |
|
SD (Syllab / Word) |
0.88 (0.28) |
1.01 (0.41) |
62.20 |
<.001 |
|
M (NgramEntr_2 / Word) |
0.88 (0.27) |
0.98 (0.27) |
62.15 |
<.001 |
|
M (POS_adj / Sent) |
0.51 (0.67) |
0.97 (0.98) |
60.85 |
<.001 |
|
M (UnqPOS_adj / Sent) |
0.51 (0.66) |
0.96 (0.96) |
60.81 |
<.001 |
|
SD (Chars / Word) |
2.71 (0.71) |
3.03 (0.94) |
58.24 |
<.001 |
|
M (ParseDepth / Sent) |
3.43 (1.00) |
4.07 (1.47) |
53.99 |
<.001 |
|
SD (Dep_amod / Sent) |
0.23 (0.41) |
0.46 (0.53) |
47.14 |
<.001 |
|
M (Dep_case / Par) |
3.03 (7.92) |
3.45 (3.74) |
39.03 |
<.001 |
|
SD (POS_noun / Sent) |
0.61 (0.83) |
1.07 (1.15) |
38.47 |
<.001 |
|
SD (POS_adj / Sent) |
0.34 (0.52) |
0.58 (0.62) |
38.30 |
<.001 |
|
SD (UnqPOS_adj / Sent) |
0.34 (0.52) |
0.58 (0.61) |
37.76 |
<.001 |
|
SD (UnqPOS_noun / Sent) |
0.61 (0.82) |
1.04 (1.11) |
36.88 |
<.001 |
|
SD (ParseDepth / Sent) |
0.54 (0.69) |
0.85 (0.81) |
34.05 |
<.001 |
|
M (UnqWd / Par) |
27.37 (48.36) |
31.79 (24.87) |
32.74 |
<.001 |
|
SD (NgramEntr_2 / Word) |
0.78 (0.18) |
0.82 (0.21) |
31.88 |
<.001 |
|
Max (UnqWd / Sent) |
11.72 (7.00) |
14.93 (8.52) |
31.81 |
<.001 |
|
M (WdEntr / Par) |
2.58 (0.93) |
2.84 (1.06) |
31.78 |
<.001 |
|
M (Wd / Par) |
39.37 (93.19) |
41.26 (36.14) |
31.65 |
<.001 |
|
Max (WdEntr / Sent) |
2.23 (0.67) |
2.4 (0.82) |
31.30 |
<.001 |
|
Max (Wd / Sent) |
12.76 (8.31) |
16.56 (10.35) |
29.49 |
<.001 |
|
Max (Dep_case / Sent) |
1.17 (1.36) |
1.69 (1.50) |
29.39 |
<.001 |
|
SD (Dep_acl / Sent) |
0.03 (0.11) |
0.09 (0.20) |
29.16 |
<.001 |
|
M (Pron_indef / Par) |
1.34 (3.69) |
1.65 (2.08) |
28.83 |
<.001 |
|
M (Dep_case / Sent) |
0.66 (0.78) |
0.94 (0.85) |
28.77 |
<.001 |
|
SD (Pron_indef / Sent) |
0.24 (0.38) |
0.4 (0.48) |
28.44 |
<.001 |
|
M (Dep_obl / Par) |
2.66 (7.21) |
2.72 (3.06) |
27.91 |
<.001 |
|
M (Dep_det / Par) |
0.88 (2.34) |
1.22 (1.72) |
27.26 |
<.001 |
|
Max (Dep_det / Sent) |
0.45 (0.74) |
0.77 (1.01) |
27.20 |
<.001 |
|
M (Dep_xcomp / Sent) |
0.11 (0.26) |
0.21 (0.35) |
26.88 |
<.001 |
|
SD (Dep_case / Sent) |
0.39 (0.54) |
0.64 (0.70) |
26.82 |
<.001 |
|
M (Dep_xcomp / Par) |
0.60 (1.89) |
0.76 (1.21) |
26.80 |
<.001 |
|
Max (Dep_xcomp / Sent) |
0.29 (0.55) |
0.51 (0.74) |
26.43 |
<.001 |
|
SD (Wd / Sent) |
2.47 (3.25) |
3.86 (3.92) |
26.32 |
<.001 |
|
Max (Pron_indef / Sent) |
0.61 (0.85) |
0.92 (0.99) |
26.19 |
<.001 |
|
M (MidEndCoh / Par) |
0.23 (0.32) |
0.35 (0.35) |
25.89 |
<.001 |
|
Max (LemmaDiff / Word) |
1.31 (0.87) |
1.63 (0.99) |
25.80 |
<.001 |
|
M (Dep_acl / Sent) |
0.02 (0.12) |
0.06 (0.14) |
24.68 |
<.001 |
|
SD (Dep_det / Sent) |
0.19 (0.36) |
0.33 (0.45) |
24.57 |
<.001 |
|
SD (UnqWd / Sent) |
2.17 (2.77) |
3.31 (3.28) |
24.31 |
<.001 |
|
M (Dep_acl / Par) |
0.17 (1.02) |
0.26 (0.61) |
24.23 |
<.001 |
|
Max (Dep_acl / Sent) |
0.08 (0.30) |
0.20 (0.43) |
24.18 |
<.001 |
|
M (Dep_nummod / Sent) |
0.04 (0.22) |
0.10 (0.30) |
23.29 |
<.001 |
|
M (Dep_det / Sent) |
0.19 (0.33) |
0.33 (0.62) |
22.94 |
<.001 |
|
Max (Dep_nummod / Sent) |
0.12 (0.41) |
0.29 (0.64) |
22.90 |
<.001 |
|
M (Dep_nummod / Par) |
0.16 (0.62) |
0.35 (0.86) |
22.10 |
<.001 |
|
Max (Dep_obl / Sent) |
1.02 (1.25) |
1.38 (1.27) |
21.88 |
<.001 |
|
M (UnqWd / Sent) |
9.02 (4.39) |
10.94 (6.11) |
21.57 |
<.001 |
|
M (Pron_indef / Sent) |
0.29 (0.43) |
0.43 (0.56) |
21.20 |
<.001 |
|
SD (Dep_xcomp / Sent) |
0.12 (0.24) |
0.23 (0.35) |
21.19 |
<.001 |
|
SD (Dep_obl / Sent) |
0.36 (0.51) |
0.54 (0.60) |
20.91 |
<.001 |
|
SD (Dep_nummod / Sent) |
0.05 (0.18) |
0.13 (0.30) |
20.37 |
<.001 |
|
M (Sent / Par) |
3.58 (7.30) |
3.32 (2.59) |
20.15 |
<.001 |
|
M (Wd / Sent) |
9.57 (4.95) |
11.84 (7.39) |
19.55 |
<.001 |
|
SD (Repetitions / Sent) |
0.34 (0.68) |
0.53 (0.82) |
18.27 |
<.001 |
|
M (Dep_conj / Par) |
2.18 (5.95) |
2.08 (2.48) |
18.14 |
<.001 |
|
M (Dep_obl / Sent) |
0.54 (0.68) |
0.73 (0.71) |
17.79 |
<.001 |
|
M (Commas / Par) |
3.01 (7.55) |
3.12 (3.43) |
16.97 |
<.001 |
|
M (Dep_appos / Sent) |
0.09 (0.27) |
0.18 (0.41) |
16.95 |
<.001 |
|
M (WdEntr / Sent) |
1.99 (0.57) |
2.09 (0.72) |
16.82 |
<.001 |
|
SD (POS_adv / Sent) |
0.37 (0.52) |
0.51 (0.57) |
16.19 |
<.001 |
|
M (StartMidCoh / Par) |
0.23 (0.32) |
0.32 (0.34) |
16.09 |
<.001 |
|
SD (UnqPOS_adv / Sent) |
0.36 (0.52) |
0.50 (0.55) |
15.72 |
<.001 |
|
M (Dep_obj / Par) |
1.74 (4.35) |
1.89 (2.43) |
15.49 |
<.001 |
|
SD (Dep_cc / Sent) |
0.27 (0.42) |
0.39 (0.46) |
15.46 |
<.001 |
|
Max (Dep_appos / Sent) |
0.21 (0.51) |
0.36 (0.66) |
15.25 |
<.001 |
|
Max (Dep_conj / Sent) |
0.94 (1.33) |
1.23 (1.34) |
15.03 |
<.001 |
|
SD (Dep_advmod / Sent) |
0.40 (0.56) |
0.57 (0.66) |
14.70 |
<.001 |
|
M (Dep_appos / Par) |
0.33 (1.07) |
0.42 (0.86) |
14.34 |
<.001 |
|
M (UnqPOS_adv / Par) |
1.94 (3.90) |
2.21 (2.50) |
13.75 |
<.001 |
|
Max (UnqPOS_adv / Par) |
1.94 (3.90) |
2.21 (2.50) |
13.75 |
<.001 |
|
SD (Pron_int / Sent) |
0.15 (0.28) |
0.24 (0.34) |
13.58 |
<.001 |
|
M (Pron_int / Par) |
0.57 (1.42) |
0.80 (1.22) |
13.47 |
<.001 |
|
M (POS_adv / Par) |
2.19 (4.85) |
2.33 (2.72) |
13.44 |
<.001 |
|
M (Punct / Par) |
8.34 (18.53) |
7.74 (6.6) |
13.39 |
<.001 |
|
M (Dep_mark / Par) |
0.61 (1.54) |
0.84 (1.41) |
13.22 |
<.001 |
|
Max (Dep_obj / Sent) |
0.75 (0.91) |
0.99 (0.99) |
12.82 |
<.001 |
|
SD (Dep_conj / Sent) |
0.35 (0.56) |
0.48 (0.61) |
12.76 |
<.001 |
|
M (Dep_nsubj / Par) |
4.18 (10.36) |
3.77 (3.7) |
12.74 |
<.001 |
|
SD (Commas / Sent) |
0.45 (0.61) |
0.60 (0.66) |
12.74 |
<.001 |
|
Max (POS_adv / Sent) |
1.01 (1.20) |
1.29 (1.27) |
12.66 |
<.001 |
|
SD (Dep_mark / Sent) |
0.15 (0.29) |
0.23 (0.34) |
12.62 |
<.001 |
|
Max (Dep_mark / Sent) |
0.35 (0.59) |
0.53 (0.73) |
12.61 |
<.001 |
|
SD (WdEntr / Sent) |
0.23 (0.29) |
0.30 (0.29) |
12.49 |
<.001 |
|
Max (Commas / Sent) |
1.32 (1.41) |
1.68 (1.53) |
12.41 |
<.001 |
|
Max (UnqPOS_adv / Sent) |
1.00 (1.18) |
1.27 (1.22) |
12.32 |
<.001 |
|
Max (Pron_int / Sent) |
0.36 (0.57) |
0.55 (0.73) |
12.32 |
<.001 |
|
SD (Dep_obj / Sent) |
0.29 (0.41) |
0.40 (0.45) |
12.25 |
<.001 |
|
M (Dep_cc / Par) |
1.73 (4.53) |
1.71 (2.16) |
12.14 |
<.001 |
|
M (Repetitions / Par) |
1.85 (5.64) |
1.96 (3.06) |
11.87 |
<.001 |
|
M (SentAdjCoh / Par) |
0.35 (0.33) |
0.43 (0.33) |
11.81 |
<.001 |
|
M (Dep_mark / Sent) |
0.15 (0.31) |
0.22 (0.39) |
11.66 |
<.001 |
|
M (Dep_advmod / Par) |
2.65 (5.95) |
2.72 (3.19) |
11.36 |
<.001 |
|
M (StartEndCoh / Par) |
0.35 (0.34) |
0.42 (0.35) |
11.32 |
<.001 |
|
M (POS_verb / Par) |
5.65 (13.16) |
5.16 (5.15) |
11.19 |
<.001 |
|
M (UnqPOS_verb / Par) |
5.19 (11.26) |
4.92 (4.83) |
10.90 |
<.001 |
|
SD (NmdEnt_loc / Sent) |
0.09 (0.30) |
0.15 (0.35) |
10.82 |
001 |
|
SD (POS_verb / Sent) |
0.54 (0.67) |
0.70 (0.72) |
10.74 |
001 |
|
SD (Punct / Sent) |
0.66 (0.84) |
0.87 (0.97) |
10.32 |
001 |
|
Max (Repetitions / Sent) |
0.98 (1.78) |
1.37 (2.13) |
10.11 |
001 |
|
SD (UnqPOS_verb / Sent) |
0.54 (0.67) |
0.68 (0.71) |
9.91 |
002 |
|
M (Pron_int / Sent) |
0.15 (0.28) |
0.22 (0.39) |
9.69 |
002 |
|
Max (Dep_advmod / Sent) |
1.18 (1.33) |
1.48 (1.44) |
9.67 |
002 |
|
Max (Dep_cc / Sent) |
0.73 (0.94) |
0.93 (0.99) |
9.59 |
002 |
|
M (Dep_fixed / Sent) |
0.03 (0.13) |
0.08 (0.24) |
9.51 |
002 |
|
SD (LemmaDiff / Word) |
0.39 (0.22) |
0.42 (0.23) |
9.35 |
002 |
|
M (NmdEnt_org / Sent) |
0.01 (0.12) |
0.06 (0.29) |
9.05 |
003 |
|
Max (NmdEnt_org / Sent) |
0.06 (0.51) |
0.12 (0.5) |
8.91 |
003 |
|
M (NmdEnt_org / Par) |
0.10 (0.98) |
0.14 (0.66) |
8.87 |
003 |
|
SD (Dep_expl / Sent) |
0.00 (0.05) |
0.02 (0.1) |
8.62 |
003 |
|
M (NmdEnt_loc / Sent) |
0.08 (0.29) |
0.13 (0.34) |
8.60 |
003 |
|
M (Dep_conj / Sent) |
0.47 (0.64) |
0.62 (0.86) |
8.50 |
004 |
|
M (NmdEnt_loc / Par) |
0.40 (2.53) |
0.51 (1.26) |
8.42 |
004 |
|
SD (Dep_fixed / Sent) |
0.05 (0.18) |
0.12 (0.32) |
8.24 |
004 |
|
M (Dep_obj / Sent) |
0.39 (0.49) |
0.50 (0.56) |
8.17 |
004 |
|
Max (Dep_expl / Sent) |
0.01 (0.10) |
0.04 (0.2) |
8.16 |
004 |
|
M (Dep_expl / Sent) |
0.00 (0.06) |
0.01 (0.08) |
8.14 |
004 |
|
M (Dep_expl / Par) |
0.01 (0.13) |
0.05 (0.22) |
8.13 |
004 |
|
Max (Dep_fixed / Sent) |
0.15 (0.47) |
0.25 (0.58) |
7.93 |
005 |
|
SD (Dep_appos / Sent) |
0.07 (0.21) |
0.13 (0.30) |
7.76 |
005 |
|
Max (NmdEnt_loc / Sent) |
0.23 (0.72) |
0.33 (0.73) |
7.76 |
005 |
|
M (Dep_fixed / Par) |
0.19 (0.63) |
0.27 (0.65) |
7.67 |
006 |
|
SD (Dep_iobj / Sent) |
0.11 (0.25) |
0.15 (0.26) |
7.34 |
007 |
|
SD (POS_pron / Sent) |
0.44 (0.58) |
0.56 (0.67) |
7.02 |
008 |
|
SD (Dep_advcl / Sent) |
0.09 (0.21) |
0.13 (0.24) |
6.67 |
010 |
|
SD (Dep_nsubj / Sent) |
0.36 (0.47) |
0.45 (0.52) |
6.62 |
010 |
|
M (Commas / Sent) |
0.74 (0.80) |
0.93 (1.01) |
6.50 |
011 |
|
SD (UnqPOS_pron / Sent) |
0.42 (0.56) |
0.52 (0.60) |
6.11 |
013 |
|
M (Repetitions / Sent) |
0.43 (0.73) |
0.64 (1.52) |
5.74 |
017 |
|
M (POS_adv / Sent) |
0.55 (0.74) |
0.65 (0.74) |
5.72 |
017 |
|
SD (Pron_snd / Sent) |
0.06 (0.20) |
0.11 (0.27) |
5.51 |
019 |
|
M (UnqPOS_adv / Sent) |
0.55 (0.73) |
0.65 (0.73) |
5.49 |
019 |
|
SD (NmdEnt_org / Sent) |
0.01 (0.12) |
0.05 (0.22) |
5.39 |
020 |
|
SD (Dep_csubj / Sent) |
0.04 (0.15) |
0.07 (0.18) |
5.07 |
024 |
|
SD (Dep_ccomp / Sent) |
0.07 (0.19) |
0.11 (0.23) |
5.01 |
025 |
|
Max (Dep_csubj / Sent) |
0.10 (0.32) |
0.16 (0.39) |
4.85 |
028 |
|
M (Dep_csubj / Sent) |
0.04 (0.14) |
0.05 (0.18) |
4.80 |
029 |
|
M (Dep_ccomp / Par) |
0.27 (0.84) |
0.36 (0.80) |
4.55 |
033 |
|
Max (Dep_ccomp / Sent) |
0.19 (0.43) |
0.26 (0.50) |
4.35 |
037 |
|
M (Dep_csubj / Par) |
0.16 (0.59) |
0.18 (0.46) |
4.34 |
037 |
|
Max (Dep_iobj / Sent) |
0.28 (0.54) |
0.35 (0.56) |
4.28 |
039 |
|
M (Dep_ccomp / Sent) |
0.08 (0.23) |
0.10 (0.23) |
4.24 |
039 |
|
M (Dep_iobj / Par) |
0.50 (1.43) |
0.47 (0.91) |
3.95 |
047 |
|
M (Dep_cc / Sent) |
0.38 (0.50) |
0.45 (0.55) |
3.89 |
049 |
Размещено на Allbest.ru
...Подобные документы
Developed the principles that a corpus of texts containing code-mixing should have and built a working prototype of Udmurt/Russian Code-Mixing Corpus. Discussed different approaches to studying code-mixing and various classifications of code-mixing.
дипломная работа [1,7 M], добавлен 30.12.2015Анализ существующего программного обеспечения эмпирико-статистического сравнения текстов: сounter оf сharacters, horos, graph, advanced grapher. Empirical-statistical comparison of texts: функциональность, процедуры и функции тестирование и внедрение.
дипломная работа [4,4 M], добавлен 29.11.2013Характеристика программных продуктов Open Source: Umbrello - среды UML-моделирования на языке, Rational Rose - средства визуального моделирования объектно-ориентированных информационных систем. Описание и сравнение сайтов по созданию онлайн UML диаграмм.
контрольная работа [1,5 M], добавлен 03.11.2013Lists used by Algorithm No 2. Some examples of the performance of Algorithm No 2. Invention of the program of reading, development of efficient algorithm of the program. Application of the programs to any English texts. The actual users of the algorithm.
курсовая работа [19,3 K], добавлен 13.01.2010Перспективные направления анализа данных: анализ текстовой информации, интеллектуальный анализ данных. Анализ структурированной информации, хранящейся в базах данных. Процесс анализа текстовых документов. Особенности предварительной обработки данных.
реферат [443,2 K], добавлен 13.02.2014Program automatic system on visual basic for graiting 3D-Graphics. Text of source code for program functions. Setting the angle and draw the rotation. There are functions for choose the color, finds the normal of each plane, draw lines and other.
лабораторная работа [352,4 K], добавлен 05.07.2009Високовольтний імпульсний драйвер MOSFET з синхронним випрямлянням від фірми Intersil. Ключові властивості драйверів SCALE. Концепція захисту драйверів SCALE. Технологія та характеристики драйверів SCALE для IGBT-модулів. Режими роботи драйверів SCALE.
реферат [180,3 K], добавлен 08.11.2010Program of Audio recorder on visual basic. Text of source code for program functions. This code can be used as freeware. View of interface in action, starting position for play and recording files. Setting format in milliseconds and finding position.
лабораторная работа [87,3 K], добавлен 05.07.2009Актуальность и значимость создания web-сайта образовательного учреждения - школы. Функциональное моделирование предметной области. Основные этапы разработки сайта. Программная реализация. Установка, настройка и работа с локальным сервером Open Server.
дипломная работа [990,5 K], добавлен 01.01.2018Program game "Tic-tac-toe" with multiplayer system on visual basic. Text of source code for program functions. View of main interface. There are functions for entering a Players name and Game Name, keep local copy of player, graiting message in chat.
лабораторная работа [592,2 K], добавлен 05.07.2009Creation of the graphic program with Visual Basic and its common interface. The text of program code in programming of Visual Basic language creating in graphics editor. Creation of pictures in Visual Basic, some graphic actions with graphic editor.
лабораторная работа [1,8 M], добавлен 06.07.2009Обзор рынка Информационных технологий. Современные автоматизированные системы управления проектами и их классификация. Open Plan (Welcom Software) - система, предлагающая решение по управлению проектами масштаба корпорации. Основные модули Open Plan.
курсовая работа [630,9 K], добавлен 24.02.2010Використання програмованих логічних інтегральних схем для створення проектів пристроїв, їх верифікації, програмування або конфігурування. Середовища, що входять до складу пакету "MAX+PLUS II": Graphic, Text, Waveform, Symbol та Floorplan Editor.
курсовая работа [1,8 M], добавлен 16.03.2015Электронные библиотеки, проблемы авторского права и их решение. Форматы выкладываемых произведений: графические растровые, графические векторные с оформлением, простой текст (plain text). Обзор по самым известным программам для чтения электронных книг.
реферат [29,7 K], добавлен 16.07.2010Basic assumptions and some facts. Algorithm for automatic recognition of verbal and nominal word groups. Lists of markers used by Algorithm No 1. Text sample processed by the algorithm. Examples of hand checking of the performance of the algorithm.
курсовая работа [22,8 K], добавлен 13.01.2010Значение атрибута TITLE тега HTML-документа. Возможности HTML для разработчиков Web-страниц. Параметры тега , регулирующие отступы вокруг изображения. Оформление комментариев в CSS. Теги логического форматирования текста (phrase elements).
тест [19,9 K], добавлен 11.10.2012Анализ оптово-розничной торговли в сфере флористики. Методы автоматизации предпринимательской деятельности, электронная коммерция и бесплатные Open-Source СУБД. Базы данных основного и архивного сервера. Запуск интернет-магазина и установка OpenCart.
дипломная работа [3,2 M], добавлен 18.07.2012Инсталляция программы Adobe PageMaker 6.5. Элементы интерфейса, палитра инструментов и меню, настройка параметров. Создание новой публикации. Форматирование текста. Масштаб отображения страниц. Инструмент Pointer и Text. Экспорт и импорт объектов.
курсовая работа [949,2 K], добавлен 12.01.2011Hyper Text Markup Language (html) как стандартный язык для создания гипертекстовых документов в среде web. Тэги списков, гипертекстовые ссылки, графика внутри документа, специальные тэги html и таблицы. Планирование фреймов. Этапы создания сайтов.
контрольная работа [126,9 K], добавлен 18.11.2010Опис мови програмування PHP. Стратегія Open Source. Мова розмітки гіпертекстових документів HTML. Бази даних MySQL. Обґрунтування потреби віддаленого доступу до БД. Веб-сервер Apache. Реалізація системи. Інструкція користувача і введення в експлуатацію.
курсовая работа [42,9 K], добавлен 21.12.2012