Anonymous Vs. Attributed: Cluster Analysis of Tolstovskii Sbornik Texts and Its Interpretation in Terms of Cultural Heritage

Lexico-semantic dominants, markers that distinguish the texts of medieval anthologies from each other. Analysis of the statistical distance between anonymous and author's texts. Differences between the anonymous Word of Wisdom and K. Turovsky's sermon.

The presented pronouns convergence supposes to contradict the union of subclusters shown on the dendrogram in Figure 7. The distribution of pronouns in «The Life of Basil the Great» is closer to that in the apocryphal «The Abgar Legend» than in the anonymous «Parable of Wisdom»: 9 and 7 units, respectively. This contradiction is even more tangible when juxtaposing Life_Bas with the anonymous «Legend of Aphroditian»: there are 12 matches, although the texts belong to different subclusters.

We should also note that between Life_Bas and Aphr, there are more similarities concerning nouns and verbs, while there are no similarities among adjectives. Cf. (Table 11).

This contradiction certainly can be smoothed out by the greater proximity of the rank values of the matching tokens. Thus, it reveals the specifics of the quantitative measurement of the texts' convergence and divergence based on the rank status of corresponding tokens. Another explanation for this contradiction could be the greater proximity of Life_Bas and An_Wisd, considering function words: prepositions, conjunctions, and particles. Their comparison, however, gives the opposite result again. Cf. (Table 12).

Here again, the similarities between Life_ Bas and Abg prevail: the convergence between An_Wisd and Life_Bas is 11 units, and between Life_Bas and Abg - 15 units.


The linguistic analysis of the convergence and divergence in Tolstovskii Sbornik texts as a whole confirmed the effectiveness of the statistical methods applied. The statistical analysis made it possible to identify several thematic keys crucial for Cyril of Turov's homilies, as well as to establish the significance of the role deixis and the diverse use of other types of pronouns for his preaching discourse. The great importance of role deixis in Cyril of Turov's homilies creates the basis for evaluation of the dramaturgical mode involvement in other texts. Besides, it is an essential indicator of the original preaching discourse.

The analysis demonstrated that the variety and quantitative level of pronouns use could determine the proximity and distance of texts from each other to a much greater extent than the use of nouns, adjectives, and verbs. In quantitative and statistical terms, the level of convergence may not completely coincide with lexical and syntactic parallels, since these cases emphasize the rank status of existing convergences. At the same time, we should take into account that the special diagnostic significance of pronouns is mostly explained by their universal high frequency in texts of various types; it primarily testifies to the nature of the discursive and pragmatic organization of texts, not to their thematic affinity. In the future, for an accurate evaluation of the thematic texts' proximity, it is necessary to use the lemmatization mechanism, which, however, should not be used in all cases. When lemmatizing verb forms, there is a risk of losing valuable information related to deictic categories of person, time, and taxis.

Thus, the linguistic and statistical analysis confirmed the sharp difference between the anonymous Parable of Wisdom and Cyril of Turov's homilies. It is another proof that the Parable belongs to another, unknown author. At the same time, statistical data allowed us to detect peculiar similarities between anonymous sermons on the 5th Sunday after Easter and on Pentecost with Cyril of Turov's homilies. Nevertheless, the level of convergence in this case sharply contrasts with the level of convergence among Cyril of Turov's homilies themselves, and this proves that the reasons for the convergence are not connected with one person's authorship. Indeed, they are limited by some similarities only in the use of units with universally high frequency, namely, pronouns of different lexical and semantic categories. The dynamics of function words' distribution will require a separate study in the future since such words have the highest frequency in texts of any type.

Sources and abbreviations

Abg - The Abgar Legend («The Holy Mandylion Transference»)

An_East - (an anonymous) sermon on the 5th Sunday after Easter

An_Wisd - (an anonymous) Parable of Wisdom

An_Pent - (an anonymous) sermon on Pentecost

Aphr - Legend of Aphroditian

Life_Bas - Life of St. Basil the Great

Chrys_Nat - John Chrysostom's Nativity sermon

CJ1 - the 1st catechetical lecture of Cyril of Jerusalem

CJ2 - the 2nd catechetical lecture of Cyril of Jerusalem

CJ3 - the 3rd catechetical lecture of Cyril of Jerusalem

CJ13 - the 13th catechetical lecture of Cyril of Jerusalem

CJ14 - the 14th catechetical lecture of Cyril of Jerusalem

CJ15 - the 15th catechetical lecture of Cyril of Jerusalem

CJ16 - the 16th catechetical lecture of Cyril of Jerusalem

CT_Asc - Cyril of Turov's sermon «On Ascension of the Lord»

CT_Paral - Cyril of Turov's sermon «On Sunday of the Paralytic»

CT_Fath - Cyril of Turov's sermon «On Nicaea Council Fathers'»

CT_Blind - Cyril of Turov's sermon «On Sunday of the Blind Man»

CT_Desc - Cyril of Turov's sermon «On Descent from the Cross»

CT_Thom - Cyril of Turov's sermon «On Sunday of St. Thomas the Apostle»

Kazan Collection - Kazan collection of Slavic-Russian written sources from the 12th - 14th centuries. Kazan Federal University, the laboratory of palaeoslavistics, with the support of IAS «Manuscript», 2007-2020, available at: (accessed 15 May 2020).

SbTol - Sermons and teachings collection («Tolstovskii Sbornik»), the 2nd half of the 13th century (Russian National Library, F.p.I. 39), 184 ff. [Online resource] / Oleg Zholobov et al.; «Manuscript» project, available at: http:// (accessed 15 May 2020).


