Buryat language and ethnic identity among the Buryats in the web space

Features of computer-mediated communication, digital diasporas. Minority languages of the Russian Federation on the Internet. Definition of Buryat language practices on the Internet. Discussion topics and analysis of users of the Buryat language.

Ðóáðèêà Èíîñòðàííûå ÿçûêè è ÿçûêîçíàíèå
Âèä äèïëîìíàÿ ðàáîòà
ßçûê àíãëèéñêèé
Äàòà äîáàâëåíèÿ 18.07.2020
Ðàçìåð ôàéëà 1,0 M

Îòïðàâèòü ñâîþ õîðîøóþ ðàáîòó â áàçó çíàíèé ïðîñòî. Èñïîëüçóéòå ôîðìó, ðàñïîëîæåííóþ íèæå

Ñòóäåíòû, àñïèðàíòû, ìîëîäûå ó÷åíûå, èñïîëüçóþùèå áàçó çíàíèé â ñâîåé ó÷åáå è ðàáîòå, áóäóò âàì î÷åíü áëàãîäàðíû.

Ðàçìåùåíî íà http://www.Allbest.Ru/

Ðàçìåùåíî íà http://www.Allbest.Ru/

Ðàçìåùåíî íà http://www.Allbest.Ru/

Federal state autonomous educational institution for higher professional education

National research university higher school of economics

St. Petersburg School of Social Sciences and Area Studies
Field of study: 39.03.01 Sociology
Degree programme: Sociology and Social Informatics
BACHELOR'S PROJECT
Buryat language and ethnic identity among the Buryats in the web space
Lkhasaranova I.A.
Supervisor: Baranova V.V.
Candidate of Sciences (PhD)
Saint Petersburg 2020
Table of Contents

Introduction

  • Chapter 1: Theoretical framework of study
    • 1.1 Features of Computer-Mediated Communication
    • 1.2 Digital diasporas
    • 1.3 Minority languages of the Russian Federation on the Internet
    • 1.4 Language situation in Republic of Buryatia
  • Chapter 2: Methodology and methods of research
    • 2.1 Hypotheses
    • 2.2 Selection of a data for analysis
    • 2.3 Cleaning and preprocessing of database
    • 2.4 Text mining
    • 2.5 Topic modelling
    • 2.6 Regression analysis
  • Chapter 3: Results of analysis
    • 3.1 Text mining
    • 3.2 Topic modeling
    • 3.3 Descriptive statistics
    • 3.4 Regression model

Conclusion

References

Appendices

Introduction

Languages like English, Chinese and Spanish become dominant in our modern society, while minority languages are forgotten. This fact can lead to the disappearance of cultures and knowledge of ethnic groups who speak these languages. According to the UNESCO Atlas of the World's Languages in Danger, if measures are not taken, approximately 6,000 languages will disappear by the end of the 21st century, while in Russia the number of endangered languages is equal to 131. Although nowadays we can see that the issue of ethnicity, ethnic identity, and the desire to revive or preserve native languages are gradually gaining attention in our society (Mustafina et al., 2018). However, the development of the Internet and social networks leads to the digitalization of a communication process and the changing in a way we speak and write, also in minority communities. So, the relevance of our study can be explained by the fact that the Internet has become a global platform for communication and self-identification on modern society.

Thus, the main goal of this study is to determine what are the language practices among Buryat people on the Internet. Also, we will find out what topics are most often discussed in the Buryat language and who use the Buryat language on the Internet. However, the title does not correspond to the content of the work. We studied the online language practices among Buryat people in Russia.

The data source will be results of the Computer Linguistics project of the HSE School of Linguistics in 2016 that collected messages from Vkontakte in Russian minority languages.

To achieve the goal, we need to complete the following objectives:

1. To identify what topics are most often discussed in the Buryat language;

2. To compare the frequency of using the Buryat language in Vkontakte groups by different population groups.

In order to achieve the goal and objectives, the study will use mixed methods for analyzing data, such as text mining, topic modeling using and regression analysis, using R Studio.

The theoretical significance of this work connected with the fact that there is no research about computer-mediated communication in the Buryat language in Russian Federation.

The theoretical framework of the study is composed by the works of S. Herring, and J. Androutsopoulos that connected with Computer-Mediated Communication (CMC), the works of J. Andrutzopoulos and M. Brinkerhoff about the digital diasporas, the works of K. Pischleger, Y. Adzhigitova and A. Gladkova that connected with minority language on the Internet in Russia.

The work consists of the introduction, three chapters and the conclusion. In the first chapter, we present the theoretical framework of the study. We define the basic concepts used in this work - a language and an identity on the Internet. Then, we consider the results of some studies about digital diasporas, the functions of minority languages in general and the analysis of statistical data on the Republic of Buryatia. In the second chapter the methodology and empirical data of our study are presented. And in the third chapter the results of the study and analysis of the data are presented.

Chapter 1: Theoretical framework of study

In this chapter, we will consider the main theoretical approaches to the study of Computer-Mediated Communication (CMC), the functioning of ethnic groups, language revitalization and minority languages in cyberspace, as well as the language situation in the region where the Buryat language is considered as the state language (the Republic of Buryatia).

1.1 Features of Computer-Mediated Communication

Since we will use data from Vkontakte where people use a computer, mobile phones or other devices to communicate their thoughts, opinions and ideas to each other, we should consider some characteristics of Computer-Mediated Communication.

Initially, Computer-Mediated Communication was text-based and accessed through autonomous clients. However, with the development of technology and the Internet, textual CMC has come to include graphic, audio or video materials, so researchers are also making efforts to analyze the discourse, using the systems, such as computer-mediated discourse analysis (CMDA). Nonetheless, CMDA was designed for text analysis, so it is not suitable for analyzing the visual aspects of online discourse. In her works, Herring tries to solve this problem and expand the CMDA to be able to analyze non-textual communications, as before that, scientists assumed that communication takes place only in text format (Herring et al. 2013, Herring 2019). Herring (2019, p.30) identifies the 3 historical stages of CMC: Pre-Web (stand-alone text clients), Web 1.0 (personal websites, publishing and so on) and Web 2.0 (blogging, Wikipedia and so on). In the end, Herring (2019) offers the theory of multimodal CMC that provides a new direction for CMDA. This theory is to include graphic materials, such as memes, avatar-mediated communication, and robot-mediated communication involving telepresence robot avatars in physical space (Herring 2019). Each of these phenomena mediates communication between people, supports social interaction and includes several ways or channels of communication. Thus, the very definition of CMC can be expanded.

Androutsopoulos introduces the concept of “networked multilingualism” to study multilingual online practices that are connected to others and included in the global web, it defined as “a cover term for multilingual practices that are shaped by two interrelated processes: being networked, i.e. digitally connected to other individuals and groups, and being in the network, i.e. embedded in the global digital mediascape of the web” (Androutsopoulos 2015, p. 188). In his work, Androutsopoulos (2015) studies users from Facebook, their linguistic repertoires, language choices for genres of self-presentation, dialogic exchange, and the performance of multilingual talk online. The results indicate that multilingual practices are complicated because they are “individualized”, “genre-shaped”, and based on a “wide” range of repertories (Androutsopoulos 2015, p. 185).

1.2 Digital diasporas

As it was already mentioned, networked technologies create new spaces, where people can share information and communicate with each other. Boyd (2011) defines it as “networked publics”, in which identities and interests of users can be formed. In her work Brinkerhoff (2009) study diasporas that use the Internet to maintain connections with their native countries. She claims that such “digital diasporas” can be considered as physical communities that encourage to share with personal stories, discuss some sensitive topics and create groups that show hybrid identities (Brinkerhoff 2009, p.2). “Digital diasporas” lead to the not only improving migrant's quality of life but help to prevent marginalization and ethnic conflicts (Brinkerhoff 2009, Everett 2009). Moreover, diasporas can have significant impact on politics and human rights, for example, when women in African diaspora were outraged by the lack of news about Million Woman March, they decided with the help of social networks to share news and events with each other (Everett 2009). Also, such digital diaspora can be identified as social network of migrants that is formed with the help of digital technologies such as mobile communications, the Internet and so on (Poznanesi 2020). Thus, Internet is considered as an alternative meeting place for people (Brinkerhoff 2009, Everett 2009).

Now we consider several examples of work that are connected with the linguistic practices of ethnic minorities on the global Internet. Generally, Androutsopoulos (2006) is focused on studying blogs and forums of diasporas in German. In his work he tries to determine how ethnic identity influences on the language use and highlights that even most of websites are in German, but minority languages such as Arabic, Greek or Hindi reach domination in some forums (Androutsopoulos 2006). However, Androutsopoulos (2006) claims that native languages transform, for instance, their Romanized transliteration. And it is the fact that human communication with each other using technologies is full of multilingualism and code-switching online (Androutsopoulos 2013). Although, Dovchin in his study about Facebook users in Mongolia and their linguistic online practices points out that Mongolian users actively use English, Russian and other languages in their online communications, they create new terms and expressions that refer to locally significant principles instead of just using words and language practices in English, Russian and other languages in their online communications, they move these practices into the Mongolian context and create new terms and expressions that refer to locally significant principles (Dovchin 2016, p.17).

1.3 Minority languages of the Russian Federation on the Internet

Russian - is an official language of Russian Federation with a significant number of speakers and support from government. However, minority languages in Russia are still present in the Internet. And the Internet changes the lives of minority communities and the way of their communication. And one of the core problems is the presence of minority languages on the Internet and how to use the global network for the preservation and the supporting of the needs of these minority languages.

Nowadays digital devices, the Internet and social media are used for the linguistic and cultural revitalization (Androutsopoulos 2007, Suleymanova 2018). However, there are 96 minority languages in Russia, but the researchers can identify 49 of them on the Internet (Orekhov et al. 2016). Then they point out the most represented languages on the Internet: Bashkir with 74 domains, Tatar with 59 domains, Yakut with 52 domains, Chuvash with 20 domains and Buryat with 19 domains (Orekhov et al. 2016). Meanwhile, in her work about representation of minor languages on the Russian Internet Krylova (2016) highlights Bashkir, Tatar, Yakut, Udmurt, and meadow-eastern Mari languages as the most common. We can see that the gap between representation of the Bashkir and the Buryat languages is notable. In her work, Khilkhanova (2019) makes a conclusion that the Internet reflects the current situation with minority languages in the real world. In other words, languages that are well represented without the Internet, have a high level of linguistic activism and national identity of the speakers, are also common in the global web (Õèëõàíîâà 2019).

Limited number of speakers of ethnic languages on the Internet is one of the core problems. In his works, Pischlöger studies the degree to which the Udmurt language is used on three typical Internet resources such as blogs, Twitter and Wikipedia, he concludes that only a few activists and journalists dominate among users and groups on various Internet resources (Pischlöger 2010, Pischlöger 2016). The similar results are in the Suleymanova's work, she highlights the significant role of activism and initiative for the revitalization and preservation of a minority language (Suleymanova 2018).

There are various quantitative and qualitative works about the use of the minority languages on the Internet space. The qualitative method of analyses we can find in the works of Pischlöger Christian. For instance, in his works Pischlöger (2010, 2016) research speakers of Udmurt language and how they use it on the Internet. He highlights that Udmurt language is one of the most active and popular minority languages in SNS (Anna Social Network Sites) comparing with different minority languages in Russia (Pischlöger, 2010). Also, Pischlöger (2016) points out that speakers of Udmurt language do not follow the language purism not only on the Internet, but in face-to face conversation as they can ignore the rules of Udmurt standard language and communicate in more natural style, including code switching and code mixing (“suro-puzho”). Moreover, he concludes that social networks have significant impact on preserving Udmurt language in the context of globalization, because it provides opportunity to create contents and share with others (Pischlöger 2010, 2013a, 2013b). Verschik (2016) makes the similar conclusion about having an impact of Estonian language on the Russian language in lexis, semantics and so on.

In her work Gladkova (2015) also highlights the role of the Internet and SNS to preserve linguistic and cultural pluralism in Russian web space. She researches 124 websites in Tatar, Chuvash, Bashkir, and Chechen. In order to achieve this aim, Gladkova distinguishes several factors:

1. To develop the Internet access in remote locations of Russian Federation;

2. To educate people to be active Internet users;

3. To motivate them to discuss their values, needs and interests on the Internet space (Gladkova 2015, p. 35).

In her work Adzhigitova (2018) also research Udmurt language using quantitative method of analyses and try to identify what role does language play in constructing the Chuvash identity on the Internet space. She makes conclusions that knowledge of the Chuvash language cannot be considered as criterion for self-identification as Chuvash on the Internet, but it can be the criterion that enhances ethnic self-identification on the Internet (Àäæèãèòîâà 2018). Additionally, Adzhigitova (2018) points out that using of Chuvash and Russian languages in Vkontakte Russian are correlated with the topics, for instance, Chuvash speakers prefer Russian, when they talk about hobbies and spending time together.

According to the Zaydelman's work, who studies minority languages in Vkontakte, a typical native speaker of the minority language in Russia is an individual from 19 to 31 ages old, he or she lives in the titular region for his language, but he or she speaks Russian much better and more often and usually he or she takes an active part in the life of only one community speaking a minority language (Çàéäåëüìàí 2016). The similar conclusion was made by Shirobokova (2011), she claims that there is language shift in favor of the Russian language, especially in the urban environment and among young people, but this study is not about the Internet users, so results can be different.

1.4 Language situation in Republic of Buryatia

In 2002, UNESCO included the Buryat language in the Atlas of the World's Languages in Danger. In the book, Buryat is considered as “severely endangered” that it is in danger of extinction. This means that in most cases the Buryat language is used by the older generation, while the younger generation practically does not speak. Despite of the fact that in 1992 the law “About the languages of the peoples of the Republic of Buryatia”/”Î ÿçûêàõ íàðîäîâ Ðåñïóáëèêè Áóðÿòèÿ” was adopted and the Buryat language along with Russian was given the status of the state language in the Republic of Buryatia.

However, according to the 2010 All-National Population Census, approximately 190 nationalities are registered in the Republic of Buryatia. This census shows that about 66.1% of Russians live in the republic, almost 30% are Buryats, and the rest are 3.8%. At the same time, 99.6% of the respondents who indicates the knowledge of the languages of the census are fluent in Russian, in everyday life 97.7% of respondents used it, and 85% of the respondents consider Russian as their native language. On the other hand, 18.9% of respondents use the Buryat language, 15.7% of them use it in everyday life, 21.3% of the respondents consider Buryat language as native. And 92% of Buryats use the Russian language in everyday life, 55.9% of Buryats - Buryat, 48% of them - in two languages.

Moreover, there is not an official television channel in the Buryat language, and there are only 2 schools that teach the Buryat language from the first to eleventh grades in Ulan-Ude. Also, the banners posters, signs (excluding on the state and municipal buildings), names of streets or squares are not translated into the Buryat language.

Thus, the linguistic situation in the Republic of Buryatia can be described as unbalanced with the dominant Russian language (Evseeva et al.,2019). However, researchers of the Buryat language note that its role is relegated to the background and gives way to the state language - Russian. (Åãîäóðîâà 2012, Îñèíñêèé 1994, Khandaeva 2016). Also, researchers point out that measures taken by the local government to preserve the Buryat language are ineffective. (Èâàùåíêî 2017, Khandaeva 2016).

Figure 1. Number of websites in minority languages in Russia

As for the Buryat language on the Internet, we can see from figure 1 that dominant languages are Bashkir, Tatar, Yakut and Udmurt languages with 74, 59, 52 and 38 websites respectively, while there are 19 websites in the Buryat language (Îðåõîâ 2017).

According to the figure 2, there 60 communities in Vkontakte that support the Buryat language, while there are 288 groups (the Udmurt language), 259 groups (the Bashkir language) and 263 communities (the Yakut language) in Vkontakte (Îðåõîâ 2017). So, we can say that the Buryat language is in the middle position according to the presence on the Internet, comparing with other minority languages of the Russian Federation.

Figure 2. Number of minority communities in the VKontakte

öèôðîâîé äèàñïîðà áóðÿòñêèé ÿçûê èíòåðíåò

Chapter 2: Methods and methodology

In order to achieve the goal and objectives, the study will use mixed methods for analyzing data, such as text mining, topic modeling to find out the topics of VKontakte communities using R Studio and regression analysis to find out the association between variables using R Studio.

2.1 Hypotheses

The above works help us to identify the following hypotheses that we need to test:

1. The Buryat language is more used to maintain online conversations on everyday topics and to a lesser extent - in the field of science, politics and economics;

2. There is language predominance in favor of the Buryat language, especially in the countryside;

3. There is language predominance in favor of the Buryat language, especially among old people.

2.2 Selection of a data for analysis

In order to answer the research question and to solve the tasks, we need a big database that can fully represent conversations in the Buryat language. As Vkontakte is the one of most popular and active social media in Russia, and people use it daily to communicate with each other on different topics. Because of this fact an analysis will be made of messages that are collected from Vkontakte. Thus, in this work we will use the data obtained during the HSE School of Linguistics project “Russian Languages on the Internet” led by B. Orekhov, and it is publicly available (http://web-corpora.net/wsgi3/minorlangs/view). This database contains various minority language in Russia, including the Buryat one. Of the 60 Vkontakte groups, where the Buryat language was used for communication at least once, 68860 messages were collected. According to Salganik (2017), this data is “big”, “always-on”, “nonreactive” and “dirty” (spam or advertisement).

This database is in json format and using the R programming language, we created a table that contains the following variables: user's id, user's gender, user's city of residence, user's birthdate, number of characters, post or comment, id of a group, name of a group, number of group members, number of messages in a group.

2.3 Cleaning and preprocessing of database

As we already said there are approximately 70000 messages in the dataset. However, some of them are spam and advertisements, most of them are too short to be analyzed. So, we need to clean the database and bring all the words from the messages to the initial form. In order to clean our dataset from messages with spam and advertisement, we will use Microsoft Excel and phrases or words that usually are used in spam such as “ïðîãîëîñóéòå ïîæàëóéñòà” or “íå ïðîõîäèòå ìèìî”.

Another problem is lack of automatic lemmatization for the Buryat language. To solve these problems, we made dictionary of Buryat words. We downloaded the data from the sites “Burlang.Toli” (https://buryat-lang.ru/) and BURYATIA.ORG that that contain electronic Buryat-Russian and Russian-Buryat dictionaries, using package “rvest” in R Studio. This process of automated information extraction from web sources is called web scraping. In the end we had a dataset that contains 10056 lemm or words that are in the initial form.

The next step was to create the lemmatizer and the determinant of the message language (Russian or Buryat) based on our dictionary. For all Buryat words with special characters such as ?, ? and ?, we recorded their analogues with Cyrillic characters.

After lemmatization and deleting stop-words such as pronouns, numerals and some common words (“to be”, “to appear”, “must” and so on), we translated the words on the Buryat language into the Russian one, using automatic translation. Also, we checked the quality of translation, using Microsoft Excel and its functions. If we cannot find the translation of the world, we removed it from database. For data preprocessing such as tokenization, normalization, and noise removal we used R Studio, for instance, library “tm” (it provides almost all functions).

2.4 Text mining

Before starting to analyze the topics of messages from Vkontakte, we analyzed the most frequently encountered and most important words in the dataset. We decided to show the results for messages in the Russian language and messages in the Buryat one separately. So, we separated dataset into two parts (Russian and Buryat) and analyzed them separately. To visualize the most common words we used word cloud. The main idea of this figure is the larger the word size in the cloud, the more often it appears in the text. In order to make figure readable, we set the minimum amount of frequency limitation that is of 5 words. Then we use Term Frequency and Inverse Document Frequency (tf-idf) in order to determine the most important words for the content of data by decreasing the weight for commonly used ones and increasing the weight for words that are not used commonly in a texts.

2.5 Topic modelling

In order to check the first hypothesis and determine the core functions of using Buryat language in web space, especially in Vkontakte, we used topic modeling. This method of classification texts by its words was used by Y. Adzhigitova in her work about the role of language in the self-determination of the Chuvashes. She points out that using of Chuvash and Russian languages in Vkontakte Russian are correlated with the topics, for instance, Chuvash speakers prefer Russian, when they talk about hobbies and spending time together (2018).

For topic modeling we used R Studio, using Latent Dirichlet Allocation (LDA). In this case the formation of topics is based on words that often appearing in documents, and each word has the probability for each topic, and the probability of each text will associate with the topic is the sum of the probabilities of the words in it. In order to improve the quality of analysis, we used texts with a length of more than 8 tokens or words. Thus, we had 1935 texts in Russian and 2932 texts in Buryat (in total 4867).

Then we identified the optimal number of topics for LDA: we built several LDA models and chose one that has the highest coherence value. Using this method, we recognized 10 stable topics based on the coherence value.

2.6 Regression analysis

In order to check the second and third hypotheses and compare the frequency of using the Buryat language in Vkontakte groups by different population groups, we used multinomial logistic regression in R Studio. As we wanted to identify the association between the outcome like language of the message and the predictor variables such as sex, age, and place of residence (table 1).

Table 1

Variables for regression analysis

Variable

Code

Number of observations

Sex

0 - male

2698

1 - female

6446

Age

-

9144

Place of residence

City in the republic of Buryatia

6097

Village in the republic of Buryatia

623

City outside the republic of Buryatia

1694

Village outside the republic of Buryatia

326

Foreign city

404

Chapter 3: Results of analysis

3.1 Text mining

We analyzed the most frequently encountered and most important words in the dataset. We decided to show the results for messages in the Russian language and messages in the Buryat one separately. To visualize the most common words we used word cloud. In order to make figure readable, we set the minimum amount of frequency limitation that is of 5 words.

Figure. 3. The most common word in Buryat texts

According to the Figure 3 the most typical words for messages in Buryat are “Buryat”, “today”, “to speak”, “an assignment”, “to improve”, “inside”, “a harvest”, “a nature” and “an adult”.

As for messages in Russian the most words are “Buryat”, “Buryat people”, “Buryatia”, “language”, “a girl”, “happiness”, “China”, “a harvest”, “a song” and “a friend”(Figure 4). As we can see the results are not the same.

Then we use Term Frequency and Inverse Document Frequency (tf-idf) in order to determine the most important words for the content of data. In the Figure 5 we can see the results of messages in Buryat. According to the tf-idf, the core words are” a happiness”, “today”, “a humanity”, “a harvest”, “to improve”, “Buryat”, “a nature”, “inside”, “an adult” and “an assignment”.

Figure 4. The most common words in Russian texts

Figure 5. The highest tf-idf of Buryat words

Figure 6. The highest tf-idf of Russian words

As for messages in Russian the main words are “a south”, “respected”, “a popularity”, “a yard”, “to speak”, “a mirror”, “good”, “a lake”, “a face” and “distant”(Figure 6). We can see that the results are also not the same. So, there are different ways of communication for the Russian and Buryat languages.

3.2 Topic modeling

Using the coherence value method, we recognized 10 stable topics based on the coherence value. Then we conducted a content analysis in order to determine their meaning and name. From Figure 7 that shows us the top 5 words with the highest association, we can see that the first topic contains words such as “an assignment”, “to speak”, “today”, “a travel”, “to locate” and “small”, while the second topic consists of words like “Buryat”, “other”, “an language” “today” and “a gold”. The words “a nature”, “a sky”, “a weather”, “a star”, “a lake” are generated from the third topic. As for the fourth topic, it contains words such as “inside”, “a harvest”, “a molk” and “a robbery”, when the fifth topic consists of words like “an adult”, “people”, “autocracy”, “today” and “Buryat”. The words “an assignment”, “a child”, “a time” and “pencil” are generated from the sixth topic. The seventh topic contains words such as “to improve”, “Buryat”, “a think”, “distant” and “a mirror”, while the eighth topic consists of words like “a humanity”, “a happiness”, “a wisdom”, “a nature” and “sacred”. The words “skillful”, “folk”, “a place”, “respected” and other” are generated from the ninth topic.

As for the tenth topic, it contains words such as “to throw”,” a face”, “nice”, “a funnel” and “beautiful”. Then we studied the first top 20 words (Appendix 1) with the highest probability that the word is generated from the topic, as well as their top 20 texts, which allowed us to determine the core themes of Vkontakte groups that using the Buryat language which are “Education”, “Language”, “Policy”, “Kitchen”, “Nature and weather”, “Hobbies”, “Religion”, “Culture”, “Family” and “Famous people”.

Figure 7. The highest word probabilities for each topic

Using these probabilities of the texts and words for each topic, we find out which topics are covered by the Buryat people on the Internet to a greater extent, and which ones to a lesser extent. According to the Figure 8, the most represented topics are “Hobbies” and” Language, while the least covered topics are “Kitchen” and “Nature and weather”. Based on these data, we can say that in most cases the Buryat people use Vkontakte groups as places where they can discuss common interests, spending time together and the translation of some words.

Figure 8. Top 10 topics

As we can see the Buryat language in Vkontakte groups is used to a lesser extent than the Russian language for maintaining conversations about hobbies, policy, education and family while in other areas like language, culture, famous people, religion, nature and kitchen both languages are used equally or the Buryat language is slightly predominant (Figure 8).

To sum up, the first hypothesis is approved: the Buryat language is more used to maintain online conversations on everyday topics such as language, culture, famous people, religion, nature, and kitchen.

3.3 Descriptive statistics

Before we test regression model, we should provide descriptive statistics of variables.

Table 9. Distribution of age

As we can see the distribution of variable “age” is not normal (table 9). It is right-skewed (positive skewness). Majority of users are from 20 to 30 years old.

Table 10. Distribution of sex

According to the table 10, the most of users are female. Their number exceeds the number of male users by almost 2 times.

Table 11. Distribution of place of residence

As we can see from table 11, the most common places of residence of users are the city in the Republic of Buryatia and the city outside the Republic of Buryatia, while the least common place of residence is village outside the Republic of Buryatia.

3.4 Regression model

The next stage of our work was the construction of multinomial logistic regression. We analyzed about 9500 observations, taking the Russian language as the dependent variable (“language” where Russian - 0, Buryat - 1).

According to the table 2, an associated p-value of the variable “age” is equal to 0.2294. So, this coefficient is not statistically significant at the 5% level, and the variable “age” is insignificant indicator. However, it can be explained by the lack of a representative sample. The existence of the digital divide (lack of the Internet and technical devices) can lead to the fact that our data are not reliable, and we cannot be sure that there is no association between the language of the online message and the age of user.

The next variable is “sex”, and for a one-point increase in “sex”, we expect to see a 1.92% increase in the odds of using the Buryat language (table 2). With an associated p-value of 0.0004, we can say that this coefficient is statistically significant at the 5% level. It means that women are more likely to use the Buryat language on the Internet.

The last predictor is user's place of residence. Buryat cities were taken as the basic category. We can see that, compared with residents of the city in the republic of Buryatia, representatives of villages in both the republic and other regions are more likely to write on the network in the native language (1.88 and 1.80 times respectively). At the same time, the probability that people from other cities of the Russian Federation will use the Buryat language is equal to 1.54. However, the variable “foreign city” is not significant with an associated p-value of 0.7810. It means that users from villages are more likely to use the Buryat language on the Internet.

Table 2

Coefficients of regression model

Independent variables:

Coefficients:

Age

1.0226 (0.2294)

Sex

1.9238*** (0.0004)

City outside RB

1.5359* (0.0183)

Foreign city

4.3658 (0.7810)

Village in RB

1.8805* (0.0325)

Village outside RB

1.8048***(0.0002)

Constant

0.2313

Observations

9144

Log Likelihood

-2229.2587

Note:

p<0.05**p<0.01***p<0.001

In the end, the second hypothesis is approved: there is association between the use of the Buryat language and the place of residence of the author of online message. As the non-urban population of the Republic of Buryatia uses the Buryat language to a greater extent on the Internet communication than the users from the cities. And the final is not approved: variable “age” is statistically significant to predict the use of the Buryat language.

Conclusion

We analyzed database that was received during the HSE School of Linguistics project “Russian Languages on the Internet” led by B. Orekhov (68860 messages until 05/09/2015). For translation, we made a dictionary that contained 10056 unique Buryat lemmas.

For topic modeling we used the database that contained 4867 messages or post with a length of 8 or more than 8 words (1935 texts in Russian and 2932 texts in Buryat). Then we received 10 stable interpretable topics: “Education”, “Language”, “Policy”, “Kitchen”, “Nature and weather”, “Hobbies”, “Religion”, “Culture”, “Family” and “Famous people”.

Based on the results of thematic modeling, we can make a conclusion that the Buryat language in Vkontakte groups is used to a lesser extent than the Russian language for maintaining conversations about hobbies, policy, education and family, while in other areas like language, culture, famous people, religion, nature and kitchen both languages are used equally or the Buryat language is slightly predominant. These results correspond to the conclusions of Adzhigitova (2018) that the Chuvash language is more used to maintain online conversations on everyday topics and to a lesser extent - in the field of science, politics, and economics. Also, we can say that groups in Vkontakte can be considered as an alternative meeting place for keeping in touch, where people can discuss the ways to spend time together, hobbies or personal stories, it one of the principle of the digital diasporas (Brinkerhoff 2009, Everett 2009).

The next step in the analysis was to compare the use of the Buryat and Russian languages by different population groups. For regression analysis we used database that contains 9144 messages or posts. We wanted to identify the association between language of the message and user's sex, age, and place of residence. So, we found out that there was signs of digital inequality and we made the conclusion that there was not association between language of the message and age. Also, we pointed out that female users are more likely to use the native language on the Internet than male users. In addition, we found out that the use of the Buryat language was associated with the place of residence of the user. Thus, the people from cities of the Republic of Buryatia, Russia and foreign cities used the Buryat language in online communication to a lesser extent than residents from villages in the Republic of Buryatia and Russia.

References

1. Androutsopoulos, J. (2006). Multilingualism, diaspora, and the Internet: Codes and identities on German based diaspora websites. Journal of Sociolinguistics.

2. Androutsopoulos J. (2007) Bilingualism in the Mass Media and on the Internet. In: Heller M. (eds) Bilingualism: A Social Approach. Palgrave Advances in Linguistics. Palgrave Macmillan, London

3. Androutsopoulos, J. (2013). Code-switching in computer-mediated communication. In S. C. Herring, D. Stein, & T. Virtanen (Eds.), Pragmatics of computer-mediated communication (pp. 659-686). Berlin, Germany & New York, NY: Mouton de Gruyter.

4. Androutsopoulos, J. (2015). Networked multilingualism: Some language practices on Facebook and their implications. International Journal of Bilingualism, 19(2), 185-205

5. Boyd, D. (2011). Social network sites as networked publics: Affordances, dynamics, and implications. Z. Papacharissi (Ed.), A networked self. Identity, community, and culture on social network sites (pp. 39-58).

6. Brinkerhoff, J.M. (2009). Digital diasporas: identity and transnational engagement. Cambridge University Press.

7. Dovchin, S. (2016). Multilingual Wordplays amongst Facebook Users in Mongolia Sender, pp. 97-112 (18 pages)

8. Everett, A. (2009). Digital Diaspora: A Race for Cyberspace. Albany: SUNY Press.

9. Evseeva, I.V., Kulekhova, A.M., & Federal State Budgetary Educational Institution of Higher Education “Kemerovo State University.” (2019). Modern Language Situation in the Irkutsk Region: The Reasons for the Language Shift (based on the Material of Russian and Buryat Languages). Nauchnyy Dialog, 10, 128-143.

10. Gladkova A., (2015) Linguistic and cultural diversity in Russian cyberspace: examining four ethnic groups online, Journal of Multicultural Discourses, 49-66

11. Herring, S.C. (2019). The Coevolution of Computer-Mediated Communication and Computer-Mediated Discourse Analysis. In P. Bou-Franch & P. Garcés-Conejos Blitvich (Eds.), Analyzing Digital Discourse (pp. 25-67). Springer International Publishing

12. Herring, Susan C., Dieter Stein, and Tuija Virtanen, eds. 2013. Pragmatics of Computer-Mediated Communication. Berlin: De Gruyter Mouton.

13. Khandaeva À.À., (2016). "The question of language situation in the republic of buryatia". Meždunarodnyj nauèno-issledovatel'skij žurnal (International Research Journal) ¹3 (45) Part 4: 100. Tue. 22. Mar.2016.

14. Lkhasaranova,Y. (2020). Buryat language and ethnic identity among the Buryats in the web space. Project proposal.

15. Mustafina, Jamila & Nurutdinova, Nailya & Slavina,

Liliia & Mustafina, Lilia. (2018). Regional languages of the Russian federation in mass media: legislative support.

16. Orekhov, B., Krylova, I., Popov, I., Stepanova, E., Zaydelman, L. Russian Minority Languages on the Web: Descriptive Statistics.

17. Pischlöger C. (2016). Udmurt on Social Network Sites: A Comparison with the Welsh Case // Linguistic Genocide Or Superdiversity? New and Old Language Diversities. Ò. 14. - P. 108-132.

18. Pischlöger, C. (2013a) ['Buranovskiye Babushki' on YouTube: The role of social networking sites in maintaining the Udmurt identity and language under the circumstances of globalization] In [Questions of Maintaining the Non-material Cultural Heritage Under the Circumstances of Globalization], pp. 161-165.

19. Pischlöger, C. (2013b) [Udmurt and Besermyan on social networking sites], pp. 187-190. [Science, Enlightenment and Art of the Province in the Socio-Cultural Space: Ninth Korolenko Readings: Proceedings of the International Conference, Devoted to the 160th Birthday of V.G. Korolenko (28 October 2013)]

20. Pischloger, Ch. (2010). Udmurtness in Web 2.0: Urban Udmurts Resisting Language Shift. In Finnisch-UgrischeMitteilungen. ¹38. P. 143-162.

21. Poznanesi S. (2020) Digital Diasporas: Postcoloniality, Media and Affect, Interventions

22. Salganik, Matthew J. (2017). Bit by Bit: Social Research in the Digital Age. Princeton, NJ: Princeton University Press. Open review edition.

23. Shirobokova, L. (2011). Udmurt-Russian bilingualism (Udmurt Republic, Sarkan region, Muvyr village). Doctoral dissertation. Budapest

24. Suleymanova, D. (2018) Creative cultural production and ethnocultural revitalization among minority groups in Russia, Cultural Studies, 32:5, 825-851

25. Verschik, A. (2016). Mixed Copying in Blogs: Evidence from Estonian-Russian Language Contacts.

26. UNESCO Atlas of the World's Languages in Danger

27. Àäæèãèòîâà Þ., (2018) ßçûê â ñàìîîïðåäåëåíèè ÷óâàøåé (ïî äàííûì äèñêóññèé íà ôîðóìàõ è â ãðóïïàõ Âêîíòàêòå)

28. ÂÏÍ 2010 - Âñåðîññèéñêàÿ ïåðåïèñü íàñåëåíèÿ 2010 ã. // Èíñòèòóò äåìîãðàôèè ÍÈÓ «ÂØÝ»

29. Åãîäóðîâà Â.Ì. (2012).È âñ¸-òàêè ðóññêèé ÿçûê … (ôàêòîðû âûáîðà ÿçûêà êîììóíèêàöèè â ïîëèýòíè÷åñêîé Áóðÿòèè) // Ó÷¸íûå çàïèñêè ÇàáÃÓ. Ñåðèÿ: Ôèëîëîãèÿ, èñòîðèÿ, âîñòîêîâåäåíèå.

30. Çàéäåëüìàí È., (2016). Ìîäåëèðîâàíèå ðå÷åâîãî ïîâåäåíèÿ íîñèòåëÿ ìèíîðèòàðíîãî ÿçûêà ÐÔ â ñîöèàëüíîé ñåòè

31. Èâàùåíêî Ë., (2017). Òåíäåíöèè ðàçâèòèÿ íàöèîíàëüíîé ïîëèòèêè â Ðåñïóáëèêå Áóðÿòèÿ // Îãàð¸â-Online. ¹12 (101).

32. Êðûëîâà È., (2016). Ìèíîðèòàðíûå ÿçûêè ÐÔ â Èíòåðíåòå: êîëè÷åñòâåííîå îïèñàíèå è àíàëèç äàííûõ

33. Îðåõîâ Á.Â., (2017). ßçûêè Ðîññèè â èíòåðíåòå

34. Îñèíñêèé Ï.È. (1994). Ýòíîïîëèòè÷åñêàÿ ñèòóàöèÿ â Áóðÿòèè â êîíòåêñòå ðåôîðìû ðîññèéñêîé ôåäåðàòèâíîé ãîñóäàðñòâåííîñòè // Îáùåñòâåííûå íàóêè è ñîâðåìåííîñòü. - ¹3. - Ñ. 121-130.

35. Õèëõàíîâà, Ý.Â. (2019). Èíòåðíåò è ìèíîðèòàðíûå ÿçûêè Ðîññèè: ñèìâîëè÷åñêîå ïðèñóòñòâèå èëè èíñòðóìåíò ðåâèòàëèçàöèè? (íà ïðèìåðå áóðÿòñêîãî ÿçûêà). Ìîíãîëîâåäåíèå; (4): 967-988.

Appendix 1

Topics of group from Vkontakte

Topic 1. Hobbies

Topic 2. Language

Topic 3. Nature and weather

Topic 4. Kitchen

Topic 5. Policy

ñåãîäíÿ

ïóòåøåñòâèå

ãîâîðèòü

íàõîäèòñÿ

ìåëêèé

íóæíûé

òåìíûé

äîì

çíàòü

ïåòü

äîðîãà

óäà÷à

ïîéòè

ëàäíî

åõàòü

ïèñàòü

ñìîòðåòü

èãðàòü

ãîâîðèòü

áèíîêëü

óì

áóðÿòñêèé îñòàëüíîé

ÿçûê

ñåãîäíÿ

ïèñüìåííûé ñëîâî

çîëîòî

ðóññêèé

÷èòàòü

âåñòü

áóìàãà ïèñüìåííîñòü

áóðÿòû

ñåãîäíÿ

ïåòü

÷åðíèëà

ðîäñòâåííèê

÷èòàòü

ñòèõîòâîðåíèå

ïèñüìî

íàðîä

ðîäíîé

ïðèðîäà

íåáî

ïîãîäà

îçåðî

çâåçäà ìåñÿ÷íûé

ñîëíå÷íûé

çåìëÿ

öâåòîê

ãîðà

ñèëüíûé

áðàòü

áàéêàë

ðîçà

ïàäåíèå

êðûëüÿ

âåðøèíà

ìóäðîñòü

ðåñïóáëèêà

ìûñëü

áóðÿòû

âíóòðåííîñòü

óðîæàé

ãðàá¸æ

ïëåñåíü

çåëåíîâàòûé

ïå÷ü

ìîëîêî

òåñòî

áûòîâîé

ìóêà

ìÿñî

ðåñïóáëèêà

ìóäðîñòü

ìûñëü

áóðÿòû

çîëîòî

óì

áóðÿòñêèé

ñåãîäíÿ

èñêóñíûé

íàðîäíûé

ìåñòî

âçðîñëûé íàðîä ñàìîäåðæàâèå ñåãîäíÿ ñòîðîíà ðåñïóáëèêà ãîñóäàðñòâî

äåëîâîé

äîëã ìóíèöèïàëèòåò

áóðÿòñêèé

ðîññèÿ

áóðÿòû

óëó÷øàòü

ìûñëü

îáû÷àé

çàïàõ

äåíü

ðóêà

êóëüòóðà

Topic 6 Education

Topic 7 Culture

Topic 8 Religion

Topic 9 Famous people

Topic 10 Family

çàäàíèå

êàðàíäàø

ìåëêèé

âðåìÿ

÷èñòûé

îáðàçîâàíèå

êà÷åñòâî

óìíûé

óøè

óì

þíûé

äåòÿ

ó÷èòåëü

óðîêè

ñòàðøèé

ó÷èòü

øêîëüíûé

êëàññ

äîðîãà

èíòåðíàò

îêòÿáðüñêèé

ðóêà

ôîíòàí

óëó÷øàòü

áóðÿòñêèé

ìûñëü

çåðêàëî

çàïàõ

äåíü

ðåñïóáëèêà

ðóêà

êóëüòóðà

íàöèîíàëüíîñòü

âïëîòíóþ

ÿñíûé

ÿê

ôàñàä

ìåðîïðèÿòèå

äàëüíèé

ãîðëî

áóðÿòû

ñàìîäåðæàâèå

äàëüíèé

÷åëîâå÷íîñòü

ñ÷àñòüå

ìóäðîñòü

åñòåñòâî

ñâÿùåííûé

íàäåæäà

àðøàí

ëîøàäü

äîáðîäåòåëü

óì

ìîëèòâà

âåðóþùèé

÷èñòûé

ìûñëü

ðåñïóáëèêà

áóðÿòû

ñåãîäíÿ

çîëîòî

áóðÿòñêèé

èñêóñíûé

þã

èñêóñíûé

íàðîäíûé

ìåñòî

ïðè÷èíà

äðóãîé

èñòîðèÿ

áóðÿòû

ðåïóòàöèÿ

êíèãà

ìóçûêà

áóäóùåå

èçâåñòíûé

äåÿòåëüíûé

ïîýò

æàð

áèîãðàôèÿ

çîëîòî

ìóäðîñòü

ðåñïóáëèêà

ìûñëü

óâàæàåìûé

áðîñàòü

ëèöî

êðàñèâûé

ëþáÿùèé

ìàìà

ðîäñòâåííèê

ìóæ

ïàïà

ðîä

ïàðåíü

äåâóøêà

ñâàäüáà

îòåö

þíûé

øóì

óòðî

ðîäñòâåííûé

ëèöî

ìóäðîñòü

ðåñïóáëèêà

ìûñëü

Ðàçìåùåíî íà allbest.ru

...

Ïîäîáíûå äîêóìåíòû

  • Loan-words of English origin in Russian Language. Original Russian vocabulary. Borrowings in Russian language, assimilation of new words, stresses in loan-words. Loan words in English language. Periods of Russian words penetration into English language.

    êóðñîâàÿ ðàáîòà [55,4 K], äîáàâëåí 16.04.2011

  • Basic approaches to the study of the English language. Intercultural communication and computerization of education. The use of technical means for intensification of the educational process. The use of video and Internet resources in the classroom.

    êóðñîâàÿ ðàáîòà [333,1 K], äîáàâëåí 02.07.2014

  • Theoretical foundation devoted to the usage of new information technologies in the teaching of the English language. Designed language teaching methodology in the context of modern computer learning aid. Forms of work with computer tutorials lessons.

    äèïëîìíàÿ ðàáîòà [130,3 K], äîáàâëåí 18.04.2015

  • The case of the combination of a preposition with a noun in the initial form and description of cases in the English language: nominative, genitive, dative and accusative. Morphological and semantic features of nouns in English and Russian languages.

    êóðñîâàÿ ðàáîòà [80,1 K], äîáàâëåí 05.05.2011

  • Theory of the communicative language teaching. Principles and features of the communicative approach. Methodological aspects of teaching communication. Typology of communicative language activities. Approbation of technology teaching communication.

    êóðñîâàÿ ðàáîòà [608,8 K], äîáàâëåí 20.10.2014

  • The Importance of Achieving of Semantic and Stylistic Identity of Translating Idioms. Classification of Idioms. The Development of Students Language Awareness on the Base of Using Idioms in Classes. Focus on speech and idiomatic language in classes.

    äèïëîìíàÿ ðàáîòà [66,7 K], äîáàâëåí 10.07.2009

  • From the history of notion and definition of neologism. Neologisms as markers of culture in contemporary system of language and speech. Using of the neologisms in different spheres of human activity. Analysis of computer neologisms in modern English.

    íàó÷íàÿ ðàáîòà [72,8 K], äîáàâëåí 13.08.2012

  • The influence of other languages and dialects on the formation of the English language. Changes caused by the Norman Conquest and the Great Vowel Shift.Borrowing and influence: romans, celts, danes, normans. Present and future time in the language.

    ðåôåðàò [25,9 K], äîáàâëåí 13.06.2014

  • Theories of discourse as theories of gender: discourse analysis in language and gender studies. Belles-letters style as one of the functional styles of literary standard of the English language. Gender discourse in the tales of the three languages.

    äèïëîìíàÿ ðàáîòà [3,6 M], äîáàâëåí 05.12.2013

  • Investigating grammar of the English language in comparison with the Uzbek phonetics in comparison English with Uzbek. Analyzing the speech of the English and the Uzbek languages. Typological analysis of the phonological systems of English and Uzbek.

    êóðñîâàÿ ðàáîòà [60,3 K], äîáàâëåí 21.07.2009

  • Comparison of understanding phraseology in English, American and post-Soviet vocabulary. Features classification idiomatic expressions in different languages. The analysis of idiomatic expressions denoting human appearance in the English language.

    êóðñîâàÿ ðàáîòà [30,9 K], äîáàâëåí 01.03.2015

  • Culture in the Foreign language classroom. Cross-cultural communication. The importance of teaching culture in the foreign language classroom. The role of interactive methods in teaching foreign intercultural communication: passive, active, interactive.

    êóðñîâàÿ ðàáîòà [83,2 K], äîáàâëåí 02.07.2014

  • Comparative analysis and classification of English and Turkish consonant system. Peculiarities of consonant systems and their equivalents and opposites in the modern Turkish language. Similarities and differences between the consonants of these languages.

    äèïëîìíàÿ ðàáîòà [176,2 K], äîáàâëåí 28.01.2014

  • Features of the use of various forms of a verb in English language. The characteristics of construction of questions. Features of nouns using in English language. Translating texts about Problems of preservation of the environment and Brands in Russian.

    êîíòðîëüíàÿ ðàáîòà [20,1 K], äîáàâëåí 11.12.2009

  • Study of lexical and morphological differences of the women’s and men’s language; grammatical forms of verbs according to the sex of the speaker. Peculiarities of women’s and men’s language and the linguistic behavior of men and women across languages.

    äèïëîìíàÿ ðàáîòà [73,0 K], äîáàâëåí 28.01.2014

  • The history of the English language. Three main types of difference in any language: geographical, social and temporal. Comprehensive analysis of the current state of the lexical system. Etymological layers of English: Latin, Scandinavian and French.

    ðåôåðàò [18,7 K], äîáàâëåí 09.02.2014

  • The study of the functional style of language as a means of coordination and stylistic tools, devices, forming the features of style. Mass Media Language: broadcasting, weather reporting, commentary, commercial advertising, analysis of brief news items.

    êóðñîâàÿ ðàáîòà [44,8 K], äîáàâëåí 15.04.2012

  • The old Germanic languages, their classification and principal features. The chronological division of the History of English. The role of the Wessex dialect. The Norman Conquest and its effect on English. The Germanic languages in the modern world.

    êîíòðîëüíàÿ ðàáîòà [34,7 K], äîáàâëåí 17.01.2010

  • A critical knowledge of the English language is a subject worthy of the attention of all who have the genius and the opportunity to attain it. A settled orthography is of great importance, as a means of preserving the etymology and identity of words.

    êóðñîâàÿ ðàáîòà [28,1 K], äîáàâëåí 14.02.2010

  • The oldest words borrowed from French. Unique domination of widespread languages in a certain epoch. French-English bilinguism. English is now the most widespread of the word's languages. The French Language in England. Influence on English phrasing.

    êóðñîâàÿ ðàáîòà [119,6 K], äîáàâëåí 05.09.2009

Ðàáîòû â àðõèâàõ êðàñèâî îôîðìëåíû ñîãëàñíî òðåáîâàíèÿì ÂÓÇîâ è ñîäåðæàò ðèñóíêè, äèàãðàììû, ôîðìóëû è ò.ä.
PPT, PPTX è PDF-ôàéëû ïðåäñòàâëåíû òîëüêî â àðõèâàõ.
Ðåêîìåíäóåì ñêà÷àòü ðàáîòó.