Buryat language and ethnic identity among the Buryats in the web space
Features of computer-mediated communication, digital diasporas. Minority languages of the Russian Federation on the Internet. Definition of Buryat language practices on the Internet. Discussion topics and analysis of users of the Buryat language.
Ðóáðèêà | Èíîñòðàííûå ÿçûêè è ÿçûêîçíàíèå |
Âèä | äèïëîìíàÿ ðàáîòà |
ßçûê | àíãëèéñêèé |
Äàòà äîáàâëåíèÿ | 18.07.2020 |
Ðàçìåð ôàéëà | 1,0 M |
Îòïðàâèòü ñâîþ õîðîøóþ ðàáîòó â áàçó çíàíèé ïðîñòî. Èñïîëüçóéòå ôîðìó, ðàñïîëîæåííóþ íèæå
Ñòóäåíòû, àñïèðàíòû, ìîëîäûå ó÷åíûå, èñïîëüçóþùèå áàçó çíàíèé â ñâîåé ó÷åáå è ðàáîòå, áóäóò âàì î÷åíü áëàãîäàðíû.
Ðàçìåùåíî íà http://www.Allbest.Ru/
Ðàçìåùåíî íà http://www.Allbest.Ru/
Ðàçìåùåíî íà http://www.Allbest.Ru/
Federal state autonomous educational institution for higher professional education
National research university higher school of economics
St. Petersburg School of Social Sciences and Area Studies
Field of study: 39.03.01 Sociology
Degree programme: Sociology and Social Informatics
BACHELOR'S PROJECT
Buryat language and ethnic identity among the Buryats in the web space
Lkhasaranova I.A.
Supervisor: Baranova V.V.
Candidate of Sciences (PhD)
Saint Petersburg 2020
Table of Contents
Introduction
- Chapter 1: Theoretical framework of study
- 1.1 Features of Computer-Mediated Communication
- 1.2 Digital diasporas
- 1.3 Minority languages of the Russian Federation on the Internet
- 1.4 Language situation in Republic of Buryatia
- Chapter 2: Methodology and methods of research
- 2.1 Hypotheses
- 2.2 Selection of a data for analysis
- 2.3 Cleaning and preprocessing of database
- 2.4 Text mining
- 2.5 Topic modelling
- 2.6 Regression analysis
- Chapter 3: Results of analysis
- 3.1 Text mining
- 3.2 Topic modeling
- 3.3 Descriptive statistics
- 3.4 Regression model
Conclusion
References
Appendices
Introduction
Languages like English, Chinese and Spanish become dominant in our modern society, while minority languages are forgotten. This fact can lead to the disappearance of cultures and knowledge of ethnic groups who speak these languages. According to the UNESCO Atlas of the World's Languages in Danger, if measures are not taken, approximately 6,000 languages will disappear by the end of the 21st century, while in Russia the number of endangered languages is equal to 131. Although nowadays we can see that the issue of ethnicity, ethnic identity, and the desire to revive or preserve native languages are gradually gaining attention in our society (Mustafina et al., 2018). However, the development of the Internet and social networks leads to the digitalization of a communication process and the changing in a way we speak and write, also in minority communities. So, the relevance of our study can be explained by the fact that the Internet has become a global platform for communication and self-identification on modern society.
Thus, the main goal of this study is to determine what are the language practices among Buryat people on the Internet. Also, we will find out what topics are most often discussed in the Buryat language and who use the Buryat language on the Internet. However, the title does not correspond to the content of the work. We studied the online language practices among Buryat people in Russia.
The data source will be results of the Computer Linguistics project of the HSE School of Linguistics in 2016 that collected messages from Vkontakte in Russian minority languages.
To achieve the goal, we need to complete the following objectives:
1. To identify what topics are most often discussed in the Buryat language;
2. To compare the frequency of using the Buryat language in Vkontakte groups by different population groups.
In order to achieve the goal and objectives, the study will use mixed methods for analyzing data, such as text mining, topic modeling using and regression analysis, using R Studio.
The theoretical significance of this work connected with the fact that there is no research about computer-mediated communication in the Buryat language in Russian Federation.
The theoretical framework of the study is composed by the works of S. Herring, and J. Androutsopoulos that connected with Computer-Mediated Communication (CMC), the works of J. Andrutzopoulos and M. Brinkerhoff about the digital diasporas, the works of K. Pischleger, Y. Adzhigitova and A. Gladkova that connected with minority language on the Internet in Russia.
The work consists of the introduction, three chapters and the conclusion. In the first chapter, we present the theoretical framework of the study. We define the basic concepts used in this work - a language and an identity on the Internet. Then, we consider the results of some studies about digital diasporas, the functions of minority languages in general and the analysis of statistical data on the Republic of Buryatia. In the second chapter the methodology and empirical data of our study are presented. And in the third chapter the results of the study and analysis of the data are presented.
Chapter 1: Theoretical framework of study
In this chapter, we will consider the main theoretical approaches to the study of Computer-Mediated Communication (CMC), the functioning of ethnic groups, language revitalization and minority languages in cyberspace, as well as the language situation in the region where the Buryat language is considered as the state language (the Republic of Buryatia).
1.1 Features of Computer-Mediated Communication
Since we will use data from Vkontakte where people use a computer, mobile phones or other devices to communicate their thoughts, opinions and ideas to each other, we should consider some characteristics of Computer-Mediated Communication.
Initially, Computer-Mediated Communication was text-based and accessed through autonomous clients. However, with the development of technology and the Internet, textual CMC has come to include graphic, audio or video materials, so researchers are also making efforts to analyze the discourse, using the systems, such as computer-mediated discourse analysis (CMDA). Nonetheless, CMDA was designed for text analysis, so it is not suitable for analyzing the visual aspects of online discourse. In her works, Herring tries to solve this problem and expand the CMDA to be able to analyze non-textual communications, as before that, scientists assumed that communication takes place only in text format (Herring et al. 2013, Herring 2019). Herring (2019, p.30) identifies the 3 historical stages of CMC: Pre-Web (stand-alone text clients), Web 1.0 (personal websites, publishing and so on) and Web 2.0 (blogging, Wikipedia and so on). In the end, Herring (2019) offers the theory of multimodal CMC that provides a new direction for CMDA. This theory is to include graphic materials, such as memes, avatar-mediated communication, and robot-mediated communication involving telepresence robot avatars in physical space (Herring 2019). Each of these phenomena mediates communication between people, supports social interaction and includes several ways or channels of communication. Thus, the very definition of CMC can be expanded.
Androutsopoulos introduces the concept of “networked multilingualism” to study multilingual online practices that are connected to others and included in the global web, it defined as “a cover term for multilingual practices that are shaped by two interrelated processes: being networked, i.e. digitally connected to other individuals and groups, and being in the network, i.e. embedded in the global digital mediascape of the web” (Androutsopoulos 2015, p. 188). In his work, Androutsopoulos (2015) studies users from Facebook, their linguistic repertoires, language choices for genres of self-presentation, dialogic exchange, and the performance of multilingual talk online. The results indicate that multilingual practices are complicated because they are “individualized”, “genre-shaped”, and based on a “wide” range of repertories (Androutsopoulos 2015, p. 185).
1.2 Digital diasporas
As it was already mentioned, networked technologies create new spaces, where people can share information and communicate with each other. Boyd (2011) defines it as “networked publics”, in which identities and interests of users can be formed. In her work Brinkerhoff (2009) study diasporas that use the Internet to maintain connections with their native countries. She claims that such “digital diasporas” can be considered as physical communities that encourage to share with personal stories, discuss some sensitive topics and create groups that show hybrid identities (Brinkerhoff 2009, p.2). “Digital diasporas” lead to the not only improving migrant's quality of life but help to prevent marginalization and ethnic conflicts (Brinkerhoff 2009, Everett 2009). Moreover, diasporas can have significant impact on politics and human rights, for example, when women in African diaspora were outraged by the lack of news about Million Woman March, they decided with the help of social networks to share news and events with each other (Everett 2009). Also, such digital diaspora can be identified as social network of migrants that is formed with the help of digital technologies such as mobile communications, the Internet and so on (Poznanesi 2020). Thus, Internet is considered as an alternative meeting place for people (Brinkerhoff 2009, Everett 2009).
Now we consider several examples of work that are connected with the linguistic practices of ethnic minorities on the global Internet. Generally, Androutsopoulos (2006) is focused on studying blogs and forums of diasporas in German. In his work he tries to determine how ethnic identity influences on the language use and highlights that even most of websites are in German, but minority languages such as Arabic, Greek or Hindi reach domination in some forums (Androutsopoulos 2006). However, Androutsopoulos (2006) claims that native languages transform, for instance, their Romanized transliteration. And it is the fact that human communication with each other using technologies is full of multilingualism and code-switching online (Androutsopoulos 2013). Although, Dovchin in his study about Facebook users in Mongolia and their linguistic online practices points out that Mongolian users actively use English, Russian and other languages in their online communications, they create new terms and expressions that refer to locally significant principles instead of just using words and language practices in English, Russian and other languages in their online communications, they move these practices into the Mongolian context and create new terms and expressions that refer to locally significant principles (Dovchin 2016, p.17).
1.3 Minority languages of the Russian Federation on the Internet
Russian - is an official language of Russian Federation with a significant number of speakers and support from government. However, minority languages in Russia are still present in the Internet. And the Internet changes the lives of minority communities and the way of their communication. And one of the core problems is the presence of minority languages on the Internet and how to use the global network for the preservation and the supporting of the needs of these minority languages.
Nowadays digital devices, the Internet and social media are used for the linguistic and cultural revitalization (Androutsopoulos 2007, Suleymanova 2018). However, there are 96 minority languages in Russia, but the researchers can identify 49 of them on the Internet (Orekhov et al. 2016). Then they point out the most represented languages on the Internet: Bashkir with 74 domains, Tatar with 59 domains, Yakut with 52 domains, Chuvash with 20 domains and Buryat with 19 domains (Orekhov et al. 2016). Meanwhile, in her work about representation of minor languages on the Russian Internet Krylova (2016) highlights Bashkir, Tatar, Yakut, Udmurt, and meadow-eastern Mari languages as the most common. We can see that the gap between representation of the Bashkir and the Buryat languages is notable. In her work, Khilkhanova (2019) makes a conclusion that the Internet reflects the current situation with minority languages in the real world. In other words, languages that are well represented without the Internet, have a high level of linguistic activism and national identity of the speakers, are also common in the global web (Õèëõàíîâà 2019).
Limited number of speakers of ethnic languages on the Internet is one of the core problems. In his works, Pischlöger studies the degree to which the Udmurt language is used on three typical Internet resources such as blogs, Twitter and Wikipedia, he concludes that only a few activists and journalists dominate among users and groups on various Internet resources (Pischlöger 2010, Pischlöger 2016). The similar results are in the Suleymanova's work, she highlights the significant role of activism and initiative for the revitalization and preservation of a minority language (Suleymanova 2018).
There are various quantitative and qualitative works about the use of the minority languages on the Internet space. The qualitative method of analyses we can find in the works of Pischlöger Christian. For instance, in his works Pischlöger (2010, 2016) research speakers of Udmurt language and how they use it on the Internet. He highlights that Udmurt language is one of the most active and popular minority languages in SNS (Anna Social Network Sites) comparing with different minority languages in Russia (Pischlöger, 2010). Also, Pischlöger (2016) points out that speakers of Udmurt language do not follow the language purism not only on the Internet, but in face-to face conversation as they can ignore the rules of Udmurt standard language and communicate in more natural style, including code switching and code mixing (“suro-puzho”). Moreover, he concludes that social networks have significant impact on preserving Udmurt language in the context of globalization, because it provides opportunity to create contents and share with others (Pischlöger 2010, 2013a, 2013b). Verschik (2016) makes the similar conclusion about having an impact of Estonian language on the Russian language in lexis, semantics and so on.
In her work Gladkova (2015) also highlights the role of the Internet and SNS to preserve linguistic and cultural pluralism in Russian web space. She researches 124 websites in Tatar, Chuvash, Bashkir, and Chechen. In order to achieve this aim, Gladkova distinguishes several factors:
1. To develop the Internet access in remote locations of Russian Federation;
2. To educate people to be active Internet users;
3. To motivate them to discuss their values, needs and interests on the Internet space (Gladkova 2015, p. 35).
In her work Adzhigitova (2018) also research Udmurt language using quantitative method of analyses and try to identify what role does language play in constructing the Chuvash identity on the Internet space. She makes conclusions that knowledge of the Chuvash language cannot be considered as criterion for self-identification as Chuvash on the Internet, but it can be the criterion that enhances ethnic self-identification on the Internet (Àäæèãèòîâà 2018). Additionally, Adzhigitova (2018) points out that using of Chuvash and Russian languages in Vkontakte Russian are correlated with the topics, for instance, Chuvash speakers prefer Russian, when they talk about hobbies and spending time together.
According to the Zaydelman's work, who studies minority languages in Vkontakte, a typical native speaker of the minority language in Russia is an individual from 19 to 31 ages old, he or she lives in the titular region for his language, but he or she speaks Russian much better and more often and usually he or she takes an active part in the life of only one community speaking a minority language (Çàéäåëüìàí 2016). The similar conclusion was made by Shirobokova (2011), she claims that there is language shift in favor of the Russian language, especially in the urban environment and among young people, but this study is not about the Internet users, so results can be different.
1.4 Language situation in Republic of Buryatia
In 2002, UNESCO included the Buryat language in the Atlas of the World's Languages in Danger. In the book, Buryat is considered as “severely endangered” that it is in danger of extinction. This means that in most cases the Buryat language is used by the older generation, while the younger generation practically does not speak. Despite of the fact that in 1992 the law “About the languages of the peoples of the Republic of Buryatia”/”Î ÿçûêàõ íàðîäîâ Ðåñïóáëèêè Áóðÿòèÿ” was adopted and the Buryat language along with Russian was given the status of the state language in the Republic of Buryatia.
However, according to the 2010 All-National Population Census, approximately 190 nationalities are registered in the Republic of Buryatia. This census shows that about 66.1% of Russians live in the republic, almost 30% are Buryats, and the rest are 3.8%. At the same time, 99.6% of the respondents who indicates the knowledge of the languages of the census are fluent in Russian, in everyday life 97.7% of respondents used it, and 85% of the respondents consider Russian as their native language. On the other hand, 18.9% of respondents use the Buryat language, 15.7% of them use it in everyday life, 21.3% of the respondents consider Buryat language as native. And 92% of Buryats use the Russian language in everyday life, 55.9% of Buryats - Buryat, 48% of them - in two languages.
Moreover, there is not an official television channel in the Buryat language, and there are only 2 schools that teach the Buryat language from the first to eleventh grades in Ulan-Ude. Also, the banners posters, signs (excluding on the state and municipal buildings), names of streets or squares are not translated into the Buryat language.
Thus, the linguistic situation in the Republic of Buryatia can be described as unbalanced with the dominant Russian language (Evseeva et al.,2019). However, researchers of the Buryat language note that its role is relegated to the background and gives way to the state language - Russian. (Åãîäóðîâà 2012, Îñèíñêèé 1994, Khandaeva 2016). Also, researchers point out that measures taken by the local government to preserve the Buryat language are ineffective. (Èâàùåíêî 2017, Khandaeva 2016).
Figure 1. Number of websites in minority languages in Russia
As for the Buryat language on the Internet, we can see from figure 1 that dominant languages are Bashkir, Tatar, Yakut and Udmurt languages with 74, 59, 52 and 38 websites respectively, while there are 19 websites in the Buryat language (Îðåõîâ 2017).
According to the figure 2, there 60 communities in Vkontakte that support the Buryat language, while there are 288 groups (the Udmurt language), 259 groups (the Bashkir language) and 263 communities (the Yakut language) in Vkontakte (Îðåõîâ 2017). So, we can say that the Buryat language is in the middle position according to the presence on the Internet, comparing with other minority languages of the Russian Federation.
Figure 2. Number of minority communities in the VKontakte
öèôðîâîé äèàñïîðà áóðÿòñêèé ÿçûê èíòåðíåò
Chapter 2: Methods and methodology
In order to achieve the goal and objectives, the study will use mixed methods for analyzing data, such as text mining, topic modeling to find out the topics of VKontakte communities using R Studio and regression analysis to find out the association between variables using R Studio.
2.1 Hypotheses
The above works help us to identify the following hypotheses that we need to test:
1. The Buryat language is more used to maintain online conversations on everyday topics and to a lesser extent - in the field of science, politics and economics;
2. There is language predominance in favor of the Buryat language, especially in the countryside;
3. There is language predominance in favor of the Buryat language, especially among old people.
2.2 Selection of a data for analysis
In order to answer the research question and to solve the tasks, we need a big database that can fully represent conversations in the Buryat language. As Vkontakte is the one of most popular and active social media in Russia, and people use it daily to communicate with each other on different topics. Because of this fact an analysis will be made of messages that are collected from Vkontakte. Thus, in this work we will use the data obtained during the HSE School of Linguistics project “Russian Languages on the Internet” led by B. Orekhov, and it is publicly available (http://web-corpora.net/wsgi3/minorlangs/view). This database contains various minority language in Russia, including the Buryat one. Of the 60 Vkontakte groups, where the Buryat language was used for communication at least once, 68860 messages were collected. According to Salganik (2017), this data is “big”, “always-on”, “nonreactive” and “dirty” (spam or advertisement).
This database is in json format and using the R programming language, we created a table that contains the following variables: user's id, user's gender, user's city of residence, user's birthdate, number of characters, post or comment, id of a group, name of a group, number of group members, number of messages in a group.
2.3 Cleaning and preprocessing of database
As we already said there are approximately 70000 messages in the dataset. However, some of them are spam and advertisements, most of them are too short to be analyzed. So, we need to clean the database and bring all the words from the messages to the initial form. In order to clean our dataset from messages with spam and advertisement, we will use Microsoft Excel and phrases or words that usually are used in spam such as “ïðîãîëîñóéòå ïîæàëóéñòà” or “íå ïðîõîäèòå ìèìî”.
Another problem is lack of automatic lemmatization for the Buryat language. To solve these problems, we made dictionary of Buryat words. We downloaded the data from the sites “Burlang.Toli” (https://buryat-lang.ru/) and BURYATIA.ORG that that contain electronic Buryat-Russian and Russian-Buryat dictionaries, using package “rvest” in R Studio. This process of automated information extraction from web sources is called web scraping. In the end we had a dataset that contains 10056 lemm or words that are in the initial form.
The next step was to create the lemmatizer and the determinant of the message language (Russian or Buryat) based on our dictionary. For all Buryat words with special characters such as ?, ? and ?, we recorded their analogues with Cyrillic characters.
After lemmatization and deleting stop-words such as pronouns, numerals and some common words (“to be”, “to appear”, “must” and so on), we translated the words on the Buryat language into the Russian one, using automatic translation. Also, we checked the quality of translation, using Microsoft Excel and its functions. If we cannot find the translation of the world, we removed it from database. For data preprocessing such as tokenization, normalization, and noise removal we used R Studio, for instance, library “tm” (it provides almost all functions).
2.4 Text mining
Before starting to analyze the topics of messages from Vkontakte, we analyzed the most frequently encountered and most important words in the dataset. We decided to show the results for messages in the Russian language and messages in the Buryat one separately. So, we separated dataset into two parts (Russian and Buryat) and analyzed them separately. To visualize the most common words we used word cloud. The main idea of this figure is the larger the word size in the cloud, the more often it appears in the text. In order to make figure readable, we set the minimum amount of frequency limitation that is of 5 words. Then we use Term Frequency and Inverse Document Frequency (tf-idf) in order to determine the most important words for the content of data by decreasing the weight for commonly used ones and increasing the weight for words that are not used commonly in a texts.
2.5 Topic modelling
In order to check the first hypothesis and determine the core functions of using Buryat language in web space, especially in Vkontakte, we used topic modeling. This method of classification texts by its words was used by Y. Adzhigitova in her work about the role of language in the self-determination of the Chuvashes. She points out that using of Chuvash and Russian languages in Vkontakte Russian are correlated with the topics, for instance, Chuvash speakers prefer Russian, when they talk about hobbies and spending time together (2018).
For topic modeling we used R Studio, using Latent Dirichlet Allocation (LDA). In this case the formation of topics is based on words that often appearing in documents, and each word has the probability for each topic, and the probability of each text will associate with the topic is the sum of the probabilities of the words in it. In order to improve the quality of analysis, we used texts with a length of more than 8 tokens or words. Thus, we had 1935 texts in Russian and 2932 texts in Buryat (in total 4867).
Then we identified the optimal number of topics for LDA: we built several LDA models and chose one that has the highest coherence value. Using this method, we recognized 10 stable topics based on the coherence value.
2.6 Regression analysis
In order to check the second and third hypotheses and compare the frequency of using the Buryat language in Vkontakte groups by different population groups, we used multinomial logistic regression in R Studio. As we wanted to identify the association between the outcome like language of the message and the predictor variables such as sex, age, and place of residence (table 1).
Table 1
Variables for regression analysis
Variable |
Code |
Number of observations |
|
Sex |
0 - male |
2698 |
|
1 - female |
6446 |
||
Age |
- |
9144 |
|
Place of residence |
City in the republic of Buryatia |
6097 |
|
Village in the republic of Buryatia |
623 |
||
City outside the republic of Buryatia |
1694 |
||
Village outside the republic of Buryatia |
326 |
||
Foreign city |
404 |
Chapter 3: Results of analysis
3.1 Text mining
We analyzed the most frequently encountered and most important words in the dataset. We decided to show the results for messages in the Russian language and messages in the Buryat one separately. To visualize the most common words we used word cloud. In order to make figure readable, we set the minimum amount of frequency limitation that is of 5 words.
Figure. 3. The most common word in Buryat texts
According to the Figure 3 the most typical words for messages in Buryat are “Buryat”, “today”, “to speak”, “an assignment”, “to improve”, “inside”, “a harvest”, “a nature” and “an adult”.
As for messages in Russian the most words are “Buryat”, “Buryat people”, “Buryatia”, “language”, “a girl”, “happiness”, “China”, “a harvest”, “a song” and “a friend”(Figure 4). As we can see the results are not the same.
Then we use Term Frequency and Inverse Document Frequency (tf-idf) in order to determine the most important words for the content of data. In the Figure 5 we can see the results of messages in Buryat. According to the tf-idf, the core words are” a happiness”, “today”, “a humanity”, “a harvest”, “to improve”, “Buryat”, “a nature”, “inside”, “an adult” and “an assignment”.
Figure 4. The most common words in Russian texts
Figure 5. The highest tf-idf of Buryat words
Figure 6. The highest tf-idf of Russian words
As for messages in Russian the main words are “a south”, “respected”, “a popularity”, “a yard”, “to speak”, “a mirror”, “good”, “a lake”, “a face” and “distant”(Figure 6). We can see that the results are also not the same. So, there are different ways of communication for the Russian and Buryat languages.
3.2 Topic modeling
Using the coherence value method, we recognized 10 stable topics based on the coherence value. Then we conducted a content analysis in order to determine their meaning and name. From Figure 7 that shows us the top 5 words with the highest association, we can see that the first topic contains words such as “an assignment”, “to speak”, “today”, “a travel”, “to locate” and “small”, while the second topic consists of words like “Buryat”, “other”, “an language” “today” and “a gold”. The words “a nature”, “a sky”, “a weather”, “a star”, “a lake” are generated from the third topic. As for the fourth topic, it contains words such as “inside”, “a harvest”, “a molk” and “a robbery”, when the fifth topic consists of words like “an adult”, “people”, “autocracy”, “today” and “Buryat”. The words “an assignment”, “a child”, “a time” and “pencil” are generated from the sixth topic. The seventh topic contains words such as “to improve”, “Buryat”, “a think”, “distant” and “a mirror”, while the eighth topic consists of words like “a humanity”, “a happiness”, “a wisdom”, “a nature” and “sacred”. The words “skillful”, “folk”, “a place”, “respected” and other” are generated from the ninth topic.
As for the tenth topic, it contains words such as “to throw”,” a face”, “nice”, “a funnel” and “beautiful”. Then we studied the first top 20 words (Appendix 1) with the highest probability that the word is generated from the topic, as well as their top 20 texts, which allowed us to determine the core themes of Vkontakte groups that using the Buryat language which are “Education”, “Language”, “Policy”, “Kitchen”, “Nature and weather”, “Hobbies”, “Religion”, “Culture”, “Family” and “Famous people”.
Figure 7. The highest word probabilities for each topic
Using these probabilities of the texts and words for each topic, we find out which topics are covered by the Buryat people on the Internet to a greater extent, and which ones to a lesser extent. According to the Figure 8, the most represented topics are “Hobbies” and” Language, while the least covered topics are “Kitchen” and “Nature and weather”. Based on these data, we can say that in most cases the Buryat people use Vkontakte groups as places where they can discuss common interests, spending time together and the translation of some words.
Figure 8. Top 10 topics
As we can see the Buryat language in Vkontakte groups is used to a lesser extent than the Russian language for maintaining conversations about hobbies, policy, education and family while in other areas like language, culture, famous people, religion, nature and kitchen both languages are used equally or the Buryat language is slightly predominant (Figure 8).
To sum up, the first hypothesis is approved: the Buryat language is more used to maintain online conversations on everyday topics such as language, culture, famous people, religion, nature, and kitchen.
3.3 Descriptive statistics
Before we test regression model, we should provide descriptive statistics of variables.
Table 9. Distribution of age
As we can see the distribution of variable “age” is not normal (table 9). It is right-skewed (positive skewness). Majority of users are from 20 to 30 years old.
Table 10. Distribution of sex
According to the table 10, the most of users are female. Their number exceeds the number of male users by almost 2 times.
Table 11. Distribution of place of residence
As we can see from table 11, the most common places of residence of users are the city in the Republic of Buryatia and the city outside the Republic of Buryatia, while the least common place of residence is village outside the Republic of Buryatia.
3.4 Regression model
The next stage of our work was the construction of multinomial logistic regression. We analyzed about 9500 observations, taking the Russian language as the dependent variable (“language” where Russian - 0, Buryat - 1).
According to the table 2, an associated p-value of the variable “age” is equal to 0.2294. So, this coefficient is not statistically significant at the 5% level, and the variable “age” is insignificant indicator. However, it can be explained by the lack of a representative sample. The existence of the digital divide (lack of the Internet and technical devices) can lead to the fact that our data are not reliable, and we cannot be sure that there is no association between the language of the online message and the age of user.
The next variable is “sex”, and for a one-point increase in “sex”, we expect to see a 1.92% increase in the odds of using the Buryat language (table 2). With an associated p-value of 0.0004, we can say that this coefficient is statistically significant at the 5% level. It means that women are more likely to use the Buryat language on the Internet.
The last predictor is user's place of residence. Buryat cities were taken as the basic category. We can see that, compared with residents of the city in the republic of Buryatia, representatives of villages in both the republic and other regions are more likely to write on the network in the native language (1.88 and 1.80 times respectively). At the same time, the probability that people from other cities of the Russian Federation will use the Buryat language is equal to 1.54. However, the variable “foreign city” is not significant with an associated p-value of 0.7810. It means that users from villages are more likely to use the Buryat language on the Internet.
Table 2
Coefficients of regression model
Independent variables: |
Coefficients: |
|
Age |
1.0226 (0.2294) |
|
Sex |
1.9238*** (0.0004) |
|
City outside RB |
1.5359* (0.0183) |
|
Foreign city |
4.3658 (0.7810) |
|
Village in RB |
1.8805* (0.0325) |
|
Village outside RB |
1.8048***(0.0002) |
|
Constant |
0.2313 |
|
Observations |
9144 |
|
Log Likelihood |
-2229.2587 |
|
Note: |
p<0.05**p<0.01***p<0.001 |
In the end, the second hypothesis is approved: there is association between the use of the Buryat language and the place of residence of the author of online message. As the non-urban population of the Republic of Buryatia uses the Buryat language to a greater extent on the Internet communication than the users from the cities. And the final is not approved: variable “age” is statistically significant to predict the use of the Buryat language.
Conclusion
We analyzed database that was received during the HSE School of Linguistics project “Russian Languages on the Internet” led by B. Orekhov (68860 messages until 05/09/2015). For translation, we made a dictionary that contained 10056 unique Buryat lemmas.
For topic modeling we used the database that contained 4867 messages or post with a length of 8 or more than 8 words (1935 texts in Russian and 2932 texts in Buryat). Then we received 10 stable interpretable topics: “Education”, “Language”, “Policy”, “Kitchen”, “Nature and weather”, “Hobbies”, “Religion”, “Culture”, “Family” and “Famous people”.
Based on the results of thematic modeling, we can make a conclusion that the Buryat language in Vkontakte groups is used to a lesser extent than the Russian language for maintaining conversations about hobbies, policy, education and family, while in other areas like language, culture, famous people, religion, nature and kitchen both languages are used equally or the Buryat language is slightly predominant. These results correspond to the conclusions of Adzhigitova (2018) that the Chuvash language is more used to maintain online conversations on everyday topics and to a lesser extent - in the field of science, politics, and economics. Also, we can say that groups in Vkontakte can be considered as an alternative meeting place for keeping in touch, where people can discuss the ways to spend time together, hobbies or personal stories, it one of the principle of the digital diasporas (Brinkerhoff 2009, Everett 2009).
The next step in the analysis was to compare the use of the Buryat and Russian languages by different population groups. For regression analysis we used database that contains 9144 messages or posts. We wanted to identify the association between language of the message and user's sex, age, and place of residence. So, we found out that there was signs of digital inequality and we made the conclusion that there was not association between language of the message and age. Also, we pointed out that female users are more likely to use the native language on the Internet than male users. In addition, we found out that the use of the Buryat language was associated with the place of residence of the user. Thus, the people from cities of the Republic of Buryatia, Russia and foreign cities used the Buryat language in online communication to a lesser extent than residents from villages in the Republic of Buryatia and Russia.
References
1. Androutsopoulos, J. (2006). Multilingualism, diaspora, and the Internet: Codes and identities on German based diaspora websites. Journal of Sociolinguistics.
2. Androutsopoulos J. (2007) Bilingualism in the Mass Media and on the Internet. In: Heller M. (eds) Bilingualism: A Social Approach. Palgrave Advances in Linguistics. Palgrave Macmillan, London
3. Androutsopoulos, J. (2013). Code-switching in computer-mediated communication. In S. C. Herring, D. Stein, & T. Virtanen (Eds.), Pragmatics of computer-mediated communication (pp. 659-686). Berlin, Germany & New York, NY: Mouton de Gruyter.
4. Androutsopoulos, J. (2015). Networked multilingualism: Some language practices on Facebook and their implications. International Journal of Bilingualism, 19(2), 185-205
5. Boyd, D. (2011). Social network sites as networked publics: Affordances, dynamics, and implications. Z. Papacharissi (Ed.), A networked self. Identity, community, and culture on social network sites (pp. 39-58).
6. Brinkerhoff, J.M. (2009). Digital diasporas: identity and transnational engagement. Cambridge University Press.
7. Dovchin, S. (2016). Multilingual Wordplays amongst Facebook Users in Mongolia Sender, pp. 97-112 (18 pages)
8. Everett, A. (2009). Digital Diaspora: A Race for Cyberspace. Albany: SUNY Press.
9. Evseeva, I.V., Kulekhova, A.M., & Federal State Budgetary Educational Institution of Higher Education “Kemerovo State University.” (2019). Modern Language Situation in the Irkutsk Region: The Reasons for the Language Shift (based on the Material of Russian and Buryat Languages). Nauchnyy Dialog, 10, 128-143.
10. Gladkova A., (2015) Linguistic and cultural diversity in Russian cyberspace: examining four ethnic groups online, Journal of Multicultural Discourses, 49-66
11. Herring, S.C. (2019). The Coevolution of Computer-Mediated Communication and Computer-Mediated Discourse Analysis. In P. Bou-Franch & P. Garcés-Conejos Blitvich (Eds.), Analyzing Digital Discourse (pp. 25-67). Springer International Publishing
12. Herring, Susan C., Dieter Stein, and Tuija Virtanen, eds. 2013. Pragmatics of Computer-Mediated Communication. Berlin: De Gruyter Mouton.
13. Khandaeva À.À., (2016). "The question of language situation in the republic of buryatia". Meždunarodnyj nauèno-issledovatel'skij žurnal (International Research Journal) ¹3 (45) Part 4: 100. Tue. 22. Mar.2016.
14. Lkhasaranova,Y. (2020). Buryat language and ethnic identity among the Buryats in the web space. Project proposal.
15. Mustafina, Jamila & Nurutdinova, Nailya & Slavina,
Liliia & Mustafina, Lilia. (2018). Regional languages of the Russian federation in mass media: legislative support.
16. Orekhov, B., Krylova, I., Popov, I., Stepanova, E., Zaydelman, L. Russian Minority Languages on the Web: Descriptive Statistics.
17. Pischlöger C. (2016). Udmurt on Social Network Sites: A Comparison with the Welsh Case // Linguistic Genocide Or Superdiversity? New and Old Language Diversities. Ò. 14. - P. 108-132.
18. Pischlöger, C. (2013a) ['Buranovskiye Babushki' on YouTube: The role of social networking sites in maintaining the Udmurt identity and language under the circumstances of globalization] In [Questions of Maintaining the Non-material Cultural Heritage Under the Circumstances of Globalization], pp. 161-165.
19. Pischlöger, C. (2013b) [Udmurt and Besermyan on social networking sites], pp. 187-190. [Science, Enlightenment and Art of the Province in the Socio-Cultural Space: Ninth Korolenko Readings: Proceedings of the International Conference, Devoted to the 160th Birthday of V.G. Korolenko (28 October 2013)]
20. Pischloger, Ch. (2010). Udmurtness in Web 2.0: Urban Udmurts Resisting Language Shift. In Finnisch-UgrischeMitteilungen. ¹38. P. 143-162.
21. Poznanesi S. (2020) Digital Diasporas: Postcoloniality, Media and Affect, Interventions
22. Salganik, Matthew J. (2017). Bit by Bit: Social Research in the Digital Age. Princeton, NJ: Princeton University Press. Open review edition.
23. Shirobokova, L. (2011). Udmurt-Russian bilingualism (Udmurt Republic, Sarkan region, Muvyr village). Doctoral dissertation. Budapest
24. Suleymanova, D. (2018) Creative cultural production and ethnocultural revitalization among minority groups in Russia, Cultural Studies, 32:5, 825-851
25. Verschik, A. (2016). Mixed Copying in Blogs: Evidence from Estonian-Russian Language Contacts.
26. UNESCO Atlas of the World's Languages in Danger
27. Àäæèãèòîâà Þ., (2018) ßçûê â ñàìîîïðåäåëåíèè ÷óâàøåé (ïî äàííûì äèñêóññèé íà ôîðóìàõ è â ãðóïïàõ Âêîíòàêòå)
28. ÂÏÍ 2010 - Âñåðîññèéñêàÿ ïåðåïèñü íàñåëåíèÿ 2010 ã. // Èíñòèòóò äåìîãðàôèè ÍÈÓ «ÂØÝ»
29. Åãîäóðîâà Â.Ì. (2012).È âñ¸-òàêè ðóññêèé ÿçûê … (ôàêòîðû âûáîðà ÿçûêà êîììóíèêàöèè â ïîëèýòíè÷åñêîé Áóðÿòèè) // Ó÷¸íûå çàïèñêè ÇàáÃÓ. Ñåðèÿ: Ôèëîëîãèÿ, èñòîðèÿ, âîñòîêîâåäåíèå.
30. Çàéäåëüìàí È., (2016). Ìîäåëèðîâàíèå ðå÷åâîãî ïîâåäåíèÿ íîñèòåëÿ ìèíîðèòàðíîãî ÿçûêà ÐÔ â ñîöèàëüíîé ñåòè
31. Èâàùåíêî Ë., (2017). Òåíäåíöèè ðàçâèòèÿ íàöèîíàëüíîé ïîëèòèêè â Ðåñïóáëèêå Áóðÿòèÿ // Îãàð¸â-Online. ¹12 (101).
32. Êðûëîâà È., (2016). Ìèíîðèòàðíûå ÿçûêè ÐÔ â Èíòåðíåòå: êîëè÷åñòâåííîå îïèñàíèå è àíàëèç äàííûõ
33. Îðåõîâ Á.Â., (2017). ßçûêè Ðîññèè â èíòåðíåòå
34. Îñèíñêèé Ï.È. (1994). Ýòíîïîëèòè÷åñêàÿ ñèòóàöèÿ â Áóðÿòèè â êîíòåêñòå ðåôîðìû ðîññèéñêîé ôåäåðàòèâíîé ãîñóäàðñòâåííîñòè // Îáùåñòâåííûå íàóêè è ñîâðåìåííîñòü. - ¹3. - Ñ. 121-130.
35. Õèëõàíîâà, Ý.Â. (2019). Èíòåðíåò è ìèíîðèòàðíûå ÿçûêè Ðîññèè: ñèìâîëè÷åñêîå ïðèñóòñòâèå èëè èíñòðóìåíò ðåâèòàëèçàöèè? (íà ïðèìåðå áóðÿòñêîãî ÿçûêà). Ìîíãîëîâåäåíèå; (4): 967-988.
Appendix 1
Topics of group from Vkontakte
Topic 1. Hobbies |
Topic 2. Language |
Topic 3. Nature and weather |
Topic 4. Kitchen |
Topic 5. Policy |
|
ñåãîäíÿ ïóòåøåñòâèå ãîâîðèòü íàõîäèòñÿ ìåëêèé íóæíûé òåìíûé äîì çíàòü ïåòü äîðîãà óäà÷à ïîéòè ëàäíî åõàòü ïèñàòü ñìîòðåòü èãðàòü ãîâîðèòü áèíîêëü óì |
áóðÿòñêèé îñòàëüíîé ÿçûê ñåãîäíÿ ïèñüìåííûé ñëîâî çîëîòî ðóññêèé ÷èòàòü âåñòü áóìàãà ïèñüìåííîñòü áóðÿòû ñåãîäíÿ ïåòü ÷åðíèëà ðîäñòâåííèê ÷èòàòü ñòèõîòâîðåíèå ïèñüìî íàðîä ðîäíîé |
ïðèðîäà íåáî ïîãîäà îçåðî çâåçäà ìåñÿ÷íûé ñîëíå÷íûé çåìëÿ öâåòîê ãîðà ñèëüíûé áðàòü áàéêàë ðîçà ïàäåíèå êðûëüÿ âåðøèíà ìóäðîñòü ðåñïóáëèêà ìûñëü áóðÿòû |
âíóòðåííîñòü óðîæàé ãðàá¸æ ïëåñåíü çåëåíîâàòûé ïå÷ü ìîëîêî òåñòî áûòîâîé ìóêà ìÿñî ðåñïóáëèêà ìóäðîñòü ìûñëü áóðÿòû çîëîòî óì áóðÿòñêèé ñåãîäíÿ èñêóñíûé íàðîäíûé ìåñòî |
âçðîñëûé íàðîä ñàìîäåðæàâèå ñåãîäíÿ ñòîðîíà ðåñïóáëèêà ãîñóäàðñòâî äåëîâîé äîëã ìóíèöèïàëèòåò áóðÿòñêèé ðîññèÿ áóðÿòû óëó÷øàòü ìûñëü îáû÷àé çàïàõ äåíü ðóêà êóëüòóðà |
Topic 6 Education |
Topic 7 Culture |
Topic 8 Religion |
Topic 9 Famous people |
Topic 10 Family |
|
çàäàíèå êàðàíäàø ìåëêèé âðåìÿ ÷èñòûé îáðàçîâàíèå êà÷åñòâî óìíûé óøè óì þíûé äåòÿ ó÷èòåëü óðîêè ñòàðøèé ó÷èòü øêîëüíûé êëàññ äîðîãà èíòåðíàò îêòÿáðüñêèé ðóêà ôîíòàí |
óëó÷øàòü áóðÿòñêèé ìûñëü çåðêàëî çàïàõ äåíü ðåñïóáëèêà ðóêà êóëüòóðà íàöèîíàëüíîñòü âïëîòíóþ ÿñíûé ÿê ôàñàä ìåðîïðèÿòèå äàëüíèé ãîðëî áóðÿòû ñàìîäåðæàâèå äàëüíèé |
÷åëîâå÷íîñòü ñ÷àñòüå ìóäðîñòü åñòåñòâî ñâÿùåííûé íàäåæäà àðøàí ëîøàäü äîáðîäåòåëü óì ìîëèòâà âåðóþùèé ÷èñòûé ìûñëü ðåñïóáëèêà áóðÿòû ñåãîäíÿ çîëîòî áóðÿòñêèé èñêóñíûé þã |
èñêóñíûé íàðîäíûé ìåñòî ïðè÷èíà äðóãîé èñòîðèÿ áóðÿòû ðåïóòàöèÿ êíèãà ìóçûêà áóäóùåå èçâåñòíûé äåÿòåëüíûé ïîýò æàð áèîãðàôèÿ çîëîòî ìóäðîñòü ðåñïóáëèêà ìûñëü óâàæàåìûé |
áðîñàòü ëèöî êðàñèâûé ëþáÿùèé ìàìà ðîäñòâåííèê ìóæ ïàïà ðîä ïàðåíü äåâóøêà ñâàäüáà îòåö þíûé øóì óòðî ðîäñòâåííûé ëèöî ìóäðîñòü ðåñïóáëèêà ìûñëü |
Ðàçìåùåíî íà allbest.ru
...Ïîäîáíûå äîêóìåíòû
Loan-words of English origin in Russian Language. Original Russian vocabulary. Borrowings in Russian language, assimilation of new words, stresses in loan-words. Loan words in English language. Periods of Russian words penetration into English language.
êóðñîâàÿ ðàáîòà [55,4 K], äîáàâëåí 16.04.2011Basic approaches to the study of the English language. Intercultural communication and computerization of education. The use of technical means for intensification of the educational process. The use of video and Internet resources in the classroom.
êóðñîâàÿ ðàáîòà [333,1 K], äîáàâëåí 02.07.2014Theoretical foundation devoted to the usage of new information technologies in the teaching of the English language. Designed language teaching methodology in the context of modern computer learning aid. Forms of work with computer tutorials lessons.
äèïëîìíàÿ ðàáîòà [130,3 K], äîáàâëåí 18.04.2015The case of the combination of a preposition with a noun in the initial form and description of cases in the English language: nominative, genitive, dative and accusative. Morphological and semantic features of nouns in English and Russian languages.
êóðñîâàÿ ðàáîòà [80,1 K], äîáàâëåí 05.05.2011Theory of the communicative language teaching. Principles and features of the communicative approach. Methodological aspects of teaching communication. Typology of communicative language activities. Approbation of technology teaching communication.
êóðñîâàÿ ðàáîòà [608,8 K], äîáàâëåí 20.10.2014The Importance of Achieving of Semantic and Stylistic Identity of Translating Idioms. Classification of Idioms. The Development of Students Language Awareness on the Base of Using Idioms in Classes. Focus on speech and idiomatic language in classes.
äèïëîìíàÿ ðàáîòà [66,7 K], äîáàâëåí 10.07.2009From the history of notion and definition of neologism. Neologisms as markers of culture in contemporary system of language and speech. Using of the neologisms in different spheres of human activity. Analysis of computer neologisms in modern English.
íàó÷íàÿ ðàáîòà [72,8 K], äîáàâëåí 13.08.2012The influence of other languages and dialects on the formation of the English language. Changes caused by the Norman Conquest and the Great Vowel Shift.Borrowing and influence: romans, celts, danes, normans. Present and future time in the language.
ðåôåðàò [25,9 K], äîáàâëåí 13.06.2014Theories of discourse as theories of gender: discourse analysis in language and gender studies. Belles-letters style as one of the functional styles of literary standard of the English language. Gender discourse in the tales of the three languages.
äèïëîìíàÿ ðàáîòà [3,6 M], äîáàâëåí 05.12.2013Investigating grammar of the English language in comparison with the Uzbek phonetics in comparison English with Uzbek. Analyzing the speech of the English and the Uzbek languages. Typological analysis of the phonological systems of English and Uzbek.
êóðñîâàÿ ðàáîòà [60,3 K], äîáàâëåí 21.07.2009Comparison of understanding phraseology in English, American and post-Soviet vocabulary. Features classification idiomatic expressions in different languages. The analysis of idiomatic expressions denoting human appearance in the English language.
êóðñîâàÿ ðàáîòà [30,9 K], äîáàâëåí 01.03.2015Culture in the Foreign language classroom. Cross-cultural communication. The importance of teaching culture in the foreign language classroom. The role of interactive methods in teaching foreign intercultural communication: passive, active, interactive.
êóðñîâàÿ ðàáîòà [83,2 K], äîáàâëåí 02.07.2014Comparative analysis and classification of English and Turkish consonant system. Peculiarities of consonant systems and their equivalents and opposites in the modern Turkish language. Similarities and differences between the consonants of these languages.
äèïëîìíàÿ ðàáîòà [176,2 K], äîáàâëåí 28.01.2014Features of the use of various forms of a verb in English language. The characteristics of construction of questions. Features of nouns using in English language. Translating texts about Problems of preservation of the environment and Brands in Russian.
êîíòðîëüíàÿ ðàáîòà [20,1 K], äîáàâëåí 11.12.2009Study of lexical and morphological differences of the women’s and men’s language; grammatical forms of verbs according to the sex of the speaker. Peculiarities of women’s and men’s language and the linguistic behavior of men and women across languages.
äèïëîìíàÿ ðàáîòà [73,0 K], äîáàâëåí 28.01.2014The history of the English language. Three main types of difference in any language: geographical, social and temporal. Comprehensive analysis of the current state of the lexical system. Etymological layers of English: Latin, Scandinavian and French.
ðåôåðàò [18,7 K], äîáàâëåí 09.02.2014The study of the functional style of language as a means of coordination and stylistic tools, devices, forming the features of style. Mass Media Language: broadcasting, weather reporting, commentary, commercial advertising, analysis of brief news items.
êóðñîâàÿ ðàáîòà [44,8 K], äîáàâëåí 15.04.2012The old Germanic languages, their classification and principal features. The chronological division of the History of English. The role of the Wessex dialect. The Norman Conquest and its effect on English. The Germanic languages in the modern world.
êîíòðîëüíàÿ ðàáîòà [34,7 K], äîáàâëåí 17.01.2010A critical knowledge of the English language is a subject worthy of the attention of all who have the genius and the opportunity to attain it. A settled orthography is of great importance, as a means of preserving the etymology and identity of words.
êóðñîâàÿ ðàáîòà [28,1 K], äîáàâëåí 14.02.2010The oldest words borrowed from French. Unique domination of widespread languages in a certain epoch. French-English bilinguism. English is now the most widespread of the word's languages. The French Language in England. Influence on English phrasing.
êóðñîâàÿ ðàáîòà [119,6 K], äîáàâëåí 05.09.2009