Automated assessment of discourse coherence in schizophrenia and schizoaffective disorder
Analysis of discourse coherence in a set of spoken narratives by people with schizophrenia or schizoaffective disorder and by neurotypical speakers of Russian. Approximating cluster number and positioning. Key formulae for the cohrence metrics.
Рубрика | Иностранные языки и языкознание |
Вид | дипломная работа |
Язык | английский |
Дата добавления | 24.08.2020 |
Размер файла | 3,3 M |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru/
1
Automated Assessment of Discourse Coherence in Schizophrenia and Schizoaffective Disorder
Table of content
1. Abstract
2. Introduction
2.1 Thought Disorder and Discourse Coherence
2.2 Automated Analysis of Clinical Discourse
2.3 Present Study
3. Literature Review
3.1 Language in Schizophrenia
3.2 Automated Analysis of Clinical texts
3.3 Loose Associations
3.4 Discourse Coherence
3.5 Automated Discourse Analysis of Schizophrenic Speech
4. Methods
4.1 Study Structure
4.2 Participants
4.3 Psychiatric Assessment
5. Part 1: Loose Associations (Verbal Fluency)
5.1 Method and material
5.1.1 Participants
5.1.2 Procedure
5.2 Analysis
5.2.1 Preprocessing
5.2.2. Manual annotation
5.2.3. Automated clustering
5.3 Results
5.3.1 PCA
5.3.2 Quantitative Analysis
5.3.3 Approximating Cluster Number
5.3.4 Approximating Cluster Positioning
5.4 Discussion
6. Part 2: Discourse Coherence
6.1 Method and material
6.1.1 Participants
6.1.2 Procedure
6.2 Analysis
6.2.1 Preprocessing
6.2.2 Vectorization
6.2.3 Automated annotation
6.3 Results
6.3.1 Procedural Discourse (Chair Task)
6.3.2 Personal Story (Gift Task)
6.3.3 Picture Description (Child Task)
6.3.4 Picture Description (Suit Task)
6.4 Discussion
6.5 Group Differences
6.6 PANSS Scores
6.6.1 Connection between Group Differences and PANSS
7. Intra-Experimental Analysis
8. Limitations
9. Conclusion
10. References
11. Appendix
11.1 SCL-90-R
11.2 Elicitation Material
11.2.1 Verbal Fluency
11.2.2 Procedural Discourse
11.2.3 Personal Story
11.2.4 Picture Description: Child
11.2.5 Picture Description: Suit
11.3 Formulae for the Coherence Metrics
11.3.1 Vectorization Methods
11.3.2 Coherence Assessment
11.4 PCA Biplots for Coherence Metrics
11.4.1 Procedural Discourse
11.4.2 Personal Story
11.4.3 Picture Description: Child
11.4.4 Picture Description: Suit
1. Abstract
discourse coherence schizophrenia schizoaffective
Disorganized, or incoherent, speech is one of the key criteria for diagnosing schizophrenia This work is a collaboration of the author with Tatiana Szyszkowska, a psychiatrist from the Russian Mental Health Research Center (MHRC), who collects all the psychiatric data and the linguistic data in the TD group.
The annotation was partially performed by a psychology student, Anastasia Shlyakhova, under the authors' supervision. This work would not have been possible without this contribution.. However, there is still a lack of an objective method for measuring speech coherence. Automated discourse analysis is a possible solution to this problem. I analyzed discourse coherence in a set of spoken narratives by people with schizophrenia or schizoaffective disorder (n = 20) and by neurotypical speakers of Russian (n = 21). All narratives were automatically rated for local and global coherence, using vector semantics methods. The discourse coherence was compared to psychiatric judgment as well as to the automatically measured performance on a verbal fluency task. People with higher psychosis symptoms showed lower coherence scores. Lower discourse coherence was also found to be associated with worse performance on verbal fluency task.
2. Introduction
2.1 Thought Disorder and Discourse Coherence
Schizophrenia is a severe mental illness characterized by fundamental distortions of thinking and perception, such as delusions and hallucinations. Schizophrenic affects are typically inappropriate or blunted (World Health Organization, 2016 https://icd.who.int/browse10/2016/en#/F20). Schizoaffective disorder features both affective, manic or depressive, and schizophrenic symptoms (World Health Organization, 2016 https://icd.who.int/browse10/2016/en#/F25). Formal thought disorder (FTD) is a set of specific disturbances in thought, speech, and communication that is typical of psychosis (Hart & Lewine, 2017). Positive thought disorder is a subset of FTD that includes positive symptoms such as the flight of ideas, neologisms, loose associations, and tangentiality. Negative thought disorder includes negative symptoms of FTD, such as poverty of speech, alogia, and preservations (Andreasen, 1986).
Incoherent or disordered speech is one of the key characteristics of TD and an important diagnostic criterion in the two major diagnostic manuals, namely the International Classification of Diseases, the 10th revision (or ICD-10) and the Diagnostic and Statistical Manual of mental disorders, fifth edition (or DSM-5). Incoherent speech is believed to be reflective of disruptions in normal thought processes (such as the ones that arise in FTD, see Hart & Lewine, 2017). Disordered speech has been also linked with communicative difficulties that may be contributing to decreased social functioning (Salzinger et al., 1964).
Loose associations are a common sign of positive thought disorder, and some believe them to be connected to disordered speech, while others attribute the incoherence to a negative thought disorder (see Ditman & Kuperberg, 2010 for a review). The loose associations symptom is rather problematic to measure in a fast and reliable manner. One of the possible solutions is to automatically measure the typicality of the patient's associations in a verbal fluency task or the one-word association task. Correlation of loose associations with higher discourse disorganization can be regarded as an argument for the theory of positive thought disorder as the origin of incoherent speech.
To this date, there is no universal set of guidelines for identifying disordered speech. Various existing techniques of quantifying speech incoherence are quite subjective, as they rely on the judgment and experience of an individual psychiatrist. “Disorganized speech” as a symptom lacks linguistic insight (Cohen et al., 2017), as this terminology fails to reflect the fact that language has multiple interdependent levels of organization (phonetics, morphology, syntax, discourse, pragmatics, and interactional markers).
Automating psychiatric assessment of speech disorganization would increase its objectivity, as well as make it faster and easier to perform. That, in turn, could decrease the workload on clinicians.
2.2 Automated Analysis of Clinical Discourse
Discourse coherence is the semantic connectedness of speech beyond the level of individual sentences. It is maintained simultaneously on many levels, including lexical connectors, intonation, reference, and logical structure of a text.
Automated discourse analysis is a set of computational linguistic techniques used to assess discourse coherence. The tasks range from shallow, low-level parsing, such as coreference (or anaphora) resolution, to very high-level ones that seek to approximate the overall structure of discourse. Automated discourse analysis is used in a variety of real-life applications (e.g. named entity recognition).
However, the existing studies in automating coherence assessment in psychotic speech are quite scarce and contradictory. While some report successful classification (Bedi et al. 2015; Elvevеg et al., 2007; Iter et al., 2018), others fail to find significant differences between psychotic speech and control texts (Just et al., 2019; Koшбnovб, 2017), while still others find the pattern to be the opposite of what one might expect (Panicheva & Litvinova, 2019).
2.3 Present Study
The object of this study is the discourse incoherence seen in psychotic speech in schizophrenia and schizoaffective disorder.
This study is aimed at discovering the relations of psychiatric measures of formal thought disorder (especially positive thought disorder) to automated measures of discourse coherence and loose associations.
To achieve this goal, I use automated analysis of spoken texts from psychiatric in-patients and healthy controls. I use a psychiatric rating scale in addition to binary diagnosis, which allows for a more detailed analysis of trends. To the best of my knowledge, this is the first study where local and global coherence measures (Bedi et al., 2015; Elvevеg et al., 2007) were applied to Russian material. I use the models that were indicated as best-performing by previous research (Iter et al., 2018; Just et al., 2019), and adopted the practices that were shown to improve the performance of the measures (Just et al., 2019). I also use a more recent type of embeddings, namely, the context-dependent ones. Moreover, I control for the overall text length. Unlike the existing research for Russian language (Panicheva & Litvinova, 2019), I apply the above-mentioned metrics to spoken discourse, rather than written text. Finally, I investigate the relationship between the measures of discourse coherence and the loose association measures, as well as psychiatric scales of positive symptoms.
The results of this study are as follows. First, I present a set of tools that can be further used for analyzing coherence automatically. Second, this study can be regarded as a proof-of-concept for the newly introduced measures, as well as of the applicability of the methods to Russian material. Third, this study has certain theoretical implications for the theory behind the incoherence of psychotic speech that can be drawn from the statistical analysis. Namely, in this study, I test whether people with loose associations show more disorganization in their speech. The results of this study support the theory of positive thought disorder as the origin of speech disorganization in psychosis, rather than negative thought disorder.
The structure of the present paper is as follows. In section э3 I provide a literature review on language phenomena in schizophrenia (э3.1, э3.3), discourse coherence measures (э3.4), as well as the methods of automated analysis of clinical texts in general (э3.2), and in schizophrenia in particular (э3.5). Section э4 contains information on study structure and methods that are shared between the discourse coherence experiments and the loose associations experiment. This study is separated into two parts. Loose associations analysis (Part 1) is covered in section э5 and the analysis of discourse coherence (Part 2) can be found in section э6. Both parts have separate sections for methods, analysis, results, and discussion. Section э7 contains the analysis of the interaction between the measures from the two experiments. Section э8 is dedicated to the limitations of the present paper. Section э9 concludes the study, and section э10 contains the bibliography.
3. Literature Review
3.1 Language in Schizophrenia
Introducing the term “schizophrenia”, Bleuler emphasized thought fragmentation that he believed to be essential to the disorder (Bleuler, 1911). Kraepelin (1919), defining psychotic disorders, mentioned that derailment, loose associations, and incoherence of thought manifest themselves through disordered speech, which is characteristic of “dementia praecox”, or “premature dementia”. Nowadays all these features of schizophrenic speech are covered by the term `positive thought disorder'. A general overview of language in schizophrenia can be found in Kuperberg (2010a, b) while Ditman & Kuperberg (2010) provide a more focused overview of discourse coherence in schizophrenia.
There are two main types of theoretical frameworks explaining the origins of discourse incoherence observed in schizophrenia: executive dysfunction theories (also known as impaired cognition theories; and loose association theories (see Ditman & Kuperberg, 2010 for a review). The former theories state that the lack of control over the process of thinking is typical of negative thought disorder. The latter, on the other hand, explain the incoherence in terms of tangentiality and loose associations that are characteristic of positive thought disorder. However, there is no definitive evidence for either theory behind the speech incoherence seen in schizophrenia. This lack of clarity is partially due to the interdependency between processes involved in speech production, partially due to the multiplicity of processes reflected in speech, and partially to the impossibility of separating meaning from its context (Cohen et al., 2017). One way of resolving this problem is to explore whether the loose associations are correlated with the discourse incoherence seen in schizophrenia, which is the objective of this paper.
3.2 Automated Analysis of Clinical texts
In recent years, there has been a growing interest in applying the methods of automated discourse analysis to the texts produced by patients with various mental disorders with the aim of adding an objective measure to psychiatric speech-based diagnosis (see Abbe et al., 2016; Cohen & Elvevеg, 2014; He, 2013 for review).
One of the key methods used for automated discourse analysis is distributional semantics. Distributional, or vector semantics is a family of algorithms that represent words as vectors in a multi-dimensional space. Such algorithms are called word embeddings; words that occur in similar contexts are represented with similar vectors. The measure of word proximity is called cosine similarity.
There are two types of embeddings. They can be context-independent, meaning every word always has the same vector. Such models include Latent Semantic Analysis or LSA (Foltz et al., 1998), word2vec or w2v (Mikolov et al., 2013), and GloVe (Pennington et al., 2014). The other more modern type of embeddings is context-dependent embeddings, which means that the vector of a given word depends on the surrounding words in the sentence. This method is more complicated, but it represents homonyms and polysemic words with different vectors, depending on the meaning in the context. These include ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018). All these methods can be used to numerically represent sentences, rather than single words (see for example SIF method, Arora et al., 2017).
3.3 Loose Associations
Verbal fluency task is a standardized screening tool used in psychiatry, neurology, and clinical linguistics. The task consists of naming as many words from one category as possible under a time limit (usually one minute). Words can belong either to a semantic category (e.g. animals) or a phonetic category (starting with a particular letter). Traditionally, the measure of performance on this task has been the number of correct words named. The task is simple and sensitive to language disorders and thus it is included in many cognitive batteries (e.g. Keefe et al., 2004 for schizophrenia; Mioshi et al., 2004 for dementia). People with schizophrenia are known to produce fewer words on the task than controls do (Bokat et al., 2003; Juhasz et al., 2012).
Many researchers have noticed that the responses to this task contain significantly more information than is captured by merely counting the number of words. For the English language, Troyer and colleagues (1997) created guidelines for performing associative clustering of the words in a response. The number of clusters can be seen as a measure of the diversity of the output in terms of associations. Thus, psychotic loosening of associations might affect the number of clusters. Thus, this measure can be used to quantify the loosening of associations. However, it is important to keep in mind that the number of words produced was shown to be sensitive to executive function (as measured in dementia Kim et al., 2019).
One approach to measuring the looseness of associations on verbal fluency task is averaging over cosine similarities of adjacent words in a response. Elvevеg and colleagues (2007) have shown that this measure is correlated with thought disorder scores in patients with schizophrenia. The same was true for single-word associations. Holshausen and colleagues (2013) have found this measure to be differentially predictive of disorganized speech measure, overall verbal fluency, and adaptive functioning in older inpatients with schizophrenia. Nicodemus and colleagues (2014) used a candidate gene approach and found the average cosine to be predictive of the diagnosis and connected with several genes previously associated with processing speed. Rosenstein and colleagues (2015) have also found this measure to correlate with the diagnosis for Norwegian-speaking patients with schizophrenia and bipolar disorder.
Another approach used in Rosenstein et al. (2015) is computing a cluster fraction. A cluster is defined as a set of words where adjacent words have a cosine similarity above some threshold. A cluster fraction is a ratio of the number of clusters to the number of words. This measure was found to be predictive of the diagnosis.
It is important to keep in mind that transferring these methods from one language to another can pose additional challenges. The typical associations are different for different languages, which might not be perfectly captured by the cosine similarity metric. Another problem is the size of the training data available. Kim and colleagues (2019) discuss these problems in detail for adapting the clustering method across domain and language.
3.4 Discourse Coherence
In linguistics, the term `discourse' is used to refer to any linguistic unit larger than a sentence (Merriam-Webster, n.d. https://www.merriam-webster.com/dictionary/discourse). Discourse coherence refers to the connectedness of the speech beyond the level of individual sentences, which involves topicality, reference, and thematic structure of a text (Jucker,1997). Discourse coherence is maintained on many different levels - intonational, lexical, syntactic, logical, etc. It is present locally, connecting sentences and their parts, as well as globally, as the overall topic of speech. There are many approaches to describing discourse coherence, and here I only review the ones used or discussed in this study.
The measures of coherence rely on separating the discourse into some pieces, like sentences. In spoken discourse the text can be separated into clauses, that can, in this case, be thought of as analogs for sentences for spoken discourse. Each clause can be defined by having only one predicate.
- Tangentiality is a dialogue phenomenon when the speaker replies to a question in an oblique or irrelevant manner. The term has been previously used as being roughly equivalent to loose associations or derailment. Later, the concept of tangentiality has been partially redefined to exclude transitions in spontaneous speech as opposed to irrelevant responses (Andreasen, 1986). This measure of discourse incoherence was successfully automated in Elvevеg et al. (2007), which is discussed in greater detail below. However, this term is limited in scope, as it only applies to dialogues or interviews, not monologues or retellings.
- Local coherence is the similarity in content and logical connectedness of two adjacent clauses. It is maintained by the means of continuation, elaboration, repetition subordination, or coordination (Coelho & Flewellyn, 2003). Bedi et al. (2015) automated this measure and successfully classified schizophrenic patients versus controls.
- Global coherence is the relationship of every clause to the overall topic of the text. Global coherence is a complex concept mainly concerning topic continuity in discourse. This measure is usually assessed on a Likert scale introduced in Glosser & Deser (1990) for aphasic speech. In this form, it has not yet been applied to schizophrenic speech.
3.5 Automated Discourse Analysis of Schizophrenic Speech
The main approaches to discourse coherence that have been successfully automated are tangentiality, as well as global and local coherence. Elvevеg et al. (2007) were the first to introduce an automated approach to measuring tangentiality. They found that high tangentiality correlated with high thought disorder scores. However, two following studies were unable to replicate these results (see Koшбnovб, 2017 for the German language; and Panicheva & Litvinova, 2019 for Russian). Moreover, Panicheva & Litvinova (2019) found the highest minimum tangentiality in patients rather than controls, which is in contradiction with the initial findings by Elvevеg et al. (2007).
Another method used by Elvevеg et al. (2007) is somewhat reminiscent of global coherence. The patients were asked to tell or retell a story, and their response was compared to the centroid of all other responses, which can be taken as a representation of the overall theme. This measure was able to differentiate between high and low TD patients and patients versus controls better than diagnosis-blind psychiatrists.
Bedi and colleagues (2015) proposed measuring local coherence as cosine similarity between average vectors of adjacent phrases. The classifier employing this method as a feature (among the others) was able to differentiate patients versus controls with 100% accuracy, better than a classifier based on psychiatric measures.
Iter and colleagues (2018) used tangentiality and local coherence methods with various embeddings and sentence-averaging methods. Four out of the 20 models tested were able to differentiate patients from controls. On the other hand, Just and colleagues (2019) were largely unable to reproduce this result for a larger sample of German speakers after adjustments for the text length of each text. Controlling for the text length is very important, as in most of these studies controls produce more words than patients with schizophrenia.
To sum up, despite the progress in the area, the results are mixed, and the questions about the best vector semantics measures to assess coherence remain open. The aim of this paper is to resolve some of the contradictions, while also adopting the best practices from previous research.
4. Methods
4.1 Study Structure
The general outline of the study is as follows. First, we recruited eligible participants (controls and TD patients) and conducted a series of psychiatric interviews with each participant. After that, we recorded several texts from each participant. The texts were transcribed and used as the input to the experimental system for automated coherence assessment. The statistical analysis is used to reveal the relationships between psychiatric and automated measures of TD, as well as binary diagnosis.
Each participant was asked to perform five tasks that can be viewed as two separate experiments:
- Verbal fluency task;
- Discourse elicitation tasks:
- Procedural discourse (Chair task);
- Personal story (Gift task);
- Picture description: child (Child task);
- Picture description: suit (Suit task).
4.2 Participants
The TD group (N = 20; 15 females) were recruited at the Russian Mental Health Research Center (MHRC) and were independently diagnosed by two psychiatrists with schizophrenia (15 people) or schizoaffective disorder (5 people). The information about the participants is presented in эTable 1.
The control group were recruited online and matched the TD group in size, age, and gender (N = 21; 18 females). The primary requirements for the participants were: Russian as the native language, age above 18 and under 45, no history of psychiatric disorders, or use of narcotic substances. The participants were asked to fill in a general mental health questionnaire SCL-90-R - Symptom Checklist-90-Revised (Derogatis & Savitz, 2000), and people with above-threshold overall scores and scores for psychosis were excluded from the study. The questionnaire and the threshold values were selected by the expert psychiatrist based on Kioseva (2016). Out of 45 eligible people scanned only 28 fulfilled all the criteria, and 8 dropped out after filling the form. The data on people scanned and the eligibility can be found in the Appendix (э11.1). The information about the participants is presented in эTable 1.
Table 1.
Sociolinguistic variation
Sex |
Age |
Years of education |
Total |
|||
Female |
Male |
Mean (SD) |
Mean (SD) |
41 |
||
TD group |
15 |
5 |
27.9 (6.95) |
13.4 (2.28) |
20 |
|
Control group |
18 |
3 |
24.1 (4.37) |
15.9 (2.11) |
21 |
After assessment by a psychiatrist, five people in the control group were evaluated as having some schizo-spectrum tendencies in their thought patterns. As they comprised a significant portion of the control group, their data was analyzed in the same way as other control group data but marked with a different color for visualization purposes. Sub-threshold schizo-spectrum tendencies in thought patterns are an interesting topic for further research.
Unfortunately, not every patient in the TD group was able to finish all the tasks, and thus the participant data for each task is described in respective sections.
4.3 Psychiatric Assessment
A psychiatric scale was selected by the expert psychiatrist to determine the levels of psychotic, negative, and positive symptoms typically present in schizophrenia and schizoaffective disorder. The selected tool was Positive and Negative Syndrome Scale (PANSS; Kay et al., 1987) developed specifically for measuring positive and negative symptoms in schizophrenia and psychotic states. The scale used relies on assessment by a psychiatrist, rather than self-report. To avoid inter-rater agreement problems, all the psychiatric tests were conducted by the same expert. PANSS was used to assess TD levels in both the TD group and the control group.
эTable 2 contains data on psychiatric scale scores in both groups. PANSS general refers to the sub-scales of general psychopathological symptoms, such as anxiety or disorientation. PANSS positive refers to the sub-scales of positive symptoms, such as delirium or hallucinations. PANSS negative refers to the sub-scales of negative symptoms, such as apathy or blunted affect. PANSS TD refers to the sub-scales of positive symptoms typically present in positive thought disorder, namely P2 (disorganized thinking), P3 (hallucinations), P5 (grandiose delusions), and G9 (unusual thought content). Every sub-scale ranges from 1 (absent) to 7 (very prominent). The results are the sums of all the sub-scales in each domain.
Table 2.
Psychiatric Scale Scores
PANSS, total |
PANSS, general |
PANSS, positive |
PANSS, negative |
PANSS, TD |
Total |
|||
Mean (SD) |
Mean (SD) |
Mean (SD) |
Mean (SD) |
Mean (SD) |
||||
TD group |
All |
71.95 (16.6) |
30.5 (9.25) |
17.1 (4.4) |
23.85 (8.7) |
11.1 (3.09) |
21 |
|
Schizophrenia |
73.93 (14.62) |
29.2 (6.12) |
18.13 (4.2) |
25.93 (7.84) |
11.13 (3.20) |
15 |
||
Schizoaffective disorder |
65.33 (23.69) |
34.17 (17.8) |
14 (4.83) |
17.17 (5.5) |
11.16 (3.28) |
5 |
||
Control group |
All |
31.9 (2.53) |
16.52 (1.08) |
7.81 (1.03) |
7.7 (1.11) |
5.1 (1.6) |
20 |
|
Schizo-spectrum tendencies |
35.2 (1.92) |
16.6 (1.67) |
9 (0.71) |
8.6 (1.67) |
6.8 (1.1) |
5 |
The groups differ significantly in PANSS scores in all domains as shown by a t-test with 39 degrees of freedom, even after applying Bonferroni correction Bonferroni correction for multiple testing is used throughout this paper where applicable. for multiple testing, as shown in эTable 3. Despite our best efforts to balance the groups, there is also a significant, although slight, difference in age and years of education.
Table 3.
Psychiatric Scale Scores (independent t-tests)
PANSS, total |
PANSS, general |
PANSS, positive |
PANSS, negative |
PANSS, TD |
Years of education |
Age |
||
t-score |
-14.11 |
-15.27 |
-14 |
-9.78 |
-13.28 |
-37.1 |
-27.43 |
|
p-value |
< 1e-10 |
< 1e-10 |
< 1e-10 |
< 1e-10 |
< 1e-10 |
< 1e-10 |
< 1e-10 |
PANSS scores, as well as age, were not normally distributed (D'Agostino and Pearson's normality test, all p < 0.05), and thus non-parametric tests were used throughout this work for testing correlation with them.
Age did not correlate (p > 0.05) with any of PANSS scores as measured by Spearman's test.
Education does negatively correlate (p < 0.01) with all PANSS subscales except for the negative subscale. The effect sizes (Spearman's rho) ranges from -0.51 to -0.48.
Another confounding factor is sex. As shown by a t-test with 39 degrees of freedom, all PANSS subscales, as well as age and education, differ significantly (all p < 1e-10) in male and female participants. Age and education differed in male and female groups as well. This is in line with the general pattern that male patients with schizophrenia show more symptoms, have earlier onset, and poorer functioning (Ochoa et al., 2012). As for the neurotypical population, male subjects have been shown to score higher than women in negative and disorganized?like symptoms (Bora & Baysan Arabaci, 2009). As gender imbalance is very similar in the groups, this should not pose problems for further analysis.
5. Part 1: Loose Associations (Verbal Fluency)
5.1 Method and material
5.1.1 Participants
эTable 4 contains the data on sociolinguistic variation for the participants who were able to complete the verbal fluency task.
Table 4.
Sociolinguistic variation for verbal fluency task
Sex |
Age |
Years of education |
Total |
|||
Female |
Male |
Mean (SD) |
Mean (SD) |
36 |
||
TD group |
10 |
5 |
28.46 (7.41) |
13.33 (2.27) |
15 |
|
Control group |
18 |
3 |
24.1 (4.37) |
15.9 (2.11) |
21 |
5.1.2 Procedure
The material used for elicitation can be found in the Appendix (э11.1). All responses were audio-recorded with the participants' permission.
The participants were asked to name as many animals as they can in one minute.
5.2 Analysis
5.2.1 Preprocessing
All audio recordings were manually transcribed using ELAN https://tla.mpi.nl/tools/tla-tools/elan/, a linguistic annotation tool developed at Max Planck Institute for Psycholinguistics (Wittenburg et al., 2006). For verbal fluency task, all non-animal words were excluded, and all animal names were lemmatized. Immediate repetitions were omitted from analysis after counting the total number of repetitions, as cosine similarity of the same vector is equal to 1 and that would skew the results in favor of tasks with many immediate repetitions.
5.2.2 Manual annotation
The psychiatrist, the author, and the co-annotator performed independent subjective manual clustering of the transcribed responses to verbal fluency task by sematic groups (fish, birds, exotic animals, etc.). There was a substantial inter-rater agreement (Cohen's kappa = 0.7, ranging from 0.67 to 0.75). The gold standard clustering was defined as the set of all cluster boundaries that at least two annotators agreed on.
5.2.3 Automated clustering
The code for analyzing verbal fluency can be found at the GitHub repository https://github.com/flying-bear/thesis/tree/master/verbal%20fluency. I applied a model from the RusVectфrзs https://rusvectores.org/ru/models/ website (Kutuzov & Kuzmenko, 2017). I used tayga_none_fasttextcbow_300_10_2019 word2vec model from RusVectфrзs, which was selected as it is the largest character-based model and it does not require POS tagging and thus this model can handle most of the out-of-vocabulary words
Cosine similarity was measured for each pair of adjacent words in the verbal fluency task transcript. The model used ensures no out-of-vocabulary issues occur. There are several measures derived from a list of pairwise cosine similarities that I use.
- The average cosine similarity across a response, which is commonly used in automated verbal fluency scoring (see section э3.3 above).
- The cluster fraction (number of clusters in relation to the number of words produced). The number of clusters can be calculated using several various threshold values on cosine similarity, as described in Kim et al. (2019).
- Threshold cutoff. The cluster boundary is set if the cosine similarity value is below a certain threshold.
- Splitting at the global mean cosine similarity;
- Splitting at the global median cosine similarity;
- Splitting at the global 0.25 percentile cosine similarity;
- Splitting at the mean cosine similarity for this participant (i.e. local mean);
- Sharp change clustering. The cluster boundary is set if the cosine similarity is lower than average cosine similarity in the current cluster (by a certain factor). Factors used were (1.05, 1.005, 1.00001, 0.95, 0.8, 05).
The goal of automatically measuring clustering is to find the best metric for approximating cluster number and cluster boundaries position as defined by manual clustering so that further studies do not require labor-intensive and subjective manual clustering.
5.3 Results
Verbal fluency task results included traditional measures (the number of unique words and the number of repetitions), clustering measures (average, minimal and maximal cluster length), and automated cosine measures.
5.3.1 PCA
The experiment was qualitatively analyzed using principal component analysis, (PCA, also known as singular value decomposition). PCA is a mathematical instrument that, given a collection of points in a multidimensional space, allows to find such an orthogonal basis for the multidimensional space, that each basis vector minimizes the average squared distance from every point. Every basis vector is called a principal component (PC) and can be thought of as a combination of initial basis vectors such that it maximizes the variance that is explained by it. The PCs are ordered according to the amount of the variance explained so that the first principal component explains most variance. I used the exploratory method of plotting the projections of all the metrics in order to understand which of them hold the highest explanatory potential. I applied z-scale to every metric before plotting PCA. I selected metrics for further statistical analysis based on the PCA biplot.
Figure 1. PCA biplot on verbal fluency task including PANSS scores and sociolinguistic data. Controls are shown in red, controls with schizo-spectrum tendencies are shown in purple, patients with schizophrenia are shown in green, patients with schizoaffective disorder are shown in cyan. PANSS_Total - total PANSS score, PANSS_O - PANSS score on general psychopathology subscale, PANSS_P - PANSS score on positive symptoms subscale, PANSS_N - PANSS score on positive symptoms subscale, TD - PANSS score on thought disorder; unique_num - unique word count, repeat_num - number of repetitions, mean_cluster_len - mean cluster length, min_cluster_len - minimal cluster length, max_cluster_len - maximal cluster length; mean_cos_sim - mean cosine similarity between adjacent words, min_cos_sim - minimal cosine similarity between adjacent words, max_cos_sim - maximal cosine similarity between adjacent words. The arrows indicate projections of the respective metric
эFigure 1 above contains a PCA plot that includes PANSS scores as well as age, sex, and years of education. The first principal component (x-axis) consists of PANSS scores (positive direction) and the unique word count (negative direction).
The first principal component explains 32.04% of the variance. The TD group and the control group are separated in this dimension at about -0.05 (everything below being control). Maximal cluster length and education are projected as almost the same vector, and they are close to the unique word count in direction.
The second principal component (y-axis) is negatively aligned with mean cosine similarity and partially positively aligned with being male (which implies male participants produced lower mean cosine similarity). The second principal component explains 17.78% of the variance, almost none of which is accounted for by PANSS scores (as they are orthogonal to this principal component).
Age, minimal cosine similarity, minimal cluster length, and the number of repetitions are all somewhat aligned, being slightly positive on the first PC and quite negative on the second.
Mean cluster length and maximal cosine similarity are somewhat aligned and both lie in the third quadrant. Mean cluster length is almost exactly perpendicular to sex.
It is worth noting that most controls with schizo-spectrum tendencies are closer to the TD group than other controls.
From this plot, one might expect the unique word count to have particularly high explanatory power, as well as a strong negative correlation between PANSS scores and the unique word count. It is also reasonable to expect maximal cluster length to be predictive of the diagnosis, and possibly mean cluster length. However, other metrics seem to be explaining some psychiatrically irrelevant variance in the data.
5.3.2 Quantitative Analysis
эTable 5 contains the results of an independent-sample t-test with 34 degrees of freedom, revealing the effect of the group. The p-values are corrected for multiple testing. All measures, but two (mean cosine similarity and number of repetitions), differed significantly between the groups.
Table 5.
Independent t-test results for verbal fluency task for group effect. * - significant at 0.05, ** - significant at 0.01, *** - significant at 0.001
metric group |
metric name |
t |
p-value |
|
traditional |
unique word count |
-16.62 |
p < 1e-16 *** |
|
number of repetitions |
-1.07 |
p > 0.05 |
||
clustering |
mean cluster length |
-14.32 |
p < 1e-20 *** |
|
minimal cluster length |
-6.52 |
p < 1e-6 *** |
||
maximal cluster length |
-16.58 |
p <1e-18 *** |
||
cosine |
mean cosine similarity |
-0.66 |
p > 0.05 |
|
minimal cosine similarity |
3.1 |
p < 0.05 * |
||
maximal cosine similarity |
-4.36 |
p < 0.001 ** |
However, none of the metrics, but the unique word count, were correlated with PANSS scores after correcting for multiple testing. The unique word count, as expected, was strongly correlated with all PANSS scores. Spearman's rho and p-values are presented in эTable 6 below.
Table 6.
Correlation of unique word count on verbal fluency task with PANSS scores. * - significant at 0.05, ** - significant at 0.01, *** - significant at 0.001
PANSS, total |
PANSS, general |
PANSS, positive |
PANSS, negative |
PANSS, TD |
||
rho |
-0.68 |
-0.6 |
- 0.65 |
- 0.66 |
-0.64 |
|
p-value |
p < 0.0001*** |
p < 0.005** |
p < 0.0005*** |
p < 0.0005*** |
p < 0.0005*** |
5.3.3 Approximating Cluster Number
Spearman's correlation was used for assessing the quality of approximation for cluster number with every clustering method. The results of this test are presented in эTable 7. All threshold methods, as well as sharp change at 0.5, showed a strong positive correlation (rho > 0.6) with the correct cluster number as defined by manual clustering. The strength of correlation was comparable to that between annotators. Sharp change clustering methods with factors of 0.8, 0.95,1.00001, and 1.05 were moderately correlated with the correct cluster number (rho > 0.5). 1.005 showed only a weak correlation (rho = 0.2).
Table 7.
Metric performance on approximating cluster number. * - significant at 0.05, ** - significant at 0.01, *** - significant at 0.001
metric group |
factor |
rho |
p-value |
|
threshold |
mean |
0.75 |
p < 1e-05 *** |
|
local mean |
0.72 |
p < 1e-05 *** |
||
median |
0.73 |
p < 1e-05 *** |
||
25th percentile |
0.65 |
p < 0.005 ** |
||
sharp change |
0.5 |
0.64 |
p < 0.005 ** |
|
0.8 |
0.55 |
p < 0.005 ** |
||
0.95 |
0.59 |
p < 0.005 ** |
||
1.00001 |
0.51 |
p < 0.05 * |
||
1.005 |
0.2 |
p < 0.05 * |
||
1.05 |
0.55 |
p < 0.005 ** |
5.3.4 Approximating Cluster Positioning
To assess the quality of cluster positioning standard machine learning classification metrics were used (accuracy, precision, recall, f1-measure). Indeed, positioning the cluster boundaries can be regarded as a binary classification task, as between every two adjacent words there either is a cluster boundary (1) or there is not (0). The quality of classification metrics rely on the amount of correctly classified elements (confusion matrix). In our case true positives (TP) are correctly identified cluster boundaries; false positives (FP) are cluster boundaries added by the model where there are no boundaries in the gold standard; true negatives (TN) are correctly identified cases of absence of cluster boundaries; and false negatives (FN) are cluster boundaries missed by the model.
Accuracy is the number of correct cases (both positive and negative) divided by all classification problems, . Accuracy can be a misleading metric if the classes are imbalanced (one class is more prevalent than the other).
Precision is the number of correctly identified cluster boundaries divided by the number of identified cluster boundaries. Precision measures how many of the identified cases were correct, or the model specificity, .
Recall is the number of correctly identified cluster boundaries divided by the number of true cluster boundaries. Recall measures how many of the true cluster boundaries were found, or the model sensitivity, .
F1-measure is a combined measure of model performance, which is a harmonic mean between precision and recall, . F1-measure is often used as the final assessment metric.
All metrics range from 0 to 1, with 0 indicating total misclassification and 1 - perfect classification.
эTable 8 contains the quality metrics for cluster boundaries positioning. The threshold methods with splits at mean, median, and local mean show very high accuracy (above 0.9), moderate precision (around 0.7), and high recall (around 0.8). This results in this group having the highest f1-measure scores (above 0.7), with the mean threshold being the best method. Setting the threshold at the 25th percentile results in the highest precision (0.78) and the lowest of recall (0.44), with unsurprisingly low f1-measure (0.55). Sharp change methods with factors below 1 show very high accuracy (around 0.9) and high recall (from 0.67 to 0.96), while having relatively low precision (from 0.47 to 0.65). Sharp change methods with factors above 1 show high accuracy (around 0.8) and precision (around 0.7). The resulting f1-measure for all sharp change methods is around 0.6.
Table 8.
Metric performance on approximating cluster positioning. The largest values are in bold italics
metric group |
factor |
accuracy |
precision |
recall |
f1-measure |
|
threshold |
mean |
0.91 |
0.69 |
0.8 |
0.73 |
|
local mean |
0.9 |
0.68 |
0.79 |
0.72 |
||
median |
0.90 |
0.68 |
0.77 |
0.72 |
||
25th percentile |
0.75 |
0.78 |
0.44 |
0.55 |
||
sharp change |
0.5 |
0.98 |
0.47 |
0.96 |
0.62 |
|
0.8 |
0.9 |
0.59 |
0.78 |
0.65 |
||
0.95 |
0.85 |
0.65 |
0.67 |
0.64 |
||
1.00001 |
0.8 |
0.7 |
0.56 |
0.61 |
||
1.005 |
0.8 |
0.71 |
0.56 |
0.61 |
||
1.05 |
0.79 |
0.72 |
0.54 |
0.6 |
5.4 Discussion
Confirming previous findings (Bokat et al., 2003; Juhasz et al., 2012), the number of unique words produced on verbal fluency task was a good predictor for the diagnosis and was strongly negatively correlated with all PANSS subscales.
Surprisingly, clustering measures differed significantly between the groups yet were not correlated with any PANSS subscale. The same pattern was observed for the automated measures, namely, minimal and maximal cosine similarity (semantic similarity between adjacent words).
As for the automated approximation of clustering, it proved to be not only possible but very successful. The best methods were the threshold-based ones, as they performed as good or higher than inter-rater agreement both for number and for positioning. This means, that the threshold-based methods, especially splitting at the mean, can be gradually introduced as a reliable assessment tool for verbal fluency task.
6. Part 2: Discourse Coherence
6.1 Method and material
Discourse coherence was measured separately on 4 texts: procedural discourse (Chair task), personal story (Gift task), and two picture description tasks: Child task and Suit task. Each task is described in greater detail below.
6.1.1 Participants
As, not every patient in the TD group was able to finish all the tasks, the participant data is shown in separate tables for each task.
эTable 9 contains the data on sociolinguistic variation for the procedural discourse task (giving instructions on making an IKEA chair).
Table 9.
Sociolinguistic variation for Chair task
э |
Sex |
Age |
Years of education |
Total |
||
Female |
Male |
Mean (SD) |
Mean (SD) |
35 |
||
TD group |
10 |
4 |
29.89 (4.55) |
13.4 (1.26) |
14 |
|
Control group |
18 |
3 |
24.1 (4.37) |
15.9 (2.11) |
21 |
Table 10 contains the data on sociolinguistic variation for the personal story task (describing the most memorable gift one received).
Table 10.
Sociolinguistic variation for Gift task.
э |
Sex |
Age |
Years of education |
Total |
||
Female |
Male |
Mean (SD) |
Mean (SD) |
35 |
||
TD group |
10 |
4 |
29.89 (4.55) |
13.4 (1.26) |
14 |
|
Control group |
18 |
3 |
24.1 (4.37) |
15.9 (2.11) |
21 |
Table 11 contains the data on sociolinguistic variation for the picture description task about the child story.
Table 11.
Sociolinguistic variation for Child task
Sex |
Age |
Years of education |
Total |
|||
Female |
Male |
Mean (SD) |
Mean (SD) |
41 |
||
TD group |
15 |
5 |
27.9 (6.95) |
13.4 (2.28) |
20 |
|
Control group |
18 |
3 |
24.1 (4.37) |
15.9 (2.11) |
21 |
эTable 12 contains the data on sociolinguistic variation for the picture description task about the suit story.
Table 12.
Sociolinguistic variation for Suit task.
Sex |
Age |
Years of education |
Total |
|||
Female |
Male |
Mean (SD) |
Mean (SD) |
37 |
||
TD group |
11 |
5 |
29.48 (4.82) |
13.41 (1.45) |
16 |
|
Control group |
18 |
3 |
24.1 (4.37) |
15.9 (2.11) |
21 |
6.1.2 Procedure
The material used for elicitation can be found in the Appendix (э11.1). All responses were audio-recorded with the participants' permission.
The procedural discourse (Chair Task) was obtained by asking the participant to give instruction based on an IKEA chair brochure to a person not seeing the picture.
Free storytelling (Gift Task) was elicited by asking the subject to produce a story about the best or the most memorable gift they have received.
For the picture-elicited storytelling two Bidstrup comics are used, namely, a story about a child and a story about a man getting a new suit, hence the names of the tasks: Child Task and Suit Task.
6.2 Analysis
The code for coherence analysis can be found at the GitHub repository https://github.com/flying-bear/thesis/tree/master/coherence.
6.2.1 Preprocessing
All audio recordings were manually transcribed using ELAN. After that, following the recommendation from the paper by Iter and colleagues (2018), filler words and hesitation pauses were excluded from the script. Additionally, false starts and preservations were removed. All words in the transcripts were lemmatized. Transcripts were separated into clauses.
6.2.2 Vectorization
To vectorize words and sentences, I applied two models from the RusVectфrзs website and a model from the DeepPavlov http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html website.
First, I applied tayga_none_fasttextcbow_300_10_2019 word2vec model from RusVectфrзs, the same model, that was used for the verbal fluency task. For embedding sentences with word2vec I used smooth inverse frequency (SIF) averaging suggested by Arora et al. (2017).
Second, I applied ELMo from RusVectфrзs, i. e. the context-dependent model which is trained on a large Internet corpus and is reported to perform relatively well on similarity tasks. I used the largest ELMo model which is currently freely available, tayga_lemmas_elmo_2048_2019. As the model has three layers of representation, I summed the outputs across them. For embedding sentences with I simply averaged word vectors, as they are already context-dependent.
Last, I applied RuBERT mo...
Подобные документы
Theories of discourse as theories of gender: discourse analysis in language and gender studies. Belles-letters style as one of the functional styles of literary standard of the English language. Gender discourse in the tales of the three languages.
дипломная работа [3,6 M], добавлен 05.12.2013The study of political discourse. Political discourse: representation and transformation. Syntax, translation, and truth. Modern rhetorical studies. Aspects of a communication science, historical building, the social theory and political science.
лекция [35,9 K], добавлен 18.05.2011The ways of expressing evaluation by means of language in English modern press and the role of repetitions in the texts of modern newspaper discourse. Characteristics of the newspaper discourse as the expressive means of influence to mass reader.
курсовая работа [31,5 K], добавлен 17.01.2014Act of gratitude and its peculiarities. Specific features of dialogic discourse. The concept and features of dialogic speech, its rationale and linguistic meaning. The specifics and the role of the study and reflection of gratitude in dialogue speech.
дипломная работа [66,6 K], добавлен 06.12.2015Theoretical aspects of gratitude act and dialogic discourse. Modern English speech features. Practical aspects of gratitude expressions use. Analysis of thank you expression and responses to it in the sentences, selected from the fiction literature.
дипломная работа [59,7 K], добавлен 06.12.2015Interjections in language and in speech. The functioning of interjections in Spanish and English spoken discourse. Possible reasons for the choice of different ways of rendering an interjection. Strategies of the interpretation of interjections.
дипломная работа [519,2 K], добавлен 28.09.2014Study of the basic grammatical categories of number, case and gender in modern English language with the use of a field approach. Practical analysis of grammatical categories of the English language on the example of materials of business discourse.
магистерская работа [273,3 K], добавлен 06.12.2015A conservative-protective or right-monarchist as one of the most influential trends in Russia's socio-political movement of the early XX century. "Russian assembly", "Russian Monarchist Party, the Union of Russian people" and "Union of Russian People".
реферат [12,0 K], добавлен 14.10.2009Use of jargons to make more specific expression of thoughts. Theoretical information on emergence and development of a slang. Jargon in Finance. Some examples of use of a financial jargons which were found in scientific articles. Discourse analysis.
реферат [20,1 K], добавлен 06.01.2015Phrases as the basic element of syntax, verbs within syntax and morphology. The Structure of verb phrases, their grammatical categories, composition and functions. Discourse analysis of the verb phrases in the novel "Forsyte Saga" by John Galsworthy.
курсовая работа [55,2 K], добавлен 14.05.2009Example of "simple linear progression". Additive. adversative. temporal textual connector. Anaphoric relations and their use in fairy tales. Major types of deictic markers: person deixis, place deixis, time deixis, textual deixis, social deixis.
творческая работа [300,8 K], добавлен 05.07.2011English songs discourse in the general context of culture, the song as a phenomenon of musical culture. Linguistic features of English song’s texts, implementation of the category of intertextuality in texts of English songs and practical part.
курсовая работа [26,0 K], добавлен 27.06.2011The problem of category of number of nouns, Russian and English grammatical, syntactical and phonetic forms of expression. The general quantitative characteristics of words constitute the lexico-grammatical base for dividing the nounal vocabulary.
контрольная работа [40,6 K], добавлен 25.01.2011A studies of small and medium silicon oxide clusters. SiO is the most abundant species in the fragmentations. Oxidation pattern of Si7. The initial oxidation process and the growth mechanism of silicon nanostructures. Si7O7 is a silicon monoxide cluster.
статья [536,1 K], добавлен 09.02.2010The discovery of nouns. Introduction. Classification of nouns in English. Nouns and pronouns. Semantic vs. grammatical number. Number in specific languages. Obligatoriness of number marking. Number agreement. Types of number.
курсовая работа [31,2 K], добавлен 21.01.2008Defining the notion "slang". Analyzing the use of slang in movies, literature, songs and Internet. Interviewing native American speakers. Singling out the classification of slang, its forms and characteristics. Tracing the origin and sources of slang.
курсовая работа [73,6 K], добавлен 23.07.2015The process of scientific investigation. Contrastive Analysis. Statistical Methods of Analysis. Immediate Constituents Analysis. Distributional Analysis and Co-occurrence. Transformational Analysis. Method of Semantic Differential. Contextual Analysis.
реферат [26,5 K], добавлен 31.07.2008Russian holidays it is the holidays of Russian people connected with widespread national traditions of their carrying out. For the state holidays the combination of what remained from the previous historical periods, and new, come to a life finding.
реферат [18,7 K], добавлен 08.10.2009Familiarization with the biographical facts of life of B. Shaw. Conducting analysis of the literary work of the writer and assessment of its contribution to the treasury of world literature. Reading's best-known work of the author of "Pygmalion".
курсовая работа [37,1 K], добавлен 24.03.2011Moscow is the capital of Russia, is a cultural center. There are the things that symbolize Russia. Russian’s clothes. The Russian character. Russia - huge ethnic and social mixture. The Russian museum in St. Petersburg. The collection of Russian art.
реферат [12,0 K], добавлен 06.10.2008