Analysis of mental health research scientific research group
The possibility of a revolutionary breakthrough in the field of health care and medicine when using artificial intelligence. The use of complex statistical and mathematical tools and multidimensional data, which increases the probability of errors.
Рубрика | Программирование, компьютеры и кибернетика |
Вид | статья |
Язык | английский |
Дата добавления | 19.03.2024 |
Размер файла | 20,6 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru/
National Technical University «Kharkiv Polytechnic Institute»
Analysis of mental health research scientific research group
Babkova Nadiia Candidate of Technical Sciences, Associate Professor, Department of Intelligent Computer Systems
Huliieva Dina Candidate of Philological Sciences, Associate Professor, Department of Intelligent Computer Systems
Kochuieva Zoia Ph.D. of Technical Sciences, Associate Professor, Department of Intelligent Computer Systems
Ugolnikova Nataliia Candidate of Philological Sciences, Associate Professor, Department of Intelligent Computer Systems
Ukraine
Summary
In 2022 there are more than 150 million people only in Europe who have mental health problems. The availability of services for population has decreased, deteriorating economic conditions, stress, military conflict make our mental health vulnerable. At the same time, the use of artificial intelligence (AI) makes possible revolutionary breakthroughs in healthcare and medicine. AI technologies are being considered as a new tool for planning, monitoring and identifying health services at level of populations and individuals. AI-powered tools could be used like digitized healthcare data, including electronic records, images and handwritten notes, to automate tasks, make clinicians' jobs easier, and understand the causes of complex diseases. At the same time, the use of AI-based solutions often involves the use of complex statistical and mathematical tools and multi-dimensional data, which raises the possibility of errors and misinterpretation of results: researchers sometimes tend to overtrust AI. There is also concern about the lack of open reporting of the use of AI models, which limits the ability to replicate results. The study showed that data and models used are most often not publicly available, and there is little collaboration between researchers.
Keywords: health research, mental health, natural language processing, machine learning method, artificial intelligence
Modern methods of computer linguistics, automatic processing and analysis of text are used today in almost all spheres of life and science. Machine translation, text generation (poems, weather forecasts, news headlines), “smart search” are only a small part of the modern capabilities of computer linguistics. As machine learning methods and neural networks develop, these algorithms are increasingly used instead of or in combination with traditional methods - rules, dictionaries, manually selected features. health artificial intelligence statistical
The capabilities of computational linguistics are often used in interdisciplinary research and projects: literature, history, media studies, medicine, law, that is, wherever there are texts. One of the most striking and important examples of such research is study of mental health of population. According to psychological and psychiatric observations, people with certain diseases and mental disorders tend to use certain words more frequently, have some specificity in choice of topics of conversation or speech means, and peculiarities of intonation and speech. For example, people with mental disorders, in particular those suffering from depression, often use the first person singular pronoun, constantly trying to draw attention to themselves and their experiences [1]. Some features of the speech of patients can be recorded only by ear, but a significant part of these features are also visible in the texts. Therefore, to improve the quality of diagnosis, prevent critical stages of the disease and general assessment of the mental state, methods of computational linguistics are increasingly being used.
The following features of displaying the topic of mental health in works on computational linguistics and machine learning can be highlighted:
Most publications are devoted to highlighting the signs of depression as the most common disease. Special attention is also paid to predicting the risk of suicide. Anxiety is studied less frequently, and studies on other disorders are also popular - post-traumatic stress disorder, bipolar disorder, etc.
The research involves two types of data: Big Data, typically from Twitter or Reddit; and data obtained from special studies and surveys. The first type of data tends to be larger and provides many opportunities for machine learning. Plus, they're pretty easy to get, as Twitter and Reddit have easy-to-use APIs. However, there is always a possibility that such data is not entirely reliable.
The experimental design in most studies is the same: the authors try to identify different features on the basis of which a classifier can be successfully trained to determine the presence or absence of a disease and, in some cases, its type.
The main tool used in almost all studies is Linguistic Inquiry Word Count (LIWC) program, developed for analysis of language data. It was created to count the use of words from different categories (including emotional ones) in texts. However, it cannot be fully used for Ukrainian language: Ukrainian-language dictionaries in LIWC are direct translations of English dictionary, and not the result of expert work. The same problem applies to other languages and is partly responsible for fact that this type of research most often relies on data in English.
To analyze texts and serve as training features, technical characteristics (text readability metrics, part-of-speech composition, average length of words and sentences, etc.) and other specially compiled dictionaries are sometimes used. The main direction associated with this area is sentimental analysis.
As in other areas of natural language processing, in such studies, authors try to use various machine learning algorithms and neural networks. Good results are obtained on narrow tasks, for example, when training took place on clinical records of patients who subsequently committed suicide. In studies devoted to identifying the characteristics of several disorders at once, it is not always possible to draw a clear line between different diseases, and as a result, classifiers show low accuracy.
When talking about how mental illness and experiences are expressed in the text, it is always worth keeping in mind not only age, gender, and demographic characteristics, but also cultural tradition and language characteristics. In most societies, mental disorders are still a rather stigmatized area, it is not accepted to talk about them, and psychotherapy is not yet considered a general practice everywhere.
In psychology, there is a special direction associated with identifying the characteristics of vocabulary in various mental states, from anxiety to insight and from despair to delight. Based on these studies, special lexical dictionaries are compiled, which, along with dictionaries from LIWC, are used in works on the analysis of mental states based on texts.
One of the pioneers in the study of mental disorders using data from the Internet is Munmun De Choudhury, who is developing computational methods for tracking the mental state of the population on social networks, in particular, identifying depression. She is best known for her research based on data from the microblogging service Twitter.
A special section “Computational Linguistics and Clinical Psychology" at the NAACL (North American Chapter of the Association for Computational Linguistics) conference is devoted to the problem of mental health and its research using natural language processing methods. Pennsylvania is also home to The World Well Being Project (WWBP), a collaboration between psychologists, computer scientists, and statisticians designed to study the psychosocial processes associated with health and happiness and their expression through social media.
One of the most important and fundamental works for this area can be considered the article [2], devoted to the development of LIWC. The authors described the main methods of text analysis and talked about how LIWC was created and evaluated, a program that counts words belonging to various semantic and emotional categories. The authors claim that with the help of this program it is possible to detect the main ideas of the text, the emotional component, highlight social relationships, thinking styles and individual differences of people.
Speaking about the idea of creating an analyzer, the authors refer back to Sigmund Freud, who was one of the first to suggest that words could be a marker of mental states and disorders. Rorschach tests are also based on linguistic examination and observation of how people describe their feelings and states. But more serious research in this area appeared only in the 1950s, along with the advent of content analysis and speech analysis. Walter Weintraub was the first to pay attention to people's use of functional parts of speech, articles and pronouns, and came to the conclusion that the frequent use of the first person singular. numbers may be a marker of depression.
The authors were prompted to create LIWC by the results of a psychological study in which people were asked to write about some emotional things in their lives: it turned out that in the stories they described primarily their psychological state.
This laid the foundation for the creation of expert dictionaries characterizing various moods and mental states.
LIWC consists of two main parts: the data processing algorithm and dictionaries. The program opens text files of any genre - essays, poems, blogs, etc. and goes through each file analyzing word by word. Each word is compared with words from dictionaries. LIWC then calculates the proportion of words from each category.
One of the important problems of using LIWC, in addition to problems with adaptation for other languages, as well as libraries and programs similar to it, is ignoring the context, and the inability to highlight irony, sarcasm and idiomatic expressions.
The authors of the article [3] believe that the physical and mental state of people can be determined by the words they use - along with the publications described above, they were among the first to seriously engage in research in this area. The main challenge facing the researchers was the need to extract relevant information from behavioral patterns reflected in the text, and then turn the unstructured text into a structured dataset.
The authors proposed a new text processing method for studying the condition of patients with post-traumatic stress disorder (PTSD) using lexical features and using texts written in the first person as data. The idea of the experiment was to ask people with PTSD to write down stories about traumatic events and PTSD symptoms, that is, instead of standard interviews with a doctor and questionnaires, texts written in the first person were used, where people shared their impressions.
From a collection of 300 texts, they extracted representative keywords and used them to create a model to determine the presence or absence of PTSD. The results of the algorithm revealed a high degree of agreement between the psychiatrists who marked the same texts and the computer.
De Chowdury's most famous work [4] is related to the study of social networks as a source of information about the mental state of the population in different populations. Together with her colleagues, she collected a large corpus of Twitter posts written by users with clinical depression and built a probabilistic model (trained an SVM classifier) that could determine whether a tweet contained signs of depression. The model, in particular, assessed signs of social activity (mentioning another user), emotions and language. The classifier determined depression with an accuracy of more than 80%. The result of the study was an index of depression in social networks, which can be used to assess the level of depression in the population. De Chowdhury's subsequent research also focused on Twitter and ways to detect depression.
After the work of Munmun de Chowdury, many studies appeared based on the same model. For example, the authors of the article [5] focused their research on the analysis of texts of people with diagnosed disorders from the following list: post- traumatic stress disorder; depression; bipolar disorder; seasonal affective disorder.
Using the API provided by Twitter, the authors found users who openly stated that they had any of the mental disorders on the list, and analyzed their tweets using language models, in particular, paying attention to the vocabulary and lifestyle features that can be extract from the data obtained. Based on this, they trained a classifier that is able to separate texts from mentally healthy people (more precisely, those who do not suffer from the disorders mentioned) from those who have a mental illness from the list.
The authors admit that their method has a number of disadvantages, in particular, it allows detecting only a part of the sick or potentially sick, and these data cannot be used as a real picture of the mental state of the population. Do not forget that the authors of the tweets, in the end, can be insincere and lie about their diagnoses. On the other hand, since mental disorders are still stigmatized, it can be assumed that it is unlikely that anyone would publicly announce a disease that they do not actually have. In addition, the control group could also include people with mental illness.
Patterns of user behavior and their analysis are allocated to a separate class of features. This means studying how correlations and observations described in the medical literature on a given topic are reflected in social media data --if they are found. Such signs are not easy to discern from Twitter data. For example, to assess social activity, parameters such as the number of mentions of a user in other people's tweets, reactions to his post, and the share of popular messages in his microblog were used. In addition, the authors tried to find signs of insomnia in patients and recorded the number of tweets written in the interval from 0:00 to 04:00 (taking into account the time zone).
Analysis using LIWC showed that the tone of the control group's tweets was indeed significantly different from the tweets of people with mental disorders. In general, each of the described methods shows significant differences between classes.
Twitter provides great scope for research: thanks to it, you can study not only the characteristics of publications among people with mental illness, but also various regional characteristics, not to mention political and sociological research. In general, Twitter is a large, open and accessible source of data, but has its limitations, in particular regarding the reliability of the data and the limited length of messages; as well as general difficulties with reading irony, etc.
In 2015, one of the assignments for the NAACL was to analyze tweets for signs of depression and post-traumatic stress disorder. The author of one of the articles on this topic [6] tried to solve this problem using reference lists and N-grams obtained from the training corpus of tweets. The authors note that the data from this task are not universal and the results of working with them are difficult to transfer into clinical practice, since their methods are strongly tied to the specifics of Twitter. However, although there is a detailed description of the method and structure of reference lists, the authors do not check or provide detailed results of testing their algorithms.
The next year, researchers tried to solve the same problem, but with slightly different data. Article [7] describes the development of a system that identifies users at risk. To create it, the author relied on sentiment analysis of the text and the results of the #BellLetsTalk campaign, organized by Canadian doctors to overcome the stigma of mental illness and raise awareness about it. The final data set included two groups of users: "depressed" and "non-depressed" users. The author cleared the data of neutral words, divided them into positive and negative ones, and based on them tried to train an SVM classifier to solve the binary classification problem (whether a person is at risk or not). When teaching, the author used such features as: "polar" words (negative and positive words, AFINN list), markers of depressi on, the use of first and second person pronouns. Using this data, the classifier had to learn to identify depressive statements.
For the tweet classifier, the results were not good, since the classifier showed low accuracy with high completeness. But for the user classifier, the accuracy was 0.7083, and the recall was 0.85.
The authors of the article [8] develop the idea of deep learning as applied to the analysis of tweets. First of all, they propose to optimize the creation of embeddings for classification using the example of determining depression on the CLPsych2015 dataset and Bell Lets Talk data. The article also provides a detailed analysis of some deep learning architectures that are commonly used in natural language processing tasks.
Another platform and, in a sense, a social network that is used as a source of data is Reddit. In some ways, it is even more convenient than Twitter: it has special thematic subsections (subreddits) dedicated to specific areas - cinema, entertainment, sports, science, etc. Accordingly, there are also subreddits dedicated to discussing mental illness, particularly anxiety. In addition, posts and comments on Reddit are much longer than on Twitter, which means that statements from there have a greater number of characteristics. Like Twitter, Reddit has an API that allows you to easily download all the texts of interest.
The authors of [9] decided to use data from a thematic subreddit to train a classifier. Having collected a dataset from posts dedicated to anxiety, using embeddings created on their basis, they solved the problem of binary classification (anxiety-not anxiety) with very high accuracy - 91 % and 98% for different classifiers. For the first experiment, the authors collected 28.8 thousand posts related to anxiety, including control anxiety, panic attacks, social anxiety, and health anxiety.
To generate features, the authors used a lot of approaches, trying to find the ideal one:
Word2Vec and Doc2Vec
Model based on Dirichlet distribution and topics
LIWC-signs
N-grams, 4 cases for unigrams and bigrams.
High results were achieved using all four types of features.
Another study [10] also uses data from thematic sections of Reddit. The authors' goal was to study how people suffering from stress and anxiety communicate with each other. The specificity of the study was that, according to the authors, it is not accepted to talk about nervousness and anxiety publicly, and therefore people suffering from these diseases rarely verbalize their experiences.
The main research tool was again LIWC. To determine the "anxiety" of a text, the authors built a decision tree that correctly determined the class of the text in 68% of cases. It turned out that when talking about anxiety, people actively use both common words and words related directly to anxiety (describing fear, worry), and the latter more often than in "non-anxious" texts. In addition, the authors paid attention to how users who are active participants in subreddits about anxiety write when they correspond within neutral subreddits - they use more common words and conjunctions. Even outside thematic forums, they use words that indicate anxiety, and also show less social activity, ask fewer questions and thank other users less often.
The authors of the article [11] developed a classifier based on machine learning that uses N-grams, syntactic patterns, lexical features and word embeddings to determine the patient's emotional state from the texts he created.
Another study [12] notes that the way people talk about depression is largely determined by culture and language. To test these differences, the authors took data from the 7 Cups of Tea mental health project and applied LIWC, topic modeling, data visualization, and other techniques to it. They compared data for different populations: African Americans and Africans, Hispanics and Latinos, and Asians or Pacific Islanders. As a result of the analysis, it turned out that there are indeed a number of differences for different groups.
In general, works on this topic represent a wide range of different experiments and descriptions of various combinations of features. Every year, technologies and algorithms improve, and more and more data appears. Probably, the best option for such research would be to obtain a large corpus of text responses to special questions or essays on a given topic. Such data will be more reliable and will allow algorithms to be trained more reliably, since good results on small corpora in some of the above articles may be a consequence of overfitting and Bayes due to a small number of texts. That is, if in practice, within the framework of, for example, special clinical texts, such algorithms work well, it is still difficult to consider them a full- fledged solution to the problem.
References
1. Corcoran, et al. 2018 -- C. M. Corcoran, F. Carrillo, Fernandez-Slezak D, Bedi G, Klim C, Javitt DC, Bearden CE, Cecchi GA. Prediction of psychosis across protocols and risk cohorts using automated language analysis // World Psychiatry. 17 (1) 2018. P. 67-75.
2. Tausczik, Pennebaker 2010 -- Y. R. Tausczik, J. W. Pennebaker. The psychological meaning of words: LIWC and computerized text analysis methods // Journal of Language and Social Psychology, 29(1), 2010. P. 24-54.
3. He, Veldkamp, de Vries 2012 -- Q. He, B. P. Veldkamp, T. de Vries. Screening for posttraumatic stress disorder using verbal features in self narratives: A text mining approach. Psychiatry Research, 198, 3. 2012. P. 441-447.
4. De Choudhury, Counts, Horvitz 2013b -- M. De Choudhury, S. Counts, and E. Horvitz. Social media as a measurement tool of depression in populations // In Proceedings of the Annual ACM Web Science Conference. Paris, 2013. P. 47-56.
5. Coppersmith, et al. 2015 -- G. Coppersmith, M. Dredze, C. Harman, K. Hollingshead. From ADHD to SAD: Analyzing the Language of Mental Health on Twitter through SelfReported Diagnoses // Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. Denver, Colorado, June 5, 2015. P. 1-10.
6. Asgari, Nasiriany, Mofrad 2016 -- E. Asgari, S. Nasiriany, M. R.K. Mofrad. Text Analysis and Automatic Triage of Posts in a Mental Health Forum // Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. San Diego, California, June 16, 2016. P. 153-157.
7. Pedersen 2015 -- T. Pedersen.Screening Twitter Users for Depression and PTSD with Lexical Decision Lists. Ted Pedersen // Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. Denver, Colorado, June 5, 2015. P. 46-53.
8. Jamil, Inkpen, Buddhitha 2017 -- Zunaira Jamil, Diana Inkpen, Prasadith Buddhitha. Monitoring Tweets for Depression to Detect At-risk Users. // Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology.Vancouver, Canada, August 3, 2017. P. 32-40.
9. Orabi, et al. 2018 -- A. H. Orabi, P. Buddhitha, M. H. Orabi, D. Inkpen. Deep Learning for Depression Detection of Twitter Users. // Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic. New Orleans, Louisiana, June 5, 2018. P. 88-97.
10. Shen, Rudzicz 2017 -- J. H. Shen, F. Rudzicz. Detecting anxiety on Reddit // Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology. Vancouver, Canada, August 3, 2017. P. 58-65.
11. Ireland, Iserman 2018 -- M. E. Ireland, M. Iserman. Within and Between-Person Differences in Language Used Across Anxiety Support and Neutral Reddit Communities // Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic. New Orleans, Louisiana, June 5, 2018. P. 182-193.
12. Shickel, et al. 2016 -- Benjamin Shickel, Martin Heesacker, Sherry Benton, Ashkan Ebadi, Paul Nickerson, Parisa Rashidi. Self-Reflective Sentiment Analysis // Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. San Diego, California, June 16, 2016. P. 23-32.
Размещено на Allbest.ru
...Подобные документы
Biography Anthony Stafford Beer - cybernetics, theorist, expert in the field of operations research and the so-called "second wave" of cybernetics. The publication of his book "Cybernetics and Management". Scientific activities of Anthony Stafford Beer.
презентация [269,8 K], добавлен 29.11.2013Social network theory and network effect. Six degrees of separation. Three degrees of influence. Habit-forming mobile products. Geo-targeting trend technology. Concept of the financial bubble. Quantitative research method, qualitative research.
дипломная работа [3,0 M], добавлен 30.12.2015Строение класса complex. Примеры использования класса complex. Результат выполнения программы. Цикл возведения первого числа во второе. Операции с комплексными числами. Конструкторы и операции присваивания для типа complex. Неявные преобразования типов.
курсовая работа [1,5 M], добавлен 18.05.2011Non-reference image quality measures. Blur as an important factor in its perception. Determination of the intensity of each segment. Research design, data collecting, image markup. Linear regression with known target variable. Comparing feature weights.
дипломная работа [934,5 K], добавлен 23.12.2015Методология, технология и архитектура решения SAP Business Objects. Возможные действия в Web Intelligence. Создание документов и работа с ними. Публикация, форматирование и совместное использование отчетов. Общий обзор приложения, его интерфейсы.
курсовая работа [1,4 M], добавлен 24.09.2015Проблемы оценки клиентской базы. Big Data, направления использования. Организация корпоративного хранилища данных. ER-модель для сайта оценки книг на РСУБД DB2. Облачные технологии, поддерживающие рост рынка Big Data в информационных технологиях.
презентация [3,9 M], добавлен 17.02.2016The solving of the equation bose-chaudhuri-hocquenghem code, multiple errors correcting code, not excessive block length. Code symbol and error location in the same field, shifts out and fed into feedback shift register for the residue computation.
презентация [111,0 K], добавлен 04.02.2011Data mining, developmental history of data mining and knowledge discovery. Technological elements and methods of data mining. Steps in knowledge discovery. Change and deviation detection. Related disciplines, information retrieval and text extraction.
доклад [25,3 K], добавлен 16.06.2012Классификация задач DataMining. Создание отчетов и итогов. Возможности Data Miner в Statistica. Задача классификации, кластеризации и регрессии. Средства анализа Statistica Data Miner. Суть задачи поиск ассоциативных правил. Анализ предикторов выживания.
курсовая работа [3,2 M], добавлен 19.05.2011Technical and economic characteristics of medical institutions. Development of an automation project. Justification of the methods of calculating cost-effectiveness. General information about health and organization safety. Providing electrical safety.
дипломная работа [3,7 M], добавлен 14.05.2014Анализ существующего программного обеспечения эмпирико-статистического сравнения текстов: сounter оf сharacters, horos, graph, advanced grapher. Empirical-statistical comparison of texts: функциональность, процедуры и функции тестирование и внедрение.
дипломная работа [4,4 M], добавлен 29.11.2013A database is a store where information is kept in an organized way. Data structures consist of pointers, strings, arrays, stacks, static and dynamic data structures. A list is a set of data items stored in some order. Methods of construction of a trees.
топик [19,0 K], добавлен 29.06.2009MathML (Mathematical Markup Language): язык разметки математических приложений. Математика и ее система обозначений. Существующие языки математической разметки. Синтаксис и грамматика MathML. Возможности современных браузеров при работе с MathML.
курсовая работа [489,2 K], добавлен 14.07.2009American multinational corporation that designs and markets consumer electronics, computer software, and personal computers. Business Strategy Apple Inc. Markets and Distribution. Research and Development. Emerging products – AppleTV, iPad, Ping.
курсовая работа [679,3 K], добавлен 03.01.2012Описание функциональных возможностей технологии Data Mining как процессов обнаружения неизвестных данных. Изучение систем вывода ассоциативных правил и механизмов нейросетевых алгоритмов. Описание алгоритмов кластеризации и сфер применения Data Mining.
контрольная работа [208,4 K], добавлен 14.06.2013Совершенствование технологий записи и хранения данных. Специфика современных требований к переработке информационных данных. Концепция шаблонов, отражающих фрагменты многоаспектных взаимоотношений в данных в основе современной технологии Data Mining.
контрольная работа [565,6 K], добавлен 02.09.2010Классификация информационных систем управления деятельностью предприятия. Анализ рынка и характеристика систем класса Business Intelligence. Классификация методов принятия решений, применяемых в СППР. Выбор платформы бизнес-интеллекта, критерии сравнения.
дипломная работа [1,7 M], добавлен 27.09.2016Cуперкомп'ютери виробництва Cray Research. Векторна обчислювальна система: регістри та арифметико-логічний пристрій. Підходи до архітектури засобів векторної обробки. Архітектура комп’ютерів Cray. Реконфігурований блэйд-сервер. Програмне забезпечення.
курсовая работа [696,0 K], добавлен 18.05.2012Overview of social networks for citizens of the Republic of Kazakhstan. Evaluation of these popular means of communication. Research design, interface friendliness of the major social networks. Defining features of social networking for business.
реферат [1,1 M], добавлен 07.01.2016Основы для проведения кластеризации. Использование Data Mining как способа "обнаружения знаний в базах данных". Выбор алгоритмов кластеризации. Получение данных из хранилища базы данных дистанционного практикума. Кластеризация студентов и задач.
курсовая работа [728,4 K], добавлен 10.07.2017