Automated image analysis as a method of selecting visual stimuli in lexical typology research
Questionnaires are the tools uses in lexical typology with collect data for low-resourced languages, for which there are no large corpora or detailed dictionaries. Algorithm for automatically collect visual stimuli for lexico-typological research.
Рубрика | Программирование, компьютеры и кибернетика |
Вид | дипломная работа |
Язык | английский |
Дата добавления | 04.12.2019 |
Размер файла | 1,1 M |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru
Размещено на http://www.allbest.ru
Introduction
Lexical and typological studies require the creation of a large number of questionnaires. Different types of questionnaires, such as wordlists, checklists, translation-based questionnaires, sets of extralinguistic stimuli (pictures, video, audio clips, etc.) require different approaches on how to create them. New methods and tools for automatic analysis of texts and images are being developed, which can be efficiently used, among other things, to automate the process of designing questionnaires. In particular, for the automatic development of contextual questionnaires have already been used such tools as parallel corpora (http://www.ruscorpora.ru/search-para-en.html), Google Ngrams (https://books.google.com/ngrams/info), distributional models (https://scikit-learn.org/0.21/modules/classes.html#module-sklearn.svm).
Despite the fact that much research (Orekhov and Reznikova, 2015; Ryzhova & Paperno, 2019) have already proposed various algorithms for automatic creation of textual stimuli for lexico-typological questionnaires, the challenges of finding new methods and developing existing ones still need to be addressed.
Thus, the following study consists of two parts.
In the first part (section 5.1) I will develop a tool that will allow us to see which of the typologically relevant situations can be distinguished based on automated image processing. In turn, it will permit us to design questionnaires with visual stimuli and to expand the horizons of the future research. Since each semantic field requires a new questionnaire, the entire process of creating a questionnaire should be based on one particular semantic domain. In this study, the proposed questionnaire will be designed for the semantic field of the verbs of falling. This choice of semantic domain is due to the wide knowledge of this type of verbs. In particular, the verbs of falling were discussed in the following paper (Kuzmenko & Mustakimova, 2015). Basing on these studies, I will automate the process of collecting images for the questionnaires for the verbs of falling. Specifically, I will use urllib.request library to download the images from the requests in google.com for the verb "fall" in different languages. Thus, it will create an image dataset consisting of illustrations for the verbs of falling. Then I will apply the clustering algorithm from the Scikit-Learn module (https://scikit-learn.org/0.21/modules/classes.html#module-sklearn.svm) to automatically group all the images into clusters (see section 5.1.1 and 5.1.2 for details). Next, I will analyze the resulting clusters and draw conclusions about which of the typologically relevant situations are reflected in the images (section 5.1.3). It will allow us to make a theoretical generalization about image processing in lexical typology and to improve a representative power of the resulting questionnaires.
Not only it will help lexical typologists to facilitate the process of creating questionnaires with extralinguistic stimuli, but also open up new opportunities for the use of images in lexico-typological studies. This tool could be also helpful for lexicographers in regard to automatically illustrating dictionaries with images. For example, it can be useful for those words that are very difficult to describe, but easy to show.
In the second part (section 5.2) I am going to modify an existing algorithm for creating a contextual questionnaire. Thus, this part of the work will be based on the data of previous studies(Ryzhova & Paperno, 2019). Namely, I will consider lists of the most frequent bigrams for verbs падать `to fall' and the set of vectors for these bigrams. Using these bigrams as request in google.com, I will create another database of images. After the downloaded images are converted into vectors, I will select the central vector and combine it with the word-based vector. (see section 5.2.1 and 5.2.2 for details). Finally, I will use the clustering algorithm on resultant vectors and compare the results with the results obtained only for the bigrams in previous research. (section 5.2.3). Thus, I will present a modified algorithm to work not only with texts, but also with images. Since automatic image processing has never been used within that domain, this will open up prospects for further work in this direction.
The current version of the Python code for the algorithm mentioned above is available on https://github.com/mimimizerova/Thesis. All the required files could be found at https://github.com/Thesis, https://drive.google.com/drive/folders/13fZpUQwhCIfHS6QlT4pOZvYtgNhZ9QFK.
1. Background
Questionnaires are one of the most important tools in lexical typology. They allow us to do cross-linguistic research and collect data for low-resourced languages, for which there are no large corpora or detailed dictionaries. There are different types of lexico-typological questionnaires. In my study, I will consider questionnaires that use extra-linguistic stimuli and context-based questionnaires. Questionnaires consisting primarily of extralinguistic stimuli (such as pictures, smell samples, video clips) are widely used by the research group of the Max Planck Institute for Psycholinguistics in Nijmegen (Majid, Bowerman (eds.) 2007, Majid, Levinson (eds.) 2011, Kopecka, Narasimhan (eds.) 2012). The pros and cons of such questionnaires were well described by Orekhov and Reznikova in their paper (Orekhov and Reznikova, 2015). On one hand, questionnaires with extra-linguistic stimuli allow us to refer to the signifier, not only the signified, and thus to obtain confirmation that the meanings that we distinguish are correct, and our question is correctly interpreted. On the other hand, for many concepts, such as emotions or pain, it is very difficult to find a suitable perceptual stimulus.
Contextual questionnaires are based on the assumption that lexical meanings can be studied and reconstructed by observing a word's “surroundings” or, primarily, collocations (Rakhilina & Reznikova, 2016). This is a frame approach, the advantages of which have been described in the study of Rakhilina and Reznikova. This method, which was developed and tested by the Moscow Lexical Typology Group, has proven to be effective in many studies. (Ryzhova & Paperno, 2019) One of the main advantages of this method is that it is applicable to a large number of areas. In addition, the frame approach allows us to analyze both direct and figurative meanings of words, fully and in detail (Ryzhova & Paperno, 2019).
Both extralinguistic and context-based questionnaires are usually manually prepared, which is time and labour consuming. Automatic questionnaire development is an important area starting to gain attention of the researchers in the field. A number of articles have been written on this topic (Orekhov and Reznikova, 2015; Ryzhova & Paperno, 2019). To perform this task, different methods have been suggested. For example, in the paper of Orekhov and Reznikova (Orekhov and Reznikova, 2015) Google NGrams were used. In the article about automatic generation of questionnaires (Ryzhova & Paperno, 2019) distributional semantic methodology (Baroni et al. 2014) was applied. The research used data from the Russian National Corpus (URL: http://www.ruscorpora.ru). All contexts were distributed automatically. Frames selected using the Distributional Semantic Models framework were used to create the questionnaire. According to this paper (Ryzhova & Paperno, 2019) the results of the quantitative evaluation of the correctness and clustering purity of the grouping by frames were sufficiently high. However, this algorithm was tested only for adjectives, therefore, the question remains whether we can apply Machine Learning clustering methods to make clusters for verb contexts.
But, as shown in the recent articles (Gella, Lapata & Keller, 2016, Silberer & Pinkal, n.d.) Maching Learning can be used for clustering and selection of the corresponding frames in the images. In particular, visual attributes of the images were used to solve these problems in the paper (Silberer, Ferrari, & Lapata, n.d). On this basis, the images were divided into semantically consistent groups. Another research (Gella, Lapata & Keller) developed an algorithm for Visual Sense Disambiguation and achieved excellent results in the field of action recognition. The whole study was based on specially labeled datasets such as COCO (http://cocodataset.org/#home) and TUHOI. (http://disi.unitn.it/~dle/dataset/TUHOI.html)
In my research, I will use the results achieved in clustering images by their meanings (Silberer, & Pinkal, n.d.) and apply them to the automatic creation of questionnaires. By combining the distributive vector method shown in Ryzhova and Paperno's article (Ryzhova & Paperno, 2019) and algorithms created for images, I will offer a tool for automatically adding images to questionnaires and creating questionnaires that use both visual stimuli and a textual data. It will permit us to extend a methodology and create an exemplary algorithm to work with visual stimuli in lexical questionnaires.
2. Research question
The problem of automatic generation of lexical-typological questionnaires was discussed and addressed in many studies (Ryzhova & Paperno, 2019; Orekhov and Reznikova, 2015). Progress has been made in such areas as automatic generation of text questionnaires. However, for questionnaires containing pictures, this problem has not yet been solved and even addressed. Thus, my study is an attempt to use the results obtained in previous research and apply them to images. Automatic image processing, in turn, was discussed in detail in several articles (Silberer, Ferrari, & Lapata (n.d.), Gella, Lapata & Keller, 2016). For example, in the recent study (Silberer & Pinkal, n.d.), the processing of images depended on the context in which they were used, which is remarkably well suited to the frame approach. However, these results have not yet been used to solve such problems as the creation of lexico-typological questionnaires. Hence, future research includes the exploration of the model developed in this study. In my paper, I propose to combine the two areas, analyze the results and develop a good algorithm for future research. It will allow us to design a methodology and understand how to work with pictures for lexical questionnaires. Firstly, on the part of lexical typology, it will make the process of compiling lexical-typological questionnaires less time-consuming. Moreover, from a theoretical point of view, it will bring us closer to understanding which types of situations are easier to visualize and which are more difficult. In addition, it will allow us to see what visual characteristics of the situations of falling may be important for the lexicalization in this area. Secondly, on the part of lexicography, it will provide additional data to accompany dictionary entries that already have images.
3. Methods
Exploring datasets.
Before starting working with images and designing an algorithm, let's turn to the existing studies on image processing and consider in detail the data that is used there.
For example, in the article (Gella, Lapata & Keller, 2016) were presented such datasets as COCO (http://cocodataset.org/#home), TUHOI (http://disi.unitn.it/~dle/dataset/TUHOI.html).
Both of these datasets are object oriented. More specifically, COCO is made for the task for object detection. Each picture shows one specific object. In turn, TUHOI is dedicated to human interactions with objects and consists of images of humans. Since the objects in the first case were not considered in terms of the type of their fall, and the humans in the second case were not in a fall situation either, these datasets doesn't quite fit the purpose of this research.
Silberer, Ferrari and Lapata (C. Silberer, V. Ferrari, & Lapata, M., (n.d.)) used ImageNet (http://www.image-net.org). ImageNet is one of the largest databases of images. All the pictures laid out in accordance with the WordNet system, i.e., they are divided into groups by their lexical meaning. A significant advantage of ImageNet is that it is marked up manually, and each picture is described in rigour with its semantic properties. However, for the task of the study, unmarked images are much more suitable - thus, we will avoid bias and will be able to evaluate pictures only by their visual characteristics. In addition, this research is based on a specific type of verbs - verbs of falling. Unfortunately, even in the most extensive databases there are not so many images of this subject.
That is why, for the most complete and effective solution of the research problem, it was decided to create an entirely new database of images. However, following the examples of the databases discussed earlier, we will be able to provide pictures with several semantic attributes that are most suitable for lexicographic questionnaires.
Collecting data.
In order to collect a dataset of images dedicated to the verbs of falling, it was decided to use google.com. Each search query was processed using the Urlib.request module from python's request library (https://docs.python.org/3/library/urllib.request.html). Then all the links from the html-code of the page were automatically found using Beautiful soup library (https://www.crummy.com/software/BeautifulSoup/bs4/doc/). For each request, exactly 20 images were downloaded, which corresponds to the results from the first page of the search. A list of all the verbs used in the search queries can be found in sections 5.1.1 and 5.1.2, Tables N and N. The final image dataset can be found at https://drive.google.com/drive/folders/13fZpUQwhCIfHS6QlT4pOZvYtgNhZ9QFK
4. Experiment with images
Clustering algorithm.
To select the clustering method that is most suitable for the task of this study, a test sample of images was used. The test sample consisted of 209 images collected automatically on request in google.com. The requests were compiled as the verb "fall" in different languages. Below (Table 1) are the details regarding the languages and verbs used in the requests:
Table 1. Verbs that were used as the requests
language |
russian |
english |
french |
german |
spanish |
polish |
|
request |
падать упасть рухнуть |
to fall to fall down to drop |
tomber trebucher |
herunterfallen |
caer |
padaж spadaж |
The resulting dataset consisted mainly of images of people.
Each picture was read using the imread method from the matplotlib library.pyplot. Then, each picture was reduced to one size using the imresize method from the scipy.misc library (https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.misc.imread.html). The resulting vector was reshaped to one length using the numpy ravel method. As a result, some images were not included in the test sample, because they did not fit in size and could not change shape.
Since the number of clusters to be obtained is unknown, it was decided to use clustering algorithms with automatic determination of the number of clusters. There are only three such algorithms in the sklearn library: affinity propagation, DBScan and Mean-Shift. All three algorithms were tested on image vectors from the test sample. Affinity propagation showed as a result the largest number of clusters (namely 10), while DBScan and Mean-Shift distributed test images for 4 clusters. Since for the purposes of this research it is important to obtain the best quality of clustering and the largest number of clusters (in order to better filter out unsuitable images and more accurately distribute images across frames), it was decided to use Affinity propagation in this study. Below (Figure 1) is an example of images, that were grouped into the same cluster.
Figure 1. An example of images, that were grouped into the same cluster
After that, all the images from the test sample were distributed into folders according to their cluster. Cluster centers and the nearest elements were found. For this purpose, an affinity matrix of the size 209*209 was used. In this matrix, the Euclidean distance was calculated for each pair of image vectors.
The analysis of the obtained clusters allowed us to draw several conclusions about the quality of clustering.
First, the quality of clustering was affected by the imbalance of the test sample: most of the pictures contained images of people. Clusters also turned out very uneven (see Table 2 below):
Table 2. Size of the clusters
cluster |
number of images |
|
1 |
65 |
|
2 |
9 |
|
3 |
13 |
|
4 |
12 |
|
5 |
5 |
|
6 |
3 |
|
7 |
4 |
|
8 |
45 |
|
9 |
26 |
|
10 |
27 |
Secondly, since the test sample was overloaded with images of people, several clusters were dedicated to the fall of people. In particular, such clusters were obtained: the falling of people "in the air", the initial moment of people falling and the final moment of the fall (when a person is already lying on the surface). In addition, several images of people falling down stairs were placed in a separate cluster.
In general, the results of clustering met our expectations and the quality of clustering can be considered good. For instance, images that are not related to the falling were placed in a different cluster. Moreover, as we know from typological studies, the initial and final moment of falling are typologically relevant. The fact that these features also stand out as a result of image clustering is important for future research. Also, the program was able to allocate a separate cluster for the falling of water (due to the Polish collocation padaж deszcz `to rain' ), as well as a cluster for the falling bridges (due to the russian verb рухнуть `to collapse'). Thus, the tested algorithm can be applied to the main sample.
Clustering the main sample.
After the best clustering method was selected and the algorithm was tested on a test sample, the main sample was collected. There were 760 images in the main sample. This time we used a variety of verbs denoting different types of falling, including verbs such as сыпаться `to pour, but only for dry, granular objects', бултыхнуться `to fall into the liquid with a characteristic sound', izpadati `fall out (usually used for hair and teeth)', etc. The full list of verbs can be found in Table 3 (Table 3) below:
Table 3. Verbs that were used as requests for collecting main sample
language |
russian |
english |
french |
german |
|
request |
падать упасть рухнуть выпасть опасть завалиться повалиться отвалиться обвалиться сорваться отскочить соскочить обрушиться сыпаться осыпаться шлепнуться шмякнуться бултыхуться |
to fall to fall down to drop |
tomber trebucher |
herunterfallen |
|
language |
hungarian |
slovenian |
spanish |
polish |
|
request |
beszakadni kiszуrуdni |
padati deћevati izpadati kruљiti se naletavati odleteti odpadati sneћiti udirati se |
caer caerse |
padaж spadaж wypadaж |
The verbs were chosen according to the articles (Kuzmenko & Mustakimova, 2015; Kashkin et al., 2015; Koleљova, 2016). Each image from the sample was processed using the same algorithm as in section 5.1.1 Some pictures were not included in the final sample, because they did not fit in size. As a result, 760 vector images were obtained.
Clustering was performed on these vectors using the Affinity Propagation method. As a result of clustering, 23 clusters were formed. All clusters are available at https://drive.google.com/drive/folders/13fZpUQwhCIfHS6QlT4pOZvYtgNhZ9QFK.
Compared to the clusters obtained in the test sample, the clusters from the main sample were more balanced in size. However, the final groups of images are less uniform in what they depict. Namely, the fall of people from the stairs are presented in two clusters and mixed with the fall from the bike. The initial moment of the fall, the final moment of the fall and the fall in weightlessness still stand out as clusters, but are mixed with images of graphs, machines and hair falling out. There was a clearer cluster with images of the sea shore, but the other cluster consisted of forest, and asphalt, and the sea. In a separate cluster got all the pictures of bright yellow shampoo, not related to the verbs of the fall. Thus, we can notice that clustering helps to separate the images that got into the dataset by accident.
Analyzing the clusters, it was noticed that the colors of the images also affect clustering. In some groups, the images were treated to a pronounced color scheme, for example, as bright yellow pictures of shampoo. It was decided to convert the images to black and white and cluster the black and white images.
Clustering the main sample in black and white.
The color scheme can both help and hinder clustering. On the one hand, the colors in the pictures allow the algorithm to better distribute objects that are always of the same color. For example, the sky and the sea will often be in blue colors, apples will often be red and trees green. On the other hand, the algorithm will place different objects of the same color in the same cluster. On the contrary, all the images will be converted to black and white, this will allow the algorithm to focus more on the shape of the object than on its color.
To convert an existing image dataset consisting of 760 images to black and white, the following code was used (http://www.cyberforum.ru/python/thread2206610.html).
After that, following the algorithm described in section 5.1.1, vectors of black-and-white images were obtained. Further, clustering was carried out on the basis of these vectors. In comparison with the outcomes of clustering of color images, as a result of clustering of black and white images, the number of clusters increased: a total of 26 clusters were created.
As expected, compared to the results obtained in section 5.1.2, the algorithm was able to better distinguish the shape of the objects depicted in the picture. For example, two different clusters were falling from a ladder and falling from a Bicycle. As for falling people, you can see more similarities in the angle of incidence. The machines that could not separate the algorithm in 5.1.2 were placed in a single cluster. However, there was not only an improvement, but also a deterioration in the quality of clustering. For example, the algorithm was not able to put bright yellow shampoos in a separate cluster, unlike the algorithm in 5.1.2. As with color images, there were many mixed clusters.
Overall, we can conclude that the resulting image dataset was complex enough to cluster. First, the sample includes images that are not quite related to the verbs of falling. Second, some images were overloaded with subjects and objects. Despite this, the clustering algorithm turned out to highlight some interesting clusters: the initial moment of the fall, the final moment of the fall, the fall from the stairs, the fall of bridges. Those images that depicted only one object were better suited to clustering. Also the objects and images of a particular bright color stand out well and appear easy to cluster.
Experiment with bigrams and text-based vectors.
For this experiment, the data of the previous study (Ryzhova & Paperno, 2019) were used. Precisely, these data included:
lists of the most frequent bigrams for the verbs падать `to fall'. This list was obtained as follows: from the main subcorpus of the Russian National Corpus (http://www.ruscorpora.ru/) nouns, which occurred at a distance of 1 or -1 from the verb падать `to fall'. Lemmas, not word forms, were taken into account when calculating the frequency. This is the main difference from the list of bigrams proposed in the Russian National Corpus.
list of the vectors for these bigrams obtained from matrices of the frequency of occurrence for each word. In turn, to obtain the vector of the phrase, an additive model of the composition (i.e. a simple sum) was used.
list of the vectors for the same bigrams reduced according to the SVD method
The experiment was decided to be conducted in two parts. In the first part (sections 5.2.1 and 5.2.2.) a text vector was attached to each of the image vectors. Text vector in this case serves as a text attribute, while the focus of the experiment is on the image. The aim of the experiment was to find out whether adding text would improve image clustering.
In the second part (section 5.1.3), on the contrary, each text vector was assigned with the vector closest to the middle vector (sum of all vectors divided by the number of vectors). Thus, this part focuses on text vectors. The purpose of this experiment was to find out whether the addition of images would increase the accuracy of the clustering of the phrases.
Combining images and texts on the test sample.
To check the quality of clustering on a more balanced sample, another test set of images, based on bigrams, was collected. This time, the requests were made so that the subject of the falling situation was different. For instance, the requests looked like this: a plane falls, a star falls, a drop falls, etc. In total, 12 subjects, included in 12 most common bigrams for the verb падать `to fall' from the Russian National Corpus were used. Below is the list of these bigrams sorted by frequency:
1. снег `snow'
2. свет `light'
3. звезда `star'
4. цена `price'
5. лист `leaf'
6. человек `human'
7. температура `temperature'
8. уровень `level'
9. самолет `plane
10. капля `drop'
11. тень `shadow'
12. скорость `speed'
For each request, exactly 20 images were downloaded. Since some images were discarded as unsuitable in size, a total of 226 images appeared in the test sample. As a first experiment, it was decided to cluster the images without adding text vectors. As a result of this clustering (the same clustering method, i.e. Affinity Propagation was used), there were obtained 13 clusters. Some of these clusters were well differentiated. Such clusters include a cluster of falling leaves, a cluster of graphs, a cluster of falling stars, a cluster of planes. Others of these clusters were less well recognized: droplets appeared in all clusters, and people did not join the same group. Below (Figure 2) is an example of the group of images that were assigned to the same class:
Figure 2. Example of the images that were placed in the same cluster
To find out whether clustering would show better results if images were combined with texts, text vectors from another study were used. A vector of the corresponding bigram from the test sample was added to each of the 20 images. For example, if the images were downloaded at the request of plane falls, a text vector was added to each of the 20 such images, obtained from the bigram plane falls. After the vectors were combined using the concatenate method from the numpy library (https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html), clustering was performed. All images were automatically grouped into 12 clusters. As with image-only clustering, some clusters were still well recognized. For example, the star cluster, the graph cluster and the aircraft cluster were still distinguished by the algorithm. However, we can also observe a decrease in the quality of clustering. For instance, despite the fact that all the leaves were still in the same cluster, planes, drops and graphics fell into the same cluster. Overall, clusters were more balanced in size, but less balanced in terms of colours and, in some cases, even in terms of shapes of objects.
Adding text to images on the main sample.
After the experiments were carried out on the test sample, the main sample was collected. The same list of frequency phrases from the Russian National Corpus was used. The first hundred bigrams from this list were used as search queries. After all the images were processed and vectors were obtained, we got a sample of 1745 vectors. First, all vectors from this sample were clustered. No textual data has been added to these vectors. After clustering, 46 image clusters were obtained.
As in section 5.2.1, some of the clusters were more determined. Such clusters are: cluster with barometers, cluster with glasses, cluster with leaves, cluster with drops, cluster with the sun, cluster of stars and cluster with graphs. It is worth noting that the fall of the glass really stands out typologically, as the glass is broken when falling. In addition, stars, meteorites and the sun do fall differently, as their fall does not apply to a fall on earth. The drop on the chart is also different from all other falls.
Other clusters were mixed. For example, a fall from a horse appeared in many clusters, but, unfortunately, was not allocated to a separate cluster. Falling of the snow and snowflakes have also been found in many clusters. All resulting clusters, as well as the original images used in clustering, can be found at https://drive.google.com/drive/folders/13fZpUQwhCIfHS6QlT4pOZvYtgNhZ9QFK.
Now let's take look at the clustering results with the addition of text vectors. After clustering of combined vectors, 47 clusters were obtained. Analysis of the clusters showed that they were more diverse and less similar in color. In some clusters, there were several clearly defined groups. For example, the barometer, which in clustering without text data was assigned to a separate cluster, is now in the same cluster with the charts and money. Snowflakes and snow were put in the same cluster as well. Drops, bombs and a meteorite were in one cluster as well. The stars, as in the previous step, were placed separately.
Several conclusions can be drawn from the results of these two clustering. First, clustering, which relies only on images, recognizes the color of images better. Second, clustering, which relies only on texts, includes several groups of images in a single cluster. These groups of images can be combined as a common form (as in the case of drops and bombs falling from the sky), and common semantics (as in the case of snow and snowflakes). However, clustering performed on the combined vectors, given a much more mixed and complicated clusters.
Adding images to texts.
To collect the main sample of images, the 100 most frequent words used with the verb падать `to fall' were chosen. These words were used in search queries in google.com. Each request, as in paragraph 5.2.2., was made in accordance with the bigrams. In total, 2000 images were collected i.e. exactly 20 images for each request. Then for each group of 20 images the average vector was calculated using the arithmetic mean. But only the image closest to the average vector was going to be added to the text vector. To estimate the proximity of vectors, a cosine measure was used (module scipy.spatial from distance bibliotheque, https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html). Thus, the entire sample was reduced to 100 vectors. Due to the small size of the text vectors, all image vectors were compressed to the size of 100. To accomplish this, the SVD method from sklearn library was applied (https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD).
Finally, text and visual vectors were combined. The combined vectors were clustered using the Affiniti Propagation tool. As a result, 7 clusters were obtained. Below is a table (Table 4) with the resulting clusters:
Table 4. Results of the clustering without textual data
1 |
снег свет доллар дождь волос качество башня кирпич дитя бутерброд девушка ребенок |
|
2 |
лист человек зрение выбор доход зрение недвижимость часть рука барометр |
|
3 |
уровень самолет капля дерево ракета стена занавес |
|
4 |
яблоко слово метеорит производство наоми лошадь |
|
5 |
солнце рубль грохот солдат |
|
6 |
звезда подозрение птица небо |
|
7 |
цена температура скорость давление тень луч рождаемость бомба взгляд вода cпрос вес звук снежинка рынок тело камень слеза ударение настроение продажа производительность оборот количество снаряд кривая стоимость эффективность доля объем курс напряжение хлопья самооценка голова мощность популярность способность активность предмет индекс акция сила число сердце темп женщина снежок шум песня показатель поток энергия вес мужик способность раз |
At first glance, it may seem that clustering has not shown good results. However, if we compare the results with the pictures, there is a reason for this distribution. For example, cluster number 7 corresponds to the images that show different graphs. Cluster number 6 combines images that somehow depict the sky. Cluster number 1 consists primarily of images of girls (girls hair, girl in the rain, girl under the snow, etc.) Cluster number 3 combines the shape of objects as well as a similar distribution of light and shadow.
Let us compare the results with those obtained by clustering only text vectors, without adding images. The table below (Table 5) shows the results of this clustering:
Table 5. Results of clustering with textual data
1 |
температура скорость давление вес кривая напряжение поток энергия |
|
2 |
звук грохот шум песня |
|
3 |
зрение подозрение настроение наоми |
|
4 |
дерево камень башня стена кирпич |
|
5 |
ударение слово раз |
|
6 |
цена доллар продажа доход стоимость недвижимость курс рубль акция |
|
7 |
самолет бомба снаряд ракета метеорит |
|
8 |
звезда свет тень луч дождь солнце птица небо барометр |
|
9 |
уровень рождаемость спрос рынок качество выбор производительность оборот количество эффективность часть доля объем мощность индекс производство число темп показатель |
|
10 |
снег лист капля вода яблоко снежинка хлопья снежок бутерброд |
|
11 |
взгляд волос тело голова рука |
|
12 |
зрение самооценка популярность способность активность предмет сила способность |
|
13 |
человек слеза сердце дитя женщина девушка ребенок |
|
14 |
вес занавес |
|
15 |
мужик солдат лошадь |
As we can see, compared to the previous outcome, the number of clusters has increased significantly. 15 groups of contexts were obtained. Let us carefully consider these groups. Almost all the groups are distributed relative to the meaning of the words. For instance, cluster number 2 consists of words denoting sounds. Cluster number 13 is associated with the people, and the cluster the number 11 represents the parts of the body, etc.
Thus, clusters derived from images better reflect the visual aspects of the falling, and clusters derived from texts better reflect the semantics.
Conclusion
This study was devoted to the selection of visual stimuli for lexico-typological questionnaires. A total of 3,160 images were automatically collected and analyzed. Various tests have been conducted to improve image processing and explore their visual attributes.
Namely, experiments were carried out with different meaning of the verbs of falling in different languages, color and black-and-white pictures, and the addition of text vectors.
Important theoretical generalizations were made. First, it was concluded that the initial and final moments of the fall are clearly visible in the images and are really important features from a typological point of view. Moreover, snow and water are combined with the meaning of the fall in many pictures and stand out in a single cluster. Second, it was found that contexts can rely more on semantics and common sense, whereas images can be found to have something in common regardless of meaning. For instance, it can be the shape of the object, the color scheme or the slope of the fall. More accurate, when clustering black and white pictures, greater accuracy will be achieved from an object point of view. When clustering color images, pictures and objects similar in color will be seen better.
To solve the research problem, we have developed an algorithm.
The algorithm that we propose to automatically collect visual stimuli for lexico-typological research is the following:
1. collecting a set of images,
2. computing a vector representation for every image,
3. clustering the distributed space of image vectors,
4. сombining text and image vectors,
5. comparison of the obtained clusters with the clusters obtained in the study of textual data. algorithm questionnaire lexical
This algorithm can be used in future studies to familiarize with the visual qualities of the images of the particular frame, as well as to distribute them into groups by objects or by colour schemes. The Python code that implements the algorithm can be found at https://github.com/mimimizerova/Thesis. All the images collected during this research as well as the resulting text vectors are available at the following addresses (https://drive.google.com/drive/folders/13fZpUQwhCIfHS6QlT4pOZvYtgNhZ9QFK, https://github.com/mimimizerova/Thesis).
References
1. Baroni, M., Bernardi, R., & Zamparelli, R. (2014). Frege in Space: A Program for Compositional Distributional Semantics. Linguistic Issues in Language Technologies, 9, 241-346.
2. Gella, S., Lapata, M., & Keller, F. (2016). Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 182-192). San Diego, California: Association for Computational Linguistics.
3. Koleљova, M.L. (2016). ГЛАГОЛЫ ПАДЕНИЯ В СЛОВЕНСКОМ ЯЗЫКЕ: ЛЕКСИКО-ТИПОЛОГИЧЕСКИИ? АСПЕКТ. Filoloљke studije, 14 (2), 188-201. Preuzeto s https://hrcak.srce.hr/203731
4. Kopecka A. Narasimhan B. (Eds.), Events of putting and taking: A crosslinguistic perspective (pp. 21-36). Amsterdam: Benjamins. 2012.
5. Majid A., Bowerman M. (Eds.) Cutting and breaking events: A crosslinguistic perspective [Special Issue]. Cognitive Linguistics, 18(2). 2007.
6. Majid A., Levinson S. C. (Eds.). The senses in language and culture [Special Issue]. The Senses & Society, 6(1). 2011.
7. Rakhilina, E., & Reznikova, T. (2016). 4. A Frame-based methodology for lexical typology. In P. Juvonen & M. Koptjevskaja-Tamm (Eds.), The Lexical Typology of Semantic Shifts. Berlin, Boston: De Gruyter.
8. Ryzhova, D., & Paperno, D. (2019). Constructing typological questionnaire with distributional semantic models. In E. Rakhilina & T. Reznikova (Eds.), The Typology of Physical Qualities. John Benjamins Publishing Company.
9. Silberer, C., Ferrari, V., & Lapata, M. (n.d.). Models of Semantic Representation with Visual Attributes, 11.
10. Silberer, C., & Pinkal, M. (n.d.). Grounding Semantic Roles in Images, 11.
11. Кашкин, Е.В., Жорник, Д.О., Закирова, А.Н., Кожемякина, А.Д., & Плешак, П.С. (2015). К лексичекой типологии глаголов падения: данные уральских языков. ББК 81.2 Д23, 41.
12. Кузьменко, Е.А., Мустакимова, Э.Г. (2015). Глаголы падения в лексикотипологической перспективе. In Типология морфосинтаксических параметров (pp. 149-160).
13. Орехов, Б.В., Резникова, Т.И. (2015). Компьютерные перспективы лексико-типологических исследований. Вестник Воронежского государственного университета. Серия: Лингвистика и межкультурная коммуникация, (3). С. 17-23.
Размещено на Allbest.ru
...Подобные документы
Non-reference image quality measures. Blur as an important factor in its perception. Determination of the intensity of each segment. Research design, data collecting, image markup. Linear regression with known target variable. Comparing feature weights.
дипломная работа [934,5 K], добавлен 23.12.2015Social network theory and network effect. Six degrees of separation. Three degrees of influence. Habit-forming mobile products. Geo-targeting trend technology. Concept of the financial bubble. Quantitative research method, qualitative research.
дипломная работа [3,0 M], добавлен 30.12.2015Требования к MS Office 2007. Набор средств разработки Visual Studio Tools for Office как альтернатива VBA. Разработка СУБД на базе MS Access. Разработка надстройки "Электронные компоненты" для PowerPoint на языке C# в среде MS Visual Studio 2010.
дипломная работа [5,2 M], добавлен 03.05.2013Язык программирования Visual Basic: краткая история возникновения, значение и общая характеристика. Изучение основных свойств Visual Basic, синтаксис языка. Обзор ключевых операторов Visual Basic, пользовательские процедуры и функции данного языка.
контрольная работа [36,4 K], добавлен 23.07.2014Основы языка Visual Prolog. Введение в логическое программирование. Особенности составления прологов, синтаксис логики предикатов. Программы на Visual Prolog. Унификация и поиск с возвратом. Использование нескольких значений как единого целого.
лекция [120,5 K], добавлен 28.05.2010Рождение и развитие Basic. Краткое описание Visual Basic for Applications. Новые возможности Visual Basic 5.0. Пример взаимодействия Excel и Visual Basic. Программирование табличных функций. Встраивание, применение функций. Формы, средства управления OLE.
реферат [20,7 K], добавлен 11.03.2010Программный проект Баз данных средствами Visual Basic 6.0. Проектирование структуры таблицы базы данных Visual Basic 6.0. Заполнение созданных таблиц БД исходными данными. Создание пользовательского меню. Вид формы и свойства элементов управления.
курсовая работа [3,0 M], добавлен 19.06.2010Программирование и структура программы на языке Turbo Pascal и MS Visual C++6.0. Вычисление площади круга. Реализация программы в системе Turbo Pascal и MS VISUAL C++6.0 для Windows. Структура окна ТРW. Сохранение текста программы в файле на диске.
лабораторная работа [3,7 M], добавлен 22.03.2012Характеристика мови програмування VBA (Visual Basic for Application): можливості й засоби. Використання редактора Visual Basic. Створення та виконання VBA-програм. Типи даних, змінні й константи, операції й вирази. Керуючі оператори, процедури й функції.
реферат [29,9 K], добавлен 28.06.2011Стандартные функции для работы с динамической памятью. Представление списков цепочками звеньев. Организация файлового каталога в файловой системе в виде линейного списка на языке Visual C++. Создание блок-схемы и инструкции по работе с программой.
курсовая работа [252,0 K], добавлен 22.01.2015Разработка программного продукта с помощью языка программирования Visual Basic. Описание интерфейса пользователя и возможностей программы. Исходный код основных модулей. Программа, демонстрирующая основные возможности диаграмм и среды Visual Basic.
контрольная работа [989,9 K], добавлен 29.03.2011Рабочая среда Visual Basic (VB) и ее основные компоненты. Ввод и вывод данных в VB. Объявление переменных и констант в программе. Создание и работа с процедурами и функциями, их виды. Организация ветвления в VB. Использование циклов в программировании.
практическая работа [502,5 K], добавлен 26.10.2013Тeopeтичecкиe ocнoвы paзpaбoтки Windows-пpилoжeний c иcпoльзoвaниeм библиoтeки MFC. Глoбaльныe функции AFX. Цикл cooбщeний. Coздaниe пpилoжeния c пoмoщью Visual C++. Oпиcaниe пpoгpaммнoгo пpoдуктa, основные тpeбoвaния к тexничecкoму oбecпeчeнию.
курсовая работа [733,5 K], добавлен 29.06.2011Решение экономических задач с помощью Microsoft Excel и инструментария Visual Basic For Application. Способы запуска редактора Visual Basic, правила его синтаксиса. Создание автоматических макросов по сортировке и выборке. Создание управляющих кнопок.
курсовая работа [852,0 K], добавлен 24.09.2010Основы языка программирвоания C++. Элементы управления в Microsoft Visual C++. Алгоритмические конструкции языка программирования Visual C++ и базовые элементы управления. Глобальные константы и переменные. Управление программой с помощью клавиатуры.
курсовая работа [1,7 M], добавлен 08.04.2015Сравнительная характеристика средств обучения программированию в среде Visual Basic. Задачи проектируемых автоматизированных программных систем. Комплекс технических средств. Математическое и программное обеспечение. Язык программирования Visual Basic.
дипломная работа [64,1 K], добавлен 17.05.2007Написание тестирующей программы для проверки знаний учащихся с помощью языка программирования Visual Basic for Applications (VBA), встроенного в пакет Microsoft Office. Общие сведения о программе, условия ее выполнения, настройка, проверка, выполнение.
контрольная работа [25,2 K], добавлен 07.06.2010Visual Basic for Application. Объекты и коллекции. Использование VBA в среде Access. Основы современной технологии проектирования АИС. Автоматизированное проектированиеCASE-технологий. Реинжиниринг бизнес-процессов и проектирование корпоративной ИС.
курсовая работа [2,1 M], добавлен 22.02.2008Принципы визуального программирования. Создание программы, генерирующей звук через определенные промежутки времени. Visual Basic как средство разработки прототипов программы, для разработки приложений баз данных и компонентного способа создания программ.
лабораторная работа [1,1 M], добавлен 10.12.2014Описание программного продукта Visual Studio. Возможности, преимущества и недостатки бесплатной среды программирования Sharp Develop для проектов на платформе MS.NET. Получение информации из справочной системы .NET SDK. Запуск визуального отладчика CLR.
реферат [393,4 K], добавлен 05.04.2017