Главная Коллекция "Revolution" Иностранные языки и языкознание Automatic extraction of phonemic inventory in Russian sign language

Automatic extraction of phonemic inventory in Russian sign language

Learning inventory in sign languages. Assay of the inventory of the shape of the hands and its interaction with the hands. Phonemic inventory of hand forms. Formation of a phonemic inventory for the Russian sign language. Location cash setting feature.

Рубрика	Иностранные языки и языкознание
Вид	дипломная работа
Язык	английский
Дата добавления	01.12.2019
Размер файла	4,6 M

посмотреть текст работы

скачать работу можно здесь

полная информация о работе

весь список подобных работ

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Страница:

In the present work, I will follow Nyst (2007) in annotating handshapes with HamNoSys (Hanke 2004). Furthermore, as well as Nyst (2007) does, I will look separately at active and secondary hand phonetic handshapes, and then phonemic handshapes. In addition to this, in this work I will compare RSL findings with NGT and AdaSL on both phonetic and phonemic handshape inventories and with Kata Kolok, YSL, and IUR on the phonetic handshape inventories only.

3. Methods

This section is devoted to the description of the methodology of this research. The structure of this research goes as follows. Firstly, I downloaded all videos for RSL from the Spreadthesign online dictionary with the help of API lingcorpora (Moroz et al. 2018). As a second step, these videos were manually cleaned from compounds, dactyl, numbers, and phrases. Thirdly, I wrote the program for extracting holds positions from the dataset for RSL on the basis of Bцrstell's (2018) script. I made two variations of the algorithm for hold extraction. They are compared, and advantages and disadvantages are discussed in detail. Then the accuracy of this program is estimated by manually estimating the quality of hold extraction. As the fourth step, hold positions are annotated for phonological features - handshape and handedness. The annotation is fully based on HamNoSys (Hanke 2004) transcription rules.

3.1 Dataset

The dataset for this research was composed out of the Spreadthesign dictionary videos for RSL. This dictionary has a number of limitations. Firstly, Spreadthesign provides approximately the same lists of meaning for all sign languages in its sample, and those lists are based on corresponding spoken languages, not signs. Therefore, it is often the case that some of the signs repeated more than once with different translations to the corresponding spoken languages. Secondly, some of the signs might be absent from Spreadthesign, again because of the limited set of meanings used or because there are no exact translations into the spoken language. Therefore, we do not assume that the list of signs for RSL in Spreadthesign is exhaustive.

To start with, I extracted all meanings for all semantic fields for RSL. This step resulted in the dataset with 14 875 meanings. This list of meanings was cleaned with respect to a number of parameters. First of all, some of the meanings appear more than once in the list, because they belong to more than one semantic field. So, the duplicates from the list are deleted. Secondly, a lot of meanings are translated into a sign language with compounds. Compounds are deleted too due to the fact that their parts already appear in the dataset. In this research phonology of compounds is not taken into account. Thirdly, phrases are deleted from the dataset too. This means that all meanings which consist out of two and more words are considered to be phrases and deleted. Fourthly, there is a set of signs which are derived from dactyl signs or from number signs. Those signs have almost the same phonological features as corresponding dactyl or number signs and it makes them irrelevant to describe due to the fact that the main interest for us here is to find inventory of distinct features. sign language inventory phonemic

Furthermore, there is a problem of disyllabic signs, because they have two sequential movements, and the algorithm used here aims at signs with one movement only. General idea is that the method described and used in this work can only decompose separate “words” into phonological features, such as holds. Although the majority of the signs is monosyllabic in all sign language (and RSL is not an exception here), disyllabic signs still occur and should be taken into account in further analyses. For instance, in the data for RSL in this research, disyllabic signs constitute approximately less than 0.5% of all signs.

As the next step, in order to clean out all dactyl- and number-based signs, I went through the dataset manually. In the end of this preprocessing and deletion the dataset reduced to 3727 meanings. One might think that 3727 meanings cannot possibly describe all non-compound, non-dactyl- and non-number-based unique signs in one sign language. Nevertheless, it is considered to be a normal size of a dictionary for a sign language. For example, the research by van der Kooij (2002) also exploits only 3084 signs as normal-sized dictionary of NGT.

3.2 Hold extraction

On the hold extraction step of the analysis I have tried two different algorithms of the same nature and evaluated which one of them works better for the aforementioned purposes.

In order to check how well does the code work I picked out 13 different RSL signs: `absolutely nothing', `to adapt', `to fall in love', `choice', `to breathe out', `hygienic pad', `rage', `debt', `daughter', `comet', `love', `sun', and `flower' See images of the signs in Appendix 3.. These signs differ from each other significantly. absolutely-nothing stands out because it has very rapid movement and the change appears only in aperture (i.e. there is only local movement). The path movement could be more noticeable than the local movement in general. This and the speed of the movement itself might affect the performance of the code. The to-adapt sign has both path movement and local movements simultaneously, which can also appear to be an obstacle. to-fall-in-love is, on the contrary, a sign with a simple movement: the very visible in 2D change in the orientation. choice is in most ways like absolutely-nothing, however, the video itself is very short, there is no pause neither in the beginning, before the signing, nor in the end, after. to-breathe-out is again a sign with simple movement, namely the path movement, however, this path movement goes along the z-axis, and, consequently, it is almost invisible in 2D. hygienic-pad has an aperture change, but is also goes in z-plane, and is very subtle in 2D. The sign rage has also a very rapid movement comparing to the majority of signs. debt has a repeated movement feature. Therefore, it is interesting how many holds can be automatically retrieved here. daughter has a rather frequent pattern of path movement in y-plane, which should be distinguished very clearly in 2D. The sign comet has a very long diagonal path movement, longer, than, for instance, the path movement in the sign daughter. The sign sun is similar with the sign comet in a sense that it also has a diagonal movement. However, in addition to that, it has an aperture change. The sign love has only one very visible in 2D path movement and, in general, does not pose any obstacle for hold extraction. And, finally, the sign flower depicts the trill movement feature, where it is very hard and irrelevant for the analysis to distinguish more than one hold position. Although the sample is more intuitive than consistently picked, it covers the most frequent obstacles for hold extraction and at the same time it contains a few regular and very simple cases.

The first code

As it has been mentioned above, I implemented two codes with different algorithms. First code, which will be discussed in this section, heavily relies on the Bцrstell's (2018) code. However, since it did not work with accuracy that is high enough I had to implement the second code with slightly different core logic. Both algorithms will be discussed here with their main advantages and disadvantages. The second algorithm functionality will be described in more detail in Section 6.1 of the Results.

First code, as well as Bцrstell's (2018) code does, builds a graph of frame difference over frame number for each video. Firstly, it calculates an individual histogram for each frame of the video with number of pixels of each pixel value for this frame (see Figure 15 above with the result of the calcHist() function). Then Bцrstell's (2018) code uses Correlation metric to calculate a difference between two histograms. The graph of frame differences over frame number is basically the results of this difference calculated with correlation metric on the y-axis and frame numbers on the x-axis. In my version of this program I use a different metric, namely Bhattacharyya distance.

In order to explore which metric distance works more effective on my dataset I explored a couple of examples (link to code). I took two random RSL signs from my datasets - sun and fall-in-love. For each of these signs I took three frames to compare how similar they are. Furthermore, the first picture is compared to itself. This is the test for metrics to give 100% similarity on the output. The fall-in-love sign's first two pictures are almost the same, but these are different frames / moments in time. The third picture shows different hold position. Since the goal here is to differentiate between holds of a sign this picture has to turn out very different from the first two according to our distance metric. The sign sun has a slightly different role in this experiment. With the help of its pictures we want to avoid registering irrelevant movements like eyeblinks as important changes in the video, as holds. So, the first picture has an eyeblink and the eyes of a signer are closed, while the second picture has eyes open. These two pictures should be quite close to each other in distance. The third picture supposed to be far in distance from both the first and the second pictures. So, that is how my experimental sample looks like.

Figure 20. Experimental pictures from the sign fall-in-love

Figure 21. Experimental pictures from the sign sun

OpenCV module for python programming language offers four built-in metrics to measure distances between pictures. Correlation metric can show values between `0' and `1', where `1' stands for the exact same pictures. As for the Chi-squared metric, it can exploit values from `1' up to +eternity, where `0' stands for exact same pictures. The higher intersection metric is the more similar are the pictures. Bhattacharyya metric has values between `1' and `0', where `0' stands for the exact same pictures. In addition to experimenting with metrics, I also experiment with adding normalization of the histograms before comparing them.

The results of the experiment are represented in Tables 3-6 below. I prefer to use in the end a more interpretable metric, like correlation or Bhattacharyya, because this kind of metrics has a set range of values and it makes it easier to interpret the resulting values. Normalization operation does not change anything when applied with these two metrics, so it is obviously only useful with chi-squared metric which gives wrong results without it. So, I am not going to use normalization, as it is not a game changer with interpretable metrics, correlation and Bhattacharyya. Furthermore, one can notice, that correlation and Bhattacharyya have rather similar results. They both work perfectly well with the sign sun, thus, they do not produce false positives for eyeblinks. However, when it comes to the sign fall-in-love, the difference between distances with Base-Test (1) and with Base-Test (2) is bigger for Bhattacharyya distance, than for correlation metric (0.0296 compared to 0.008 respectively). Since I need a sharper difference between hold positions, in further analysis I use the Bhattacharyya metric.

Table 3. sun with normalization

	Same picture	Base-Test (1)	Base-Test (2)	Comments
Correlation	1.0	0.9908	0.9200	+ The same number as without normalization
Chi-squared	0.0	1.3133	12.3889	+ The difference is big enough
Intersection	59.3509	53.3232	35.6612	+ The difference is big enough
Bhattacharyya	0.0	0.0474	0.1418	+ The same number as without normalization

Table 4. fall-in-love with normalization

	Same picture	Base-Test (1)	Base-Test (2)	Comments
Correlation	1.0	0.9842	0.9786	- too small difference between Base-Test(1) and Base-Test(2)
Chi-squared	0.0	2.4601	4.0901	- too small difference between Base-Test(1) and Base-Test(2)
Intersection	59.1680	51.5206	47.8485	+ The difference is big enough
Bhattacharyya	0.0	0.0672	0.0968	+ The difference is big enough

Table 5. sun without normalization

	Same picture	Base-Test (1)	Base-Test (2)	Comments
Correlation	1.0	0.9908	0.9200	+ The difference is big enough
Chi-squared	0.0	1343.9209	11841.3266	- Wrong result
Intersection	76800.0	73408.0	64931.0	+ The difference is big enough
Bhattacharyya	0.0	0.0474	0.1418	+ The difference is big enough

Table 6. fall-in-love without normalization

	Same picture	Base-Test (1)	Base-Test (2)	Comments
Correlation	1.0	0.9842	0.9786	- too small difference between Base-Test(1) and Base-Test(2)
Chi-squared	0.0	2958.5602	4112.5016	- Wrong result
Intersection	76800.0	71844.0	71166.0	- too small difference between Base-Test(1) and Base-Test(2)
Bhattacharyya	0.0	0.0660	0.0968	+ The difference is big enough

As the next step Bцrstell (2018) uses continuous wavelet transform (CWT). CWT is usually applied when the aim is to extract relevant peaks from the very noisy data. It was initially invented in bioinformatics for working with mass spectrometry data which happens to have a lot of noise (Pan Du et al. 2006), and then it was applied in phonetics analysis in linguistics (Vainio et al. 2013). CWT essentials are very similar to the fast Fourier transform. It also goes through signal with a window of a particular size which can be adjusted. But the window in CWT is also a signal - wavelet. The wavelet is a small signal of particular shape. The shape of the wavelet should somehow reflect an expected shape of relevant peaks. Then the wavelet can locate them with very low false positive rate, i.e. omitting all the noise. Python has a Scipy module implementation for finding peaks with CWT - scipy.signal.find_peaks_cwt() function. This function is used in Bцrstell's (2018) code which, as a reader can recall, works with SSL and NGT datasets from a different corpus. find_peaks_cwt() function's most important parameter is width. This parameter needs to be adjusted to each dataset individually, which is also pointed out by Bцrstell (2018). At each value of width from the set range of widths the convolution is performed between the wavelet itself and the signal in its window. Scipy gives an opportunity to give a range of widths. Ordinarily a widths range starts with some small value, because it helps to localize the peaks better, and ends with values big enough to find the relevant peaks on the dataset which I used. So, this range of widths is always hardcoded. After a long process of hardcoding these figures on my dataset, it turned out that this method is not suitable for my videos, probably because they have rather inconsistent structure. Spreadthesign videos differ from each other in a sense that a sign is performed sometimes right at the beginning of a video, sometimes there is gap at the beginning where a signer just stands and blinks, and at the end of the video there is a sign performed, and sometimes there are no gaps at all. Moreover, some signs are performed very fast and some very slow. I hypothesize that this is the reason why CWT does not work very well on my data. The noise, like eyeblinks or other irrelevant movements, occurs in different parts of a video for different videos, so it is not consistently spread over the signal. Or it occurs, for instance, only at the beginning of a video. In addition to that, CWT might be not the most proper choice in general in terms of what this algorithm was initially invented to do, because in videos with sign language there is no noise that overlaps with signing, i.e. with relevant peaks of signal. What can be considered to be noise in case of this type of signal are eyeblinks, very distinct mouthing, slight accidental moves of a signer's body, and very distinct non-manual components. Only very distinct mouthing can have a significant reflection in a signal due to the fact that algorithm measures difference between pictures to build a signal, and mouth of a signer takes up very small number of pixels, so that in order to appear in a signal the mouth has to be open rather wide. RSL dataset in the current research does not have a lot of distinct mouthing, so as a lot of noise in signals of its videos.

Having elaborated on CWT idea, let us turn to what was used instead of CWT in the first algorithm. Scipy module has also a function which estimates the prominence of peaks in a given signal (signal.peak_prominences()). After looking into how well each of the parameters - peaks heights, peaks widths, peaks prominences - describes the relevant peaks in the data, it turned out that prominences absolutely coincide with holds when we take only those peaks which prominences are higher than 3 standard deviations. That number was established experimentally: I looked at X signs, build for each of them a graph of signal with estimated peaks. When I took prominences, which are higher than only one or two standard deviation, it resulted into plenty of irrelevant peaks, whereas 3 standard deviations gave an optimal number of peaks for each sign in the test set.

Figure 22 shows the result of searching for relevant peaks in the sign fall-in-love from the dataset. There are two peaks which refer to two holds. Two frames which contain these holds are depicted on Figure 23. The result reflects correctly the only two hold positions of this sign, which has only one simple movement (orientation change).

Figure 22. Key peaks for a sign fall-in-love, RSL

Figure 23. Automatically estimated holds for the sign fall-in-love, RSL

As a last step here, I evaluated the accuracy of this algorithm by estimating how well it defined holds in the first 493 signs. Since the list was organized in the alphabetical order of the translations, the sample of the first 493 is random with respect to phonology of RSL. I was not looking for a precise frame, the way to estimate whether the holds are estimated correctly was the following: if I can imagine what the sign looks like with movement, and this coincides with the real video, then the holds are estimated correctly. While this evaluation technique is rather subjective, it is almost the only accessible one for estimating accuracy on the large number of signs. Another possibility is to annotate videos for precise moments in time where a hold position has occurred with two or more independent annotators with the command of RSL on at least a very basic level. This technique is more reliable; however, hold position is not a dot in time or a very precise interval of the video. It is rather unclear how to estimate boarders of the hold positions, so that results of the annotators will likely not match. The more simple technique result is depicted in Table 7 below.

Table 7. Accuracy of the first code

140	works
353	does not work
140/493	= 28.4 % accuracy

The accuracy of this algorithm turned out to be only 28.4%, which is obviously not enough. So, the first algorithm worked very well on the test sample, but on the larger dataset it turned out to be not effective enough.The second code

Since the first code does not have a high accuracy, I came up with another algorithm which works better on my dataset.

The first idea is to cut out frames which do not have hands moving. This can be done with the help of simple movement tracking. The idea is taken from pyimage tutorial about movement detection (Rosebrock 2015). Firstly, we assume that the very first frame of the video does not have any movement. In most of the cases first frame is a setting image, on which a signer stands still with hands not moving. Then a couple of frames later hands enter the frame and start moving. In order to detect hands movement we calculate absolute difference (absdiff() function) between the first frame and every next frame. After that we search for contours of moving objects on the video with the help of findContours() function, then we grab these contours (imutils.grab_contours()), so that we can iterate through them and define which contours describe hands objects and which contours are too small and irrelevant. The minimal size of a contour which describes hands objects (contourArea() function) is hardcoded on the data. In our case the contour should exceed 9000. Iterating through frames we get two possible outputs: either the frame does not have a moving object (there is no contour which is bigger than minimal contour area), i.e. the frame is not occupied, or the frame has a moving object, i.e. is occupied. The example of how a moving object, i.e. hand, can be detected with a contour of more than a minimal contour size is represented on Figure 24 below.

Figure 24. One of the occupied frames for a sign sun with a moving hand in a contour

Unoccupied frames cannot contain hold positions, because there are no hands moving, so we cut out all unoccupied frames. 31 times out of 517 our occupied() function does not perceive any frames as occupied, because the signing starts right away. In this case no frames are cut from the video, and all frames are considered as occupied. In addition, in order to preserve the last peak, we add two more frames in the end, so that the last peak does not disappear because of the border effects. All in all, this way I eliminate the possibility of setting images being picked as key frames with relevant peaks.

After cutting out all unoccupied frames, I again make histograms for each (occupied!) frame and then measure distances between these histograms, again with Bhattacharyya distance. As a result, I get a little bit shorter signal frame difference over frame (see Figure 25 on the left). However, those signals have a lot of rapid peaks and a big dispersion. It is reasonable to smooth a signal before looking for key peaks there. In order to do so I use moving average. Moving average is a smoothing method which rolls through a signal with a window of set size and calculates an average on each iteration, therefore convolving a signal.

However, before applying moving average we need to decide on the frame size. I experiment with values from 1 to 5 (because the signal is very short) and see how many peaks can be detected with each of these values, and whether the number of peaks corresponds with the real number of holds in a particular sign. These are so-called elbow graphs. As it can be seen from Figure 26, most of the signs have two peaks regardless of the window size of the moving average. The sign to-adapt in reality has two holds. Two peaks for this sign are found only with a window of size 2. The sign choice has only one hold. One peak for this sign is established with a window of size 2 or 4. The sign to-breathe-out has in reality two holds. Two peaks are estimated for this sign with a window of size 2 or 3. Thus, the window of size 2 is the most suitable choice.

If we apply the moving average with a window of size 2, as a result, we get the graph of a signal as the one on Figure 25 on the right. As we can see, the dispersion of the signal is much less after the smoothing.

Figure 25. Frame differences over frame number for fall-in-love sign before (left) and after (right) smoothing with moving average

Figure 26. Number of peaks over the window size. (Left to right downwards: `absolutely nothing', `to adapt', `to fall in love', `choice', `to breathe out', `hygienic pad', `rage', `debt', `daughter', `comet', `love', `sun', `flower')

After smoothing signals, we proceed to retrieving the relevant peaks. The signal here is two short and does not have almost any noise at all. Sometimes there even only one peak in the whole signal. Thus, it is pointless to use CWT with this kind of signal. Prominences, as in the first code, do not show proper results either. Due to the fact that all irrelevant frames are cut out of the signal and it is very well smoothed with the help of the moving average, it is enough to simply take two highest peaks. As data exploration have shown disyllabic signs are very rare, only about 0.5% of cases (estimated on the first 493 signs), and monosyllabic signs usually have no more than two holds, more seldom only one hold. The only case where monosyllabic signs can have more than two holds is when there is a complex movement, and this complex movement consists of the path movement + something else. Consequently, this algorithm simply gets two highest peaks. When there is only one peak in the whole signal, then it takes only one peak. When very short video is smoothed, sometimes there are no peaks. If this happens, we just take one frame from the middle as a peak.

In the end of this process, I manually evaluated whether holds are estimated correctly for the first 493 signs. Again, as with the first code, they were organized alphabetically, so that the sample of signs is random. In 378 cases the holds were estimated correctly. This gives us an accuracy of 76.7%.

Table 8. Accuracy of the second code

378	works
115	does not work
378/493	= 76.7 % accuracy

This algorithm was used to extract holds for all 3727 signs in the dataset. After that I went through all pictures in order to delete the ones which were obviously not hold positions in the corresponding signs. This resulted in a sample of 5189 pictures with hold positions.

3.3 Annotation

As have been said before, in the output of the program and after manual cleaning of erroneous holds I got 5189 images with holds. All of the images with holds were annotated for a handshape. In order not to annotate each parameter of the Prosodic model for a handshape separately I used HamNoSys notation of handshapes (Hanke 2004). In Appendix 4.1 a reader can find a table with images and numbers of all handshapes used (Hanke 2010). The list by Hanke (2010) is not exhaustive, so I added 11 more handshapes to this list (see Appendix 4.2). There HamNoSys notations and their numbers are provided without pictures. When describing them in the Results section, I provide examples of real RSL signs with these handshapes.

In terms of annotation, I am following Nyst's (2007) description of AdaSL phonology. However, compared to Nyst (2007) I did not take into account the tense vs. lax property of handshapes. van der Kooij (2002) does not account for this distinction for NGT either. For RSL, it might be reasonable to take this distinction into account when describing flat, all fingers extended handshapes (see Section 6.4). In addition to that, the annotation is, of course, divided into active and secondary hand, so that we can account for handedness.

Furthermore, 493 first signs (alphabetically first => random sample) were also additionally annotated for movement type. Movement types were annotated with respect to Mak & Tang's (2011) research. Both hands were annotated for the number of parameters separately (all parameters are binary): path, orientation, aperture, full repeat (movement has a particular end state), return, trill (does not mix with any other types). For instance, the sign fall-in-love has a movement in a wrist (see Figure 23 above), i.e. orientation movement, therefore, it receives `1' in orientation column and `0's in all other columns. Each sign also was annotated for whether the movement is complex or simple, and whether it is two-handed or one-handed. In addition to that, there is a column with the number of frames between holds, so-called length of movement. It was essential to annotate at least a subset of the data in order to see whether this length of the movement interact with the type of the movement, annotation of the whole dataset was not necessary for this purpose.

4. Results

This section is devoted to the discussion of the results and the analysis of the data on the output of the extracting holds program. In 6.1 I elaborate on algorithm functionality, discuss its advantages and disadvantages, and how it can be upgraded further. In 6.2 I present the fact that there turned out to be no statistically relevant interaction between the type of the movement and the length of the movement in frames. Section 6.3 is dedicated to the description of the phonetic handshapes that occur in RSL. In 6.4 I group phonetic handshapes into phonemic handshapes following van der Kooij (2002) analysis. Section 6.5 compares phonemic inventory of RSL with NGT and AdaSL and compares main conclusions from Nyst (2007) and van der Kooij (2002) with the results for RSL. In 6.6 the applicability of phonological frameworks, in particular the Prosodic model, to RSL is discussed.

4.1 Algorithm functionality

The hold extraction algorithm has its limitations which are going to be described in this section.

Starting with the advantages, this program has an accuracy of 76.7% as it has been shown before. It can work with separate videos of monosyllabic signs, but videos should not be too short, it is preferable to have a couple of frames before and after an actual signing. The background of all of the videos should be still, because the algorithm detects all moving objects on the video.

There is a number of drawbacks, which can be improved in the future. First of all, this algorithm does not account for either disyllabic signs or compounds. There are not so many disyllabic signs in RSL in general (less than 0.5% of all lexis), whereas compounds constitute a large group of signs (at least according to Spreadthesign) and are rather frequent. Secondly, sometimes it picks a wrong frame as a hold, which leads to mistakes in the annotation, because some middle frame is annotated and not the real hold. However, this appears to be a rather rare case and only in signs with complex movement. Thirdly, sometimes when there is only one hold, it perceives the raising of the hand to the initial signing position as a hold too. Finally, another problem in this area is that when a sign with one hold is too short, then the moving average can smooth this signal so much that there will be no peak at all.

One of the ways to solve most of these issues is to use openPose module for python. OpenPose module builds a skeleton of human's body, hands, and all finger joints, and this skeleton moves throughout the video as human moves. With the help of the openPose we can register complex movements; the problem of retrieving wrong holds can be eliminated, because we are not retrieving holds anymore; it does not matter if the signal is too short, because we neither work with the signal, nor smooth it; disyllabic signs and compounds can be taken into consideration too. However, this method raises a number of different issues. There is still a need to implement some movement tracking to distinguish between different holds and potentially between different signs. This method leaves an open question of segmenting signing process into separate signs. In addition to that, it is very computationally expensive and takes up a lot of time and memory because it is machine learning based. Therefore, for now, the signal processing method seems more reasonable, because it can help to distinguish more linguistic information in a more time and memory efficient way.

Next sections are devoted to statistical analysis of the annotated holds, which were extracted with the help of this algorithm.

4.2 No interaction between movement type and length of signing

In previous sections I hypothesized that some movement types might be determined by the length of movement (expressed as the number of frames). In order to prove or reject this hypothesis I annotated 493 signs for movement type with respect to Mak & Tang's (2011) movement type classification and 510 for whether the movement is simple or complex.

To begin with, I am going to describe the frequency of each movement type. 71% of signs have complex movement, meaning that there is more than one movement. For example, the sign sun has complex movement: there is a change of aperture (from closed to open) plus there is a path movement. Whereas fall-in-love has simple movement - orientation change. “Complexity” of movement does not interact with handedness or any other parameters (see Table 9). Both one-handed and two-handed signs represent complex movement pattern in about 70% of cases (to be precise it is 70% for one-handed and 71% for two-handed). The same situation holds for asymmetric (69%) and symmetric (72%) two-handed signs.

This table additionally gives us information on the number of signs in RSL by handedness. There are more two-handed signs in RSL than one-handed, 67% and 33% respectively. Among two-handed signs there are more symmetric signs than asymmetric ones (64% vs. 36%).

Table 9. Handedness movement complexity independent

	one-handed	all two-handed	asymmetric two-handed	symmetric two-handed
simple movement	49 (30%)	98 (29%)	38 (31%)	60 (28%)
complex movement	112 (70%)	234 (71%)	83 (69%)	151 (72%)

The other columns will be analyzed conjointly. I concatenated path, orientation, aperture, full repeat, return, and trill columns for both hands and counted how many values are of each type. It resulted in 74 types of signs by movement, however, here I am going to elaborate only on the first 15 types due to the fact that they describe 62% of the whole dataset. Table 10 represents these 15 types. It takes into account handedness the following way: if the sign has `1' in `Second hand' column, but values for the second hand are absent, then it is a two-handed symmetric sign; if values for the second hand are there, then it is a two-handed asymmetric sign; and, finally, if `Second hand' value is `0', then it is a one-handed sign.

The first thing that pops up in this table is that there is a gap in frequencies after the first six types, and then the frequency value declines steadily. Moreover, first six types describe 39% of the whole dataset (of 493 signs). So, they appear to be the most popular movement types, and I describe them first. The most frequent movement pattern is a path movement with a symmetric hand (see sign inspiration). The second place is taken by two-handed symmetric sign with a trill movement (see sign sparkle). Then there is a two-handed asymmetric sign, where the active hand performs a path movement, while the secondary hand does not move at all (see sign alternative). The fourth movement type is a one-handed sign with a trill movement (see sign indifference). The fifth by frequency is two-handed symmetric again, but now with a complex movement: path movement plus full repeat plus return (see sign swimming-pool). And, the sixth sign is one-handed with a path movement (see sign mother).

Other signs, from the seventh to the 15th represent the following movement patterns: 7) two-handed symmetric sign with a path movement plus an aperture change (see sign sun); 8) one-handed sign with the combination of path movement, full repeat, and return (see sign broth); 9) two-handed symmetric sign with an orientation change (see sign fall-in-love); 10) two-handed asymmetric sign, where the secondary hand does not move and the active hand has path movement, full repeat, and return (see sign to-worry); 11) two-handed symmetric sign with a path movement and an orientation change (see sign trousers); 12) two-handed symmetric sign with an orientation change and a full repeat (see sign drum); 13) two-handed asymmetric sign, where the secondary hand does not move at all and the active hand represents a path movement and an aperture change (see sign to-rent); 14) two-handed symmetric sign with a path movement and a full repeat pattern (see sign run); 15) one-handed sign with an orientation change (see sign instinctive).

Table 10. Signs by movement types frequency

Path	Orientation	Aperture	Full repeat	Return	Trill	Second hand	Path2	Orientation2	Aperture2	Full repeat2	Return2	Trill2	Frequency
1	0	0	0	0	0	1							41
0	0	0	0	0	1	1							33
1	0	0	0	0	0	1	0	0	0	0	0	0	30
0	0	0	0	0	1	0							30
1	0	0	1	1	0	1							29
1	0	0	0	0	0	0							28
1	0	1	0	0	0	1							17
1	0	0	1	1	0	0							16
0	1	0	0	0	0	1							14
1	0	0	1	1	0	1	0	0	0	0	0	0	13
1	1	0	0	0	0	1							13
0	1	0	1	0	0	1							10
1	0	1	0	0	0	1	0	0	0	0	0	0	10
1	0	0	1	0	0	1							10
0	1	0	0	0	0	0							10

Now after discussing all the basic statistics of the movement types, we can turn to the length hypothesis. Table 11 shows the mean values of length for each type of movement. Notice that the mean value of length in general is 10.88 frames, while the range of length is from 0 to 42. As we can see, all mean values differ from the general mean value of length by less than 6%. Therefore, we can conclude that length does not determine movement types separately. Furthermore, if we use concatenated columns and count correlation ratio between concatenated movement types, the correlation is only 0.43. In order to check whether these values are high not by accident, I repeated the same procedure, but only with 15 top frequent types (from the Table 10 above), and then with 6 top frequent values. The results were 0.31 and 0.18 respectively. Since correlation ratio declined when using only the most descriptive categories of movement types, it means that there is no statistically significant correlation between the movement type and the length of movement at all.

Table 11. Mean values of length for each movement type

complex	simple
11.41	9.65
path	not path
11.03	10.63
orientation	not orientation
9.44	11.39
aperture	not aperture
12.25	10.60
repeat	not repeat
11.67	10.54
return	not return
13.2	10.29
trill	not trill
11.36	10.73

4.3 Phonetic inventory of handshapes All the pictures with HamNoSys handshapes used in 5.3 and 5.4 belong to Hanke (2010).

The analysis on the current dataset has shown that there are 116 phonetic handshapes in total. 115 of them occur on the active hand, while only 103 can occur on the secondary hand. There is only one handshape which occurs only on the secondary and never occurs on the active hand - №106. Fourteen handshapes can occur on the active hand, but not on the secondary hand: 128, 5, 138, 19, 48, 27, 161, 165, 39, 170, 60, 120, and 127 (see Figure 28).

Figure 27. Handshape 106

Figure 28. Handshapes that occur only on the active hand (left to right, downwards: 128, 5, 138, 19, 48, 27, 161, 165, 39, 170, 60, 120, and 127)

As the next step of the analysis, I compare top-15 handshapes for each type of sign by handedness for active hand and for secondary hand. The active hand top-15 handshapes are specified separately for one-handed signs, for two-handed signs in general, for two-handed asymmetric signs, and for two-handed symmetric signs. The secondary hand top-15 handshapes are calculated for all two-handed signs. As it can be seen from the Table 12, the active hand top-15 handshapes are quite similar for all types of signs. The most frequent handshape is 14 (i.e. “1”-handshape), and the second most frequent handshape is 71 (i.e. tense “B”-handshape with thumb extended). The first two positions hold for all four types of active hands. Handshapes 69 (i.e. “B”-handshape) and 87 (i.e. “5”-handshape) take places from third to fifth in all types of active hands. In addition to that, handshape 1 (i.e. “A”-handshape) is in top-6 for all types of active hands, and handshape 24 is in the top-8. Then there are some slight differences. Active hand in asymmetric two-handed signs shows more preference for handshape 3 (a fist with an extended thumb), as this handshape takes the third place by frequency. It also does not have handshape 144 (i.e. “O”-handshape) in the top-15, whereas other three types of active hand do. There are some other handshapes which occur very frequently on the active hand in more than one type of a sign with the active hand: 18 (i.e. one finger selected and flattened), 2 (i.e. fist with thumb covering up other four fingers), 75 (i.e. four fingers non-spread flattened, thumb extended), 97 (i.e. all fingers spread and bent, thumb opposed), 103 (i.e. all fingers spread and middle finger flattened), 137 (i.e. all fingers non-spread flattened with thumb opposition, thumb is touching other fingers), 34 (i.e. “U”-handshape), 96 (i.e. all fingers spread and bent, thumb extended), and 13 (i.e. index finger selected and extended) (see pictures of these handshapes in the Appendix 4). Secondary hand in general differs from active hand, namely there are more handshapes with all fingers extended. The most frequent handshapes of this kind are: 69 (i.e. “B”-handshape), 71 (i.e. tense “B”-handshape with thumb extended), 87 (i.e. “5”-handshape), 96 (i.e. all fingers spread and bent, thumb extended), and 70 (i.e. “B”-handshape with thumb on the palm).

Table 12. Top-15 frequent handshapes with respect to handedness

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
one-handed active hand	14	71	69	87	144	1	24	21	51	3	34	13	18	97	103
two-handed active hand	14	71	87	69	1	144	24	34	3	2	97	137	75	96	18
two-handed asymmetric active hand	14	71	3	69	87	1	34	24	75	59	137	97	103	73	2
two-handed symmetric active hand	14	71	87	69	1	144	24	2	96	34	18	97	13	3	137
secondary hand	69	71	14	87	1	144	24	2	3	13	34	96	70	18	137

All in all, the top-5 handshapes for both types of hands are represented in Table 13 below. The first five most frequent handshapes describe 42.8% of the whole dataset. The most frequent handshape is 71 (i.e. tense “B”-handshape with thumb extended).

Table 13. Top-5 frequent handshapes

	1	2	3	4	5
handshape	71	69	14	87	1
frequency	11%	10.1%	9.8%	7.3%	4.6%

4.4 Phonemic inventory of handshapes

After establishing phonetic inventory, I use van der Kooij (2002) methodology in defining phonemic handshapes. Recall that there are two main rules for establishing phonemic handshapes in van der Kooij's (2002) model: 1) phonemic handshape cannot be predicted from phonetics; 2) phonemic handshape cannot be iconic. If a handshape is iconic or phonetically predicted in some, but not all cases, it is phonemic. Application of these rules to RSL inventory of phonetic handshapes resulted into 23 phonemic handshapes. This is the list of all phonemic handshapes for RSL in the order of decreasing frequency (equivalent images and/or HamNoSys formulas can be found in Appendix 4): 14, 87, 71, 1, 144, 3, 34, 21, 51, 75, 137, 59, 30, 68, 110, 125, 111, 136, 79, 29, 61, 53, and 83.

The most frequent phoneme is 14 (i.e. “1”-handshape). It has four allophones - handshape 13, 18, 15, and 19. The only difference between 14 and 13 is a position of a thumb: in 13 it is neutral, on the side of three non-selected fingers, while in 14 a thumb is covering non-selected fingers. I hypothesize that position of the thumb in a realization of this phoneme can be phonetically motivated or dictated by a signer's preference. In order to check this there should be recorded more videos of the same signs with different signers. For now I have only one video per sign, therefore I cannot account for signer's preferences. The thumb has a different role comparing to other fingers, for instance, it is not discussed as a selected finger. Furthermore, its position is not phonologically distinctive not only in RSL, but in many other sign languages too (Schuit 2014), such as IUR (Schuit 2014), for example. As for the other allophones, handshape 18 is the same as 14, but with index finger flattened. This handshape is phonetically motivated. Consider the sign voice (see Figure 29 below), where a signer points at her throat, and it takes more joints to point at the throat with 14 handshapes, than with 18. In addition to that, there are allophones 15 and 19, both of them differ from other allophones due to the fact they have an extended thumb. Handshape 19 differs from 15 because its selected finger is flattened. Handshape 15 is almost always iconic (e.g. in a sign to-threaten this handshape refers to a gun). It can also appear in initialized signs which start in spoken Russian with letter “Г”: warranty, truck, etc. Handshape 19 occurs only in one sign - geometry - and is phonetically predicted. It is basically handshape 15 but pointing down, and it is easier to point down with just flattening of the selected finger, without any movements in farther joints. All in all, phoneme “1”-handshape with allophones 18 and 13 has a total frequency of 23%.

Figure 29. voice, RSL

Handshape 71 (i.e. “B”-handshape) is phonemic. Handshapes 69, 70, and 72 are allophones of this phoneme. Handshapes 71 and 69 are the most frequent phonetic handshapes in general, 71 occurs in 11% of signs in the dataset, while 69 occurs in 10%. I postulate here that 71 is a phoneme and 69 is its phonetic realization only due to the fact that 71 is a little bit more frequent than 69. Allophones 69 and 71 differ from each other only in the position of the thumb. However, in many cases these handshapes are not iconically motivated and not predicted from phonetics. This variation is explained only by individual difference. In our dataset there are two signs to-balance and balance (Figures 30-31). The difference between nouns and verbs in sign languages is usually depicted with the help of reduplication or some movement patterns, this morphological derivation cannot imply a change in a handshape. However, here we have handshape 69 in the verb to-balance and handshape 71 in the noun balance. Notice that these signs are shown by different signers. So, this difference can be explained by individual preferences. And, on the basis of this pair I propose that handshapes 69 and 71 represent the same phoneme. Handshapes 70 and 72 are also allophones of this phoneme. Handshape 72 is predicted from phonetics (in the sign credit the passive hand has finger in front of the palm, because if it was extended, it would have hindered the movement of the active hand. See Figure 32.), while handshape 70 is iconically motivated, referring to some flat object (e.g. it refers to long flat ears of a donkey in a sign donkey).

Figure 30. to-balance, RSL Figure 31. balance, RSL

Figure 32. credit, RSL

As it was mentioned in Section 5, this dataset has not been annotated for tense/lax distinction of handshapes, although this distinction seems to play an important role for “B”-handshapes. I propose that lax “B”-handshape (i.e. lax 71) is an allophone of “B”-handshape (i.e. 71). Interestingly, in AdaSL (Nyst 2007) lax “B”-handshape is not an allophone of tense “B”-handshape, but another phoneme. On the contrary, in IUR (Schuit 2014) tenseness is not phonemic as well as in RSL. Lax “B”-handshape is found in two types of signs: iconic signs and in two-handed asymmetric signs on the secondary hand, which is not moving and serves as a setting of movement of the active hand. Since lax “B”-handshape can be only iconically motivated or restricted to a certain type of signs, it cannot be phonemic. Therefore, it is an allophone of handshape 71. The first type of signs with lax “B” is depicted on Figure 33 in the sign wind. Handshape 69 is lax here, because it represents a gusty flow of a wind, and a flow of a wind does not have a strict shape in space, so that this handshape is a little bit abstract. In comparison a sign hall (Figure 34) (almost a minimal pair, the movement is different) has tense 69 handshape, which is also iconic, because hall has a strict shape, and its walls are straight and parallel. A sign to-earn (Figure 35) belongs to the second group, asymmetric two-handed signs with lax “B”-handshape on the secondary hand. This sign has a lax 69 handshape in its secondary hand, and this hand is not moving. All in all, tenseness in RSL appears to create allophones.

Figure 33. wind, RSL Figure 34. hall, RSL

Figure 35. to-earn, RSL

Handshape 87 (i.e. “5”-handshape) is also a very frequent one. It has two iconically motivated allophones: 85 (thumb non-spread) and 86 (thumb on th...

Страница:

дипломная работа "Automatic extraction of phonemic inventory in Russian sign language" скачать

Подобные документы

Intercultural communication of Russian and English languages
Loan-words of English origin in Russian Language. Original Russian vocabulary. Borrowings in Russian language, assimilation of new words, stresses in loan-words. Loan words in English language. Periods of Russian words penetration into English language.

курсовая работа [55,4 K], добавлен 16.04.2011
Comparison of nouns in English and Russian languages
The case of the combination of a preposition with a noun in the initial form and description of cases in the English language: nominative, genitive, dative and accusative. Morphological and semantic features of nouns in English and Russian languages.

курсовая работа [80,1 K], добавлен 05.05.2011
The system of English verbs
The area of the finite verb including particular questions tense, aspect and modal auxiliary usage. The categories of verb morphology: time, possibility, hypothesis, desirability, verb agreement. American sign language and the category of voice.

курсовая работа [41,3 K], добавлен 21.07.2009
Intensification of learning a foreign language using computer technology
Theoretical foundation devoted to the usage of new information technologies in the teaching of the English language. Designed language teaching methodology in the context of modern computer learning aid. Forms of work with computer tutorials lessons.

дипломная работа [130,3 K], добавлен 18.04.2015
Teaching English speaking at the beginning stage
Principles of learning and language learning. Components of communicative competence. Differences between children and adults in language learning. The Direct Method as an important method of teaching speaking. Giving motivation to learn a language.

курсовая работа [66,2 K], добавлен 22.12.2011
The history of the english language
The influence of other languages and dialects on the formation of the English language. Changes caused by the Norman Conquest and the Great Vowel Shift.Borrowing and influence: romans, celts, danes, normans. Present and future time in the language.

реферат [25,9 K], добавлен 13.06.2014
The British language teaching
The origins of communicative language teaching. Children’s ability to grasp meaning, creative use of limited language resources, capacity for indirect learning, instinct for play and fun. The role of imagination. The instinct for interaction and talk.

реферат [16,9 K], добавлен 29.12.2011
The comparative typology of English, Russian and Uzbek languages
Investigating grammar of the English language in comparison with the Uzbek phonetics in comparison English with Uzbek. Analyzing the speech of the English and the Uzbek languages. Typological analysis of the phonological systems of English and Uzbek.

курсовая работа [60,3 K], добавлен 21.07.2009
Gender and age peculiarities of the language and some linguistic difficulties of translation them in practice
Study of lexical and morphological differences of the women’s and men’s language; grammatical forms of verbs according to the sex of the speaker. Peculiarities of women’s and men’s language and the linguistic behavior of men and women across languages.

дипломная работа [73,0 K], добавлен 28.01.2014
Bases of English grammar
Features of the use of various forms of a verb in English language. The characteristics of construction of questions. Features of nouns using in English language. Translating texts about Problems of preservation of the environment and Brands in Russian.

контрольная работа [20,1 K], добавлен 11.12.2009
The concept and feature of literary translation
The lexical problems of literary translation from English on the Russian language. The choice of the word being on the material sense a full synonym to corresponding word of modern national language and distinguished from last only by lexical painting.

курсовая работа [29,0 K], добавлен 24.04.2012
English proverbs and sayings with a component "Pets and other animals" and their Russian equivalents
The functions of proverbs and sayings. English proverbs and sayings that have been translated into the Russian language the same way, when the option is fully consistent with the English to Russian. Most popular proverbs with animals and other animals.

презентация [3,5 M], добавлен 07.05.2015
Gender discourse in modern English and Russian belles-letters
Theories of discourse as theories of gender: discourse analysis in language and gender studies. Belles-letters style as one of the functional styles of literary standard of the English language. Gender discourse in the tales of the three languages.

дипломная работа [3,6 M], добавлен 05.12.2013
English idioms and their Russian equivalents
The Importance of Achieving of Semantic and Stylistic Identity of Translating Idioms. Classification of Idioms. The Development of Students Language Awareness on the Base of Using Idioms in Classes. Focus on speech and idiomatic language in classes.

дипломная работа [66,7 K], добавлен 10.07.2009
Types of subordinate clauses in the English language
Definition and classification of English sentences, their variety and comparative characteristics, structure and component parts. Features subordination to them. Types of subordinate clauses, a sign of submission to them, their distinctive features.

курсовая работа [42,6 K], добавлен 06.12.2015
Methods of foreign language teaching
Description of the basic principles and procedures of used approaches and methods for teaching a second or foreign language. Each approach or method has an articulated theoretical orientation and a collection of strategies and learning activities.

учебное пособие [18,1 K], добавлен 14.04.2014
English football lexis and its influence on Russian
The history of football. Specific features of English football lexis and its influence on Russian: the peculiarities of Russian loan-words. The origin of the Russian football positions’ names. The formation of the English football clubs’ nicknames.

курсовая работа [31,8 K], добавлен 18.12.2011
Establishing and development of the theory of translation as a science in the XX century
History of interpreting and establishing of the theory. Translation and interpreting. Sign-language communication between speakers. Modern Western Schools of translation theory. Models and types of interpreting. Simultaneous and machine translation.

курсовая работа [45,2 K], добавлен 26.01.2011
Blends in the System of English Word-Formation
The general outline of word formation in English: information about word formation as a means of the language development - appearance of a great number of new words, the growth of the vocabulary. The blending as a type of modern English word formation.

курсовая работа [54,6 K], добавлен 18.04.2014
Teaching sentence structure
The best works of foreign linguists as Henry I Christ, Francis B. Connors and other grammarians. Introducing some of the newest and most challenging concepts of modern grammar. The theoretical signifies are in comparison with Russian and Uzbek languages.

курсовая работа [50,3 K], добавлен 21.07.2009

Другие документы, подобные "Automatic extraction of phonemic inventory in Russian sign language"

весь список подобных работ

скачать работу можно здесь

сколько стоит заказать работу?

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.