Automated error detection and correction system for learner essays in English produced by students with Russian native language

Technologies for computer learning of the English language. Development of a system for detecting and correcting errors, capable of simultaneously processing them and creating coherent text. Solving problems of language processing related to the context.

Рубрика Иностранные языки и языкознание
Вид курсовая работа
Язык английский
Дата добавления 14.08.2020
Размер файла 48,4 K

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.Allbest.Ru/

Размещено на http://www.Allbest.Ru/

Размещено на http://www.Allbest.Ru/

Федеральное государственное автономное образовательное учреждение высшего образования

Национальный исследовательский университет «Высшая школа экономики»

Факультет гуманитарных наук

Образовательная программа «Фундаментальная и компьютерная лингвистика»

КУРСОВАЯ РАБОТА

На тему:

Система автоматического определения и исправления ошибок в учебных эссе русскоязычных студентов на английском языке

Тема на английском

Automated error detection and correction system for learner essays in English produced by students with Russian native language

Выполнил Торубаров И.С.

Студент 4 к. гр. №БКЛ162

Научный руководитель

Виноградова О.И., доцент

Москва, 2020 г.

Table of contents

  • Preface
  • 1. The substance of learner text correction task
  • 2. Computational linguistics approach to the problem
  • 3. Data collection and processing
  • 4. Model training procedure
  • 5. Results of the final model execution
  • 6. Discussion
  • Conclusion
  • References

Preface

It is no secret that the world is in the midst of an ongoing rapid globalization. Even though there may be some events that may hinder the interconnection of the world, it does not affect the communication, as all the means of getting in contact are as effective online as they are offline. With English being the language that connects communities between different parts of the world, providing opportunities for those having good command of language, the importance of finding new ways to teach, study and analyze English is obvious. There is yet another global trend which supports and amplifies the ever-increasing need in software solution for learning English, this being the steady digitalization of the world. One may argue that in the wake of the following years we may face an unprecedented growth of the market of online teaching solutions and other digital services, so, with the increasing demand for such solutions, one field to experience it and offer what it has in stock is computational linguistics, in particular language corpora and natural language processing (NLP).

As we have noted, English is the language of the global communication and, naturally, it has the largest amount of data available for and the most advanced automated solutions provided for, with new ground-breaking technology being introduced on a weekly basis (see [GPT-3]). The recent development of the natural language processing systems is characterised by a steady progress in a vast number of complex tasks. There were some notable breakthroughs in solving a number of difficult machine learning tasks in the past two years, so it is therefore logical that we turn to natural language processing in search of any methods it could provide to with such a relevant problem. Given all that, what does machine learning have in stock for English learners? The answer is that there are some numerous learner text-related solutions and competitions, forming the domain of language learner text analysis and automated error correction.

While there are a variety of tools on the market suited to help the teachers and learners on the one hand and a significant amount of rather efficient solutions to some pretty challenging learner-related NLP problems on the other hand, they are not as well interconnected as they could. For instance, a number of prominent tools promising to help better the English are generally statistic-and rule-based, while it has been shown that some alternative approaches consistently outperform these solutions. This is not to say that this situation persists not for a good reason, as the inference time of larger models is pretty slow and their computational complexity is too high to remain consistently operational on some personal computers, let alone smartphones. Nevertheless, it is a known problem and in fact a number of the newer NLP solutions are published with the means of scaling them to at least a stable server-side application already included.

Then there is yet another problem directly related to erroneous text correction: there are no scalable and reliable statistic-based solutions for that. Producing a sequence not only is a more sophisticated task, logically, than simply choosing one of the options (including such as “does this word fall under an error pattern: yes/no”), as it can lead to a significantly bigger number of possible solutions. Furthermore, sequence prediction is more difficult to score: while there are some robust statistic-based scoring metrics for tasks such as classification and function, assessing how well the computer has predicted a sequence is in fact a more challenging task.

1. The substance of learner text correction task

Given all what we already discussed, we argue that error correction is a relevant and interesting problem and the topic of our prolonged research. There are a number of reasons why learner texts are specific from natural language processing perspectives. Firstly, they have an increased number of spelling errors: while this usually does not affect the human understanding of the text, it has an impact on general language analysis routines, as the traditional part-of-speech taggers are often unable to correctly process the given word. These routines are further affected when running into an irregular syntax pattern or a grammatically misaligned n-gram, which are also some typical occurrences in a learner text: in fact, these are more likely to find in a non-native English speaker exercise than in a poorly performed text by a native speaker []. What is more common for the both yet still challenging for automated analysis is the missing sentence delimiting punctuation, as it affects the standard tools for sentence splitting. As a logical conclusion, when dealing with learner texts we have to either address the described problems, tuning and adapting the existent statistical and vocabulary-based tools, or use less straightforward algorithms which do not rely so heavily on prescribed language patterns.

The problem is further complicated by the fact that learners with different language backgrounds exhibit various patterns of errors relevant to their native languages. This has been shown by a number of machine learning competitions developed for this specific task of detecting the language background of a learner text author. Specifically, the English learner texts of native Russian speakers. In terms of natural language processing this means that we may want to amplify the factor of language-specific error detection and correction given that we know what language speakers we are going to work with.

Finally, the errors themselves contribute to the problem, as there are different types of them which have their distinct features such as domain, interrelatedness or weight on language understanding. Even the research for distinct types of errors varies a lot: while, for instance, grammar error correction is the most developed competition in natural language processing with a number of contests and established task-specific corpora, spelling correction is generally viewed from a more grounded algorithm-based perspective, with the problem of L2 learner spelling addressed in a fewer number of papers. Perhaps the most challenging task is lexical/semantic error correction, therefore, we either have to find different specific ways to work with them or, in analogy to our own language perception, a really efficient language model with a general broad representation of the written word. We chose the latter method to solve the task at hand: basically, as we are interested in correcting errors in all domains of the text, we need a model that is capable of efficiently processing context and, subsequently, fairly competent in other spheres of the language command. In the following parts of the paper we will provide more basis for this claim.

2. Computational linguistics approach to the problem

The language correction problem is a sequence-to-sequence (seq2seq) task in terms of machine learning. We basically treat it as a machine translation task, but instead of e. g. English to French translation we translate from erroneous English to correct English. What may seem as an illogical comparison to a human is in fact not very different in terms of computer knowledge, if even different at all, given that the computer representation of every language is a model, a grid of weighted connections between vectors representing word tokens, and a computer basically performs the same tasks while either translating between languages or rendering a same-language enhanced version of a flawed text.

A seq2seq task is also a notion of classification task with numerous add-ons on top: namely, we limit our output to a number of tokens and then choose the appropriate output from our vocabulary, token-by-token. Nevertheless, the specifics of seq2seq routines are that there may be no mutual token-to-token equivalence between input and output: as illustrated by a language translation task, a sentence such as `Моя мама сейчас спит' should be translated as `My mom is sleeping now', which not only alters the order of words, but also represents one word in `сейчас' by two words in `is sleeping'. Hence, a language model or a suitable equivalent is needed to perform this or related tasks in seq2seq models, which complicates the model architecture for the task.

When trying to suggest one of the existing architectures to the given task, one may notice that the past few years in state-of-the-art natural language processing saw a shift from LSTM-based neural networks, including Bi-LSTM, to Transformer-based architectures. Pioneered by such models as ELMo and BERT, the transformer-based deep learning neural networks have proven to be very effective at solving fairly difficult NLP tasks, reaching some previously unimaginable bars on a number of classic NLP tasks. On some of the tasks, such as assessing semantic acceptability, resolving coreference or performing logical inference, the derivatives of BERT based on transformers not only have beaten LSTM-based solutions, yielding improvements of up to 30%, but also even outperformed general human baselines. Given that, it is somewhat surprising that the underlying structural elements of a Transformer neural network are in fact more basic than that of LSTM: while the latter are basically an improved version of a recurrent neural network, the former are, in similar terms, just a lot of linear neural networks sophisticatedly stacked upon one another. The complexity then comes not from the basis of the model, but from its composition, as the linear networks form the inner layers, which consist of different parts such as general word representation and attention heads. Then, these linear layers are stacked upon one another in a structure called encoder that represents the input sentence in the vector space, while a similar mirroring structure called decoder then applies the necessary operations to convert the inner input sentence representation to the specified output format (in our case, from erroneous sentence to its corrected version).

Then comes the Text-to-Text Transfer Transformer, or T5, which is a remarkable achievement by (Raffel 2019) and is in fact what we would solely use for error correction. Barring the high computational cost, the model has shown breakthrough results on the GLUE benchmark datasets, having claimed the top of the leaderboard at the time of its release. More so, even though T5 has been outperformed on GLUE since, at the time of the writing it holds the best performance record for Super GLUE, a self-described “new set of more difficult language understanding tasks, improved resources”, beaten only by human baselines by some 0.5%. These facts are even more impressive while taking into consideration that T5 it is a purely sequence-to-sequence model, such that it takes a textual input and produces a textual output. In fact, it has specifically been trained to solve different tasks using the same checkpoint, as compared to its competitors, which are typically tuned to predict the numerical values in an array, which in itself is a much simpler task leaving much less possibilities for errors and confusions.

As lexical errors are purely context-based, and even though one may come up with some vocabulary-based detection rules for them working to some degree, a context-processing neural network at the modern state of the development should consistently outperform such a solution. It is also worth noting that taking context into the account can help to detect errors in some of the aforementioned cases: namely, some misspelled words may collate with another word that is in the dictionary, or some erroneous grammatical and syntactical patterns are connected to the intraword relations. Again, the transformer architecture, being the base of all the most advanced NLP solution, is notorious for its processing of context, and, with such a powerful tool as T5 available, perfectly suited for our need in a deep sequence-to-sequence context-aware neural network, we, naturally, employed it for the task.

3. Data collection and processing

We set the goal to retrieve as much training data as possible from all the appropriate datasets we would be able to search. First, we collected the aforementioned standard datasets for grammar error correction tasks which are FCE and W&I + LOCNESS; then, based on our prior knowledge, some search and retrieval of lists of learner corpora available online, we added the following list of source corpora to our set: NUCLE, ICNALE, REALEC and EFCamDat. Next we will describe the used sources more broadly.

FCE (Yannakoudakis 2011) is an openly distributed subset of paid Cambridge Learner Corpus (CLC), produced by learners taking the First Certificate in English. It provides three subsets: train, dev and test, which we will use for validation, testing and training (scoring) accordingly. We expect fairly clean results from FCE dataset, with corrections for pretty much every text, so here we will just set up our dataset analysis procedure and see how the things should look like for a model dataset.

W&I + LOCNESS (Bryant 2019, Granger, 1998) is another model dataset which was used at the BEA-2019 shared task on grammatical error correction and was introduced to become a new standard dataset for grammatical error correction. It consists mainly of a sample of texts from the Cambridge English Write & Improve (W&I) corpus, which is a, with addition of some texts from LOCNESS, a corpus of texts produced by native English speakers, in the development section. Here, there is no test subset, as it is instead distributed in the form of raw text, with the corrections and metrices counted within internal contest evaluation system.

The NUS corpus of learner English, or NUCLE (Dahlmeier 2013) consists of student essays produced by the National University of Singapore English learners and is also widely used in GEC tasks. We consider it to be well-annotated and not in need for any cleanup. Nevertheless, NUCLE is less modelling for us, as it is somewhat further from IELTS essay format than we are pursuing.

We also used a subset of corrected texts from the International Corpus Network of Asian Learners of English (ICNALE) (Ishikawa 2013), which contained a number of farly well-annotated essays on general topics produced by Japanese learners.

Russian Error-Annotated Learner English Corpus, or REALEC (Vinogradova 2019), while having some known under-annotated parts, has the second-most texts count between the corpora that we were able to obtain and is also the dataset that we wish to align our models most closely. Collected in the National Research University Higher School of Economics, it consists of Russian students' argumentative essays and graph descriptions in IELTS format, and, given that these are the closest for the possible users of our product, we naturally want our predictions to align with the annotations in REALEC corpus rather than in any other corpora in our list.

Lastly, we used EFCamDat, which provided the main share of our training data with more than 90% of the related set. As such, we were obviously concerned for it not to influence our model in the direction aligned to that of EFCamDat annotations. Specifically, the texts in EFCamDat are generally shorter than the length of an argumentative essay and the command of language in many of EFCamDat entries is lower than in the other corpora we used, as the texts produced by A1-A3 level speakers on CEFR scale form the major part of this particular corpus. Nevertheless, we did not consider the possibility of omitting EFCamDat data completely, as its sheer size is its ultimate quality. So, living up to the data scientific motto “the more data the better” (for which the single exception is “don't use too dirty data”, which made us to abstain from using LANG-8 dataset), we went on to process the obtained corpora.

After collecting the corpora, we turned to the following step which is train, validation and test splitting. The general principle we had to consider there is that in error correction the validation is practically a numerical testing procedure as opposed to a means of model self-adjusting during the training routine. More specifically, while validation is still not present for model training and reserved from keeping the model from bias and overfitting, it is not used for any corrections of model weights in the training, as is the case for some other neural network architectures. As such, while our validation set size is still sufficient in absolute terms and we account for diversity, there is no need to divert too much data, e. g. 10% of EFCamDat to validation: we should prioritise the structure of the subset instead. As the mean text length and CEFR level of an EFCamDat entry is generally less than that of the other datasets and also our model goal, we opted to balance our validation evenly between EFCamDat and the other corpora. In doing so, after collecting the validation data from all the corpora by EFCamDat, we then copied the subset and added the same amount of validation data from EFCamDat (to make use of its diversity). computer english error text

To facilitate the reproduction of our research, we set a manual random seed to that the datasets were shuffled in the same way each time. We than repeated this procedure in every of the following steps where it was applicable. As for scoring, each configuration will be scored on the combination of FCE-dev, WI-LOCNESS-dev and REALEC-gold subsets.

While FCE and W&I + LOCNESS are reference corpora and should be fairly clean, there are some non-annotated texts present in REALEC and EFCamDat. With that knowledge, we performed some analysis: the logic here is we know that in most cases no edits whatsoever does not mean that the given text is in itself clean, but rather that in all likelihood it just has not been annotated. If we were to leave these texts as they are, we would have been teaching the model to not correct some erroneous texts, which is obviously contrary to what we were looking for. To overcome that, we assumed that each text with no edits in the listed corpora is unannotated and just dropped these entries from the set. Furthermore, knowing that there are also under-annotated texts in REALEC, such as only one or two errors are corrected over the whole body of the text, we plotted some relative frequency graphs and applied an arbitrary pruning in some cases (see Figure A). Specifically, we dropped texts with q<=0.025 in REALEC and, somewhat surprisingly, we also had to drop texts with q<=0.01 from NUCLE as the distribution there turned out to be significantly positively skewed with the mode being around 0.01.

For the next step we applied spell-checking as we were looking to determine whether using in text preprocessing it will be beneficial as is is expected to correct the slightly misspelled words better than the broad model, or it was also possible for the spell-checker to fail too much on more heavily misspelled words, thus affecting the quality of the overall system. As we were looking for a tradeoff between processing speed and quality, we turned to jamspell. We already had tested jamspell in our prior research, concluding that it demonstrated the best alignment of corrections with our manual spelling annotations in REALEC, a result that has been proven again this year.

Finally, we did some necessary preprocessing for the text to fit into T5 pipeline. In this instance, we had to adopt a strategy of splitting the texts, as our correction model has a hard cap of 512 for input tokens, while there are a number of entries that exceed this limit in the datasets. The problem here was that there are no pre-set splitting points in the parallel texts, so we can never be sure if the texts are aligned accordingly after any programmatic split. The paragraphs turned out to be the best solution: we argue that they are best-suited for a comprehensive delimiter of parallel texts. The logic is that, firstly, they are mostly consistent within the text and its corrected version and, secondly, the logical ties between the sentences are higher than those between the paragraphs, so we lose less context if we split the text by paragraphs as compared to any other division strategy (except for no division). For precaution sake, we set the maximum entry limit not to 512, but to 510, as we will still have to declare the task in input data later on. So then we discarded the entries where the number of paragraphs between original and corrected texts did not match; for the rest we will assume that the paragraphs are aligned correctly and split the texts by them. We also discarded the paragraphs that contained more than 512 tokens, and, in contrast, concatenated the other paragraph with a <br> notation so that the model was able to learn the line break as a certain kind of a punctuation mark where it could have seen fit.

4. Model training procedure

At first we considered to perform a grid search for hyper parameters of the model. However, soon we discovered that the model has no native support for hyper parameter search, which is perhaps unsurprising given its complexity. We ultimately decided against manipulating with the code of the model and implementing cross-folds, since the authors reported good results for the majority of the tasks with the same parameters. More importantly, we wanted to try different configurations of training data and test model parallelism (the alignment of, batch size and consistent between models of different complexities providing for the same training procedure), so we opted to perform what we call a naпve parameter search: we performed a series of runs of similar models with differing parameters on the same data to see if there is any significant difference.

To report the accuracy for each of our runs we use BLEU and ROUGE-L scores, as calculated by the code provided alongside the model. We could not use accuracy or any other traditional scoring metrics here, as accuracy, for example, implies a perfect match between the model prediction and the target sequence, which naturally produced unreliable results (see Appendix X), as could be explained for example by mentioning that there can be often a multitude of ways to correct an error. BLEU and ROUGE were proposed specifically to assess the quality of text translation and similar tasks. Basically, the first is closest to precision (how correct were the model prediction), while the second is most similar to recall (how much of the entries that were needed to correct did the model actually correct). Please note that the generally high scores are related to the fact that, naturally, the majority of words are equivalent within the original and the corrected text, which the scores attribute to good prediction, so it is in fact the small differences that matter here, and later we provide the test scores for some general grammar error correction tasks such as CoNLL-2014 and BEA-2019, as well as some manual metrics to make the objective outcomes of our research more comprehensive.

We started with a small model and a dataset consisting of roughly 1% of our training data. As is the case with our general training set, this subset was dominated by EFCamDat entries (see Table 1 for details). In terms of models parameters, we tested three configurations differing only by learning rates, with the first model having lr = 0.002, the second lr = 0.003 (similar to the exemplary network provided by the original researchers) and the third lr = 0.0045.

Table 1

First parameter search results

Model #

Learning rate

Best step

BLEU

ROUGE-L

1-1

0.002

21000

79.69

92.96

1-2

0.003

29000

79.78

92.97

1-3

0.0045

29000

79.56

92.88

We conclude that lr = 0.0045 is not optimal, while the lr = 0.003 seems slightly better than lr = 0.002, although the difference lies within margin of error and the results are inconclusive. Still, we opted to test the same three configurations in the second run, as we wanted to assure that the first results are not an unlikely coincidence and that the behavior of the models would not change much with respect to different training data.

For the second test, we selected a collection of essays from all the datasets except EFCamDat and tested the models in a different environment:

Table 2

First parameter search results

Model #

Learning rate

Best step

BLEU

ROUGE-L

2-1

0.002

34000

80.77

92.65

2-2

0.003

34000

80.72

92.66

2-3

0.0045

39000

80.77

92.74

Given that, we could not choose for sure whether lr = 0.002 or lr = 0.003 is the better option, so in the last test before full-scale training we again test these two learning rates against each other. However, there is also a set of other training task features that we tested here. First, we tried another optimizer function to adjust the model weights in respect to the loss. Second, we checked if decreasing the iterations per loop variable twice, from 100 to 50, would have any effect. Last but not least, we tried to add automated spell checking, as discussed earlier, as a preprocessing step. Here we used a bigger training dataset, balanced between EFCamDat and the other corpora. Note that we adopted a compromise of lr = 0.0025 for the latter three comparisons.

Table 3

Third parameter search results based on a larger model size

Model #

Learning rate

Best step

BLEU

ROUGE-L

3-lr2

0.002

42000

82.27

93.78

3-lr3

0.003

36000

82.14

93.8

3-ipl50

0.0025

36000

82.2

93.81

3-SpC

0.0025

31000

81.85

93.4

3-adam

0.0025

42000

80.83

93.02

It turned out that the default optimizer is significantly better, as the run with Adam optimization showed the poorest results. The second-lowest performance in terms of BLEU and ROUGE-L is demonstrated by the model trained on spell-checked data, so, as it stands, spell-checking actually does not improve model performance. We would still argue, though, that judging by textual results as opposed to numerical metrics it is evident that the appliance of spell-checking is beneficial in some cases: while there are examples of incorrect guess by the spell checker that converts into a wrong predictions by the model, there are also cases where the spell checker in fact corrects the error that the model in itself otherwise could not correctly process. Even yet, the difference in scores is pretty minuscule; as such, despite the lower score, we are not dropping spelling correction entirely.

The three remaining runs yielded practically identical scores, so we were free to choose the combination of parameters for the final model as we saw fit. Given that the model with lr = 0.0025 had a slight margin in the scores and is a compromise between two best-performing metrics, we decided to go with this learning rate in the final run. We did not, however, apply 50 iterations per loop, as it seemed to have no effect on neither the training time nor the score, so in this case we decided to go with the parameters recommended by the authors, which have shown a consistent positive performance in the previous test runs.

5. Results of the final model execution

Table 4

Final performance results

Model #

Learning rate

Best step

BLEU

ROUGE-L

big

0.002

76500

80.77

92.65

SpC

0.003

76500

86.96

94.83

fuse

0.0045

50000

86.08

94.6

We trained the model on a general set, with fusing both spellchecked and unspellchecked data and on a 1:1 spellchecked dataset. The latter two runs resulted in the best performance, and the fuse model was successfully fitted to Russian native speaker data.

6. Discussion

As we discussed earlier, the task of error correction in learner texts is gradually non-trivial. We argue that in terms of error correction tasks the text should be viewed as a hierarchically layered structure, where the errors produced in the lower layers of language, such as spelling, being more quantized (word/token-centered) and more prone to statistical analysis, while the errors produced in the higher levels of language, such as syntax, lexis and ultimately discourse, can only be perceived in relation to not only other words, but also the context and sometimes even implicit meanings and general knowledge. If a spelling error can be detected on a standalone word basis if it does not fit into the vocabulary, or at most within a 5-gram if a homonym of the misspelled form is found in the vocabulary, the effective span of a higher-level error is in most cases bigger and a reliable algorithm able to correctly process such errors should require some kind of context understanding under the hood.

The algorithm produced in our research effectively meets the aforementioned criteria as it can handle errors from different layers of language simultaneously. For instance, let us consider some excepts from the model correction produced by the system, for which the full version can be found in Appendix 2.

Conclusion

We emphasize that the main product of our research is an effective error detection and correction system able to simultaneously process different types of errors and produce a text which is generally more coherent and less erroneous than the input. Prior to the recent couple of years, the systems were virtually unable to suggest such precise corrections and demonstrate this level of versatility. As of now, our research proves that the modern state-of-the-art solutions are competent enough to efficiently solve complex context-related language processing tasks involving all the language layers.

Our goals for the future research will be maintenance and development of our service, as well as improvement of the options for applying it for narrower custom tasks. For the maintenance part, we plan to refactor our server so that it could effectively handle multiple concurrent requests and provide faster inference without any significant drops in terms of the quality. One of the options here is, while we can also apply static elements and/or reconfigure parts of the model in preparing it for serving. In fact, we may turn to using the lightest architecture for T5 in case we are able to find a way to transfer knowledge of a bigger model.

However, in terms of research our main focus is to make the outputs of the model more meaningful. This includes providing comments for corrections performed by the model based on the labeling of the erroneous parts of the original text. We acknowledge that while a pure correction of the erroneous text may be impressive, it is of little informative value, as no positive feedback comes from such a system in case it provides some pure corrections alone. The task of attaching meaningful insight for the user is in itself an interesting problem from linguistic perspective. We view this as part of classification task which can be performed by a separately-trained T5 model or any other algorithm given that its results will be acceptably high-scoring and meaningful. While we are yet to decide whether a parallel token-wise multi-label error classification, a labeling of error spans detected by the presented algorithm (multi-label or seq2seq-based) or a combination of the two would be the most appropriate, we argue that should we come by an efficient model for the task, it will be transformer-based, such as the T5 model we used or other disciples of BERT such as RoBERTa, XLNet, XLM, and Distil BERT. This research is yet another proof of the prominence and competence of the recently suggested approach.

Still, we consider the obtained results to mark an important milestone in learner texts processing and natural language processing in general, as well as a fairly good contribution to the society. We hope that the path set by this paper and related research will not lead us into a future where future research would be produced by a new generation of state-of-the-art language model requiring zero intellectual input from humans, but into a future where humanity uses such technology to set new ways for enlightening, or at least let the future systems to label such ways for itself.

References

1. Callison-Burch, C. et al., 2006] Callison-Burch, C., Osborne, M., & Koehn, P. (2006, April). Re-evaluation the role of bleu in machine translation research. In 11th Conference of the European Chapter of the Association for Computational Linguistics.

2. Devlin, J. et al., 2018] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arхiv preprint arXiv:1810.04805.

3. Grammatical Error Correction] Grammatical Error Correction by NLP-progress:

4. Hoover, B. et al., 2015] Hoover, B., Lytvyn, M., & Shevchenko, O. (2015). U.S. Patent No. 9,002,700. Washington, DC: U.S. Patent and Trademark Office.

5. Kuzmenko, E., & Kutuzov, A.] Kuzmenko, E., & Kutuzov, A. Russian Error-Annotated Learner English Corpus.

6. Papineni, K. et al., 2002] Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Association for Computational Linguistics.

7. Raffel, C. et al., 2019] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.

8. Tang, R. et al., 2019] Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., & Lin, J. (2019). Distilling task-specific knowledge from BERT into simple neural networks. arXiv preprint arXiv:1903.12136.

9. Torubarov, I., 2019] Torubarov, I. (2019). Automated detection of errors in English learner essays using BERT. Higher School of Economics course paper

10. Vinogradova, O. et al., 2019] Vinogradova, O., Ershova, E., Sergienko, A., Overnikova, D., & Buzanov, A. Chaos is merely order waiting to be deciphered: Corpus-based study of word order errors of Russian learners of English. Learner Corpus Research 2019 Warsaw, 12-14 September, 113.

11. Wang, A. et al., 2018] Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.

12. Wang, A. et al., 2019] Wang, A., Pruksachatkun, Y., Nangia, N., Singh, A., Michael, J., Hill, F., ... & Bowman, S. (2019). Superglue: A stickier benchmark for general-purpose language understanding systems. In Advances in Neural Information Processing Systems (pp. 3261-3275).

13. Yannakoudakis, H. et al., 2011] Yannakoudakis, H., Briscoe, T., & Medlock, B. (2011, June). A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 180-189). Association for Computational Linguistics.

Размещено на allbest.ru

...

Подобные документы

  • Theoretical foundation devoted to the usage of new information technologies in the teaching of the English language. Designed language teaching methodology in the context of modern computer learning aid. Forms of work with computer tutorials lessons.

    дипломная работа [130,3 K], добавлен 18.04.2015

  • Theoretical problems of linguistic form Language. Progressive development of language. Polysemy as the Source of Ambiguities in a Language. Polysemy and its Connection with the Context. Polysemy in Teaching English on Intermediate and Advanced Level.

    дипломная работа [45,3 K], добавлен 06.06.2011

  • Traditional periodization of historical stages of progress of English language. Old and middle English, the modern period. The Vocabulary of the old English language. Old English Manuscripts, Poetry and Alphabets. Borrowings in the Old English language.

    презентация [281,2 K], добавлен 27.03.2014

  • Loan-words of English origin in Russian Language. Original Russian vocabulary. Borrowings in Russian language, assimilation of new words, stresses in loan-words. Loan words in English language. Periods of Russian words penetration into English language.

    курсовая работа [55,4 K], добавлен 16.04.2011

  • The history of the English language. Three main types of difference in any language: geographical, social and temporal. Comprehensive analysis of the current state of the lexical system. Etymological layers of English: Latin, Scandinavian and French.

    реферат [18,7 K], добавлен 09.02.2014

  • The general outline of word formation in English: information about word formation as a means of the language development - appearance of a great number of new words, the growth of the vocabulary. The blending as a type of modern English word formation.

    курсовая работа [54,6 K], добавлен 18.04.2014

  • The Importance of Achieving of Semantic and Stylistic Identity of Translating Idioms. Classification of Idioms. The Development of Students Language Awareness on the Base of Using Idioms in Classes. Focus on speech and idiomatic language in classes.

    дипломная работа [66,7 K], добавлен 10.07.2009

  • From the history of notion and definition of neologism. Neologisms as markers of culture in contemporary system of language and speech. Using of the neologisms in different spheres of human activity. Analysis of computer neologisms in modern English.

    научная работа [72,8 K], добавлен 13.08.2012

  • English language: history and dialects. Specified language phenomena and their un\importance. Differences between the "varieties" of the English language and "dialects". Differences and the stylistic devices in in newspapers articles, them evaluation.

    курсовая работа [29,5 K], добавлен 27.06.2011

  • Background of borrowed words in the English language and their translation. The problems of adoptions in the lexical system and the contribution of individual linguistic cultures for its formation. Barbarism, foreignisms, neologisms and archaic words.

    дипломная работа [76,9 K], добавлен 12.03.2012

  • Principles of learning and language learning. Components of communicative competence. Differences between children and adults in language learning. The Direct Method as an important method of teaching speaking. Giving motivation to learn a language.

    курсовая работа [66,2 K], добавлен 22.12.2011

  • Features of the use of various forms of a verb in English language. The characteristics of construction of questions. Features of nouns using in English language. Translating texts about Problems of preservation of the environment and Brands in Russian.

    контрольная работа [20,1 K], добавлен 11.12.2009

  • Language as main means of intercourse. Cpornye and important questions of theoretical phonetics of modern English. Study of sounds within the limits of language. Voice system of language, segmental'nye phonemes, syllable structure and intonation.

    курсовая работа [22,8 K], добавлен 15.12.2010

  • The influence of other languages and dialects on the formation of the English language. Changes caused by the Norman Conquest and the Great Vowel Shift.Borrowing and influence: romans, celts, danes, normans. Present and future time in the language.

    реферат [25,9 K], добавлен 13.06.2014

  • Development of guidelines for students of the fifth year of practice teaching with the English language. Definition of reading, writing and speaking skills, socio-cultural component. Research issues in linguistics, literary and educational studies.

    методичка [433,9 K], добавлен 18.01.2012

  • Recommendations about use of a text material and work with expressions. Rules of learning and a pronunciation of texts taking into account articles, prepositions and forms of verbs. The list of oral conversational topics on business English language.

    методичка [50,8 K], добавлен 15.02.2011

  • Linguistic situation in old english and middle english period. Old literature in the period of anglo-saxon ethnic extension. Changing conditions in the period of standardisation of the english language. The rise and origins of standard english.

    курсовая работа [98,8 K], добавлен 05.06.2011

  • Specific character of English language. Words of Australian Aboriginal origin. Colloquialisms in dictionaries and language guides. The Australian idioms, substitutions, abbreviations and comparisons. English in different fields (food and drink, sport).

    курсовая работа [62,8 K], добавлен 29.12.2011

  • The historical background of the spread of English and different varieties of the language. Differences between British English and other accents and to distinguish their peculiarities. Lexical, phonological, grammar differences of the English language.

    курсовая работа [70,0 K], добавлен 26.06.2015

  • Why English language so the expanded language in the world. The English countries of conversation are located in various parts of the world and differ in the different ways. Each country has own customs of history, tradition, and own national holidays.

    топик [10,7 K], добавлен 04.02.2009

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.