Lost in machine translation: contextual linguistic uncertainty

Actual problems associated with semantic, grammatical, stylistic and technical difficulties of machine translation. Comparison of the main methods of such translation, their advantages and disadvantages: based on rules, text corpora, neural, hybrid.

Рубрика Иностранные языки и языкознание
Вид статья
Язык английский
Дата добавления 14.02.2022
Размер файла 48,7 K

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.allbest.ru/

Размещено на http://www.allbest.ru/

Kuban state agrarian university,

University of Malaya

College of arts, Rikkyo university

University of Auckland

Lost in machine translation: contextual linguistic uncertainty

Anton V. Sukhoverkhov

Dorothy DeWitt

Ioannis I. Manasidi

Keiko Nitta

Vladimir Krstic

Krasnodar

Kuala Lumpur

Tokyo

Auckland

Abstract

The article considers the issues related to the semantic, grammatical, stylistic and technical difficulties currently present in machine translation and compares its four main approaches: Rule-based (RBMT), Corpora-based (CBMT), Neural (NMT), and Hybrid (HMT). It also examines some «open systems», which allow the correction or augmentation of content by the users themselves («crowdsourced translation»). The authors of the article, native speakers presenting different countries (Russia, Greece, Malaysia, Japan and Serbia), tested the translation quality of the most representative phrases from the English, Russian, Greek, Malay and Japanese languages by using different machine translation systems: PROMT (RBMT), Yandex. Translate (HMT) and Google Translate (NMT). The test results presented by the authors show low «comprehension level» of semantic, linguistic and pragmatic contexts of translated texts, mistranslations of rare and culture-specific words, unnecessary translation of proper names, as well as a low rate of idiomatic phrase and metaphor recognition. It is argued that the development of machine translation requires incorporation of literal, conceptual, and content - and-contextual forms of meaning processing into text translation expansion of metaphor corpora and contextological dictionaries, and implementation of different types and styles of translation, which take into account gender peculiarities, specific dialects and idiolects of users. The problem of untranslatability (`linguistic relativity') of the concepts, unique to a particular culture, has been reviewed from the perspective of machine translation. It has also been shown, that the translation of booming Internet slang, where national languages merge with English, is almost impossible without human correction.

Key words: machine translation, untranslatability, contextual translation, linguistic relativity, lexical ambiguity, syntactic ambiguity.

Аннотация

Трудности машинного перевода: контекстная языковая неопределенность

Антон Владимирович Суховерхов

Кубанский государственный аграрный университет, г. Краснодар, Россия

Дороти де Витт

Малайский университет, г. Куала-Лумпур, Малайзия

Иоаннис Игоревич Манасиди

Кубанский государственный аграрный университет, г. Краснодар, Россия

Кейко Нитта

Колледж искусств, Университет Риккё, Тошима, Токио, Япония

Владимир Крстич

Университет Окленда, Окленд, Новая Зеландия

В статье изучаются актуальные проблемы, связанные с семантическими, грамматическими, стилистическими и техническими трудностями машинного перевода, сравниваются 4 основных метода такого перевода: 1) на основе правил (RBMT); 2) на основе корпусов текстов (CBMT); 3) нейронный (NMT); 4) гибридный (HMT). Описываются некоторые «открытые системы» перевода, которые позволяют самим пользователям исправлять или дополнять содержание перевода («краудсорсинговый», или «коллективный, перевод»). Коллективом авторов статьи, носителями языка разных стран (России, Греции, Малайзии, Японии и Сербии), проведено тестирование качества перевода наиболее показательных фраз на английском, русском, греческом, малайском и японском языках с использованием различных систем машинного перевода: PROMT (RBMT), Яндекс. Переводчик (HMT) и Google Translate (NMT). В результате тестирования выявлен недостаток учета семантического, лингвистического и прагматического контекстов переводимого текста (А. Суховерхов), неверный перевод редкой или лингвоспецифичной лексики (К. Нитта), смысловой перевод имен собственных (И. Манасиди), низкое распознавание идиоматических выражений и метафор (Д. де Витт). Авторами статьи показано, что для совершенствования современных систем машинного перевода требуется объединение буквальной, концептуальной и контентно-контекстной форм обработки смыслов текста, улучшение корпусов метафор и контекстологических словарей (Д. де Витт), разработка различных типов и стилей перевода, включающих специфические диалекты и идиолекты пользователей, а также гендерные особенности языка (К. Нитта). На материале сербского языка В. Крстичем переосмыслена с точки зрения машинного перевода проблема непереводимости («языковой относительности») понятий, уникальных для определенной культуры. И. Манасиди показано, что без участия человека невозможен перевод бурно развивающегося интернет-сленга, характеризующегося смешением национальных языков с английским.

Ключевые слова: машинный перевод, непереводимость, контекстуальный перевод, лингвистическая относительность, лексическая многозначность, синтаксическая многозначность.

Main part

The language barrier and machine translation

Natural languages per se are hybrid, dynamic, context-sensitive and eco-logical [Sukhoverkhov, 2014; 2015; Steffensen, Fill, 2014; Sukhoverkhov, Fowler, 2015]. Each has its own syntax, multiple word meanings, idioms, innuendos, intertextualities, ecological and cultural embeddedness that sometimes do and sometimes do not coincide with each other. Although analytic philosophy, the generative-linguistic theory, Russian formalists, French structuralists and others have all contributed to language formalisation in recent years, the ecological, process and system approaches to language nature questioned the possibility and effectiveness of such formalisation. For example, ecolinguistics, the distributed language theory, the dynamic and adaptive systems approaches to language, systemic functional linguistics, and cognitive linguistics show that the same language has potentially an infinite variety of meanings and structures and that, by its nature, it is dynamic, interactive, situated, and ecologically / culturally embedded [De Bot, Lowie, Verspoor, 2007; Fowler, Hodges, 2011; Verspoor, De Bot, Lowie, eds, 2011]. As natural languages being developed, distributed, and situated within various systems of activities cannot be completely formalised, the process of translation sui generis is also approximate and constantly developing.

Machine translation programs can effectively produce «verbum pro verbo» translations but the metaphorical, metonymic, and idiomatic expressions are not captured in most cases [Abd Rahman, Md Norwawi, 2013; Yusoff, Jamaludin, Yusoff, 2016]. However, the process of human translation is not based on a simple rendering according to denotation per se; it requires capturing the concept of a word, phrase (sentence), and the general idea of the whole message (text). The results are even more distinguishable when those languages are seldom used or when they belong to a different language family. For instance, the metaphorical expression «The wheels are falling apart» or idiomatic phrase «let's call it a day!» cannot be translated literally, because they express a problem of a human relationship or need to rest. Knowledge of the relevant culture is also crucial for correct translations. For instance, Bahasa Malaysia or Malay (the national language of Malaysia) has significant varieties of idioms that have been used as a tool of socialization and have contributed to transferring the values and thoughts of the Malay culture [Muhammad, 2006]. A simple idiom such as «hitam manis» in Malay (directly translated as «black sweet») would be used to refer to a pretty lady (but never to a boy) with a dark complexion. Context sensitivity is another problem for human and machine translation. For example, the Malay word «geram», when used in reference to an adorable child, conveys «love and fondness» such as in the expression «Geram melihat anak comel ini», but the same word, when used in another context, can denote anger or disappointment.

Furthermore, in some languages, including Japanese, homophones cause an analogous problem. Numerous homophones in Japanese may be fairly distinguished one another when they are spelled in correct Chinese characters (i.e., ideograms) or pronounced with conventional intonations. For instance, one within the system of Japanese would never be seriously complicated between kuma («bear») and kuma («dark circles under eyes») even without a strict context. However, when either human or machine translators lack sufficient knowledge of Chinese characters, they cannot translate even the simple sentence «Tsukare te kuma ga deki-ta», meaning `I've got dark circles under my eyes due to fatigue' correctly. Indeed, an actual trial of translating the sentence by Google Translate, which uncontrollably detects the Japanese sentence merely in alphabetical syllables, ends up with `I got tired and made a bear'. Obviously, there are obstacles to even performing an elementary «literal translation» in a general sense. The case demonstrates the multiple layers of both polysemy as well as literariness as a critical issue of translation.

The complexity and multidimensionality of translation presupposes an interaction between the understanding of the general content / context of an utterance and its particular concepts or components. Yu. Marchuk shows that modern machine translation systems cannot even solve the basic task of making the correct choice between variants of polysemantic words in one phrase. For example, nowadays, the above - mentioned Google Translate, which is based on the latest and the most advanced Neural Machine Translation system, correctly translates the phrase «technical support system» to Russian [Marchuk, 2016, p. 29], yet fails to identify the meaning of the Sydney Trains announcement «Doors closing, please stand clear» or the sceptical response «I don't buy it!».

In contrast to a human translator, a machine does not possess the language mastery and cultural background needed to create a trustworthy translation without having a set of rules explicitly predefined in it. These rules have been seen as the result of linguistic formalisation and are based on both cultural idiosyncratic and universal aspects [Wierzbicka, 1992, p. 26]. In comparison with previous years [Kotov, Marchuk, Nelyubin, 1983; Novozhilova, 2014], we see that translation methods and technologies have been greatly improved and diversified thus effectively diminishing the language barrier between speakers of different languages. However, many problems, issues and technical challenges related to machine translation still remain. In this paper, we revise and test the latest machine translation systems for translation accuracy of idioms, rare words, proper names, phrasal verbs and the general content of phrases [Marchuk, 2016; Nguyen, Chiang, 2017] and for the ability to keep track of the fast developing and chaotic online communication, the so-called «netspeak» [O'Curran, 2014; Lim, Cosley., Fussell, 2018; Lohar, Afli, Way, 2018].

To reach the purpose, we review existing translation algorithms, comparing outcomes of the most popular machine translation systems (PROMT, Yandex and Google) with, respectively, Rule-based (RBMT), Hybrid (HMT) and Neural (NMT) algorithms. By translating between various languages (English, Greek, Russian, Malay and Japanese), we test the ability of these systems to understand concepts, metaphorical expressions and structure of a sentence and suggest possible linguistic and technical solutions to detected problems. Therefore, another purpose of this article is to identify the unavoidable limitations of machine translation, show how these limits are predetermined by and correlated with the systemic and dynamic nature of languages, and propose some solutions for coping with this linguistic dynamics and fuzziness.

The main approaches to machine translation

Machine Translation, as a subfield of computational linguistics that investigates the use of computer software for translation of text or speech, has four main approaches on its current stage of development: Rule-based (RBMT), Corpora-based (CBMT), Neural (NTM) and Hybrid (HMT). In this chapter, the theoretical and technical premises of these approaches are reviewed. For the comparison of these methods and for the analysis of their effectiveness, we evaluate their translation quality by using popular machine translation systems: PROMT (RBMT), Yandex. Translate (HMT) and Google Translate (NMT). Received results are used for examination of properties and complexions of tested languages that have yet to be handled by these systems.

Rule-based Machine Translation (RBMT). Rule-based Machine Translation is a translation approach which uses dictionaries to determine the corresponding words, syntax and grammar between the target and the source language. After receiving the message, the machine uses the dictionaries to construct an equivalent message in the target language, which it then outputs. Examples of such systems are Apertium, GramTrans and PROMT, while new systems are being elaborated for Uralic languages [Riahovskaya, 2017; Wiechetek, 2008; Johnson et al., 2017b].

Even though Rule-based Machine Translation seems like a neat solution to our problem, it comes with various inflexibilities that can make it unsuitable in a variety of situations. To begin with, in order to create an accurate RBMT system, all grammatical rules from both languages, as well as the relations between them, have to be explicitly defined in a programmatical way, including grammar exceptions. This greatly increases the time, effort and funds needed to construct such a system. In addition, the word dictionaries (lexicons) are hard to manufacture as, on the one hand, the number of total existing words is different in each language and, on the other hand, this number is constantly increasing by leaps and bounds. For instance, the Global Language Monitor shows that the English language has 1,052,010.5 words (on March 2019) and a new word is being created every 98 minutes, averaging to about 14.7 words per day (http:// www.languagemonitor.com/global-english/no-of - words) However, such a word difference may be compensated for, as example, by adapting or borrowing words from another language, without any translation [Koltan, 2017; Cui, 2012].

Therefore, having only a set of strictly defined rules and a list of corresponding words may lead to false and untrustworthy results, especially when idioms or literary texts are involved [Riahovskaya, 2017]. Furthermore, because natural languages are constantly evolving, with new meanings being added quite frequently, keeping the corpora up to date can be just as inefficient as creating them, especially if a grand change in a language system takes place. Take, for example, the transition from Katharevousa to Demotic Greek which took place in the 1980s, putting an end to the diglossia between written text (Katharevousa) and spoken language (Demotic) in favour of the latter. Were a change like that to happen to a modern language in Rule - based Translation, its dictionaries would be instantly rendered obsolete.

Corpora-based Machine Translation (CBMT). Corpora-based Machine Translation, contrary to RBMT, does not strictly depend on defined lexicons and grammar rules but instead bases its acquisition of «language knowledge» (training) on the analysis of parallel corpora between two languages. This way, the task of manually creating and maintaining rules or word correspondences is delegated to an algorithm, solving RBMT's inflexible dictionary problem.

With the help of information and probability theories comes one of the most popular and effective CBMT's methods: Statistical Machine Translation (SMT), which, as its name suggests, translates texts based on probability values between the source and the target language. In its essence lies the fact that every word in the target language is a suitable translation of a word in the source language and has a certain probability of being correct. The word with the highest probability value is then selected and the source word is substituted by it. For metaphors or idioms, SMT systems can use phrases instead of words to deliver results. The probability values can be determined in a number of ways of which we list two: 1) by analysing the provided parallel corpora and calculating probabilities based on word or phrase equivalence between the source and target languages; or 2) by identifying the words that are more likely to appear after other words [Wang et al., 2017; Babhulgaonkar, Bharad, 2017].

The main disadvantage of Corpora-based Machine Translation, however, is its ineffectiveness when presented with text that it was not trained for. If, for example, the parallel corpora were based on distinct terminology (for a specific brand or domain), then it will struggle to translate text that is written in everyday, casual style. Moreover, if casual texts are added in the specialised training set, then some specific translations could be overridden by casual ones, as their probabilities of appearing would be higher. Consequently, it is important to exercise caution when selecting the parallel corpora, depending on the material that is going to be translated.

In recent years, various Hybrid Approaches have been actively developed [Costa-Jussa, Fonollosa, 2015]. Some of them combine the statistical method and the rule-based approach and are applied to popular and rare languages alike [Oladosu et al., 2016]. According to such research, this approach competes with base machine translation methods and provides the best translated output in each language. In keeping with translation quality metrics, for this approach, the National Institute of Standards and Technology (NIST) method displayed a score of 0.8963, while the Bilingual Evaluation Understudy (BLEU) algorithm output a score of 0.7923, with a value close to one indicating high similarity of the machine translation to a reference text, usually a human translation [Oladosu et al., 2016, p. 123]. An example of a Hybrid Machine Translation system is Yandex. Translate, which combines the Neural (Russian to English) and Statistical (all languages) methods, using another system for selecting the best result out of the two (CatBoost) (https://www.bbc. com/russian/features-41086998).

Neural machine translation (NMT). As of 2016-2017, Google, Yandex, Omniscien Technologies, SDL and many others have announced the deployment of neural machine translation. Generally, a neural translation system is based on encoder-decoder architecture. The encoder takes in a sentence in the source language and formalizes its semantics, outputting a sequence of numbers that represent its meaning.

This technology, in contrast to other methods of machine translation, does not «memorize» phrase-to-phrase correspondences or rules between languages, but instead tries to encode the semantics of a sentence and saves them for future reference. In order to represent linguistic (sequentially dependent) information, a more complex type of neural network is used: for example, recurrent neural networks, along with their specific architectures that can «remember» the words used in a sentence (LSTM, GRU) [Zaremba, Sutskever, Vinyals, 2014].

Neural networks represent each word and the whole meaning of a sentence through numerical values. These values are then passed through different mathematical functions and get influenced by other coefficients that hold the «language knowledge» of the system, making a prediction of what the translated text should be like. The coefficients are usually represented through (N x M) - dimensional matrices and are adjusted with the goal of minimizing the system's error value; i.e. how wrong the system was in its translations (e.g. backpropagation algorithm).

In the meantime, Google has developed an approach that allows an NMT system to generalize each language's accumulated semantics. This allows for «zero-shot translation», meaning that the system can translate between

language pairs (correspondences) that were not explicitly included in the training set [Johnson et al., 2017a].

Despite its increased accuracy, NMT also has its problems; it is comparatively quite computationally expensive to train and, in translation inference, encounters difficulty with rare words. It can «over-translate» or «undertranslate» (overfitting/underfitting data) and may provide wrong results where the meaning of the source sentence is ambiguous [Wu et al., 2016; Wang et al., 2017]. Taking into account these NMT translation flaws, developers and researchers have proposed hybrid models based on the integrity of the statistical and neural machine translation technologies.

Table 1. Russian language

¦ Phrase: Он на седьмом небе от счастья

¦ Human translation: He's on cloud nine.

Language

PROMT, RBMT

Yandex. Translate, Hybrid (NMT)

Google Translate, NMT

To English:

He is in the seventh heaven

He's over the moon.

He is in seventh heaven

To Greek

Auto; swai xrov e^Sogo oupavo (He is on the seventh sky).

Ewai xrov e^Sogo oupavo ano euru^a

(He* is on seventh the sky from happiness).

*Pronoun with no gender difference.

Ewai xrov eЯSogo oupavo (He* is on the seventh sky). *Pronoun with no gender difference.

To Malay

Not available.

Dia ke Bulan

(He has gone to the moon).

Dia berada di langit ketujuh (He is in the seventh Sky).

To Japanese

Kare wa, mujo no kofuku de imasu

(He is in the supreme happiness).

Kare wa tsuki no ue da (He is on the moon).

Kare wa dai-nana tengoku ni iru (He is in the seventh heaven).

Comments: 1) Malay translation «he has gone to the moon» shows that Yandex renders from Russian to English and only afterwards to Malay.

2) Malaysia also has the tradition of 7 layers of heaven.

3) Hybrid MT and NMT translated Greek in gender-neutral form, while RBMT explicitly used a male pronoun (Arnog), like in the original text.

4) There is no idea of the seventh heaven in Japanese culture either religious or secular. Yet, PROMT presents the descriptive accuracy of the phrase, even though its syntax is somewhat awkward for the choice of preposition de instead of ni.

Table 2. English language

¦ Phrase: This is, straight up, not my cup of tea.

¦ Human translation: Это, честно, не в моих интересах (Honestly, I am not interested in this).

Language

PROMT, RBMT:

Yandex. Translate, Hybrid (CBMT):

Google Translate, NMT:

To Russian:

Это, прямо, не моя чашка чая (This, straight, not my cup of tea).

Это, прямо вверх, не моя

чашка чая

(This is, straight upward, not my cup of tea).

Это, прямо, а не моя чашка чая (This, straight, but not my cup of tea).

To Greek

Auto ewai, отрёгт*, Sev gou фХгх^ауг tou тоауюи (This is, straight*, not to me cup of tea).

*Word not translated, written in Greek letters.

Auto Sev ewai to Toai gou (This is not my tea).

Auto Ewai, кат 'Eu9E^av, Sev

Ewai to зLitЗфvi тааі gou (This is, right away, is not my tea

cup).

To Malay

Not available.

Ini adalah, yang lurus ke atas, bukan cangkir teh saya (This is the straight upwards, not my tea cup).

Ini, lurus, bukan cawan teh saya (This, straight, is not my tea cup).

To Japanese

Kore wa, massugu ni ue e, o-cha no watashi no kappu de wa arima-sen

(This is, straight upwards, not my tea cup).

Kore wa, massugu de wa naku, o-cha no watashi no kappu desu (This is, not straight, and my tea cup).

Kore wa massugu de, watashi no

o-cha de wa arima-sen

(This is straight, and not my tea).

Comments: 1) Malay translation of cangkirand cawan mean the same, meaning a `cup'.

2) Hybrid MT failed to convey the meaning to Greek, i.e. even if the user knew both English and Greek, it would be impossible to manually translate the Greek output back to English.

3) All the three translations to Japanese fail to both convert the meaning and compose natural phrases with conventional collocations. In particular, o-cha no watashi no kappu means literally `tea's my cup', even though it can be guessed as `my teacup'. `[M] y tea' in Google Translate can be considered to connote `my cup of tea' only by omitting the container in accordance with the grammatical convention of Japanese.

Table 3. Greek language

¦ Phrase: nsp^svs gs avurcogovqOTa to gsLLov

¦ Human translation: He was looking forward to the future.

Language

PROMT, RBMT:

Yandex. Translate, Hybrid (CBMT):

Google Translate, NMT:

To Russian:

Он с нетерпением ожидали в будущем

(He with impatience [they] awaited in the future).

Ждал с нетерпением будущее (He waited with impatience for the future).

Он с нетерпением ждал будущего

(He with impatience waited for the future).

To English

He waited impatiently for the future.

W ait, looking forward to the future.

He was looking forward to the future.

To Malay

Not available.

Tunggu, sabar untuk masa depan (W ait, patience for the future).

Dia menanti masa depan (He waits for the future).

To Japanese

Not available.

Mirai wo tanoshimi ni shite matte (Looking forward to the future).

Watashi-tachi no te no todoku tokoro ni

(Within our reach).

Comments: 1) RBMT provided an inaccurate translation with both invalid grammar and meaning.

2) Hybrid MT provided a Russian gender-neutral translation, but confused the gender-neutral verb with its imperative form when translating to English (`Wait,…'). The other two translation systems used a male pronoun in the Russian translation.

3) The imperative form can still be used `[Then,] look forward to the future', but is secondary and less common for the user's needs. Furthermore, the translation results for this variant are not grammatically correct.

4) Available two translations to Japanese are both incomplete as sentences: Hybrid MT fails to translate the subject/noun, while Google Translate provides the sentence lacking both verb and object besides its mistranslation of he as `we'.

Table 4. Malay language

¦ Phrase: Timah*, dengan warna kulit kuning langsat, dikenali dengan kejelitaannya.

¦ Human translation: Timah, with her fair skin, is renowned for her beauty.

Language

PROMT, RBMT:

Yandex. Translate, Hybrid (CBMT):

Google Translate, NMT:

To Russian:

Not available

Свинец, цвета кожи желтая кожа светлая, выявленных kejelitaannya (Lead, skin color yellow skin light, identified kejelitaannya).

Жесть с желтым цветом кожи известна своей красотой (The tin, with a yellow skin color, is known for its beauty).

To English

Not available.

Tin, with the color of the skin yellow complexioned, known by the kejelitaannya

The tin, with a yellow skin color, is known for its beauty.

To Greek

Not available.

MoXupSoi;, то хрюра xou Seppaxoi; Kvcpivo complexioned, пои npoaSiop^ovxai ano kejelitaannya (Lead, the colour of the skin yellow complexioned, that [are] defined from kejelitaannya).

О Kaaavcepoi;, ps idxptvo хрюра Seppaxoi;, aval yvroaxo^ yia xqv opoptpia xou

(The tin*, with yellow skin colour, is known for its beauty)

*The periodic element (Sn).

To Japanese

Not available.

Hifu no iro to suzu, kejelitaannya ni yotte shirarete-iru, kaoiro no kiiro (Skin color and tin, known by kejelitaannya, yellow in the complexion).

Kiiroi hada-iro no suzu wa, sono utsukushi-sa de shirarete imasu (Yellow complexion is known for its beauty).

Comments: 1) Kuning langsat is used in Malay to refer to fair-skinned or fair complexion.

2) *Timah is a colloquial shortened name for Fatimah, and is translated erroneously as `tin'.

3) Both Hybrid MT and NMT output an illogical, absurd translation in all target languages, failing to provide the proper context in their result and using source words without any changes. 4) Two available translations to Japanese also fail to recognize the proper noun Timah and automatically drop the word from translation, while Hybrid MT leaves kejelitaannya as it is.

Table 5. Japanese language

Example 5. Japanese language

¦ Phrase: (Ano kafe no amazake wa oishii-

ne.)

¦ Human translation: Amazake [sweet non-alcohol rice drink] at that cafe is tasty, isn't it?

Language

PROMT, RBMT:

Yandex. Translate, Hybrid (CBMT):

Google Translate, NMT:

To Russian:

No translation

Вот именно (It is true).

Вкус Акафу восхитителен (The taste of Akafu is delicious)

To Greek

Not available.

Sraoxa (Right).

H уетюр xou Akafu eivai ипёрохр (The taste of Akafu is wonderful)

To Malay

Not available.

Itu benar (It is true)

Rasanya Akafu lazat (Akafu tastes delicious)

To English

No translation

That's right.

The taste of Akafu is delicious

Comments: 1) [K] afe is a loan word meaning `cafe'. No system can identify the word, though humans perhaps can guess its sense by analogy with the original word.

2) - ne at the tail of the sentence is a unique binding particle used to ask for confirmation rhetorically. The function of the construction is thus analogous with the tag question in English. Again, no system succeeds in translating the structure.

3) Amazake is a culturally specific soft drink, produced intermediately in the process of brewing sake, Japanese rice wine.

Accuracy testing of main machine translation approaches. In order to examine the different types of machine translation methods, we used a number of phrases from Russian, English, Greek, Malay, and Japanese. The result was that Google (NMT) and Yandex (HMT) translation services showed the highest degree of accuracy, compared to PROMT (RBMT). However, PROMT had several results better than Google and Yandex.

Below are the samples of the most illustrative results of our online machine translation tests (see Tables 1-5).

These online translation examples illustrate the high degree of literal translation persistent in machine translation systems. When the machine translation is juxtaposed to human translation, it can be seen that the online translation procedure lacks conceptual meaning even though the semantic and syntactic systems are integrated into it the online translation procedure. Surprisingly, our testing indicated that all translation systems encounter problems with the recognition of both full and shortened names. In Table 4 Timah, a shortened name, was translated as `tin' and Greek name Егрцш) (Irene in English) also was not recognized as a name and got translated literally by its meaning (`peace') in another tested phrase va sixs Sikio n Eip^vn xskiKaf».

In most cases, idiomatic phrases were not detected and were translated literally by the systems (see Tables 1, 2). Also, this pair of examples demonstrates that the accuracy of a translation depends on the syntactical complexity of the source phrase. Whereas the outcomes in the first example basically maintain minimum readability, those in the second example are broken in terms of the sentence structure. All three systems are obviously weak in translating adverbial syntax; the adverbial clause straight up seems to make them particularly confused and this results in poor performance. Likewise, additional information in the fourth example «with her fair skin» and simple modifier today in Table 5 respectively cause the same type of mistranslation. Colloquialisms are also misrepresented: the phrase «Geram melihat anak comel ini» should convey a feeling of affection when seeing the child, and means `What a cute child!'. However, with Google Translate it loses its meaning in the given context, becoming `Greedy saw this cute kid'.

The ambiguity problems, which can be easily resolved by a human, largely contribute to wrong output, as is the case with avvno^ovtfoia: the Greek word describing the feelings of impatience and excitement caused by an unknown situation (Table 3). In the same example, the translation systems fail to identify the gender of the subject and even change the verb into an imperative form. Cases regarding Japanese are even more complicated: the third-person singular pronoun is replaced by the first-person plural we in one case, and in the other case, the subject is dropped as seen in the participial construction. This instance suggests that ambiguity of the action can result in the misplacement of the verb causing a wrong form as well as a fragmentary phrase never aligned in a complete sentence.

As mentioned above, the Malay language is full of idiomatic expressions which reflect the various cultural aspects of the language. This feature of the language has led to many inconsistencies in machine translations. As has been shown by previous research in an accuracy analysis during the translation of 200 Malay sentences containing proverbs into English, more than half (55.0%) of them were wrongly translated by Google Translate, and 34.0% were correct while only 9.6% were translated accurately into similar idioms [Abd Rahman, 2013]. The challenges encountered during machine translation were mostly rooted in the use of affixes in words and the additional stopwords in phrases during translation (both of which were used to reflect the grammatical structure of the language), as well as the use of different words with the same meaning [Abd Rahman, 2013]. In the first issue, the example memilih kasih can be translated when the affixes are removed: «pilih kasih». Hence, stemming, which is the detection and filtering of the proverb to exclude affixes, needs to be done before translation [Abd Rahman, 2013]. Secondly, proverbs may have stopwords such as in the following phrases: «sedikit-sedikit lama-lama jadi bukit» and «sedikit-sedikit lama-lama akan jadi bukit». Hence, the removal of superfluous words such as akan would enhance accuracy of the translation [Kwee, Tsai, Tang, 2009; Abd Rahman, 2013]. Thirdly, there may be different words used to represent the same proverbial meaning - «Ada angin, ada pokoknya» is similar to «Ada angin, ada pohonnya» (meaning anything that happen has a cause) [Abd Rahman, 2013]. In the case «bagai kera mendapat bunga» for someone who does not appreciate the value of a gift, beruk and monyet can replace kera to mean the same thing.

Contemporary machine translations have yet to solve the above-mentioned problems [Yusoff, Jamaludin, Yusoff, 2016]. As language does not exist in isolation but is part of a society and culture, one would need to be familiar with Malay to be able to translate the rich and colourful cultural contexts of the language. Nevertheless, there are studies of Semantic-based Translation using N-Grams that could deal with ambiguous sentences by identifying words with multiple (ambiguous) meanings [Yusoff, Jamaludin, Yusoff, 2016].

Therefore, our analyses and results of previous works in this field show that for machine translation to be effective, it first needs to incorporate three levels of meanings processing: the literal, the conceptual, and contextual. Secondly, a corpus of metaphors, idiomatic phrases and proverbs with equivalences from different cultures need to be constructed. Finally, the recognition of proper names and their shortened versions should be improved.

Machine translation and linguistic relativity theory

The act of translating from one language to another, apart from being a fairly complex problem when implemented by machines, can pose difficulties even for a human translator. A good example of this fact is the book «English As She Is Spoke: the new guide of the conversation in Portuguese and English» by Pedro Carolino [Da Fonseca, Carolino, 2002] which is full of grammatical and stylistic mistakes that are surprisingly similar to the ones made by the machine translation systems.

The linguistic relativity theory and theories similar to it explain some of these translational difficulties. They show that many languages differ in the amount of words they have, some words describe unique feelings, a person's characteristics, professional jargon, and many other realities specific to a culture [Whorf, 1956; Kovecses, 2005; Deutscher, 2010]. The surrounding environment, natural resources, and specific activities of a region also bootstrap vocabularies or slangs that have no equivalents in other languages [Wierzbicka, 1992; Durdureanu, 2011; Sanders, 2014]. Because of this cultural and geographical specificity, many of them can be translated only with the help of a contextual explanation, rather than with a distinct word.

A good example is a word sevdah, whose root comes from Turkish language, and is commonly used by people living in Bosnia. The standardly offered translations - «melancholy», «lovesickness», «yearning for love» - do not really capture the essence of sevdah. A more precise translation would be `enjoying your state of sorrow as a very special (sorrow-ish) kind of «pleasure» '. Perhaps, the English term «wallowing in your sorrow» would not be a bad way to understand sevdah but only because a better concept does not exist. This specific kind of «emotion» simply seems to be «reserved» for people from the Balkans who would find sevdah-kind of happiness in singing songs describing and glorifying the heroic death of their most loved ones. A possible explanation is that the people from the Balkans in the most difficult times in which they had little to look forward to simply evolved to learn how to enjoy in their sorrow. Today, mainly due to the changed historic circumstances, even many young people from the Balkans would struggle to understand word sevdah itself and the state of being in sevdah.

Such types of words do not have their counterparts in other languages and this kind of untranslatability very often leads to their borrowing: we adopt words from one language (the donor language) and incorporate them into another language without or with minor modifications. Sometimes this happens on a critical cultural level because borrowings tend to overflood national languages around the globe (e.g. economic and computer terms such as market, poster, billboard, slogan, hashtag, etc.). In France, this has even caused cultural resistance [Styblo, 2007; Caruso, 2012]. However, the question whether simply introducing a foreign word into a language entails introducing the relevant concept in that language remains discussable. Suppose we incorporate word sevdah into English in a way in which we can incorporate poster into Serbo-Croatian: it is unlikely that the former borrowing will yield the same result as the latter.

Difficulties in translations of unique words also could be overcome by «adaptation» or «free translation» wherein the social or cultural reality (idea) in the source language is replaced with new realities that are closer and more natural to the audience in the target language. Such «domestication» [Lawrence, 1995] of the source text is very artistic and vulnerable to criticism, and for the moment cannot be implemented by machine translation because of its creative complexity. The opposite side in the art of translation - «foreignization» - strives to save the source language and culture, and translate text into the target language with minimal changes using, for instance, comments and explanations about original realities. However, many researchers, as in the case of verbally expressed humour based on wordplay, agree that such methods are sometimes ridiculous because additional comments and explanations destroy the amusement [Low, 2011; Hoffman, 2012]. Humour comprehension requires implicit and explicit knowledge of specific cultural and linguistic realities, and their explanations could be too long or inappropriate for translation. For example, the joke «A priest, a rabbi, and a nun all walk into a bar, and the bartender says, `What is this, some kind of joke?'» requires knowledge about jokes that begin with «A, B and C walk into a bar…».

In this regard, we can see that the problem «how to translate» is a unique and disputable task that can be solved differently by different translators and with a variety of methods. This complexity / relativity of languages, cultures and methods makes current machine translation systems just an auxiliary tool. However, the more successful, socially accepted or standardised examples of human translations we will have, the more data could be borrowed, formalised and used by specialists and technical systems for «neutral translation» [Razlogova, 2017]. Thus, despite variabilities in the rendering of the same text, some formal invariants or typical examples can be extracted and be practically used in the machine and human translation.

Lost in Internet translations

The on-going technology boom has further created an additional problem in terms of language translation. Internet slangs, acronyms, hidden meanings, letter and number combinations in words and intentional mistakes are, generally, an accepted way of communicating online.

Additionally, foreign languages are heavily influenced by English in this field with many hybrid words being coined as a result of mixing two languages together. For instance, incorporating English in German to create Denglish, or English in Malay for Manglish, or writing Greek characters in English (Greeklish) as a means to not constantly switch between keyboard layouts. In this case, translations need to consider processing the literal, the conceptual, and the content/context meanings, referring to contextological dictionaries for styles and idiolects of the users.

Combined with the many abbreviations used online, it can be hard for speakers who are learning the language to understand these foreign compressed messages even if they might understand colloquial speech. For instance, the numbers 55 sounds as `go go' in Japanese and are used to convey the English meaning. In Malaysia, fuyoh is commonly used online and may be equivalent to the OMG in the internet slang. An example in Manglish: «Fuyoh, so cheap!»

The difficulty and novelty of this language style or idiolect is high enough that a reader outside a given network community and culture would be frustrated to understand anything. Even YouTube comments are becoming more and more outlandish for «strangers». These are the reasons why the origins of «netspeak» and «digital natives» - competent communicators in cyber contexts - are being postulated by researchers in modern culture or in so-called «generation Z» [Crystal, 2001; Pasfield-Neofitou, 2012; Sharifian, 2017, p. 108].

Therefore, «netspeak» is a refined example of an untranslatable language system via machine translation processes because the online translation tool requires to be trained to recognise style, deliberate typos, abbreviations and acronyms. In addition, the community-specific vocabulary predominantly depends on the contexts within which it is utilised. This social reality in interlocutor exchange situations directly affects the translation programs. It is essential to have machine training based on corpora that include words, concept, and content. However, such a problem was partly solved by emergence of a new branch in translation called «crowdsourced translation». Its two main forms are: 1) nonprofessional community-based systems such as the Google Translate community that corrects automated translations or Luis von Ahn's «Duolingo» language approach that uses a learning platform where people translate websites as a part of the learning process and 2) crowdsourced translation service platforms such as TM-Town, Gengo, Smartling and others with professional translators providing their services (https://www. morningtrans. com/crowdsourced-translation - does-it-work/).

Because of the international boom of social networks and the need for translating usergenerated content with its slang variations and informal language, Facebook and Twitter have also launched crowdsourced translation platforms in order to create multilingual posts. It must be noted, however, that crowdsource methods do not restrict themselves to translation problems only. For instance, project reCAPTCHA uses the input words or selected images to bring old books to the digital realm and to gather data for artificial intelligence, improving the accuracy of maps. This project is realised in expecting users to decipher distorted words or to identify particular pictures online to reach a successful registration. Crowdsourced translation for verification of machine translations may be the solution to accurate translation for languages in which words and phrases are heavily reliant on culture and context, such as in the Malay language. However, the difficulty may be in getting a sufficiently dedicated and sufficiently informed crowd to contribute to the translations.

Conclusion

In retrospect, looking at the problems discussed in previous works [Kotov Marchuk, Nelyubin, 1983; Novozhilova, 2014; Arestova, 2015; Dulov, Shmeleva, Boronkinova, 2017], we still see that:

1) At the present stage and in the near future, it is impossible to exclude the human editor from machine translation. The latest and most advanced Neural Machine Translation so far needs human corrections even with simple tasks [Marchuk, 2016; Nguyen, Chiang, 2017]. Therefore, linguists and other scholars still have to contribute to the further development of such systems, for example, through the construction of metaphors corpora and contextological dictionaries that could be used for translation (interpretation) of the most difficult literary texts.

For the further development of machine translation, it is also crucial to include the various pragmatics aspects of language into the process of computer-based translation. For the moment, «crowdsourced translation» could be a solution of that problem. It may provide resources for the current translation issue of fast-growing «netspeak». An attempt was also made by a recent project called «SenseTrans», a tool that adds contextual information to posts in social media using Al-analytics [Lim, 2018; Lim Cosley, Fussell, 2018].

The idea of mobilizing a crowd of translators of a variety of texts - both amateurs and professionals - into a sort of collective wisdom for trans-linguistic communications seems also to become a computational materialization of the «translation norms» or «the social reality of correctness notions» [Bartsch, 1987, p. xii]. According to the two pioneer contributors to the theory, G. Toury and Th. Herman, each individual's interaction with the multi-layered socio-cultural norms circumscribing her/his verbal operations informs the sense of accuracy and quality of a translation result [Toury, 1995; Hermans, 1996]. Crowdsourcing processes in tandem with platforms of professional translators able to verify them can potentially construct such a norm that helps make translation machine a usable apparatus.

2) We still do not have an integrated typology of various types and styles of translation. Indeed, translations of technical texts differ from translations of newspaper texts or informal online conversations. Among the systems tested in our research, only PROMT, with some limits, can translate with pre-set writing styles but still cannot detect such styles or eliminate stylistic mistakes. However, some grammar checker software (like Grammarly, StyleWriter, WhiteSmoke, etc.) have addressed this problem, revealing, for instance, colloquialisms in documents written in an academic style.

Many studies show that there are cultural and aesthetic differences between men and women in the use of vocabulary, syntax, and communication [Na, 2016; Okamoto, 2013]. In the Japanese language, «onnarashii hanashikata» (feminine ways of speech) and «otokorashii kotobazukai» (masculine ways of speech) is common. This kind of speech reflects specific forms of politeness and cultural norms represented by language. Although there is no masculine or feminine way of speaking in the Malay language, specific words may be used to describe a feminine or masculine trait. «Hitam manis» is always used for women but for men, they are «berkulit gelap» or dark complexion. In that context, wrong word choice could lead to ridiculous or impolite results in machine translation.

Nowadays, Google Search considers the original location of a search query, as well as the user's previous requests. Probably in the future, machine translation will also become less abstract and universal, and more personalised and situated, by constantly learning from the interests and idiolects of its users.

3) Modern ecolinguistics, the distributed language theory, the dynamic and adaptive systems approaches to language, systemic functional linguistics, and cognitive linguistics have shown that language (or process of «languaging») is dynamic, interactive, situated, ecologically and culturally embedded. All these aspects of language complicate its formalisation and machine translation. However, it may be still possible to find a common, formalizable core («universal grammar») for a group of languages by means of theoretical (linguistic, mathematical) and machine analysis of languages. Furthermore, dynamic, learning and evolving models (software) can be designed by programmers and linguists that could adapt the machine translation to the dynamic nature or the evolvability of natural language and to dialects/idiolects of their users.

References

semantic grammatical machine translation

1. Abd Rahman K., Md Norwawi N., 2013. The Challenges of Handling Proverbs in Malay-English Machine Translation. 14th International Conference on Translation 2013. Penang, University Sains Malaysia, pp. 27-29. URL: www.scribd.com/doc/ 163669571/Khirulnizam-The-Challenges-of - Automated-Detection-and-Translation-of - Malay-Proverb.

2. Arestova A.A., 2015. Sravnitelnyy analiz sistem mashinnogo perevoda [Comparative Analysis of Machine Translation Systems]. Vestnik Volgogradskogo gosudarstvennogo universiteta. Seriya 9, Issledovaniya molodykh uchenykh [Science Journal of Volgograd State University. Young Scientists' Research], no. 13, pp. 105-109. Babhulgaonkar A.R., Bharad S.V., 2017. Statistical Machine Translation. Intelligent Systems and Information Management. 1st International Conference on Intelligent Systems and Information Management (ICISIM), Aurangabad, pp. 62-67.

3. Bartsch R., 1987. Norms ofLanguage: Theoretical and Practical Aspects. London, New York, Longman. 348 p.

...

Подобные документы

  • Analysis the machine translation failures, the completeness, accuracy and adequacy translation. Studying the equivalence levels theory, lexical and grammatical transformations. Characteristic of modern, tradition types of poetry and literary translation.

    методичка [463,5 K], добавлен 18.01.2012

  • A brief and general review of translation theory. Ambiguity of the process of translation. Alliteration in poetry and in rhetoric. Definitions and main specifications of stylistic devices. The problems of literary translation from English into Kazakh.

    курсовая работа [34,6 K], добавлен 25.02.2014

  • Concept, essence, aspects, methods and forms of oral translation. Current machine translation software, his significance, types and examples. The nature of translation and human language. The visibility of audiovisual translation - subtitling and dubbing.

    реферат [68,3 K], добавлен 15.11.2009

  • Primary aim of translation. Difficulties in of political literature. Grammatical, lexical and stylistic difficulties of translation. The difficulty of translation of set phrases and idioms. The practice in the translation agency "Translators group".

    курсовая работа [77,5 K], добавлен 04.07.2015

  • Translation is a means of interlingual communication. Translation theory. A brief history of translation. Main types of translation. Characteristic fiatures of oral translation. Problems of oral translation. Note-taking in consecutive translation.

    курсовая работа [678,9 K], добавлен 01.09.2008

  • Translation as communication of meaning of the original language of the text by the text equivalent of the target language. The essence main types of translation. Specialized general, medical, technical, literary, scientific translation/interpretation.

    презентация [1,3 M], добавлен 21.11.2015

  • Studying the translation methods of political literature and political terms, their types and ways of their translation. The translation approach to political literature, investigating grammatical, lexical, stylistic and phraseological difficulties.

    дипломная работа [68,5 K], добавлен 21.07.2009

  • Development of translation notion in linguistics. Types of translation. Lexical and grammatical peculiarities of scientific-technical texts. The characteristic of the scientific, technical language. Analysis of terminology in scientific-technical style.

    курсовая работа [41,5 K], добавлен 26.10.2010

  • Article as a part of speech. Theoretical and practical aspect. The historical development of articles. Lexico-grammatical aspects of translation of the definite and indefinite articles. Realization of the contextual meanings of the indefinite article.

    дипломная работа [2,1 M], добавлен 14.11.2011

  • History of interpreting and establishing of the theory. Translation and interpreting. Sign-language communication between speakers. Modern Western Schools of translation theory. Models and types of interpreting. Simultaneous and machine translation.

    курсовая работа [45,2 K], добавлен 26.01.2011

  • Contextual and functional features of the passive forms of grammar in English. Description of the rules of the time in the passive voice. Principles of their translation into Russian. The study of grammatical semantics combinations to be + Participle II.

    курсовая работа [51,9 K], добавлен 26.03.2011

  • The process of translation, its main stages. Measuring success in translation, its principles. Importance of adequacy in translation, cognitive basis and linguistics. Aspects of cognition. Historical article and metaphors, especially their transfer.

    курсовая работа [48,6 K], добавлен 24.03.2013

  • Systematic framework for external analysis. Audience, medium and place of communication. The relevance of the dimension of time and text function. General considerations on the concept of style. Intratextual factors in translation text analysis.

    курс лекций [71,2 K], добавлен 23.07.2009

  • What is poetry. What distinguishes poetry from all other documents submitted in writing. Poetical translation. The verse-translation. Philological translation. The underline translation. Ensuring spiritual contact between the author and the reader.

    курсовая работа [38,1 K], добавлен 27.04.2013

  • To determine the adequacy of the translation model, from difficulties in headline trаnslаtion of music articles. Identification peculiarities of english music press headlines. Translation analysis of music press headlines from english into russian.

    дипломная работа [602,6 K], добавлен 05.07.2011

  • Translation has a polysemantic nature. Translation as a notion and subject. The importance of translating and interpreting in modern society. Translation in teaching of foreign languages. Descriptive and Antonymic Translating: concept and value.

    реферат [26,9 K], добавлен 05.08.2010

  • The fundamental rules for determining the correct form of a noun, pronoun and verb "to be" in English. Plural nouns in English. Spelling compositions "About myself". Translation of the text on "Our town". Сompilation questions to the italized words.

    контрольная работа [19,9 K], добавлен 15.01.2014

  • The structure and purpose of the council of Europe. The structural and semantic features of the texts of the Council of Europe official documents. Lexical and grammatical aspects of the translation of a document from English to ukrainian language.

    курсовая работа [39,4 K], добавлен 01.05.2012

  • The history of translation studies in ancient times, and it's development in the Middle Ages. Principles of translation into Greek, the texts of world's religions. Professional associations of translators. The technology and terminology translation.

    дипломная работа [640,7 K], добавлен 13.06.2013

  • Exploring the concept and the subject matter of toponymy. Translation of place names from English to Ukrainian. The role of names in linguistic, archaeological and historical research. Semantic and lexical structure of complex geographical names.

    курсовая работа [50,1 K], добавлен 30.05.2014

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.