Corpus linguistics and language study

Characteristics of the practical application of language Corpora. The composition and the form of the British National Corpus. The results of investigation of semantically-related words small/little. The peculiarity of the study corpus linguistics.

Рубрика Иностранные языки и языкознание
Вид дипломная работа
Язык английский
Дата добавления 02.10.2015
Размер файла 461,1 K

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Parallel corpora are also beginning to be harnessed for a form of language teaching which focuses especially on the problems that speakers of a given language face when learning another. For example, at Chemnitz University of Technology work is under way on an internet grammar of English aimed particularly at German-speaking learners. An example of the sort of issue that this focused grammar will highlight is aspect, an important feature of English grammar but one which is completely missing from the grammar of German. The topic will be introduced on the basis of relatively universal principles (reference time, speech time and event time) and the students will be helped to see how various combinations of these are encoded differently in the two languages. The grammar will make use of a German-English parallel corpus to present the material within an explicitly contrastive framework. The students will also be able to explore grammatical phenomena for themselves in the corpus as well as working with interactive online exercises based upon it.

It is not a secret that it is now a commonplace in linguistics that texts contain the traces of the social conditions of their production. But it is only relatively recently that the role of a corpus in telling us about culture has really begun to be explored. After the completion of the LOB corpus of British English, one of the earliest pieces of work to be carried out was a comparison of its vocabulary with that of the earlier parallel American Brown corpus (Holland and Johansson 1982). This revealed interesting differences which went beyond the purely linguistic ones such as spelling (e.g. colour/color) or morphology (e.g. got/gotten). Roger Fallon, in association with Geoffrey Leech, has picked up on the potential of corpora in the study of culture. Leech and Fallon (1992) used as their initial data the results of the earlier British and American frequency comparisons, along with the kwic concordances to the two corpora to check up on the senses in which words were being used. They then grouped those differences which were found to be statistically significant into fifteen broad domain categories. The frequencies of concepts within these categories in the British and American corpora revealed findings which were suggestive not primarily of linguistic differences between the two countries but of cultural differences. For example, words in the domains of crime and the military were also more common in the American data and, in the crime category, 'violent' crime was more strongly represented in American English than in British English, perhaps suggestive of the American 'gun culture'. In general, the findings from the two corpora seemed to suggest a picture of American culture at the time of the two corpora (1961) that was more macho and dynamic than British culture. Although such work is still in its infancy and requires methodological refinement, it seems an interesting and promising line which, pedagogically, could also more closely integrate work in language learning with that in national cultural studies [McEnery 2005].

2. Investigation of Semantically-Related Words “Small/Little” in the British National Corpus

2.1 The British National Corpus: Structure and Composition

In this chapter we'll describe and illustrate the possible use of the British National Corpus for investigation. I'd like to consider the British national Corpus in detail. I have mentioned that the British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. It was compiled as a general corpus (collection of texts) in the field of corpus linguistics. The corpus covers British English of the late twentieth century from a wide variety of genres with the intention that it be a representative sample of spoken and written British English of that time.

The written part of the BNC (90%) includes, for example, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text. The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations (recorded by volunteers selected from different age, region and social classes in a demographically balanced way) and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins [Gvishiani 2008: 65]. There are several sorts of the British National Corpus :

Monolingual: It deals with modern British English, not other languages used in Britain. However non-British English and foreign language words do occur in the corpus.

Synchronic: It covers British English of the late twentieth century, rather than the historical development which produced it.

General: It includes many different styles and varieties, and is not limited to any particular subject field, genre or register. In particular, it contains examples of both spoken and written language.

Sample: For written sources, samples of 45,000 words are taken from various parts of single-author texts. Shorter texts up to a maximum of 45,000 words, or multi-author texts such as magazines and newspapers, are included in full. Sampling allows for a wider coverage of texts within the 100 million limit, and avoids over-representing idiosyncratic texts.

The British National Corpus lets you gain better insight into e-texts and analyse language objectively and in depth. Its lets you count words, make word lists, word frequency lists, and indexes. It has widespread applications in content analysis, language engineering, linguistics, data mining, lexicography, translation, and numerous commercial areas and academic disciplines. It can make full concordances of publishable quality showing every word in its context, handling texts of almost any size. It can make fast concordances, picking your selection of words from text to facilitate targeted analysis. You can view a full word list, a concordance, and your original text simultaneously, and browse through the original text and click on any word to see every occurrence of that word in its context. A user-definable reference system can let you identify which section of a text each word comes from or which logical category of your choice it belongs to.

You can select and sort words in many ways, search for phrases, do proximity searches, sample words, and do regular expression searches.
The British national Corpus was produced by a consortium of British publishers led by Oxford University Press, and the Bank of English came as the result of a long-term effort of the Collins Cobuild Group. We utilized the software the KWIC ("Key Word in Context") Concordance for building concordance figures. the KWIC Concordance is a corpus analytical tool for making word frequency lists, concordances and collocation tables by using electronic files.

2.2 The Results of Investigation of Linguistic Peculiarities of Semantically-related Words “small/little” in the British National Corpus

The domain of linguistics that has arguably been studied most from a corpus-linguistic perspective is lexical, or even lexicographical, semantics. Already the early work of pioneers such as Sinclair has paved the way for the study of lexical items, their distribution, and what their distribution reveals about their semantics and discourse functions. A particularly fruitful area has been the study of semantically-related words as probably every corpus linguist has come across the general approach of studying synonyms on the basis of their distributional characteristics. Obviously, synonymy is the most frequently corpus-linguistically studied lexical relation.

It is generally accepted that English is one of the most useful languages used by people around the world as a lingua franca. With English as an international language, people from different countries who speak different native languages are able to communicate with one another [Kirkpatrick 2007]. The language enables them to understand their interlocutor's speech. Meanwhile, they can also impart information to others through English. As a language with a long history and considerable benefits, it is not surprising to learn the fact that there exist millions of words in English. According to several studies, English tends to have larger number of words, if not the largest, than many other languages [Crystal 2003]. Some of the words have been borrowed from other languages [Finegan 2004].

Quite a few English learners could notice that there are a number of words, known as synonyms, which share similar senses of meaning or semantic features, e.g. big and large. The concept of synonyms plays an important role in English. Learners who wish to improve their English skills really need to be aware of and master synonyms. However, it is often found that, in fact, not all synonyms can be used interchangeably in every context. One has to be used in a particular context, whereas another is appropriate for some other situations. Some synonyms differ in terms of connotations they express, and some are different in regions in which they are used [Trudgill 2003].

Therefore, computerized corpora are useful to dictionary makers and others in establishing patterns of language that are not apparent from the introspection. Such patterns can be very helpful in highlighting meanings, including parts of speech, and words that co-occur with some frequency. Further, while it may appear that synonymous words can be used in place of one another, corpora can show that it is not in fact common for words to be readily substitutable [Finegan 2004].

To appreciate what revisions are made by speakers with regard to semantically-related words “small/little” we may turn to the BNC data.

Using the text browser (Screenshots 1 and 2) it can be seen, that according to the British National Corpus there are more cases of using the word `little' (51928) than the word `small' (43118), i.e. the difference is not big, it amounts only to 0,09 %.

Screenshot 1

Screenshot 2

Analyzing definitions from the Macmillan English Dictionary we see that `little' can be used as a determiner, a pronoun, an adverb and adjective. For instance, as a determiner it is often used in the meaning `small amount or degree'; `not much': little choice; little progress; `hardly any of something': there's little point in discussing it any further; there's little or no hope. Besides `little' is often used with the article `a', meaning `some, but not a lot': a little time left. In spoken English it's more usual to say `a bit of', `a little bit': We knew a little bit more; a bit of money. As an adverb little/a little can be used in the meaning `slightly': She trembled a little; a little irritated [http://www.macmillandictionary.com].

Applying KWIC (Key Word in Context) option we singled out a number of instances for analysis that can be interpreted in the following way.

First, Figure 1 shows a selection of entries for the word “little, and Figure 2 shows a selection of entries for “small. In Figure 1 quite a few of the sentences wouldn't tolerate the substitution of small for little- for example 2,3,5,6,9,10,11,15,16,17 and 21. Taking 3 as an example, English does not permit “not a small irritated”. Of those instances where the substitution is possible, several would sound very odd or convey a different connotation, such as 1, 4 and 8.

Little is usually more absolute in its application than `small', and it is preferred to `small' when there is the intent to convey a hint of narrowness, pettiness, unimportance: silly little jokes; little mind.

2. From the analysis of a random sample of instances we can observe that the adjective `small', more frequently than the adjective `little', applies to things whose magnitude is determined by number, size, value or significance: a small group; a small house; a small income. `Small' is also used with the words `quantity', `amount', `size', `quite', `very', e.g.: a small quantity of sugar; a very small car (not a very little car).

A characteristic feature of `little', as we think, lies in the fact that it can express either positive or negative connotations, depending on the context. `Little' is appropriate when the context carries connotations of sympathy, tenderness, affection as in the following examples: a little old lady, poor little thing; a pretty little house. And it is used in a negative way for referring to somebody or something you dislike: You little scoundrel! a boring little man.

As the examples in Figure 2 show, little is more readily substitutable for small; part of the reason is that in its use as an adjective little does in fact carry denotations and connotations much like those of most uses of small. However, we observe from Figure 2 that the opposite is not true. This is because little is not only an adjective meaning `small' but also part of an adverb, in the expressions `a little ruffled, a little dispirited, and a little open where it modifies an adjective.

Conclusion

In this research work I have tried to explore the notion of corpus linguistics and its application. In the first Chapter I described the role of corpora in the world, its main characteristics and the use of it for different purposes. In the second Chapter I investigated linguistic peculiarities of semantically-related words small/little using the BYU-BNC. During the investigation I faced difficulties working with computer software as many of the programs are unavailable freely on the Internet, others require good understanding of corpus linguistic terms and analysis. It is evident that corpus linguistics is fast becoming an important subset of applied linguistics as a result of the rise of computers. Computer tools can accurately count the occurrence of linguistic items in texts with tremendous speed and accuracy. They permit the researcher to work with collections of data, too large to do manually and readily search for patterns in order to arrive at generalizations about language use that go beyond mere intuitions. Therefore, corpus-based analysis not only constitutes an extremely useful technological tool, but can be looked at as a type of approaches that makes it possible to do new types of investigations and conduct research on scope previously unfeasible. Without the computer-based corpora and computer programs it is impossible to do this lexical investigation objectively, accurately and efficiently, and to answer the research questions successfully.

Undoubtedly it is an undeniable fact that corpora have a number of features which make them important as sources of data for empirical linguistic research. We have seen a number of these exemplified in several areas of language study in which corpora have been, and may be, used. In brief the main important advantages of corpora are:

1. Ease of access. Using a corpus means that it is not necessary to go through a process of data collection: all the issues of sampling, collection and encoding have been dealt with by someone else. The majority of corpora are readily available: once the corpus has been obtained, it is also easy to access the data within it, because the corpus is in machine readable form, a concordance program can quickly extract frequency lists and indices of various words or other items within it.

2. Enriched data. Many corpora are now available already enriched with additional interpretive linguistic information such as part-of-speech annotation, grammatical parsing and prosodic transcription. Hence data retrieval from the corpus can be easier and more specific than with annotated data.

3. Naturalistic data. All corpus data are largely naturalistic, unmonitored and the product of real social contexts. Thus the corpus provides the most reliable source of data on language as it is actually used.

4. Because corpus linguistics is a methodology, all linguists could in principle use corpora in their studies of language: creating dictionaries, studying language change and variation, understanding the process of language acquisition, and improving foreign- and second-language instruction.

5. From the research work we can state that the results we have got are interesting, as the adjectives `small' and `little' carry different denotations and connotations despite being defined as synonyms by many dictionaries. As can be seen, the corpus-based approach to semantically-related words small/little proved to be helpful in language learning and provided the findings that would have been difficult to obtain otherwise.

Coming up to the conclusion I regard corpus-analytic techniques as multi-purpose strategies with an immense potential to enhance all sorts of textual analysis and to confirm or contradict our intuitions about patterns and meanings in literary and non-literary language. However, it is important to be familiar with modern computer software in order to investigate corpus data successfully. I see corpus analysis as a key skill that ought to reach all parts of the discipline as it can be equally applied to linguistics, literary studies, and language teaching. Once John Sinclair said: “Language cannot be invented; it can only be captured” And it can be captured by a corpus.

Bibliography

1. Гвишиани Н. Б., English on Computer. A Tutorial in Corpus Lingustics. - Москва : Высш. шк., 2008.

2. Crystal D. The Cambridge Encyclopedia of the English Language.-Cambridge University Press, 2003.

3. Finegan E. Language. Its Structure and Use. -Heinle, 2004.

4. Hunston S., G. Francis. "Verbs Observed: a Corpus-driven Pedagogic Grammar." Applied Linguistics 19(1), 1998.

5. Johanson S., Stenstrom A. “ English Computer Corpora: Selected Papers and Research Guide”. - Walter de Gruyter, 1991.

6. Kennedy G; An Introduction to Corpus Linguistics. Harold Somers
Centre for Computational Linguistics, UMIST Manchester, U.K, 2000.

7. Kirkpatrick A., World Englishes. Implications for International Communication and English Language Teaching, 2007.

8. Leech G.Corpora and Theories in Linguistic Performance.-Berlin: Mouton de Gruyter, 1992.

9. McEnery, T; Wilson A; “Corpus Linguistics”. Edinburgh University Press Ltd.2005.

10. Meyer Ch. “English Corpus Linguistics: an Introduction”.-Cambridge University Press, 2002.

11. Mindt D. “Syntactic Evidence for Semantic Distinctions in English” Aijmer, Karin & Altenberg, Bengt, 1991.

12. Sinclair J. Introduction to How to Use Language Corpora in Language Teaching.- Amsterdam. Benjamins, 2004.

13. Sinclair J. Corpus, Concordance, Collocation. Oxford, Oxford University Press, 1991.

14. Renouf A, Kehoe A. The Changing Face of Corpus Linguistics, 2003.

15. Trudgrill P., The Handbook of Language Variation and Change. Wiley-Blackwell, 2003.

16. Wallis S, Nelson G. Knowledge Discovery in Grammatically Analysed Corpora'. - Data Mining and Knowledge Discovery, 2001.

Размещено на Allbest.ru

...

Подобные документы

  • Language as main means of intercourse. Cpornye and important questions of theoretical phonetics of modern English. Study of sounds within the limits of language. Voice system of language, segmental'nye phonemes, syllable structure and intonation.

    курсовая работа [22,8 K], добавлен 15.12.2010

  • Phonetics as a branch of linguistics. Aspects of the sound matter of language. National pronunciation variants in English. Phoneme as many-sided dialectic unity of language. Types of allophones. Distinctive and irrelevant features of the phoneme.

    курс лекций [6,9 M], добавлен 15.04.2012

  • Legal linguistics as a branch of linguistic science and academic disciplines. Aspects of language and human interaction. Basic components of legal linguistics. Factors that are relevant in terms of language policy. Problems of linguistic research.

    реферат [17,2 K], добавлен 31.10.2011

  • New scientific paradigm in linguistics. Problem of correlation between peoples and their languages. Correlation between languages, cultural picularities and national mentalities. The Method of conceptual analysis. Methodology of Cognitive Linguistics.

    реферат [13,3 K], добавлен 29.06.2011

  • Modern sources of distributing information. Corpus linguistics, taxonomy of texts. Phonetic styles of the speaker. The peculiarities of popular science text which do not occur in other variations. Differences between academic and popular science text.

    курсовая работа [24,6 K], добавлен 07.02.2013

  • One of the long-established misconceptions about the lexicon is that it is neatly and rigidly divided into semantically related sets of words. In contrast, we claim that word meanings do not have clear boundaries.

    курсовая работа [19,7 K], добавлен 30.11.2002

  • A critical knowledge of the English language is a subject worthy of the attention of all who have the genius and the opportunity to attain it. A settled orthography is of great importance, as a means of preserving the etymology and identity of words.

    курсовая работа [28,1 K], добавлен 14.02.2010

  • Biography of von Humboldt and J. Herder. Humanistic ideal of scientist. The main Functions of Linguists. Language as an intermediary in the course of understanding and demands therefore definiteness and clarity. Balance between language and thinking.

    реферат [20,6 K], добавлен 26.04.2015

  • Loan-words of English origin in Russian Language. Original Russian vocabulary. Borrowings in Russian language, assimilation of new words, stresses in loan-words. Loan words in English language. Periods of Russian words penetration into English language.

    курсовая работа [55,4 K], добавлен 16.04.2011

  • Categorization is a central topic in cognitive psychology, in linguistics, and in philosophy, precisely. Practical examples of conceptualization and categorization in English, research directions of these categories in linguistics at the present stage.

    презентация [573,5 K], добавлен 29.05.2015

  • Study of lexical and morphological differences of the women’s and men’s language; grammatical forms of verbs according to the sex of the speaker. Peculiarities of women’s and men’s language and the linguistic behavior of men and women across languages.

    дипломная работа [73,0 K], добавлен 28.01.2014

  • Comparative analysis of acronyms in English business registers: spoken, fiction, magazine, newspaper, non-academic, misc. Productivity acronyms as the most difficult problem in translation. The frequency of acronym formation in British National Corpus.

    курсовая работа [145,5 K], добавлен 01.03.2015

  • Specific character of English language. Words of Australian Aboriginal origin. Colloquialisms in dictionaries and language guides. The Australian idioms, substitutions, abbreviations and comparisons. English in different fields (food and drink, sport).

    курсовая работа [62,8 K], добавлен 29.12.2011

  • Text and its grammatical characteristics. Analyzing the structure of the text. Internal and external functions, according to the principals of text linguistics. Grammatical analysis of the text (practical part based on the novel "One day" by D. Nicholls).

    курсовая работа [23,7 K], добавлен 06.03.2015

  • The definition of concordance in linguistics as a list of words used in a body of work, or dictionary, which contains a list of words from the left and right context. The necessity of creating concordance in science for learning and teaching languages.

    контрольная работа [14,5 K], добавлен 18.01.2012

  • Theoretical problems of linguistic form Language. Progressive development of language. Polysemy as the Source of Ambiguities in a Language. Polysemy and its Connection with the Context. Polysemy in Teaching English on Intermediate and Advanced Level.

    дипломная работа [45,3 K], добавлен 06.06.2011

  • Background of borrowed words in the English language and their translation. The problems of adoptions in the lexical system and the contribution of individual linguistic cultures for its formation. Barbarism, foreignisms, neologisms and archaic words.

    дипломная работа [76,9 K], добавлен 12.03.2012

  • Grammar is the art of writing and speaking correctly. Grammar bears to language. The composition of language. The term grammar. language is an attribute of reason, and differs essentially not only from all brute voices, but even from all the chattering.

    курсовая работа [30,1 K], добавлен 14.02.2010

  • Structure (composition) of advertisements is determined by their purpose and tasks referred to. A very common feature of any advertisement is the advertising claims which possesses certain peculiarities and is of great interest to a translator.

    реферат [12,4 K], добавлен 02.10.2009

  • The study of the functional style of language as a means of coordination and stylistic tools, devices, forming the features of style. Mass Media Language: broadcasting, weather reporting, commentary, commercial advertising, analysis of brief news items.

    курсовая работа [44,8 K], добавлен 15.04.2012

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.