Linguistic studies based on the interpretation of the text of the explanatory dictionary of the Spanish language as lexicographic system
Creation of linguistic tools for the virtual lexicographic laboratory of the Spanish explanatory dictionary (VLL DLE 23). The problem of integrating the two main components of the language (vocabulary and grammar) into a single digital environment.
Рубрика | Иностранные языки и языкознание |
Вид | статья |
Язык | английский |
Дата добавления | 23.02.2021 |
Размер файла | 190,8 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru/
National Technical University “Kharkiv Polytechnic Institute "
Ukrainian Lingua-information Fund of Science of Ukraine
Linguistic studies based on the interpretation of the text of the explanatory dictionary of the spanish language as lexicographic system
Kupriianov Yevhen, Ostapova Iryna, Shyrokov Volodymyr
Summary
The present article is devoted to the problems of creating linguistic tools for the virtual lexicographic laboratory of Spanish explanatory dictionary (VLL DLE 23). The goal of the research is to consider some issues related to the development of linguistic tools for VLL DLE 23. The object is VLL DLE 23 under development. To achieve this goal the dictionary was analyzed for the peculiarities of linguistic facts representation, its structure and metalanguage. On the basis of the dictionary analysis the formal model of DLE 23 was developed and its main components, including their relationships, to be made available via linguistic tools for accessing linguistic information were determined. The range of research activities to be performed by using the linguistic tools was outlined.
Unabridged monolingual dictionaries, among them DLE 23, in digital format are found to be powerful research environment facilitating the navigation and access to their structural elements and integration of language facts in one object. This can be achieved by formalizing the structure of dictionary text in a form of p -structures and c-links for converting the dictionary into database.
The prospect of our research is related to the problem of integrating two main components of the language, -- vocabulary and grammar, -- in one digital environment. This article highlights main problems of the creation of linguistic tools of virtual lexicographic laboratory of the explanatory dictionary of Spanish language (VLL DLE 23). To do this, we analyzed the dictionary representation of various linguistic facts, revealed its structure and features of its meta-language. A formal model of DLE 23 has been developed, its elements and possible links between them are described, which should be accessible through the linguistic tools. A circle of research tasks is based on the dictionary, the implementation of which should provide linguistic tools.
Key words: computer lexicography, virtual lexicographic laboratory, digital environments, electronic dictionaries.
Introduction
Explanatory Dictionary of the Spanish Language, as well as other big explanatory lexicons, is a carrier of deep systemic language regularities, which, being almost hidden from the reader, play an important role in identifying linguistically-informative potential of the language.
In general, systematic properties of language, as well as systemic demonstrations of the language in big lexicons of explanatory type, were studied in many scientific works, including works of authors of this article [4; 6; 7; 14].We believe that such systemic effects have the best representation in monolingual explanatory dictionaries. We are talking about large, mostly multi-volume lexicons, which contain the major part of the national lexicon and phraseology, and that characterized by a detailed description of lexical-grammatical and lexical-semantic systems of the language. Due to the grate amount, elaborated structure and completeness of a lexicographic description such dictionaries are carriers a huge number of implicitly-defined linguistic, cognitive, logical and other relationships (mostly uncontrolled), making these extensive lexicographical system a kind of "thing-in-itself" [9].
This rises a question of the development of methodology and technology of creation of such lexicographical objects, and also a question of study a variety of effects that explicitly or implicitly operate there. From the beginning, we are talking about the methods of computational linguistics, because, as noted in the book "Computer lexicography" [9], it is physically impossible to perform such studies with a help of traditional methods. So, the first problem here is to create digital analogues of corresponding traditional lexicographic studies or convert them into digital form, followed by the explication of underlying systemic linguistic regularities. In Ukrainian Lingua-Information Fund, where studies devoted to this work had been conducted, we developed a universal theoretical basis, focused on the construction of lexicographic objects of almost unlimited size and complexity, and on implementation a deep study of the language systems. We are talking about a theory of lexicographic systems and a theory of semantic states. Using these theories here in Fund we had designed and formed lexicographical systems of explanatory type for the Ukrainian, Russian and Turkish languages [8; 15], and lexicographic etymological system for the Ukrainian language [31], all in the scope and scale inherent in academic lexicography. We also built grammatical lexicographical systems for these languages.
Using these tools (and partly even before they had been created, but with approaches based on the theory of lexicographic systems), a number of studies were conducted and obtained a series of fundamentally new linguistic results for the Ukrainian language. Among these result we should mention a study and establishment of formal structure of the so-called equaisemantic series of verb and noun (Ukrainian language) [4; 10; 17; 18].
Our experience, gained in said works, leads to certain generalizations and at the same time raises new problems and challenges.
In our opinion one of the most important problem is the problem of possible limits of applying methods of lexicographical systems and semantic states. In other words, what is universality and linguistic "coverage" of these methods? This question serves as a motivation to implicate new fundamental lexicographical objects to the orbit of our methodology in order to verify efficiency of the general theory and its ability to reproduce adequately individual characteristics of new languages and, accordingly, traditions of lexicography.
The next equally important question is a question of potency of the theories we had created regarding the solution of the problem of coherent combination of different, heterogeneous, in its linguistic nature, phenomena into a single integrated lexicographical object. These two question sets represent the range of problems to which this work is dedicated, in projection into Spanish language. In addition, we also hope (however, not really focusing on it), that our approach to lexicographic description of language systems will inspire other researchers to apply it to the objects of their own researches.
Concept and Method
The conceptual basis of our research is the theory of lexicographical systems and the theory of semantic states. Let's briefly describe the content of these theories and start with the second one.
The theory of the semantic states
The concept of a language unit state had been introduced into linguistics in 1957 by outstanding Russian mathematician A. Kolmogorov when he had attempted to make a formal definition of the case in Russian language. A. Kolmogorov had never published his philological works, but his student Vladimir Uspensky (which had attended the seminar on mathematical linguistics in Moscow State University, where Kolmogorov made a report on this question) recorded his teacher's thoughts and published them as a scientific article "To the question of definition of case according to A. Kolmogorov" in the collection "Bulletin of the Association for machine translation problems" [19].
According to Kolmogorov, subjects (and in fact -- words in speech, denoting objects) are existing in certain "states" that depend on the contexts in which they are functioning. In the set of these states, Kolmogorov in some way had determined the relation of equivalence in such a way that equivalence classes according to this relation have corresponding noun cases of the Russian language. Detail analysis of Kolmogorov's approach could be found in the book "Language. Information. System", section "Theory of Linguistic States" [14]. Note that these Kolmogorov's ideas were not perceived by a linguistic society at that time. And only A. Zaliznyak, in his book "Russian inflection naming" (1967), noted that he had been using the concept of case roughly in the sense in which it had been used by Kolmogorov in the above-cited V. Uspensky's summary (we refer to Zaliznyak's opinion from his book, republished in 2002 with some other works devoted to this subject [1]). Later, regardless of A. Kolmogorov, and for other considerations, one of authors of this work (V. Sh.) had "rediscovered" the concept of state of a language unit and applied it to the description of various aspects of the language system [5; 6; 15].
The main postulates of the theory of semantic states are following: Based on a phenomenological approach to the description of the language system [5; 11], we postulate that in linguistics, not the language units and grammatical or semantic categories and their meanings should be the objects in conceptual view, but objects that are "intermediate" in terms of the language, which phenomenological correlates are psycho-physical conditions and processes that occur in language- mind apparatus of a man. The concept of a state carries a range of meanings, close to what is inherent in substance of the relevant concepts used in quantum mechanics. The analogy, in our opinion, is so close that it allowed one of the authors of this article (V. Sh.) to use metaphor "quantum linguistics" [11; 12].
Thus emerges the fundamental difference between classical and nonclassical ("quantum") approaches to the description of the linguistic world, which could be schematically represented as follows
Therefore, the state of the object acquires the status of a basic concept of the theory and moves on forefront of the modeling and description of a certain class of the phenomena of the language system when using a "quantum" approach. Formalization of the semantic conditions is done as follows. We postulate that there is a correspondence between the linguistic unit and its state:
(1)
where X is a particular unit of language; s -- correspondence betweenX, and s(X) is a formal object that represents a semantic condition of the unit X, which has its determinants as the elements of means of material expression of semantics. For any unit X, its semantic states form a set, we assume that it is finite, but not bounded, and we denote it by {s(X) }. Class of specific units of the language L we denote by the symbol W(L), or simply W, if we are talking about only one specific language; the belonging of X to W will be denoted by: Xe W; the set of all semantic states for all Xe W will be denoted by
The formalism of semantic states provides the possibility of introducing a fundamentally new phenomenon in linguistic theory, namely, the phenomenon of superposition. This means that if a linguistic unit X can be
in state s1(X) and s2(X), then there is (if there are no special reservations) a certain state s(X) that:
s(X)= a s1(X) S1(X) + p s2(X), (2)
where a and p there are weight coefficients with which the state s1(X) and s2(X) are included in the superposition (2). On the meaning level, the phenomenon of superposition means that in this case the language unit has both the properties of a state s1(X) and condition s2(X), which is manifested with the intensity that determined by the ratio of the weight coefficients a and p. The phenomenon of superposition provides opportunities to represent in linguistic theory and to simulate in practice many effects of linguistic ambiguities. Some examples of demonstration of superposition of semantic states in Spanish language were given in a work [3].
Theory of lexicographic systems Theory of lexicographic systems is based on the phenomenological principle, which we call lexicographic effect in information systems. This principle firstly was formulated in the book [4, 48-56]. The most complete description of this principle was made in the book [14, 39-73], section "Phenomenology of language and the language picture of the world". The lexicographic effect is universal because it applies not only to a natural language, but also to all objects with information processes, and therefore to almost everything, because information is a universal quality of things. The essence of this effect is as follows.
Observing and generalizing the behavior of different systems, we conclude that during evolution (dynamics, self-development ...) of system of any nature in its structure, during its interaction (supervision) with some object, a subsystem of discrete entities ("subsystem of order"), which play the role of elementary information units, is occurred, so that all other phenomena appear as a certain way-organized combinations of these elementary information units.
The specified subsystem has properties that are in some way related to the properties of the lexical system of the natural language: it "generates" in its structure something like a thesaurus and grammar with the inherent properties of signs, meaning, content, polymorphism etc.; it is a bearer of "expression plan", as well as "content plan". This circumstance explains usage of the term "lexicographic effect." Sets of elementary informational units are characterized by "substantiality", as well as other aggregations due to objective processes, as a result, these aggregates, as a rule, have a relative stability of their characteristics, which ensures their localization in the corresponding areas of system parameters. We also conclude that any lexicographic effect develops in the environment of relations "subject-object," and the manifestations of subjectivity vary in extremely wide ranges, from the universal property of reflection that is peculiar to all things, and ending with psychic and cognitive reactions and sending of intellectual entities. The set of phenomena described above is the content of the lexicographic effect in information systems.
The formalization of the concept "lexicographic system" is as follows. First, as a result of the reception by the subject S of the lexicographic effect Q within the object D, the discrete class of elementary information units (EIO) IQ(D) is determined. The next step is to construct V(IQ(D)) --description of the class IQ(D), where the notion description is used in the same sense as in A. Kolmogorov's definition of algorithmic information [2]:
The object V(IQ(D)) is a word (text) in some finite alphabet of symbols A = {ai, a2,..., at}. This text is a complete description of the class IQ(D), which allows to unambiguously reconstruct the specified class and properties of all its elements. If IQ(D) = {Xi, X2,..., Xn,...}, then a restriction V(IQ(D)) on X,-: V(X) = V(IQ(D))iXr ' the symbol V(X). Denote the union V(X) with the symbol Vo(IQ(D)):
Vo(IQ(D)) = u, V(X) ; Vo(IQ(D)) c V(IQ(D)). (2)
In the dictionary interpretation the class IQ(D) = {Xi, X2,..., Xn,.} is interpreted as a class of dictionary registry units; then V(X) is the text of the dictionary entry with the register unit X,.
Remark 1. The dictionary interpretation is not obligatory for the definition of lexicographic system, and any dictionary necessary has the structure of lexicographic system.
Remark 2. Lexicographic system defined by a single lexicographic effect Q will be called the elementary lexicographic system. Further, the terms "L-sys- tem" and "elementary L-system" will be used as synonyms in cases where this does not lead to confusion.
One of the main aspects in the definition of L-system, which considered as information system of a certain type, is a concept of its architecture. We use the ANSI/X3/SPARK architecture (or ANSI/SPARK) [21], which consists of three levels of data representation: conceptual, internal and external, that have following interpretations. Conceptual model (conceptual level of representation) of the subject area is a semiotic and semantic model in which various specialists' visions of subject branch are integrated into an unambiguous, finite and consistent manner. Internal model (internal level of representation) determines the types, structures and formats of representation, data storage and manipulation, algorithmic base and operational software environment in which a conceptual model "immerses" during its implementation. External model (external level of representation) reflects the views of end-users (and therefore application programmers) on the subject area. It is a set of tools that allows user to make allowed contacts and manipulations with data represented in the inner level. One conceptual model may corresponds to several internal and external ones.
Thus, there are following elements of the architecture ARCH of a lexicographic system: ARCH = {CM, EXM,INM; 0,Ґ, S}, where there are the following indications: a conceptual model of the L-system is marked by the symbol CM; EXM ={exM} is a set of its external models that correspond to the conceptual model CM, and INM = {inM} is the corresponding set of its internal models; symbols 0 = {9},^ = {9}, S = {§} denote the representations that link CM, inM and exM in following coimnutative chart:
The conceptual model CM of any L-system has a certain standard structure that follows from the information approach to its modeling. Since the only source of meaningful representations of L-system is the description V(IQ(D)), which is the word (text) in some alphabet A, then the only source of L-system structure is some invariantly defined text components, its certain elements. Following the section "Theory of lexicographic systems" from the book [9, 103-136], we denote the set of these structural elements by a symbol p = P[V(IQ(D))] = {Pi , P2 , ... ,Pq}. Therefore, each element Pi depends on Xi, X2, ... , Xn ,..., and acquires a certain meaning Pj(X) on each element Xi of the class IQ(D). Some of the elements Pj(X) could be empty. The set of structural elements Pj(X), j = 1, 2, ., q; i = 1, 2, ., n,. defines the fundamental structure of the L-system. The question arises: in what way you can build this structure for a specific subject area that is an object of lexicography? There can be two ways. The first one is to build such structures based on certain "first principles", i.e. from linguistic theory that adequately describes the object of lexicography. The second way is to analyze the text of already created dictionary that takes the lexicographical description of a certain area of the language. It is believed that the dictionary is made when there is a theory, so satisfactorily developed, that relevant material can be submitted in dictionary form. Then, by analyzing the text of the dictionary, which in our terminology plays the role of a particular external model of the L-system, which is a substrate of this dictionary, the researcher abstracts from it a set of structural elements, which summarizes in a structure Pj(X),j = 1, 2, ..., q; I = 1, 2, ..., n,... . The structure P is the first structure of the lexicographic system. After building of P, it can be considered completely independently from a dictionary, from which it was abstracted. Following elements are obligatory elements of the structure P (this is a distinguishing feature of all lexicographic systems): A(P-(D)) and P(IQ(D)) which are carriers of the form and content of class EIO IQ(D) and, therefore, they represent the relationship of "form-content", which is the backbone for any lexicographic system.
From the structural elements Pj(X) a second lexicographic structure ap is constructed, which is defined on P and, therefore, on V(IQ(D)). Hereafter we
Pay attention to the change of the type of lexicographic effect on the second floor -- instead of Q now we have Qi and Q2, respectively. So, we come to a netwill denote a[P as a macrostructure V(IQ(D)); limitation of ap to V(x): a[fi\ | v(x) ra a(x) generates the ^m- crostructure V(x). The active formulation of this fact is to establish a procedure (operator, process ...) a, which generates P on a structure a [P\:
Elements of the structure aP represent the regularities of the subject area (including implicit ones), the subject of lexicography. Their definition in each case is a specific scientific study, sometimes quite complex and refined, which depends on the researcher's experience.
Thus, the general structure of L-system is presented in the following form:
{D, S, Q, IQ(D), V(IQ(D)), P, a[P\, Red[V(P-(D)\;ARCH }, (5)
where all elements, except the Red[V(IQ(D)\, have been defined above. The symbol Red[V(IQ(D)\ indicates a process of a recursive reduction of the lexicographic system, the essence of which is as follows. The universality of the phenomenon of lexicographical effect gives the possibility to consider A(IQ(D)) and P(IQ(D)) as separate, autonomous elementary L-sys- tems, and this makes possible such construction:
work of objects IQi(D) and IQ2(D). Continuing this pro cess, we obtain a recursive development of the lexico graphic system V(IQ(D))
This process we will call a recursive reduction of the lexicographic system. It resembles a kind of information "microscope" that reveals all the subtle details of the structure of the lexicographic system and induces a structure similar to a fractal.
Modeling of the lexicographic structure of the Spanish language based on the text of Diccionario de la lengua Espanola
Diccionario de la lengua espanola [], which is considered according to our method as an external model of the explanatory type L-system of the Spanish
language, provides the necessary material for the reconstruction of the corresponding conceptual model. This could be done as follows.
In the structure of dictionary entries, we distinguish the set of register (head) lexical units W = {x}, which serve as the identifiers of the corresponding dictionary entries V(x). The dictionary register includes words and morphemes, certain phrases and abbreviations. Representation of morphemes, phrases and abbreviations as headwords is not inherent for most explanatory dictionaries. For convenience, all language units that act as a headword will be called headwords.
In the structure of each dictionary entry V(x) there lexicographic representation of the semantics x is parameters, and there is "right side" P(x). in which the is a "left side" L(x), which consists of certain headword given.
In the case of explanatory dictionary, we distinguish two types of language units: lexical level units and phrases (which include headword), which has an idiomatic status in language (so-called "after-rhomb- sign zone"). Therefore it is natural to present the structure of the dictionary unit V(x) in the form of a combination of descriptions (dictionary entries) of structural units of both types:
where VLex(x) is a lexicographic description of the headword x; V,jFras(x) is a description of the j-th phrase of i-th type; m (i) is the number of phrases of i-th type, and n(x) is the number of phrase types in the dictionary entry V(x). In various dictionaries their authors have different approaches to the formation of the "afterrhomb-sign zone". In SUM (Dictionary of the Ukrainian Language) there are four types of phrases (free phrases, terminological, so-called equivalents of the word and actually phraseological units). In OED (Oxford English Dictionary), the "after-rhomb-sign zone" is divided into phraseological units (phrases), free phrases (compounds) and derivatives. "After-rhomb- sign zone" in the Dictionary of Spanish Language is divided into two blocks: "noun + adjective" phrases (agua bendita, agua bendita, agua blanca, agua cor- riente) and other: a) verb phrases: agarrar alguien un agua, banarse en agua rosada, beber agua un buque etc .; b) adverb phrases: como agua de mayo, como agua para chocolate etc.; c) adjective phrases: de agua y lana, de arte y ensayo etc. Type of phrase of the particular phrase is determined at the formal level: "other" are marked in the dictionary entry with the help of the words "loc." (Locution -- locution) and "expr." (Expresion -- expression).
Each lexicographic V Lex(x) and Vi Fras(x) corresponds to the basic structure:
This process we will call a recursive reduction of the lexicographic system. It resembles a kind of information "microscope" that reveals all the subtle details of the structure of the lexicographic system and induces a structure similar to a fractal.
In the case of V = VLex(x), the headword of the dictionary entry with the corresponding parameters (which we shall now name the parameters of the headword) acts as Lo. For Vi Fras(x), Lo is the phrase in the register dictionary form plus the parameters of the head unit. The structure of the right side of Po is identical to a lexical unit and phrase. Arrows hereinafter indicated an attachment relationship. The text analysis of the dictionary entries showed that the structures V(x) are almost identical for combinations of phrases and headwords. Hereafter we will analyze the structure of a dictionary entry of a headword, which is adaptable for phrases.
Depending on the type of linguistic information, we introduce 6 types of parameters (level Lo -- parameters of the headword): 1. RR (register row); 2. DUPL (duplexes); 3. ETYM (etymology); 4. MORPHO (morphology); 5. ORTHO (orthography); 6. UNCRT (uncertain). We introduce another parameter PARHWi (i-th parameter of a headword) for any parameter regardless of its type. Each parameter is represented in our model in a text line (authentic original text) and a code of structural element type.
UNCRT type refers to a parameter, the type of which cannot be determined by given formal features, but it satisfies formal features of the headword parameter. Only RR is mandatory; in the extreme case, it is a headword (or a linguistic unit that acts as a headword). In our model, we do not limit the number of parameters. Moreover, we assume that the entry includes several options of the same type. Each parameter is a character string (text) of a certain structure; the structure is determined by the type linguistic information. At this stage of modeling, we will be more interested only in the structure of RR (registered row) and DUPL (duplexes).
Here are examples of Lo texts and their distinguishing by defined structural elements (texts are given in the font markup of the online version of the dictionary).
Example 1.
RR
bikini.
DUPL
Tb. biquini.
ETYM
Del ingl. bikini, y este de Bikini, nombre de un at- olon de las Islas Marshall, con infl. de bi- bi-', por alus. a las dos piezas.
Example 2.
RR
comico, ca.
ETYM
Del lat. comicus, y este del gr. kwpikoc; komikos.
Example 3.
RR
poeta, tisa.
ETYM
Del lat. poeta, y este del gr. rcoiqxqc; poietis; para la forma f., cf. fr. mediev. poetisse.
MORPHO
Para el f., u. t. la forma poeta.
Example 4.
RR
inmaculado, da.
ETYM
Del lat. immaculatus.
ORTHO
Escr. con may. inicial en acep. 2.
Example 5.
RR
preterir.
ETYM
Del lat. praeterire 'pasar adelante'.
MORPHO
Conjug. c. pedir.
MORPHO
U. solo las formas cuya desinencia empieza por -i.
Example 6.
RR
ab initio.
ETYM Loc. lat.
Example 7.
RR
ONG.
UNCRT
Sigla de organization no gubernamental.
Example 8.
RR
bici.
UNCRT
Acort.
Registry rows indicate the set of registry units of the dictionary W = {x}. In terms of form, the dictionary register includes words, morphemes (formants), phrases and abbreviations. The register includes words that: 1) represent a specific Spanish vocabulary and 2) borrowed from other languages, and have not yet adapted to the orthographic and phonological system of the Spanish language. First ones are shown in normal font and the second ones are shown in italics: gato, kilobyte, ojo, software. The dictionary also records frequently used abbreviations (DNA, ONU) and acronyms (radar, laser), which are represented as ful words. In addition, the dictionary contains word-formation units, such as prefixes (a-, pre-, contra-, pro- etc.), suffixes (-aico, -ino, -ivo etc.), and formants of Greek and Latin origin (archi-, hidro-, -onimo).
Here are some examples which illustrate the structure of the text of the register rows: comico, ca; vis1; vis2; champana; poeta, tisa; combi; recentfsimo, ma; puncion; kilobyte; ab initio; pre-; -aico, ca; hidro-; ONG.
We introduce the parameter St(RR) -- structure type of register row: St(RR)=1 if RR is single component, and St(RR)=2 if not; a comma (,) in the text of the register row is a formal sign of the multicomponent nature. One-component RR: vis1; vis2; champana; combi; puncion; kilobyte; ab initio; pre-; hidro-; ONG. Two-component RR: comico, ca; poeta, tisa; recentfsimo, ma; -aico, ca.
In the two-component unit, the first component is a headword, and the second one is formant for constructing the second register word. Let's introduce the parameter n that is the i-th register word (it is clear that ri is the headword of the dictionary entry). The second register word is formed according to the formal scheme:
comico, ca ri = comico r2 = comica
poeta, tisa ri = poeta r2 = poetisa
-aico, ca r2 = -aico r2 = -aica
Let's introduce another parameter for servicing the registry row F(RR) -- form of the registry unit:
F(RR) = 1 (Latin phrase);
F(RR) = 2 (borrowed);
F(RR) = 3 (prefix/suffix);
F(RR) = 4 (abbreviation);
F(RR) = 0 (for all other cases).
It is clear that meanings are defined formally and the classification can be supplemented.
Doublets are also the source of incoming register of the dictionary. Let's introduce the parameter di that is the i-th duplet (word or phrase), which is a part of the structural unit DUPL. This parameter can be defined by a formal sign in the text line that represents the DUPL parameter (bold):
Tb. substancia di = substancia
Tb. jienense, giennense di = jienense d2 = giennense
Tb. ozonosfera, Am. di = ozonosfera
di = salvavida
Tb. en hora buena en aceps. 2-4 di = en hora buena
Tb. chavola, p. us. di = chavola
In generalized form, the left side of the dictionary entry may be represented as follows:
HW (=n) headword
RR
St(RR)
F(RR)
ri
ri
DUPL
di
di
ETYM
MORPHO
ORTHO
UNCRT
Minimal structure of the left side:
HW (=ri) headword
RR
F(RR)
ri
If this structure is reflected in the structure of the computer database, then such approach provides an opportunity to search for the presence/absence of any structural element (or list of elements). Search through a structure can be combined with the search with a given substring in the text of the structural element.
The right side (Po) is a lexicographic description of the semantics of the headword x. For any explanatory dictionary, it is a sequence of meanings (interpretations); but the structure and content of the text of the interpretation is determined by the scientific school and the theoretical positions of the dictionary authors.
The analysis of texts gives the following structural elements of a text block of meaning: MNG№ (Number of meaning); REM (block of remarks); DEF (Definition); ED (Encyclopedic Data); COMM (Comment); IL (Illustration).
For more details, we will analyze the structure of the text and the content of the remark block (REM). The text is distributed sequentially to the zones, each of which represents a remark of a certain type: REM-GR (grammar); REM-PR (pragmatics); REM-ST (stylistics); REM-SF (subject field); REM-REG (geographic region); REM-WHU (where-used). Grammar remark REM-GR is mandatory remark, and it always is in the first place and determines the part of speech affiliation of the headword. At this stage of the study, we did not identify an effective, from our point of view, algorithm for distributing the text REM to the corresponding zones. linguistic virtual lexicographic dictionary
The minimum structure of a dictionary entry: MNG№ = 1 REM = REM-GR DEF
As a rule lexical semantics is described in the text of the structural component DEF, and grammatical and pragmatic semantics -- in COMM. Comments correlate with the definitions. Each definition and each comment can be accompanied by its illustrations. We do not limit the number of structural elements DEF, COMM and IL, but we establish the following order of attachment: DEF ^ IL; DEF ^ COMM; COMM ^ IL.
Here is an example of an interpretation structure with all structural elements:
MNG№
REM
DEF
IL
IL
COMM
IL
IL
COMM
IL
IL
DEF
IL
COMM
IL
As we have already noted, implementation of this structure in the form of computer database allows us to create the collection of dictionary entries according to their structural profile. For example: select entries with a certain number of meaning; without illustrations; with or without comments; entries with two (or more) definitions; meaning with two (or more) comments etc.
Let's illustrate the division into structural elements of the text of meanings for the dictionary entry with the headword comicо.
Example 10.
adj. Que divierte y hace reir. Situacion comica..
MNG№
1.
REM
adj.
DEF
Que divierte y hace reir.
IL
Situacion comica.
adj. Perteneciente o relativo a la comedia. MNG№
2
REM
adj.
DEF
Perteneciente o relativo a la comedia.
adj. Dicho de un actor: Que representa papeles comicos. U. t. c. s.
MNG№
3
REM
adj.
DEF
Dicho de un actor: Que reprйsenta papeles comi- cos.
COMM
U. t. c. s.
The dictionary provides the following types of definitions:
generic type of definition: it is constructed by reference to the generic concept represented by lemma and list of its generic characteristics:
barco... Embarcation de estructura concava y, generalmente, de grandes dimensiones.
definition by synonym: the lexical meaning of the headword is represented through a word that is close to or identical to the meaning:
fachento ... jactancioso.
agua ... marea (movimiento periodico de las aguas del mar)
explanatory definition: this type of definition is intended to answer the question: "what is expressed?", "for what use?", "how it is used?"
clic ... U. para reproducir ciertos sonidos, como el que se produce al apretar elgatillo de un arma, pulsar un interruptor.
contextual definition: it indicates the context of headword usage, rather than disclose its meaning:
agrio ... Dicho de un metal: Fragil, quebradizo, no dщctil ni maleable.
other, in particular, enumerative definitions, intended to word semantization by naming the objects (concepts) or characteristics that are called by this word:
arte ...Logica, fоsica y metafisica.
divino ... Muy excelente, extraordinariamente primoroso.
Formally, thus far we cannot clearly define by the text the type of each definition (with the exception of is necessary to introduce a parameter TDEF (type of definition); a code that identifies the type of definition (including other types) is assigned to each type definition. As the system evolves, the number of types will accumulate.
Some lexical units that denote concepts in mathematics, chemistry, physics etc. are defined with the help of extra-linguistic means: chemical formulas, symbolic designations of physical and mathematical variables:
acido carbolico ... Compuesto organico que con- tiene en su molecula uno o mas grupos carboxilo. (Form. COOH).
hercio ... Unidad de frecuencia del sistema inter- nacional, que equivale a 1 ciclo por segundo. (Smb. Hz).
numero pi ...Mat. numero trascendente 3,141592..., que expresa el cociente entre la longitud de la circunferencia y la de su diametro. (Smb. n).
In order to enter this information into the research circle, we introduce parameter ED (encyclopedic data): DF ^ ED.
Almost all texts that serve the dictionary should be structured such a way that they could be used in database. The dictionary contains a list of abbreviations (part of the dictionary meta-language), which we divide into the following categories: 1) grammatical, 2) stylistic, 3) evaluative-aesthetic, 4) pragmatic, 5) statistical, 6) industry-specific, 7) geographic, 8) lingual (in the etymological zone), 9) language processes (in the etymological zone), 10) additional remarks for the etymological zone, 11) remarks for the Latin, 12) service remarks in the left side, and 13) service remarks in the right side. These «words» will be used to form a search pattern and to identify the structural units. Let us consider in detail the contents of the selected categories. Grammatical remarks include abbreviations indicating part of speech (adj., adv., art.) and grammatical categories
The stylistic remarks aim at pointing out the stylistic features of the headword usage both in oral and written speech. Remarks indicate style (coloq., cult., cient.) and social groups (estud., infant.):
cient. |
cientifico |
scientific |
|
coloq. |
coloquial |
colloquial |
|
cult. |
culto |
bookish |
|
infant. |
infantil |
baby language |
Remarks indicating the evaluative-aesthetic and pragmatic properties of the headword are a small group. There are three such remarks in each group:
synonym and explanatory definitions), especially because there can be more types. However, we consider it - (de/, m., f, advers.) of the headword:
adj. |
adjetivo |
adjective |
|
adv. |
adverbio; adverbial |
adverb, adverbial |
|
advers. |
adversativo |
adversative |
|
art. |
articulo |
article |
|
def. |
definido |
definite |
|
f. |
feminino |
feminine (gender) |
|
m. |
masculino |
masculine (gender) |
-evaluative-aesthetic : |
|||
malson. |
malsonante |
rude |
|
vulg. |
vulgar |
vulgar |
|
eufem. |
eufemismo |
euphemism |
|
-pragmatic: |
|||
iron. |
ironico |
ironic |
|
despect. |
despectivo |
contemptuous |
|
fest. |
festivo |
humorous |
Industry-specific remarks denote names of natu- of industrial and social activities, which include regis- ral, humanitarian, social and technical sciences, types try units or their meanings:
Acus. Acustica acoustics
Cinem. |
cinematografia |
cinema |
|
Econ. |
economia |
economy |
|
Electr. |
electricidad; electronica |
electrical engineering; electronics |
|
Fon. |
fonйtica; fonolog^a |
phonetics; phonology |
|
Ling. |
lingmstica |
linguistics |
|
Sociol. |
sociolog^a |
sociology |
|
Transp. |
transportes |
transport |
|
Geographical remarks allocate the area of word |
tic baggage of another society living in a particular ter |
||
usage or its meaning usage. In other words, they indi |
ritory (province, city, Spanish-speaking country, geo |
||
cate that this word or its meaning belongs to the linguis- |
graphical area): |
||
And. |
Andaluria |
Andalusia (province) |
|
Cat. |
Cataluna |
Catalonia (province) |
|
Cad. |
Cadiz |
Cadiz (city) |
|
Vall. |
Valladolid |
Valladolid (city) |
|
Arg. |
Argentina |
Argentina (country) |
|
Am. Cent. |
America Central |
Central America (region) |
Language remarks indicate names of languages (parent language and/or intermediary language), sources of lemas, and they used only in etymology zone:
aim. |
aimara |
Aymara |
|
arag. |
aragonйs |
Aragon |
|
celtolat. |
celtolatino |
Celtic Latin |
Forming method is a category that includes remarks describing linguistic processes which resulted in the formation of the headword:
acort. |
acortamiento |
dropping |
|
abrev. |
abreviacion |
abbreviation |
|
afйr. |
afйresis |
apheresis |
Additional remarks for the etymological zone do not contain proper linguistic information; they serve only as indicator of headword origin:
desc. |
desconocido |
unknown |
|
disc. |
discutido |
controversial |
|
m. or. |
mismo origen |
the same origin |
Remarks indicating Latin cases are used when lexicographers consider that it is necessary to indicate forms of etymon in one of six cases:
abl. |
ablativo |
ablative |
|
acus. |
acusativo |
accusative |
|
dat. |
dativo |
dative |
|
genit. |
genitivo |
locative |
|
vocat. |
vocativo |
vocative |
Service remarks in the left side primarily have a function of the formal markers of the dictionary's information elements of the left side. They have no actual linguistic information.
conjug. |
conjugation |
conjugation |
|
escr. |
escrito |
written |
|
may. |
mayщscula |
capital letter |
|
reg. |
regular |
standard (inflection) |
|
Tb. |
tambiйn |
also (indicates the presence of a doublet) |
The same can be said about service remarks in the right side. In addition, they are also used for constructing additional comments to the definitions:
s. |
sustantivo |
noun |
|
sent. |
sentido |
meaning |
|
Smb. |
smbolo |
symbol |
|
[u.] t. |
[usado] tambien |
also used |
Each abbreviation has its assigned code (TABBR--type of abbreviation) according to the said above, and this code determines not only the type, but also the structural element in the text where this abbreviation is used.
Conclusions
One of the main tasks of a modern computer lexicography is updating and supporting fundamental lexicons -- large paper explanatory dictionaries -- in digital environment. Computer lexicography successfully solves this task by combining many years of traditional lexicography experience with the latest computer technologies. Virtual lexicographic laboratories (VLLs) are the result of such combination, i.e. systems that enable both the operation of dictionary material and the conducting series of linguistic studies.
By now Ukrainian Lingua-Information Fund (ULIF) has developed a set of such VLLs, in particular, for the explanatory dictionary of the modern Ukrainian language (20 volumes), and for grammatical, synonymous, antonymic and etymological dictionaries. Similar laboratories had been created for explanatory dictionaries of Russian and Turkish languages. All these systems are instrumental; they are located on the web site of the Ukrainian Lingua-Information Fund https://lcorp.ulif.org.ua/, where they operates in corporate mode. Equally important is the creation of VLL for Spanish explanatory dictionary (Diccionario de la len- gua espanola. Edition del tricentenario). Moreover, this importance is determined by the need for further development and testing of the theoretical base developed by ULIF based on an explanatory dictionary that based on other principles. In addition, developments outlined in our work are basis for the development of tools, that, in turn, allows user to create her/his own tools for solving linguistic problems on the basis of the explanatory dictionary.
Proceeding from the said above, the purpose of this work is to consider some questions of the development of linguistic tools for the VLL DLE 23, which is currently in the technical implementation phase. To do this, it is necessary: 1) to identify the linguistic features of the language units represented in DLE 23; 2) to analyze according to the methodology of the theory of lexicographic systems its p-structures and o-bonds between them; 3) to describe the linguistic tools and identify the research tasks that they should provide. The object of the investigation is DLE 23, and the subject is the creation of linguistic tools, that provide a wide range of opportunities for the study of grammatical, semantic, pragmatic, and other features of the Spanish linguistic units. Unlike digital dictionaries, including on-line ones, VLL offers a software interface for implementation of:
access administration functions: users authorization and identification; new users adding and removing; access control (read only, reading and editing of the dictionary);
lexicographic works: dictionary entries editing; creation of a number of derivative dictionaries on the basis of explanatory dictionary; representation of dictionary entries in any format;
research work: research at a certain language level, presented in the explanatory dictionary (grammar, including derivation; lexis, including semantics; pragmatics); research at the interface of language levels: grammar and semantics, word forming and semantics, semantics and pragmatics etc.
Literature
1. Зализняк А. А. «Русское именное словоизменение» с приложением избранных работ по современному русскому языку и общему языкознанию. - М.: Языки славянской культуры, 2002. (Studia phil- ilogica).
2. Колмогоров А. Н. Три подхода к определению понятия «количество информации»
3. Купріянов Є. В. Широков В. А. Лексична неоднозначність іспанських мовних одиниць як об'єкт теорії семантичних станів // Мовознавство. 2016. № 6. С. 48-56.
4. Широков В. A. Інформаційна теорія лексикографічних систем. - К. : Довіра, 1998. - 331 с.
5. Широков В. A. Феноменологія лексикографічних систем. - К.: Наук. думка, 2004. - 327 с.;
6. Широков В. А. Семантичні стани мовних одиниць та їх застосування в когнітивнш лексикографії // Мовознавство. - № 3-4. - 2005.
7. Volodymyr Shirokov. Theory of Lexicographic systems. (Part 1. Proceedings of Mondilex Third Open Workshop. - Bratislava, 2009. - P. 151-167; Part 2. Proceedings of Mondilex Fourth Open Workshop. - War- sawa, 2009. - P. 89-105.; Part 3. Proceedings of Mondilex Fifth Open Workshop. - Ljubljana, 2009. - P. 98-119.).
8. Широков В. А. та ін. Лінгвістичні та технологічні основи тлумачної лексикографії / В. А. Широков. - К. : Довіра, 2010. - 295 с.
9. Широков В. А. Комп'ютерна лексикографія / В. А. Широков. - К. : Наукова думка, 2011. - 351 с.
10. Широков В. А. Системна семантика тлумачних словників. У зб. «Акцентологія. Етимологія. Семантика». До 75-річчя академіка НАН України В.Г.Скляренка. - К.: Наукова думка., 2012. С.487-510.
11. Широков В.А. Грамматика как феноменологическая проблема. Бионика интеллекта. C.3-14.
12. Широков В. A. Очерк основных принципов квантовой лингвистики. // Бионика интеллекта. - 2007. - № 1(66). - С. 25-32.
13. Широков В. A. Квантова лінгвістика // Se- mantyka a konfrontacja jezykova, 4. SOW. - Warszawa, 2009. - P. 119-139.
14. Широков В. А. Язык. Информация. Система. Palmarium Academic Publishing, 2017. ISBN 978-3-65972403-9. 280 c.
15. Широков К. В., Широков В. A. Застосування формалізму нечітких множин для визначення граматичних станів турецьких слів // Мовознавство. - № 5. - 2005. - С.51-56.
16. Широков К.В. Іменна словозміна в сучасній турецькій мові. - К.: Довіра. 2009. - 318 с.
17. Погрібна О.О., Чумак В.В.,Широ- ков В.А., Шевченко І. В. Лінгвістична класифікація українського іменника у світлі теорії лексикографічних систем // Мовознавство. - 2004. - № 5-6. - Сс. 62 -82.
18. Рабулець О. Г., Сухарина Н. М., Широков В. А., Якименко К. С. Дієслово в лексикографічній си- стемі.-К.: Довіра, 2004. - 259 с.
19. В. А. Успенский. К определению падежа по А. Н. Колмогорову. В сборнике: Бюллетень Объединения по проблемам машинного перевода.-- № 5.-- М.:[І МГПИИЯ], 1957.-- С.11 - 18.)
20. Український лінгвістичний портал [Электронный ресурс]. - Режим доступа: http://lcorp.ulif.og.ua.
21. ANSI/X3/SPARK DBMS study group interim report. //FDT-Bull. ACM SIGMOD. - 1975. - V. 7. - №2. - 140 p.
22. Diccionario de la lengua espanola : ed. 23. - Madrid : Espasa Calpe, 2014. - 2432 p.
23. Casares J. Nuevo concepto del diccionario de la lengua. - Madrid., - 1921. - 119 p.
24. Alvar Ezquerra M. El caminar del diccionario academico / Manuel Alvar Ezquerra // Proceedings of the 4th EURALEX International Congress, Malaga: Biblograf. - 1990. - pp. 3-27.
25. Ahumada Lara I. Aspectos de Lexicografia Te- orica. - Granada: Universidad de Granada, 1989. - 295 p.
26. Maldonado C. La entrada lexica en el discurso lexicografico digital // Qrculo de la lingmstica aplicada a la comunicacion 56. - 2013. - pp. 26-52.
27. Alvar Ezquerra M. El caminar del diccionario academico / Manuel Alvar Ezquerra // Proceedings of the 4th EURALEX International Congress, Malaga: Biblograf. - 1990. - pp. 3-27.
28. M. Seco «El nacimiento de la lexicografia espanola no academica», Estudios de lexicografia espanola, Madrid, Paraninfo, 1987. - pp. 129-151.
29. Medina Guerra Maria Medina A. Lexicografia espanola. - Barcelona: Editorial Ariel S. A. - 2003. - 427 p.
30. Shyrokov, V. Digital lexicographical systems and traditional paper dictionaries (from traditional paper dictionaries to digital lexicographical systems) / Volodymyr Shyrokov, Iryna Ostapova // Cognitive Studies.- Warsaw : Institute of Slavic Studies, Polish Academy of Sciences, 2015. - No 15. - P. 193-210.
31. Shyrokov, V. Indexing the etymological lexi- cagraphical systems / Volodymyr Shyrokov, Iryna Ostapova, Kostyantyn Yakymenko // Cognitive Studies.- Warsaw: Institute of Slavic Studies, Polish Academy of Sciences, 2014. - No 14. - P. 1-11.
32. Shyrokov, V. Ontologized lexicographical systems in modern terminography / Volodymyr Shyrokov, Iryna Ostapova, Maksym Nadutenko, Yulia Verbinenko // Cognitive Studies.- Warsaw : Institute of Slavic Studies, Polish Academy of Sciences, 2016. - No 16. - P. 90-99
Размещено на Allbest.ru
...Подобные документы
Lexicography as a science. Dictionary: notion, functions, classification, components. The characteristics of Macmillan English Dictionary for Advanced Learners. Theory and practice of compiling of dictionaries. Dzhonsonovskiy Method of creation.
реферат [41,3 K], добавлен 30.04.2009Interjections in language and in speech. The functioning of interjections in Spanish and English spoken discourse. Possible reasons for the choice of different ways of rendering an interjection. Strategies of the interpretation of interjections.
дипломная работа [519,2 K], добавлен 28.09.2014Legal linguistics as a branch of linguistic science and academic disciplines. Aspects of language and human interaction. Basic components of legal linguistics. Factors that are relevant in terms of language policy. Problems of linguistic research.
реферат [17,2 K], добавлен 31.10.2011Methods of foreign language teaching and its relation to other sciences. Psychological and linguistic prerequisites for foreign language teaching. Aims, content and principles language learning. Teaching pronunciation, grammar, speaking and writing.
курс лекций [79,6 K], добавлен 13.03.2015Background of borrowed words in the English language and their translation. The problems of adoptions in the lexical system and the contribution of individual linguistic cultures for its formation. Barbarism, foreignisms, neologisms and archaic words.
дипломная работа [76,9 K], добавлен 12.03.2012Translation as communication of meaning of the original language of the text by the text equivalent of the target language. The essence main types of translation. Specialized general, medical, technical, literary, scientific translation/interpretation.
презентация [1,3 M], добавлен 21.11.2015Theoretical problems of linguistic form Language. Progressive development of language. Polysemy as the Source of Ambiguities in a Language. Polysemy and its Connection with the Context. Polysemy in Teaching English on Intermediate and Advanced Level.
дипломная работа [45,3 K], добавлен 06.06.2011Study of lexical and morphological differences of the women’s and men’s language; grammatical forms of verbs according to the sex of the speaker. Peculiarities of women’s and men’s language and the linguistic behavior of men and women across languages.
дипломная работа [73,0 K], добавлен 28.01.2014Text and its grammatical characteristics. Analyzing the structure of the text. Internal and external functions, according to the principals of text linguistics. Grammatical analysis of the text (practical part based on the novel "One day" by D. Nicholls).
курсовая работа [23,7 K], добавлен 06.03.2015Grammar is the art of writing and speaking correctly. Grammar bears to language. The composition of language. The term grammar. language is an attribute of reason, and differs essentially not only from all brute voices, but even from all the chattering.
курсовая работа [30,1 K], добавлен 14.02.2010Biography of von Humboldt and J. Herder. Humanistic ideal of scientist. The main Functions of Linguists. Language as an intermediary in the course of understanding and demands therefore definiteness and clarity. Balance between language and thinking.
реферат [20,6 K], добавлен 26.04.2015Teaching Vocabulary in English Language: effective Methodologies. Patterns of Difficulty in Vocabulary. Introduction of the Vocabulary. Ways of Determining the Vocabulary Comprehension and Remembering. Key Strategies in Teaching Vocabulary.
курсовая работа [204,1 K], добавлен 06.12.2015Linguistic situation in old english and middle english period. Old literature in the period of anglo-saxon ethnic extension. Changing conditions in the period of standardisation of the english language. The rise and origins of standard english.
курсовая работа [98,8 K], добавлен 05.06.2011Style as a Linguistic Variation. The relation between stylistics and linguistics. Stylistics and Other Linguistic Disciplines. Traditional grammar or linguistic theory. Various linguistic theories. The concept of style as recurrence of linguistic forms.
реферат [20,8 K], добавлен 20.10.2014Lexicology, as a branch of linguistic study, its connection with phonetics, grammar, stylistics and contrastive linguistics. The synchronic and diachronic approaches to polysemy. The peculiar features of the English and Ukrainian vocabulary systems.
курсовая работа [44,7 K], добавлен 30.11.2015Characteristic features of Slang. Feature Articles: Magical, Ritual, Language and Trench Slang of the Western front. Background of Cockney English. Slang Lexicographers. The Bloomsbury Dictionary Of Contemporary slang. Slang at the Millennium.
курсовая работа [69,2 K], добавлен 21.01.2008Expressive Means and Stylistic Devices. General Notes on Functional Styles of Language. SD based on the Interaction of the Primary and Secondary Logical Meaning. The differences, characteristics, similarities of these styles using some case studies.
курсовая работа [28,8 K], добавлен 30.05.2016Extra-linguistic and linguistic spheres of colour naming adjectives study. Colour as a physical phenomenon. Psychophysiological mechanisms of forming colour perception. The nuclear and peripherical meanings of the semantic field of the main colours.
реферат [193,7 K], добавлен 27.09.2013Language as main means of intercourse. Cpornye and important questions of theoretical phonetics of modern English. Study of sounds within the limits of language. Voice system of language, segmental'nye phonemes, syllable structure and intonation.
курсовая работа [22,8 K], добавлен 15.12.2010American Culture is a massive, variegated topic. The land, people and language. Regional linguistic and cultural diversity. Social Relationships, the Communicative Style and the Language, Social Relationships. Rules for Behavior in Public Places.
реферат [35,1 K], добавлен 03.04.2011