The active learner’s construction-combinatory thesaurues: user-driven principles of compiling (a cognitive linguistic approach)

The article discusses the design of a new type of dictionaries, the Active Learner’s Construction-Combinatory Thesaurus intended for adult learners of the second (foreign) language. The article proposes a cognitive profile of the dictionary’s target user.

Дата добавления 23.06.2022
The active learner's construction-combinatory thesaurues: user-driven principles of compiling (a cognitive linguistic approach)

Svitlana Zhabotynska

(Bohdan Khmelnitsky National University of Cherkasy, Cherkasy, Ukraine)

Yevhenii Plakhotniuk

(Kyiv National Linguistic University, Cherkasy, Ukraine)

S. Zhabotynska, Ye. Plakhotniuk. The Active Learner's Construction-Combinatory Thesaurus: user- driven principles of compiling (a cognitive linguistic approach). This article discusses the design of a new type of dictionaries, the Active Learner's Construction-Combinatory Thesaurus (ALCCT) intended for adult learners of the second (foreign) language. The ALCCT is an ideographic dictionary where phrases, understood as instantiations of constructions, are arranged in accordance with the cognitive ontology of a particular conceptual thematic field. As such, the ALCCT is a project compatible with cognitive lexicography, a contemporary branch of dictionary-making that adopts the findings of cognitive science, cognitive linguistics in particular. The article proposes a cognitive profile of the dictionary's target user, and makes it a departure point in elaborating the principles of compiling the ALCCT. They are defined as the principles of data selection, arrangement, and application. Data selection regards their thematic and formal coherence, their authenticity, and their prominence, or frequency. Data arrangement implies their relational coherence, and their elaboration. Relational coherence is realized through the lexicographic code, or the dictionary' s overall design that develops at three hierarchical levels: those of macrostructure (a conceptual ontology of the theme), mediostructure (the key words evolving into phrasal sets), and microstructure (description of phrasal lemmas). Elaboration of the data is provided via the overarching structure mapped onto the three hierarchical structures of the lexicographic code, and concerned with etymological, cultural, grammatical, and phraseological (metaphorical) extensions. Data application reaches out to the communicative situations in which the ALCCT's resources can be used.

Key words: cognitive lexicography, Active Learner's Construction-Combinatory Thesaurus (ALCCT), user' s cognitive profile, principles of compiling, data selection, data arrangement, data application.

С. Жаботинська, Є. Плахотнюк. Активний навчальний конструкційно-комбінаторний тезаурус: принципи укладання з урахуванням користувача (лінгвокогнітивний підхід). У статті запропоновано проєкт нового типу словника - активного навчального конструкційно-комбінаторного тезауруса (АНККТ), призначеного для дорослих користувачів, які вивчають іноземну мову. АНККТ є ідеографічним словником, в якому словосполучення, потрактовані як утілення конструкцій, упорядковано на підставі когнітивної онтології конкретної тематично визначеної поняттєвої царини. Тим самим АНККТ є проєктом, узгоджуваним із когнітивною лексикографією - сучасною галуззю словникарства, яка послуговується доробком когнітивної науки, зокрема, когнітивної лінгвістики. У статті потенційний користувач словника представлений у когнітивному профілі, який вмотивовує принципи укладання АНККТ. Вони визначаються як принципи відбору, упорядкування та застосування даних. Відбір даних ураховує їхню тематичну і формальну когерентність, автентичність та промінантність, або частотність. Упорядкування даних передбачає їхню реляційну когерентність і їхнє поглиблення. Реляційна когерентність утілюється в лексикографічному коді - цілісному ієрархічному дизайні словника, представленому макроструктурою (концептуальною онтологією теми), медіоструктурою (ключовими словами у складі фразових сетів) і мікроструктурою (описом фразової леми). Поглиблення даних залучає надструктуру, яка проєктується на три ієрархічні структури лексикографічного коду і забезпечує етимологічне, культурологічне, граматичне та фразеологічне (метафоричне) поширення словникового матеріалу. Принцип застосування даних пов'язаний із виходом у комунікативні ситуації, які потребують мовного забезпечення, наданого в АНККТ.

Ключові слова: когнітивна лексикографія, активний навчальний конструкційно -комбінаторний тезаурус (АНККТ), принципи укладання, відбір даних, упорядкування даних, застосування даних.

С. Жаботинская, Е. Плахотнюк. Активный учебный конструкционно-комбинаторный тезаурус: принципы составления с учетом пользователя (лингвокогнитивный подход). В статье предлагается проект нового типа словаря - активного учебного конструкционно-комбинаторного тезауруса (АУККТ), предназначенного для взрослых пользователей, изучающих иностранный язык. АУККТ является идеографическим словарем, в котором словосочетания, трактуемые как воплощения конструкций, упорядочены на основе когнитивной онтологии конкретной тематически определенной понятийной области. Тем самым проект АУККТ согласуется с когнитивной лексикографией - современной отраслью составления словарей, использующей наработки когнитивной науки, в частности, когнитивной лингвистики. В статье потенциальный пользователь словаря представлен в когнитивном профиле, мотивирующем принципы составления АУККТ. Они определяются как принципы отбора, упорядочивания и использования данных. Отбор данных учитывает их тематическую и формальную когерентность, аутентичность и проминантность, или частотность. Упорядочивание данных предусматривает их реляционную когерентность и их углубление. Реляционная когерентность реализуется в лексикографическом коде - целостном иерархическом дизайне словаря, представленном макроструктурой (концептуальной онтологией темы), медиоструктурой (ключевыми словами в составе фразовых сетов) и микроструктурой (описанием фразовой леммы). Углубление данных осуществляется посредством надструктуры, которая проецируется на три иерархические структуры лексикографического кода и обеспечивает этимологическое, культурологическое, грамматическое и фразеологическое (метафорическое) расширение словарного материала. Принцип использования данных предполагает выход в коммуникативные ситуации, которые требуют языкового обеспечения, представленного в АУККТ.

Ключевые слова: когнитивная лексикография, активный учебный конструкционно -комбинаторный тезаурус (АУККТ), принципы составления, отбор данных, упорядочивание данных, использование данных.


The anthropocentric orientation of nowadays linguistics echoes in the `user perspective', or `user- driven' approach in theoretical and practical lexicography (Tarp, 2008; 2011). This approach accentuates the necessity of making the dictionary more `user-friendly', which is achieved via devising a `target user profile' that guides the dictionary design (Tarp, 2008). A `user-driven' approach is particularly important for compiling bilingual dictionaries intended for learners of the second / foreign language (L2). The available `profiles' of dictionary users are mostly concerned with their “common sense” employed in decoding a dictionary entry (Tarp, 2008, pp. 41-43, 82-85). The respective strategies of compiling dictionaries tend to introduce some modest amendments to their text format or simplify the definiens. Most of such L2 dictionaries, consulted only sporadically, are de facto passive, i.e. alphabetically structured, and focused on the systemic properties of linguistic expressions. The user's cognitive capacities regularly employed in L2 acquisition and speech production remain under-addressed, since traditionally oriented dictionary- makers do not consider “mental processes in the brain” to be a matter of lexicography (Tarp, 2008, p. 132). Meanwhile, lexicography may benefit from the ideas of cognitive linguistics which dovetails mental and linguistic phenomena, and which may contribute to developing active dictionaries (Apresyan, 2010; Fuentes-Olivera & Bergenholtz, 2018) that aim to practically assist in L2 learning and teaching.

This article discusses the principles of compiling a particular kind of active dictionaries--the Active Learner's Construction-Combinatory Thesaurus (ALCCT) (Plakhotniuk, 2020b), which is an updated version of the combinatory thesaurus grounded on a conceptual ontology (Zhabotynska, 2010; 2019). Designing the ALCCT complies with the emergent field of cognitive lexicography (Ostermann, 2015) that bridges the theory and praxis of dictionary-making with cognitive linguistics and the broader field of cognitive science. In cognitive lexicography, the dictionary is expected to represent schematic patterns of cognition that are tracked in various linguistic data, and thus are relevant for language acquisition and speech production.

The discussion below focuses on a cognitive profile of the ALCCT user, a cognitive linguistic background of the dictionary design, the ALCCT's semiotic interpretation that provides a systemic approach to the user's needs, and the principles of compiling the ALCCT that satisfy these needs. The concluding discussion outlines the theoretical and practical implications of this study.

Cognitive profile of the ALCCT user

A `user-friendly' dictionary should be compiled with regard to the user's cognitive capacities intended for processing linguistic information. Since the ALCCT is addressed to adult learners, these capacities are the ones inherent in the adult mind / brain. Its properties, explored by different branched of cognitive science, define the ALCCT's general objectives. In our research, they are further specified in a pilot survey on the needs and expectations of the ALCCT's potential adult users.

Neurolinguistics argues that language acquisition depends on both nature and nurture, i.e. language evolves at the intersection of biological and societal factors. Biological factors relate to the language faculty existing in the human mind, and societal factors are represented by the linguistic environment which activates this faculty (Zhabotynska, 2020, pp. 102-103). Language faculty, as a natural endowment, can be properly activated only at a particular age. For L1, this age (up to 8-12) is called the critical period, because a child not exposed to any language during this time will not be able to achieve an adequate proficiency in speaking and thinking. For L2, the same period is defined as sensitive, because L2, similarly to L1, is acquired unconsciously, with assistance of the procedural memory. The period after the age of 8-12 is called post-sensitive. Now, L2 is learned consciously, being assisted by the declarative memory (Lenneberg, 1967; McWhinney, 2005; Paradis, 2005; Zhabotynska, 2020, p. 103). As the ALCCT is intended for adults who learn L2 in the post-sensitive period, a profile of the ALCCT's user may incorporate the findings of cognitive and andragogic research on the brain / mind workings in adulthood.

According to andragogic studies, adult learners differ from children in such aspects as previous experience, internal motivation, need for implementing the social roles, awareness of the on-going rational cognitive activities and immediate application of new knowledge (Knowles, 1984). Studies of the post-sensitive period in language acquisition, as well as those concerned with the ageing brain and adult learning, provide evidence for the neurocognitive basis of this difference (for review see Zhabotynska & Plakhotniuk, 2016). The processes involved in adult learning are automated due to lateralization, myelination, development of the prefrontal cortex (the age of 14-21) and the default neural network, particularly in the medial prefrontal cortex (the age of 21-31). Adult learning becomes more efficient in terms of neural connectivity (Fair et al., 2008, p. 4030) and conscious conceptualization, or abstract thinking, as well as retrospective and creative use of information (Fair et al., 2008, pp. 4028-4029). Learning per se changes the language-related areas of the adult brain both functionally and structurally (Martensson et al., 2012). The role of comprehensive, enriched and meaningful input stimulating learning-based neuroplasticity in adulthood seems to be crucial and, as such, emphasized throughout the literature (Caine & Caine, 1994, pp. 30-33; Valipour & Asl, 2014).

Respectively, verbal and non-verbal mental representations are not chaotic. Instead, the researchers report on stable patterns of self-organization of information at conceptual and linguistic levels, i.e. embeddedness and interconnectedness (Caine & Caine, 1994, p. 39), which provides evidence for a certain degree of iconic motivation between the external / formal linguistic patterns and internal / conceptual patterns (Perniss, Thompson, & Vigliocco, 2010). For instance, neural activations triggered by listening to audio-texts reveal semantic grouping throughout the cerebral cortex. Researchers associate such grouping with a certain interdependence between symbolic representations and bodily (perceptual) experience in mental schemata (Huth, de Heer, Griffiths, Theunissen, & Gallant, 2016). Hence, information available in the ALCCD intended for adult learners should be provided in a systematic, structured, integrative and pragmatically driven way that is isomorphic to the way in which the adult brain / mind processes linguistic and conceptual information.

The results of our pilot survey on expectations of the potential ALCCT adult users are compatible with the conclusions of cognitive and andragogic studies as to workings of the adult brain / mind. The interviewed participants turned out to favor (i) a thematic arrangement of the dictionary instead of its alphabetical arrangement, (ii) exposure to the key words of the theme, to their synonyms, and to the phrases in which they are used, (iii) presence of syntactic patterns according to which these phrases can be transformed, (iv) availability of instruction as to combining the phrases into sentences that make up a text applicable in communication (Plakhotniuk, 2020a).

The above preferences of adult users are reflected in the ALCCT as a dictionary type: (a) it is an ideographic (onomasiological) thesaurus featuring a thematically homogeneous conceptual field; (b) it is a combinatory thesaurus: its units (lemmas) are phrases / word-combinations with the key words of the thematic field; (c) it is a construction-combinatory thesaurus: it provides the patterns of phrases as constitutive elements of sentences; (iv) it is an active learner's thesaurus: its design actively assists the learners in L2 acquisition and speech production (Plakhotniuk, 2020b). The ALCCT differs from the existing lexicographic projects of active dictionaries (see overview in Fuentes-Olivera & Bergenholtz, 2018). The ALCCT is to meet the user's primary, secondary and tertiary needs (Tarp, 2011, p. 283), defined so with regard to the order in which they are satisfied. The primary needs are concerned with the type of lexicographic data. The secondary needs associate with appropriateness of the dictionary's design which has to be compatible with the user's neuro-cognitive profile, thus facilitating L2 acquisition at the lexical and syntactic levels. The tertiary needs imply assistance of this dictionary in speech production, or developing thematically relevant texts and communicative skills.

Compiling the ALCCT according to the principles consistent with the ways in which the user's mind / brain processes linguistic and non-linguistic information presents a significant challenge for dictionary-makers. This challenge is attempted to answer in the cognitive linguistic conception termed Semantic of Lingual Networks (see the recent version in (Zhabotynska, 2018)) that underpins the combinatory thesaurus grounded on a conceptual ontology (Zhabotynska, 2010, 2015, 2019; Brovchenko, 2011; Radchenko, 2012, 2019). The ALCCT, which is an updated version of this thesaurus, has the same theoretical background that is briefly described below.

Cognitive linguistic background of the ALCCT

Semantics of Lingual Networks (SLN) has six theoretical statements based on the analysis of various linguistic phenomena. Four of these statements are immediately relevant for compiling the ALCCT (their further description corresponds to Zhabotynska, 2018).

Conceptual structures that arrange the meanings of linguistic expressions are constituted by basic propositional schemas (BPS) which represent the most abstract conceptual categories and their relations. The BPSs are thematically grouped into five types: being, action, possession, identification, and comparison schemas.

- Being schemas include the quantitative (X is THAT MANY-Qn), qualitative (X is SUCH-

Ql), locative (X exist THERE / LC-locative), temporative (X exists THEN / TM-temporative),

and mode of being (X exists SO / MD-mode) schemas.

Action schemas comprise the state/process (AG-agent acts), contact (AG-agent acts upon PT-patient / AF-affected), and causation (CR-causer makes FT-factitive) schemas.

Possession schemas are represented by the part-whole (WH-whole has PR-part), inclusive (CR-container has CT-content / CT-content has CR-container), and ownership (OW-owner has OD-owned / OD-owned has OW-owner) schemas.

Identification schemas are particularised as the classification (ID-identified = individual or kind is CL-classifier = kind or type), characterization (ID-identified = individual is CH-characteriser), and personification (ID-identified = individual is PS-personifier = a proper name) schemas. In English, CL is manifested with the indefinite article, and CH--with the definite article.

Comparison schemas include the identity / metamorphosis (CV-comparative is [as] MS-correlate = another category of the same entity), similarity / analogy (CV-comparative is as AN-correlate = an entity from the same category), and likeness / metaphor (CV-comparative is as if MT-correlate = an entity from a different category) schemas.

The BPSs may get extension with additional argument roles: SC-circumstant (attendant, aid, counter-agent, instrument, mediator, means, and mode), ST-stimulus (cause and goal), PQ-prerequisite (condition and concession), RC-recipient (addressor, benefactor, and malefactor), LC-locative, and TM-temporative.

The BPSs integrate into an operational network employed in processing information about the objects of the experienced world. The number of BPSs is limited, but, arranged in various configurations, they structure an unlimited number of conceptual networks.

A conceptual network may be transformed into a conceptual matrix, if the links between its nodes remain implicit.

A conceptual network or matrix may be built at one level or several levels. In the latter case, the information evolves in-depth, being structured as `networks-in-networks' or `matrixes-in- matrixes'. The hierarchical conceptual levels are: a conceptual thematic field (all the information that is structured), the domains (focuses of the thematic field), parcels (focuses of the domains), and concepts which constitute parcels and which are structured as a set of properties. At all conceptual levels, the networks or matrixes are built with the BPSs that suchwise exhibit fractal properties. Thematically coherent information arranged with a conceptual network or matrix is defined as a conceptual ontology.

The BPSs and their clusters are the meanings of syntactic constructions. In construction grammar, they are interpreted as schematic, or generalized linguistic forms that have their own schematic meanings existing independently of the words that fill out these forms (Goldberg, 1995; Ostman & Fried, 2004; Lyngfelt, Borin, Ohara, & Torrent, 2018 among others). The constructions whose meanings are represented by the BPSs are employed for both categorization and recategorization of linguistic information. In case of re-categorization, the schematic meaning rendered by a BPS is manifested not by its own schematic form, but by the schematic form of some other BPS. For instance, the schematic form NP2 of NP}, the inherent meanings of which are represented by the possessive BPSs (the page of a book, students of the group, a car of this owner), can be used to explicate the other BPSs: qualitative (beauty of the girl, a girl of beauty), contact (invitation of the student), classification (a game of chess), likeness / metaphor (a devil of a boy), etc. In this case, the non-possessive propositional schemas are re-formatted as possessive, and their blended meaning integrates into the semiotic (syntactic) category of possession.

The above theoretical statements pairing linguistic and conceptual structures are relevant for developing user-driven principles of compiling the ALCCT. The system of these principles is prompted by the ALCCT's semiotic interpretation.

Semiotic interpretation of the ALCCT

Any dictionary is a text. Since the text can be viewed as a `macrosign' (Vorobyova, 1993, p. 41), it agrees with the semiotic definition of a sign as the unity of a material form, the meaning which it evokes in the mind, and the function which it performs (Fig. 1).

Figure. 1. Semiotic interpretation of the ALCCT

In the dictionary, the MATERIAL FORM is the lexicographic object, or the data represented by particular linguistic expressions. In the ALCCT, they are phrases with the key words (nouns) that feature a particular conceptual thematic field identified as the dictionary's MEANING. In the ALCCT, which is an ideographic dictionary, the signified thematic field is arranged in accordance with a conceptual ontology that becomes a lexicographic structure providing a thematic and structural arrangement of the phrasal data. The latter obtain a lexicographic description that has its own design. Together, the lexicographic structure and lexicographic description make up a lexicographic code which is pivotal for the compiler. Lexicographic code is a system of methods employed for processing the lexicographic data of L2. The ALCCT's FUNCTION is assistance to users in L2 acquisition and speech production. The dictionary's multimodal (multisemiotic) text can be presented on a paper or digital carrier.

The ALCCT's semiotic interpretation prompts the system of principles applied in the dictionary design. These principles will be specified below.

Principles of compiling the ALCCT

Compiling the ALCCT with regard to its semiotic aspects is guided by (a) the principles of data selection (concerned with the lexicographic object), (b) the principles of data arrangement (concerned with the lexicographic code), and (c) the principles of data application (concerned with the dictionary purpose).

The principles of data selection are represented by their thematic and formal coherence, their authenticity, and their prominence.

Thematic coherence of the data means that they name a particular thematic field relevant for everyday or professional communication (e.g. SCHOOL, TRAVELLING, AIRPORT, COURT, MARKETING, etc.). Besides, the data include not only the key words (nouns) of the thematic domain, but also the synonyms of these words. Each key word evolves into a set of phrases that specify schematic content of the BPSs and their extensions. Formal coherence of the data means that the phrases belong to particular structural types (e.g., Adj N1--prestigious school, N2N1--boy school, N1N2--school teacher, Prep N1--at school, N1V N2 - school admits students, N2V N1- students attend school, etc.) that are formal correspondences of particular BPSs and their extensions. Thematic and formal coherence of the dictionary's data agree with what text linguistics calls “referential coherence of the text”, or the continuing reference to the same entities figuring in the text (Dirven & Verspoor, 2004, p. 186).

Authenticity of the data means that the thematically and structurally coherent phrases are retrieved from a corpus of L2 authentic texts describing a particular theme, or topic. Thus, the ALCCT that features language used in speech represents a “usage-based model of language” bridging linguistic competence and performance (Tomasello, 2003; Boyland, 2009). In learning L2, authenticity is of particular importance, since combinability of words in phrases tends to be language-specific. That is, a phrase in L1 may not be a word-for-word translation in L2--the fact which tends to be ignored by L2 learners.

Prominence of the data means that their retrieval from a specialized corpus considers a frequency factor that defines the learning priorities. The existing research shows that nearly 10% of the total lexicon is composed of words that are most frequent, deeply entrenched, and applicable in defining the rest of the word-stock. This part of the lexicon, first acquired in ontogenesis, becomes the so-called “minimum grounding set” (Vincent-Lamarre et al., 2016, pp. 636-637). Accounting for frequency effects, prototypicality and associative connections between lemmas as well as the use of highly productive lexical and syntactic patterns might enhance L2 acquisition by adults (Tomasello, 2003; Frost, Siegelman, Narkiss, & Afek, 2013). The ALCCT defines three frequency groups of phrasal data (differentiated with colors) which correspond to the users' proficiency levels in L2 within a particular theme, and facilitate the choice of learning priorities (cf. Frost, Siegelman, Narkiss, & Afek, 2013). Reference to data frequency allows the users to prioritize their lexical and syntactic choices, prevents them from the information overload, and visualizes L2 prototypical expressions.

The principles of data arrangement include their relational coherence, and their elaboration.

Relational coherence of the data agrees with what text linguistics defines as “relational coherence of the text”, or comprehensive links between its referents (Dirven & Verspoor, 2004, p. 189). In the ALCCT, relational coherence provides conceptual and linguistic congruence of the lexicographic code, or the dictionary's overall design. This design develops at three major hierarchical levels: those of macrostructure, mediostructure, and microstructure. The first two correspond to the `lexicographic structure' per se, while the third level corresponds to the `lexicographic description'.

The ALCCT's macrostructure demonstrates arrangement of the entire conceptual thematic field signified with the lexicographic data. Here, the key concepts are linked within parcels, and the latter are linked within domains which constitute the thematic field (Figure 2). The relations between the concepts within a parcel, between the parcels within a domain, and between the domains within the thematic field are represented by BPSs that comply with the structured content. Hence, the ALCCT's macrostructure is a networks-in-the-network ontology that may be converted into a matrixes-in-the-matrix or networks-in-the-matrix ontology. A conceptual ontology that arranges the ALCCT's data provides a natural correspondence between the linguistic and conceptual fields. As Caine and Caine (1994) note, any lexicographic text is not a self-sufficient source of meaning in itself, but rather a form that is meant to activate and foster a definite knowledge structure. That is why processing of information in an active dictionary should be meaningful, or addressed to an inherently meaning-tuned user (Caine & Caine, 1994, pp. 100-101). The ALCCT's macrostructure definitely meets this requirement relevant for an idiographic (onomasiological) dictionary that demonstrates the `meaning -> form' perspective.

Figure 2. Macrostructure of the ALCCT (Zhabotynska, 2010, p. 81)

The ALCCT's mediostructure, iterated throughout the dictionary, arranges information about the key concepts as constituents of parcels in the macrostructure of the thematic field. The key concepts may exhibit variations that are linguistically captured by synonyms. In a synonymous group, the key word (a lexical lemma) that names the key concept, and the most frequent synonym(s) of the key word develop into phrases (phrasal lemmas). The two types of lemmas have their own patterns of representation. Lexical lemmas are described with regard to differential senses in the meanings of synonyms. Such senses are distinguished via the properties registered in the BPSs (e.g. contact BPS + MD-mode `X teaches WHAT + HOW'- school: teaches all disciplines equally; gymnasium: teaches selective disciplines in-depth). Phrasal lemmas are arranged in phrasal sets (Figure 3).

The structure of phrasal sets, being constitutional for the lexicographic code of the ALCCT, accounts for its definition as a `construction-combinatory thesaurus'. Here, word-combinations are considered as instantiations of constructions, or abstract syntactic forms that have their own schematic meaning.

Figure 3. Mediostructure of the ALCCT: arrangement of a phrasal set (Zhabotynska, 2019, p. 21)

In the ALCCT, a phrasal set has its thematic tuning, i.e. the phrases subsumed by a particular construction are thematically grouped. For instance, the phrases which instantiate the qualitative BPS with the key word SCHOOL (SCHOOL is SUCH > SUCH SCHOOL) further split into those where the logical predicate represents (1) the students' age, (2) the taught subjects and the student body, (3) the students' sex, (4) the way of funding, (5) evaluation, etc. (Figure 4).

Figure 4. Tuning of phrasal sets in the ALCCT (a fragment of the phrasal set in Zhabotynska 2019, p. 23)

A tuned phrasal set arranged around a key word creates a construction-combinatory portrait of this word. Since such a portrait retains multiple instantiations of the same constructions as schematic form-meaning correspondences, it may foster the required automation of rote memorization and learning based on the pattern recognition and reproduction (see Frost, Siegelman, Narkiss, & Afek, 2013).

The ALCCT's microstructure arranges information in the dictionary entries that describe phrasal lemmas. This description includes (a) translation of the phrase into the native language of dictionary users, (b) examples of sentences with this phrase, and (c) transformations (TRF) of this phrase caused by re-categorization of the respective construction. For example, reputable school-- TRF: school (that) has a good reputation, school with a good reputation. The problem that arises thereby is caused by prominence of the syntactic form inherent for a particular BPS, and prominence of the syntactic form(s) which this BPS has adopted due to re-categorization. An adopted syntactic form may become more entrenched, which is demonstrated by frequency of its use in speech. In this case, an adopted syntactic form becomes the phrasal lemma, and the initial syntactic form is listed among transformations of this lemma. For example, boy school--TRF: school teaches only boys, school for boys only.

Elaboration of the data takes place at the forth, additional level of the ALCCT's design. This level represents the overarching structure that maps upon the three core structures of the lexicographic code and provides additional information relevant for the constituents of these structures. The additional information for lexical lemmas concerns their particular etymology. Phrasal lemmas may require culture-specific comments (e.g. specific kinds of schools typical of Great Britain and the USA). Phrasal sets are supplied with a syntactic constructor retaining the guidelines as to combining phrases into sentences of different types and different degrees of complexity (see the instances of exercises in Zhabotynska, 2015, pp. 50-52; 2019, p. 25; Plakhotniuk, 2015, pp. 59-70). Some domains in the conceptual ontology that structures the entire ALCCT's thematic field may be employed in conceptual metaphors, where they become either the metaphorical target (e.g. SCHOOL is as if X) or metaphorical source (e.g. X is as if SCHOOL). The phraseological linguistic expressions, or idioms, brought under particular conceptual metaphors, are represented in the `metaphorical repository ' included into the overarching structure of the ALCCT.

The principles of data application reach out to the communicative situations where the ALCCT's data can be used. The dictionary has a system of `Let's talk' assignments targeted at individual phrasal sets, and clusters of phrasal sets (a) within one and the same parcel, (b) within one and the same domain, and (c) within several domains of the entire conceptual ontology of the theme. The `Let's talk' assignments propose dictionary users to employ the expressions from one or several phrasal sets in various simulated communicative practices relevant for a thematically focused interaction. The communicative assignments are to be engaged after the user's work with the syntactic constructor. The number of involved phrasal sets depends on complexity of a communicative task. The latter may also extend into the field of creative writing, where the user may employ the ALCCT's metaphorical repository. To devise the communicative assignments, the ALCCT's compiler should consult experts in the field for which the thesaurus is intended.

The principles of compiling the ALCCT suggest multimodality of the resultant text: besides the verbal part, it has visual constituents--conceptual graphics and pictorial illustrations. Provided the text carrier is digital, the ALCCT may be supplied with videos and other Internet resources. Conceptual graphics is used to represent the ALCCT's ontology, and to visualise arrangement of phrasal sets which is compatible with the formal arrangement of a sentence (see the examples in (Zhabotynska, 2010, 2015, 2019; Brovchenko, 2011; Radchenko, 2012, 2019)). Pictures may illustrate some lexical and phrasal lemmas, especially those that are culture-specific. Videos are of particular help for scaffolding the communicative situations. Converging evidence asserts that visual perception improves comprehension and boosts learning. Visualization of the inherent conceptual properties of linguistic expressions or extralinguistic objects stimulates associative memorization, helps in understanding complex ideas, and increasing the mind's productivity and creativity (Hay, Kinchin, & Lygo-Baker, 2008; Li & Jeong, 2020, p. 2). Therefore, a multimodal format of the ALCCT adds to making it more `user-friendly'.

Concluding discussion

The ALCCT project is consistent with the contemporary theoretical conceptions of natural language generation, and practical approaches to L2 teaching and learning.

The models of natural language generation distinguish several aspects in `the speaker's blueprint' (Levelt, 1998): (1) conceptual planning (CONCEPTUALIZER)--conceptualizing the event and forming a preverbal message, (2) grammatical encoding (FORMULATOR)--mapping the preverbal message onto lexicon and syntactic structures, and (3) morpho- phonological and phonetic encoding (ARTICULATOR)--formalization and verbalization of a linear message. In (Guhe, 2003), conceptualization is viewed as an incremental process that reduces the complexity of computation (pp. 31, 54) by parallel processing of information that regards: (a) construction / segmentation--mapping what is perceived to concepts from long-term memory, (b) selection of the events that are to be verbalized (macroplanning), (c) linearization-- ordering selected events appropriate to the goal of the discourse (macroplanning), and (d) generation / structuring a preverbal message--mapping the conceptual representation that has been handled so far to the semantic content that can interface with the linguistic formulator (microplanning) (p. 31). The information models at this stage involve semantically underspecified “referential nets” of incremental elements. The activation value assigned to each element determines its salience (p. 110).

Thus, the models of natural language generation emphasize the role of conceptualization, or processing the information that is to be manifested with linguistic expressions. Meanwhile, the ways in which this information is processed remain unspecified. In the ALCCT, information processing is effected via constructions that integrate pre-verbal conceptual schemas, or BPSs, with their formal manifestations, or syntactic schemas. Besides, the BPSs are involved in developing a conceptual ontology that arranges the total scope of information in the ALCCT. While constructions contribute to exposure of linguistic information (HOW to say), a conceptual ontology gets beneficial for exposure of non-linguistic information (WHAT to say), which is especially important when the scope of information is sufficient or / and the information is new to the learners. Cognitive studies argue that in the brain / mind the information recall and connection of the working memory with the long-term memory is based on the associative map-like activation (Caine & Caine, 1994, p. 42-44). A conceptual ontology that arranges information in the ALCCT makes the associative activation structured and thus facilitates comprehension and memorization of the intended content. Hence, the theoretical framework employed in compiling the ALCCT may contribute to understanding the nature of relations between conceptual representations and their linguistic manifestations. And conversely, this framework may benefit from the new findings in the field of natural language processing.

The ALCCT, which represents both content and language, agrees with CLIL as one of the most popular contemporary approaches in language teaching and learning. CLIL (Content-and- Language Integrated Learning) emerged in the USA and Europe around 1990s as a continual teaching of curricular content through the medium of a foreign language and foreign language through content (Cenoz, 2015, p. 12; Castellano-Risco, Alejo-Gonzalez & Piquer-Piriz, p. 6). Although the balance at any one time may vary, the assumption is that overall, a CLIL program will equally focus on content and language and will be referenced to both a foreign language and a content subject curriculum (Kiely, 2011). It is reasonable, therefore, to accept that the language aspect of a CLIL program will also be content driven, in that it will be generated from the specific needs of the particular subject taught and will assist students in better dealing with the requirements of the subject (Ioannou-Georgiou, 2012, pp. 498-499). Meanwhile, the critical remarks address, on the one hand, insufficiencies in language teaching. They are caused by absence of linguistic expertise of the non-native language instructors, relatively late age of the learners (Dalton-Puffer, 2011, pp. 183-184), and--in general--lack of a systematic, as assumed, “content-driven language aspect of CLIL programs” (Ioannou-Georgiou, 2012, pp. 498-499). On the other hand, the critical remarks also address insufficiencies in content teaching. The scholars note that published materials targeted for CLIL teachers sometimes water down the content subject and treat it in a FL-oriented manner. So, if specific guidelines are not given, CLIL risks to become a time-consuming, ineffective, and frustrating experience (Ioannou-Georgiou, 2012, 497-498).

Presumably, the ALCCT employed in CLIL may become the `missing link' that balances language-and-content learning and teaching. The ALCCT provides phrasal coverage of particular coherent content which, being rich, demonstrates algorithmic arrangement consonant with the mind's natural logic. The language teachers not quite knowledgeable in a specific professional domain may use the ALCCT as a source of structured subject-related data. The content teachers without an adequate linguistic background may use the ALCCT as a source of linguistic data (lexical, grammatical, and communicative) for teaching language employed in their professional field. If the field has no ready-made ALCCT yet, the teacher can compile it using the principles set out in this article. Such a dictionary may provide linguistic scaffolding of a particular class or the entire topic taught in L2.


