Research university bachelor students spoken production assessment criteria: design and approbation

Speaking ability as an important component of the language proficiency. The difference between spoken and written discourse. English for academic purpose in the context of higher education. Recommendations for developing speaking tests and criteria.

Рубрика Педагогика
Вид дипломная работа
Язык английский
Дата добавления 10.12.2019
Размер файла 1,2 M

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.allbest.ru

Introduction

Speaking ability is an important component of the language proficiency. It plays a pivotal role in people's ability to communicate efficiently. Speaking, being a complex phenomenon, is often seen as the most difficult skill to assess. Nevertheless, the demand for teaching and learning languages communicatively pushes educators to improve the methods of speaking assessment. As a result, speaking skills assessment is a major area of interest within the field of language testing, being researched by prominent scholars, who strive to propose effective ways of speaking assessment (Bachman, 1990; Fulcher, 2003; Luoma, 2004; Taylor, 2011).

Speaking is a compound skill and needs to be assessed validly and reliably. Therefore, it is important to design criteria suitable for the context and aim of learning. This paper discusses the issue of criteria design and approbation for the purpose of the spoken production assessment of the Research University bachelor students. The terms “rating scale” and “(assessment) criteria” are used interchangeably. Although there has been a significant amount of research in the field, there was not much focus on designing spoken production assessment criteria for Academic purpose. This work provides an important opportunity to advance the understanding of the research university students' spoken production assessment.

The thesis will examine the way in which the speaking assessment criteria for bachelor linguist students are designed. The aim of the paper is to design two rating scales - for monological and dialogical speech assessment - for the 3rd year students of the HSE programme “Foreign languages and cross-cultural communication” in order to optimize the assessment process. These rating scales are aimed at the final speaking assessment at the end of the two units.

A full discussion of speaking assessment lies beyond the scope of the study. The reader should bear in mind that the work is based on the experience of teaching linguist bachelor students at the research university, and the designed criteria may not completely suit other educational contexts.

The thesis is composed of two chapters. The first one deals with the theoretical background of the speaking assessment criteria design. Firstly, I will describe the fundamental features on the speaking activity. Secondly, I will examine speaking subskills. Finally, rating scales design approaches will be explored, and several challenges will be discussed. The second chapter deals with the design of the criteria for the HSE bachelor students' assessment. It begins with the analysis the aims of the “Speech practice” course and proceeds with the design of two sets of speaking assessment criteria. After the criteria are tested, the results of the interviews with the practicing teachers are presented and the two sets of criteria are improved using the feedback. At the end of the chapter I also provide the recommendations for developing speaking tests and criteria as a part of the English for Academic purpose (EAP) assessment.

The theoretical value of the work is reflected in the analysis of a number of significant sources on speaking assessment, as well as the contribution to the analysis of the practice of spoken production assessment as a part of higher education. The final versions of the monological and dialogical speech assessment criteria are expected to have the potential of being used as a part of the Speech practice course, providing grounds for the improvement of the system of assessment.

1. Theoretical part. Literature review

The theoretical part of the thesis is based on the critical analysis of a number of prominent works in speaking assessment, as well as newer studies in the field. It starts with the discussion of the unique features of speaking and proceeds to the analysis of the speaking subskills as the basis for criteria design. Then the advanced level (C1-C2) speaking tasks are enumerated and analyzed, the types of rating scales are described, and the process of criteria development is dwelled upon. At the end of the section, I will look closely at the CEFR specifications and rating scales and mention their implications for the rating scales I designed.

Let me begin by stating that speaking has not always been the center of attention in the process of a foreign language teaching. With the rise of the Communicative Language Teaching (CLT) approach speaking became the focal point of learning. The idea of meaningful interaction has become central to the foreign language classes all over the world. Thus, the students are encouraged to engage in a substantial amount of spoken activities in the classroom, the center of the class being shifted from the teacher to the students, resulting in lower levels of teacher talking time (TTT) and higher levels of student autonomy and engagement. In order to do so, the greater output from students is promoted, and the teacher becomes a facilitator and negotiator rather than a strict dominant authority. The basic assumption is that language is better approached as action and interaction, not a set of rules. However, the CLT approach is not ubiquitous in Russia. Our educational system has embraced some of its principles, yet there is a lot to be done. This thesis adopts the idea of speaking being a vital part of the language learning process and needing much practice and effort.

Unique features of speaking

Before proceeding with the discussion of the issue of speaking criteria design, it is vital to note the features that makes speaking process special in terms of teaching and assessing.

Whenever people indicate a person who knows a language, they refer to him or her as a language “speaker”. This alone signals the importance of the speaking activity, fixated in the language itself. In reality, when two or more people meet, they communicate via speaking, which is the skill that is used in everyday life much more often than writing, for instance. Though it is important to mention that writing is crucial in the academic world, it is the speaking that everyone employs in their life daily. Therefore, there is a significant demand for teaching speaking as a part of teaching a language, and “many, if not most, learners are mainly interested in learning to communicate orally” (Ur, 2012). To fulfill the demand, there is a variety of speaking activities that are incorporated in the English language courses. It is noteworthy that the speaking which is taught as a part of different courses, varies greatly. For instance, English for daily use includes basic words (elementary and intermediate level) and informal vocabulary to facilitate the learner's needs; everyday conversations and their imitation are emphasized. If the English for Academic and research purpose is in focus, such educational purpose presupposes the choice of formal presentations and discussions of various issues as means of developing the speaking skill.

Speaking, being an integral part of people's lives, is a complex phenomenon. The ability to speak is exclusive to humans. Although other animals are capable of producing separate sounds or sound groups, only human beings can express an infinite variety of meanings with the help of the spoken language (Hughes, 2011; Romaine, 2000; Thornbury, 2005).

Generally, what people say and how they say it is influenced by the language means available (language as a whole), the prosodic features (stress, intonation and rhythm), the purpose and the register of speech. The ability to produce spoken utterances is indispensable without the language itself. Language is actualized through speech, and the situational factor is a vital element of this process (Fetzer, 2013; Levelt, 1989). In other words, the situation in which the communication takes place influences what is being said, as well as how and to whom. Consequently, the context of speech defined the speech and its properties. As there are five dimensions of context, described by C. Kramsch, namely, linguistic, situational, interactional, cultural, and intertextual (Kramsch, 1993), context "is shaped by people in dialogue with one another in a variety of roles and statuses" (Kramsch, 1993, p. 67). Viewing contexts in this way, it is crucial to understand that they are by no means stable, being constantly reconstructed by the interlocutors.

As can be seen, it is not enough to have a large vocabulary to speak, to understand and be understood. The knowledge of the cultural background and norms of behaviour is something that any form of communication, especially oral, is indispensable of (Harmer, 2007).

Apart from the sociocultural competence, the strategic component (as seen in the Speech production model by D. Douglas) plays an important role in the process of speech production, incorporating metacognitive strategies, language strategies and cognitive strategies (Douglas, 1997). Metacognitive strategies are responsible for the performance during situations which do not require language. There are four distinguished types of strategies that operate at this level, being assessing the context, setting goals for cognition and behaviour, cognitive planning and control of execution and attention. In turn, language strategies are directly connected to the communication process. They encompass such strategies as assessing the discourse context, setting communicative goals, linguistic planning and control of linguistic execution. Cognitive strategies are those that are used to solve problems, including various types of reasoning and planning.

Speaking is regarded as a meaningful interaction, which presupposes that meaning is being transmitted from the sender to the recipient through a certain channel. Consequently, spoken production is a complex process. It incorporates four different levels of decision making (which can also be treated as four stages of production): discourse modeling, message conceptualization, message formulation, and message articulation (Bygate, 2010), that all need to transpire at a substantial rate. The level of automaticity of this process is bound by the learner's level of control of the language structure, their range of vocabulary, the level of message complexity, the degree of familiarity with the topic. This list is in no way exhaustive.

In addition, speaking events may be classified according to the purpose, participation and planning (Thornbury, 2005). Speaking genres can be transactional (with the main goal of conveying information) and interpersonal (with the aim of maintaining relationships between people). Moreover, speaking is interactive (dialogue and polylogue) and non-interactive (monologue). Finally, depending on the time available for preparation (or its absence), planned, partly-planned and unplanned speaking events can be singled out.

Though speech, in contrast to the written forms of communication, has been perceived as an intangible, ethereal matter (see Figure 1), with the advancements in technology it becomes easier to capture and analyze it (Isaacs, 2016). For instance, speech can be recorded, visualized, analyzed, stored for an indefinite period of time, as well as altered or modified. Therefore, “advancements in technology have made it possible to turn the auditory, dynamic, and ephemeral phenomenon of speech into a visual, static medium” (Isaacs, 2016, p. 133).

Figure 1. The difference between spoken and written discourse

To summarize, there are a number of features of speaking that make it a peculiar process: its exclusivity to humans, frequency of occurrence, the focus on meaning, the role of the situation in which the communication takes place and the vitality of cultural knowledge and strategic competence. These traits and components make speaking difficult to assess.

Speaking subskills

In order to facilitate speaking assessment, it is possible to dissect the multiplex speaking skill into smaller ones, which are called subskills. Generally, ten subskills are singled out: fluency, accuracy with words and pronunciation, using functional language, appropriacy, turn-taking skills, ability to speak at relevant length, responding and initiating, repair and repetition, use of range of words and grammar, use of discourse markers (Lackman, 2010).

However, according to the model proposed by H. Brown and P. Abeywickrama, micro- and macroskills are recognized (Brown & Abeywickrama, 2010, p. 186). As such, speaking microskills include the production of smaller chunks, such as phonemes, words, collocations, while macroskills presuppose the focus on fluency, cohesion, discourse, etc. The main difference from K. Lackman's classification lies in the differentiation of production of various chunks (phonemes, reduced word forms, etc.) and in the presence of a macroskill connected with the non-verbal communication means (facial expressions and body language). Everything considered, I do believe that the ten subskills proposed by K. Lackman seem more suitable to be used as a reference point for this piece of research due to the fact that they present a clear vision what components constitute our speech.

Having discussed the classification of the subskills, it is essential to state that the speaking subskills form the basis for the speaking assessment criteria, which form an essential part of speaking assessment using tests. This becomes evident when considering a rating scale of any kind, particularly an analytic scale.

For instance, looking at the University of Cambridge ESOL Examinations Speaking assessment criteria (UCLES, 2011), it can be clearly seen that the five parameters of speech are assessed: grammatical resource, lexical resource, discourse management, pronunciation and interactive communication. These components incorporate almost all of the subskills, except for speaking at relevant length (which is not being assessed, while being important for the overall performance as the test-taker has to speak within the given time limit and, if they are not able to manage it, they are left with no time to finish expressing their ideas).

In conclusion, the speaking skill is usually divided into smaller subskills, which form the base for rating scales design. I have chosen K. Lackman's (Lackman, 2010) classification of the speaking subskills to further reference in this thesis.

Advanced level test tasks

Having elaborated on the speaking subskills, it is necessary to focus on the tasks that are often employed to test students' level of language knowledge. It is necessary to emphasize that, although there are three distinct modes of speaking assessment, being direct, semi-direct and indirect (Isaacs, 2016), I will focus on the tasks typical for direct assessment. Direct speaking assessment is based on the face-to-face contact of the test taker with the interviewer, or interlocutor. Unlike semi-direct (being computer-mediated, when the test taker produces the response which is recorded and the human rater then evaluates the performance) and indirect (without the test taker actually producing any utterances, for instance, using a multiple choice test to predict speaking ability of the examinee (O'Loughlin, 2001)) modes of assessment, direct speaking assessment is the mode that is widely used in the Russian educational system in general, and in higher education in particular. Due to the fact that it is less time and effort consuming than other modes of testing provided the scope of testing described in the practical part of the thesis, combined with being a reliable tool for speaking assessment, it is mostly the mode that is used for testing speaking. Sometimes is may be combined with semi-direct methods.

So, having stated that I will focus mainly on the direct method of assessing speaking, let me enumerate the most frequently used advanced (C1-C2) level speaking test tasks. They include:

Conversations (dialogues). While it is sometimes difficult to incorporate a naturally-flown conversation in the learning process, it becomes more feasible when a set of topics is introduced as a part of the course. Introducing useful vocabulary, as well as communicative strategies beforehand is significant to assist the students in being prepared for the tasks. During such dialogues, it is necessary to evaluate not only the individual performance of the students, but also the interaction strategies they use and whether they together achieve the required goal.

Discussions (debates). Mainly, such interactions should be thoroughly planned. It is vital for the students to be familiar with the format and the rules of debates. The topic should be controversial enough so that opponents could have a genuine discussion. Moreover, the learners should be quite familiar with the topic and have some background knowledge.

Academic presentations. It is one of the most useful test tasks in the context of higher education, especially if the focus is on the Academic English. This task is supposed to be very close to a true scientific presentation, therefore, the format and the genre should be thoroughly discussed with the students. Furthermore, watching sample presentations could be helpful in the process of preparation. This type of tasks is aimed at the preparation of the students for their paper (or thesis) presentation, as well as making them acquainted with the format of a formal presentation and the respective register.

Live monologues (partly-planned). The student is usually given a topic and a set amount of time to prepare and deliver a talk. As mentioned before, a set of broad topics is usually introduced during the course according to the syllabus. A variation of the task may include providing prompts for the student, such as pictures, graphs, infographics or a text to base their talk on. In case of a text, the student is usually asked to give its summary and to state their opinion on the problem raised in the text.

These are the main tasks that are employed to test the advanced level students' knowledge.

As can be seen, they are aimed at testing speaking as a part of different events, some of them being planned or partly-planned, interactive or non-interactive. Moreover, the tasks are chosen with the aim of eliciting a speech sample that will help the testers elicit strong inferences to the task construct. Hence, the speech sample should be sufficient to make such inferences both in linguistic and conceptual content.

Generally, the tasks are focused on various contexts of the student's academic and professional life to test their ability to efficiently take part in different types of communication. The components of the task that should be taken into account are: the goal of the task, the situation in which the communication takes place, the “active participation” (Bachman & Palmer, 1996, p. 44) of the learner, the setting of the test (the number of test tasks, the participants, the time available for preparation, the number of assessors, etc.) and the expected outcomes. Therefore, these components constitute the essence of the situation in which the test taker produces the speech sample. As opposed to close tasks, when the answer is almost always pre-determined, open tasks types, which I have supplied above, allow to elicit a wide range of responses, employing various strategies and linguistic means.

As for the choice of topics for oral assessment, they are chosen according to the course syllabus. The discussion about the influence of the students' background knowledge on the performance during speaking assessment has been quite controversial. While some scholars suggest that the alternations in task conditions do not “automatically translate into changes in test score” (Fulcher, 2003, p. 64), the newer studies (Khabbazbashi, 2015) demonstrate that there is a significant effect of background knowledge (BK), “where low levels of BK posed the greatest level of challenge for test takers while high levels of BK were shown to have a facilitative effect on performance” (Khabbazbashi, 2015, p. 43). The study conducted by C. Krekeler showed somewhat similar results: the background knowledge effect was strong, while all the students, regardless of their level of proficiency, were able to make use of their BK (Krekeler, 2006). So, the influence of the background knowledge is not always consistent; however, it is present. Thus, the topic for academic presentations, debates, conversations and talks are chosen to fit the supposed level of background knowledge of the students. After studying multiple disciplines of different areas (professional disciplines, including the basics of language teaching, translation and interpreting and intercultural communication; Cultural studies; World literature, etc.) the students are expected to have deeper knowledge of those areas, being encouraged to expand their knowledge and acquire significant amount of information in English. The latter assists in their ability to communicate their ideas and other people's ideas in the target language.

To sum up, there are several types of speaking test tasks for direct assessment that are employed at the advanced level. Being conversations, discussions, academic presentations and live monologues, they are intended to test speaking during the events of different types. The topics for such test tasks are chosen to fit the students' expected level of background knowledge and the course syllabus.

English for Academic purpose in the context of higher education

It is crucial to note that apart from focusing on the advanced (C1-C2) level speaking tasks, the emphasis of this thesis is on the tasks and criteria for higher education in the context of English for Academic purpose (EAP). Therefore, I will outline the notion of EAP, its main characteristics and the ways it blends into the higher education purposes.

English for Academic purpose is a part of the broader field of English for specific purposes (ESP), “defined by its focus on teaching English specifically to facilitate learners' study or research through the medium of English. EAP is differentiated from ESP by this focus on academic contexts” (Hamp-Lyons, 2011, p. 89). In addition, EAP is an eclectic and pragmatic discipline, as a wide range of topics can be viewed from the EAP perspective.

English has become the dominant language for the circulation of academic knowledge, as most areas of study have followed the globalization patterns and continuously shifted from publishing in journals in their own language to publishing in journals in English. Alongside this tendency, there appeared a notion of “advanced EAP” (Hamp-Lyons & Hyland, 2002), such as English for research publication purposes. This notion is closely connected to the type of aim that lies in the basis of the Speech practice course at the Department of foreign languages in the Higher School of Economics, meaning that it is targeted at the English for Academic purpose (the use of English for other professional disciplines), including the research purpose, too. The EAP teaching has expanded in the past 10-20 years, encompassing many sub-approaches to the special needs of the learners and steadily improving its methodology. As a result, “EAP teachers are more qualified and more committed than ever” (Hamp-Lyons, 2011, p. 92), or at least they are expected to be. Testing and assessment in EAP contexts “has traditionally been carried out on the basis of a needs analysis of learners or a content analysis of courses” (Fulcher, 1999, p. 221).

Speaking for academic purposes is “an overall term used to describe spoken language in various academic settings” (Jordan, 1997, p. 193). Such types of language:

are normally neutral or formal;

follow the genre/activity conventions;

usually include participation in discussions, oral presentations, answering questions, presenting data, etc.

Now it is essential to consider the state of things in Russia. On the one hand, according to the federal standards for language studies developed by the Ministry of Education and Science and the EFL (English as a Foreign Language) teaching traditions in higher education, EAP is “merely one of the modules and usually not the most significant and lengthy in the university EFL curriculum” (Dvoretskaya, 2016, p. 148). On the other hand, there is a high demand for developing the students' cognitive academic language proficiency (CALP), provided that many of the students are supposed to carry out research in English (even if they are expected to do so in Russian, the trend (of the majority of science data being available in English, not in Russian) that has been previously mentioned in this thesis strengthens the need for EAP). There have been attempts to extend the EAP courses, which is particularly understandable given the amount of skills such courses develop (see Figure 2).

Figure 2. Skills that are developed as a part of the English for Academic purpose

Having described the commonly used advanced speaking test tasks and the state of EAP development in Russia, it is vital to focus on the rating scales, namely on their types and advantages.

Normally, the aim of a speaking test is to evaluate and assess the spoken production of the examinee. Speaking scores reflect “how well the examinees can speak the language being tested” (Luoma, 2004, p. 59). They are often represented by a set of numbers (in our case they range from 1 to 10, 1 being the lowest, which corresponds to the general HSE rating scale used across all disciplines). However, sometimes verbal categories are used, such as “good”, “excellent”, etc. One more important part is the descriptors, which specify the skills and behaviour which are assessed. Ultimately, all the scores and their descriptors form rating scales (or criteria for speaking assessment). A rating scale “provides an operational definition of a linguistic construct such as proficiency” (Davies et al., 1999, p.153), as it focuses on the mastery of linguistic features, as well as the functional aspect of speaking. Consequently, scales are “descriptions of groups of typically occurring behaviours” (Davies et al., 1999, p.154) which reflect the level of language proficiency demonstrated by the examinee.

A “systematic fashion” (Ginther, 2013, p. 2) of assigning numbers (scores) to various speech characteristics which is achieved through the use of rating scales allows them to be a reliable tool for assessing speaking performance. When appropriately designed and tested rating scales are used, they make the whole testing and rating procedure more coherent and feasible both for the raters and the examinees, as well as transparent in terms of the marks received by the examinees. The latter is vital especially for large-scale speaking assessment, but it is still important when carrying out any assessment. Moreover, the evaluation criteria, or the scoring rubrics, allow “for both quantitative and qualitative analysis of student performance” (Allen & Tanner, 2006). The use of such standards for assessment is especially important when the students are novices with respect to a particular task (Anderson, Bresciani, & Zelna, 2004).

Figure 3. Advantages of using evaluation criteria

Having discussed the aims of speaking assessment and the definition of the rating scale and its advantages, let us elaborate on the types of criteria. There are several of them that can be singled out according to three main parameters: orientation, scoring and focus (Fulcher, 2003).

Firstly, the orientation criterion is the basis for the differentiation between user-oriented, assessor-oriented and constructor-oriented scales. The user-oriented scales are designed to describe the typical or likely behaviours of a test taker (at a given level). The assessor-oriented scales are made to guide the rating process, focusing on the expected quality of the performance. Constructor-oriented scales are produced to help the test constructor select tasks for inclusion in the test (Alderson, 1991). In this thesis I am mainly considering the design of assessor-oriented speaking criteria due to the fact that the purpose of the scales designed as part of this piece of research is to guide the rating process.

Secondly, depending on the scoring method, holistic and analytic criteria can be identified. Holistic criteria ask the rater to match the performance of the examinee with one of the descriptors, which are based on certain constructs. Analytic criteria also comprise a number of parameters for speech assessment, however, they require to treat the speaking subskills in a more precise way. In other words, holistic rating is a qualitative summary of the examinee's performance, while analytic rating is of a quantitative nature.

Finally, the focus of the criteria can be either on the test construct or the “real world”. The latter is true if the rating scale “contains references to the types of task that the test taker is able to undertake outside the test situation” (Fulcher, 2003, p. 200).

Construct, validity and reliability

It is crucial to define the task construct before entering the sphere of speaking task and criteria design. Construct is the type of speaking that needs to be tested and assessed. There are three main frameworks for defining the construct: linguistically oriented, communication-oriented and situation-based (Luoma, 2004).

The linguistically oriented approach focuses on such categories as vocabulary, grammar and pronunciation, sometimes accompanied by the functional language. Communication-oriented framework is aimed at the features that make a particular type of communication successful. For instance, a successful expressing of opinion may entail asking the test-taker to state their opinion, support it with relevant evidence, give a contrasting point of view and explain why it is not valid, or not convincing. The situation-based approach is usually used when testing language for specific purposes and professional education, and the focus may be on a great range of tasks, including giving talks in the target language, simulating some aspects of professional life, etc.

Although these three construct definition frameworks are quite distinct, they are combined by educators in their work (Luoma, 2004). Hence, this allows for a variety of tasks and criteria types with different focuses. The pivotal point is that the choice of the framework depends on the educational context in which the test tasks and criteria are used, as well as the purpose of the test.

Having discussed the construct definition, it is essential to highlight that while designing a test, it is important to bear in mind three main components: reliability, validity, and practicality (Nation & Newton, 2008).

Reliability, or score consistency, entails the fact that the test shows relatively the same results even though the conditions (day, time, people who administer the test) are changing. Reliability is important due to the fact that the test result should be dependable, so they could be used to make decisions (Luoma, 2004). There are several features of a speaking test that make it more reliable. Firstly, a number of tasks (of different types) provide a fuller picture of the student's abilities. Secondly, the evaluation based on several subskills rather that one ensures greater objectivity. Thirdly, a test is considered to be more reliable if the students are acquainted with the tasks format and have probably practiced it before. What is more, rater training sessions also contribute to the whole testing process being reliable, as the rater are acquainted with the format, rating procedure, etc. Sometimes experts are able to distinguish and describe qualitatively different performances using a common set of criteria with no guidance (Brown, Iwashita, & McNamara, 2005). In other cases, rater training is essential. Finally, if the test is reliable, it may or may not be valid; however, if there is no reliability, generally, there is no validity (Nation & Newton, 2008, p. 167).

With reference to validity, a test is valid “if it measures what it is supposed to measure and when it is used for the purpose for which it is designed” (Nation & Newton, 2008, p. 167). Scholars usually differentiate between face validity and content validity. Face validity incorporates the opinion of the people taking, administering the test and other people affected by the test about the very test being reliable and fair. However, “good face validity is not a guarantee of reliability or other kinds of validity” (Nation & Newton, 2008, p. 168). Content validity is the concept used to reflect whether all the components of the tested skill, course or language are present in the test task and criteria. If the aim is to test speaking in the form of giving an academic presentation, it is necessary to start by enumerating the components of this skill. For example, it may include the ability to structure the ideas, to present them coherently, using the formal register vocabulary, to use a variety of discourse markers, to be able to give full yet concise answers to the questions of the listeners, etc. If the test task includes the majority (or all) of the skill, language, or course components, it can be viewed as having high content validity. The concept of validity affects all test users due to the fact that “accepted practices of test validation are critical to decisions about what constitutes a good language test for a particular situation” (Chapelle, 1999).

Finally, practicality is another vital aspect of a test. While designing the test tasks and criteria, available resources should be managed properly. For instance, the speaking test should not be time consuming, meaning it is usually short. Moreover, it does not require a lot of paper and equipment, and it can be easily administered and marked, so the results are easy to interpret.

Criteria (rating scale) development

Speaking skill is thought to be inseparable from other skills, particularly listening (Douglas, 1997). In order for the communication to be successful, both partners have to switch roles of the sender and the receiver, employing both speaking and listening skills. If the task is to deliver a talk or a presentation, the audial channel of transmitting the information is vital, as the instructions are given orally by the examiner. Speaking should not be completely separated from reading, too. Being able to grasp the essence of the speaking test task, which is often presented in the written form, is of paramount importance for the test taker. When the task is not understood, it most likely leads to the examinee scoring less points than they are capable of. As for writing, it may also facilitate speaking. For instance, note-taking may assist the examinee while preparing for any type of speaking task, provided there is in fact the time available for preparation. Hence, in most contexts, speaking abilities should not be separated from other skills.

As far as the assessment of oral proficiency is concerned, B. O'Sullivan (2012, p. 234) maintains that it is “commonly believed that tests of spoken language ability are the most difficult to develop and administer”. Y. Chuang (2009) claims that assessing oral performance seems to be one of the most difficult to carry out because are numerous external and internal factors that affect assessors.

In itself, the process of criteria design is of a complex nature. This stems from a number of issues and concerns. Firstly, the fact that speaking is a uniquely human activity with endless variations makes it difficult to be analyzed and evaluated, even according to set grounds. Moreover, it represents an attempt to categorize the language skill, dissect it into smaller, more measurable elements, which is rarely an easy task. Furthermore, the sum of these elements is never equivalent to the speech itself, as there are many factors influencing the speaker (internal factors, situational factors, etc.), so their speech is a multiplex product of their internal abilities, the influence of the situation (the environment of the test, the topic, the task format, the rater, etc.) and the decision making that takes place before and during the speech. It is also noteworthy that rating scales reflect a vision of what makes a good speaker, what constitutes language knowledge and what, as a result, makes a performance weak or strong (Fulcher, 2003; Luoma, 2004), which may be also subjective. Finally, as the procedure of speaking assessment (if direct and semi-direct assessment modes are taken into consideration) involves human raters, the criteria should be proven valid and reliable to avoid inconsistencies caused by the human factor.

Now that I have mentioned the possible challenges occurring in the process of speaking criteria design, let me focus on the rating scales design approaches. According to G. Fulcher, all the approaches are divided into two groups: intuitive methods and empirical methods (Fulcher, 2003). Each of the groups is further subdivided into more distinct approaches.

Hence, intuitive methods incorporate:

The expert judgement method, when an experienced teacher or a language tester is the person who designs the criteria and decides on the levels and the wording of the descriptors on the basis of the needs analysis, course syllabus and existing rating scales, possibly getting the feedback on the usefulness of the scales they design;

The committee method, which is similar to the expert judgement method except for being based on the collaboration of a group of experts, who have considerable experience in teaching students of different levels of knowledge, as well as developing educational materials for them;

The experiential method, which starts with the design of the rating scale either by an expert or a group of experts and proceeds with the raters improving the scale in the process of employing it (Fulcher, 2003).

Typically, the intuitive approaches to speaking rating scales design adopt certain principles upon which they are based. Firstly, the concept of a native speaker ability as the ultimate goal of learning a language has been prominent in many proficiency scales, including the ILR (Interagency Language Roundtable scale) scale and the ACTFL (American Council on the Teaching of Foreign Languages) scale. It can be observed in the use of such words as “native” and “native-like”. This principle has been criticized from various angles (Bachman & Savignon, 1986; Davies, 1991; Lantolf & Frawley, 1985). The scholars emphasize that it is not known who a native speaker is, what skills they possess and what their level of knowledge in different areas is. Thus, the notion of an “educated native speaker” started to emerge and became a part of some rating scales descriptors. Still, most modern speaking scales have tried not to explicitly reference native speaker ability. For example, the modifications in the CEFR (Common European Framework of Reference for languages) descriptors in particular (compared to the 2001 version) saw this change (Council of Europe, 2018, p. 223). The next principle of the intuitive approaches to rating scales design is the preciseness of the wording of the scale descriptors, which has to embody the test construct. Finally, due to the fact that the terminology in intuitive scales may be vague, emphasis is put on continuous rater training, which ensures that they as well share the understanding of the scale.

Now that the intuitive approaches to criteria design and their main principles have been discussed, it is necessary to focus on the empirical methods. Such methods encompass:

The data-based (data-driven) scale development method, which has its basis in the analysis of the performance of the test takers on a range of tasks and the descriptions of the features of their performance. This is how the speaking tasks criteria for the International English Language Testing Service (IELTS) test were designed and developed;

The empirically driven, binary-choice, boundary definition scales methods (EBBs), where a group of experts are asked to analyze a range of speaking task performances and evaluate them, rating from the weakest to the strongest, giving arguments in favour of their choice, which will be used to formulate a list of questions that the future raters will have to answer in order to come up with a score;

The scaling descriptors method, which is also based on the expert judgement of a group of testers who sequence the descriptors (taken in isolation from the scale), ranging them based on their difficulty (Fulcher, 2003).

This whole group, comprised by the data-based (data-driven) scale development method, EBBs and the scaling descriptors method, is what S. Luoma refers to as a qualitative method of approaching criteria design (Luoma, 2004). She also singles out the quantitative methods for developing speaking scales, which focus on a significant amount of statistical expertise. This method is more feasible for large scale tests or for research institutions that makes it easier to collect large scale data to develop a scale. An example of a scale designed using the quantitative approach is the fluency scale by Fulcher (Fulcher, 1996), whose scaling descriptors were developed after a vast discourse analysis of speech samples in terms of their fluency features.

Unlike the intuitive methods, the data-based and the empirically driven methods are based on the performance of the examinees or the analysis of scaling descriptors rather that preconceived criteria. This allows test developers to produce more objective criteria for speech evaluation and assessment, which contributes both to the validity and the reliability of the test.

Moving from the methods of scale development to the process of criteria design, I have composed the scheme (Figure 4) for test task and criteria analysis that I will use in the second (practical) part of this piece of research.

Figure 4. Test task and criteria analysis steps

To begin with, the target language use (TLU) situation is analyzed. Generally, if the syllabus of the course is already developed, this component is dwelled upon in the syllabus. When designing criteria for speaking assessment, it is of paramount importance to look at the purpose of the test. Actually, the test purpose defines speech and language parameters that are assessed, as well as the type of rating scale. As was already mentioned in this thesis, there are numerous speaking subskills forming the basis for speaking assessment criteria. Test purpose influences the choice of the features of spoken production to focus on in the criteria. The construct of the test also shapes the criteria, either chosen from a set of available ones, especially used in international exams, or designed by the teacher. After the construct is defined during the analysis, it is vital to demonstrate how it is actually implemented in the tasks and reflected in the speaking criteria.

After analyzing the test, it is possible to move on to designing the criteria. Another scheme (Figure 5) shows the decisions that are made during this process.

Figure 5. Speaking criteria design steps

During the process of criteria design, the number of levels in the scale needs to be agreed on. It is important to bear in mind that the more levels there are, the more detailed the descriptors and the feedback should be. Moreover, rater consistency must be ensured so that they agree on the marks they nominate. Among the other central questions in scale development is the number of parameters that are assessed if the criteria are analytic. It is generally agreed that if this number is seven or more, it is very difficult for the raters to process the speech sample and evaluate it. Hence, the optimal number of parameters is five or six. Finally, the descriptors should be concrete, precise and detailed, while being practical, meaning that they should not be too long. Thus, significant emphasis is put on the wording of the level descriptors, which should be revised multiple times to ensure there is no ambiguity and all the terminology is familiar for the raters or clarified in the footnotes.

To sum up, speaking criteria design is a complex process, and the complexity of the speaking activity, as well as the presence of human raters create certain issues when it comes to rating scales development. Generally, intuitive and empirical methods of criteria design are singles out. The most important thing is that the rating scales should be based on the purpose of the assessment, as well as take into account the context of learning.

CEFR speaking descriptors analysis

СEFR (Common European Framework of Reference for languages) is a resource that is used by teachers, assessors and students for language education, who not only get the descriptive scheme of language proficiency and its implications for the classroom and testing, but they also presented with a set of common reference levels (A1 - C2). Among other useful materials, CEFR contains a range of “illustrative descriptors” of language competence (speaking included). Although the criteria are not designed for any particular test (i.e. written for general purposes), they can be adapted to specific needs and various educational contexts. Since the descriptors reflect what the learner can do, they constitute a behavioural rating scale, including both analytic and holistic scales, that are both user and assessor-oriented with the focus on real-world (Council of Europe, 2018). The main function of descriptors is “to help align curriculum, teaching and assessment” (Council of Europe, 2018, p.42).

The CEFR rating scales were developed through the scaling of descriptors, meaning that the developers used a large number of separate descriptors, regardless of the scales they were taken from, and re-arranged the descriptors into new scales employing the multi-faceted Rasch analysis. After extensive testing the feedback was received, and the descriptors were either proven to be successful, or not successful, or needing revision, which gave a starting point for the improvement of the scales. The main principles of CEFR and its philosophy are presented in the Figure 6.

Figure 6. CEFR's philosophy

Moving to the speaking assessment descriptors, CEFR has a general brief holistic scale for spoken production (Council of Europe, 2018, p. 69). The main parameters that are described are the progression of the ability to cover various topics and types of speaking, ranging from the lower to the highest level of difficulty; and the speech coherence. One more set of scaling descriptors (p. 83) features the evaluation of fluency, effortlessness, linguistic range and again ability to cover various topics and types of speaking.

As for an extended variant of an analytic scale for speaking (“Qualitative features of spoken language (expanded with phonology)”), it can be found on pp. 171-172. In K. Lackman's classification ten speaking subskills are singled out, as has been discussed earlier in this thesis; they are fluency, accuracy with words and pronunciation, using functional language, appropriacy, turn-taking skills, ability to speak at relevant length, responding and initiating, repair and repetition, use of range of words and grammar, use of discourse markers (Lackman, 2010). It is noteworthy that all these subskills are mentioned in the CEFR description, making them a wide-scope analysis tool. These descriptors are quite universal and were created for general purposes, meaning that such categories as range, accuracy, fluency, interaction, coherence and phonology are described in correspondence with the common language proficiency levels (A1- C2). Hence, the criteria can be adapted to various needs and educational contexts and different “weight” can be assigned to the mentioned parameters. This brings me back to the discussion of the speaking subskills that form the basis for speaking scales descriptors. It is the choice of the teacher/assessor regarding what subskills play the more important role, and, hence, are going to be more prominent in the rating scale.

To conclude, the process of speaking assessment criteria design is a complex matter. This stems from the uniqueness of speaking as a human activity, relying on other skills (such as listening, reading and writing) as well. Speaking is regarded as a meaningful interaction and should be assessed in regard to the context of communication. To make the evaluation process more feasible, speaking is dissected into smaller units, subskills, which form the foundation for designing the rating scales. Moreover, speaking assessment should be carried out validly and reliably, taking into account the students' needs and the target language use (TLU) situation. Finally, the CEFR framework for language assessment contains the starting points for the criteria I designed, and this is the part of this research I will proceed with in the following chapter.

2. Practical part

The practical part of the thesis is devoted to the design of the two sets of speaking assessment criteria, both for monological speech and dialogical speech evaluation. In this part I will describe the research methods that were used, analyze the target language use (TLU) situation according to the Speech practice course syllabus, outline the structure of the test tasks that were used to test the new criteria, detail the feedback received from the practicing teachers and report on the improved version of the criteria. I also provide the recommendations that can be used by teachers or test developers in the process of the test tasks and criteria design for speaking assessment.

Methods

In the thesis I employ the method of the critical analysis of a number of prominent works on speaking assessment, as well as speaking criteria design. It is done in order to explore the previously conducted research in the given area and to foresee any difficulties I may come across while testing and improving the designed criteria. It is also done to be aware of the features of a good speaking test, namely, validity, reliability, real-world orientation and its capability of measuring the communicative competence.

...

Подобные документы

  • Oxford is the oldest English-speaking university in the world and the largest research center in Oxford more than a hundred libraries and museums, its publisher. The main areas of training students. Admission to the university. Its history and structure.

    презентация [1,6 M], добавлен 28.11.2012

  • University of Cambridge is one of the world's oldest and most prestigious academic institutions. The University of Cambridge (often Cambridge University), located in Cambridge, England, is the second-oldest university in the English-speaking world.

    доклад [23,1 K], добавлен 05.05.2009

  • Oxford is a world-leading centre of learning, teaching and research and the oldest university in a English-speaking world. There are 38 colleges of the Oxford University and 6 Permanent Private Halls, each with its own internal structure and activities.

    презентация [6,6 M], добавлен 10.09.2014

  • The education system in the United States of America. Pre-school education. Senior high school. The best universities of national importance. Education of the last level of training within the system of higher education. System assessment of Knowledge.

    презентация [1,4 M], добавлен 06.02.2014

  • Modern education system in the UK. Preschool education. The national curriculum. Theoretical and practical assignments. The possible scenarios for post-secondary education. Diploma of higher professional education. English schools and parents' committees.

    презентация [3,3 M], добавлен 05.06.2015

  • Italy - the beginner of European education. Five stages of education in Italy: kindergarten, primary school, lower secondary school, upper secondary school, university. The ceremony of dedication to students - one of the brightest celebrations in Italy.

    презентация [3,8 M], добавлен 04.04.2013

  • Context approach in teaching English language in Senior grades. Definition, characteristics and components of metod. Strategies and principles of context approach. The practical implementation of Context approach in teaching writing in senior grades.

    дипломная работа [574,3 K], добавлен 06.06.2016

  • Teaching practice is an important and exciting step in the study of language. Description of extracurricular activities. Feedback of extracurricular activity. Psychological characteristic of a group and a students. Evaluation and testing of students.

    отчет по практике [87,0 K], добавлен 20.02.2013

  • The most common difficulties in auding and speaking. Psychological characteristics of speech. Linguistic characteristics of speech. Prepared and unprepared speech. Mistakes and how to correct them. Speaking in teaching practice. Speech, oral exercises.

    курсовая работа [35,8 K], добавлен 01.04.2008

  • School attendance and types of schools. Pre-school and elementary education. Nursery schools and kindergartens which are for children at the age of 4 - 6. The ideal of mass education with equal opportunity for all. Higher education, tuition fees.

    реферат [20,5 K], добавлен 01.04.2013

  • The development in language teaching methodology. Dilemma in language teaching process. Linguistic research. Techniques in language teaching. Principles of learning vocabulary. How words are remembered. Other factors in language learning process.

    учебное пособие [221,2 K], добавлен 27.05.2015

  • The history of the use of the interactive whiteboard in the learning. The use of IWB to study of the English, the advantages and disadvantages of the method. Perfect pronunciation, vocabulary. The development of reading, writing, listening and speaking.

    презентация [1,3 M], добавлен 23.02.2016

  • Studying the system of education in Britain and looking at from an objective point of view. Descriptions of English school syllabus, features of infant and junior schools. Analyzes the categories of comprehensive schools, private and higher education.

    презентация [886,2 K], добавлен 22.02.2012

  • Disclosure of the concept of the game. Groups of games, developing intelligence, cognitive activity of the child. The classification of educational games in a foreign language. The use of games in the classroom teaching English as a means of improving.

    курсовая работа [88,5 K], добавлен 23.04.2012

  • Process of learning a foreign language with from an early age. The main differences between the concepts of "second language" and "foreign language" by the conditions of the language environment. Distinguish different types of language proficiency.

    статья [17,3 K], добавлен 15.09.2014

  • Planning a research study. Explanation, as an ability to give a good theoretical background of the problem, foresee what can happen later and introduce a way of solution. Identifying a significant research problem. Conducting a pilot and the main study.

    реферат [26,5 K], добавлен 01.04.2012

  • The purpose and psychology-pedagogical aspects of extracurricular work on a foreign language. Requirements to extracurricular work. Forms of extracurricular educational work on a foreign language. Using the Internet in extracurricular work on English.

    курсовая работа [38,9 K], добавлен 19.03.2015

  • The impact of the course Education in Finland on my own pedagogical thinking and comparison of the Finnish school system and pedagogy with my own country. Similarities and differences of secondary and higher education in Kazakhstan and Finland.

    реферат [15,2 K], добавлен 01.04.2012

  • Peculiarities of English nonsense rhymes – limericks and how to use them on the classes of English phonetics. Recommendations of correct translation to save its specific construction. Limericks is represented integral part of linguistic culture.

    статья [17,5 K], добавлен 30.03.2010

  • What is the lesson. Types of lessons according to the activities (by R. Milrood). How to write a lesson plan 5 stages. The purpose of assessment is for the teacher. The students' mastery. List modifications that are required for special student.

    презентация [1,1 M], добавлен 29.11.2014

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.