__MAIN_TEXT__

Page 1

IATEFL TEASIG

Best of TEASIG Vol. 2


BEST OF TEASIG 2 2003-2011 © IATEFL Tes ng, Evalua on and Assessment SIG 2021 Copyright for whole issue IATEFL 2021. IATEFL retains the right to reproduce part or all of this publication in other publications, including retail and online editions as well as on our websites. The articles in this collection are reprinted from the IATEFL TEASIG publication archives from 2003 to 2011. Articles from a joint publication with the English for Speakers of Other Languages SIG (Winter 2007) and a joint publication with the Teacher Training and Education SIG (November 2008) are included. Some articles have been edited for reprinting. The views expressed in this book are those of the authors and do not necessarily reflect those of IATEFL, its staff or trustees, TEASIG or any other SIG unless expressly stated as such. Contributions to this publication remain the intellectual property of the authors. Any request to reproduce a particular article should be sent to the relevant contributor and not to IATEFL or TEASIG. Articles which have first appeared in IATEFL TEASIG publications must acknowledge the IATEFL TEASIG publication as the original source of the article if reprinted elsewhere.

ISBN 978-1-912588-34-3

All rights reserved.

No part of this book may be reproduced in any form or by any electronic or mechanical means, including information storage and retrieval systems, without written permission from the author, except for the use of brief quotations.

Published by IATEFL, 2-3 The Foundry, Seager Road, Faversham, ME13 7FD, UK www.iatefl.org

For more information about TEASIG, visit https://tea.iatefl.org/

People involved in this publication Editor: Maggi Lussi Bell Proof-readers: Judith Mader, Eleanor Baynham Design and layout: elc – European Language Competence

IATEFL TEASIG

2

Best of TEASIG Vol. 2


This collection is dedicated to all IATEFL TEASIG members, both past and present, with special thanks to those who have contributed to this book and other TEASIG publications. IATEFL TEASIG Committee 2020 The articles included in this collection were chosen by the TEASIG Committee members. (Articles written by current or past committee members were not chosen by themselves.)

Mehvar Ergun Turkkan TEASIG Coordinator

Saeede Ide Haghi TEASIG Social Media Coordinator

Neil Bullock Joint TEASIG Webinar Coordinator (until December 2020)

Anna Soltyska Joint TEASIG Webinar Coordinator

Mehtap Ince TEASIG Events Coordinator

Thom Kiddle TEASIG Webmaster

Maggi Lussi Bell TEASIG Editor

Ceyda Mutlu Joint TEASIG Coordinator (until April 2020)

Sharon Hartle Joint TEASIG Webinar Coordinator (until April 2020)

Dave Allan TEASIG Committee Member without Portfolio (until May 2020)

IATEFL TEASIG

3

Best of TEASIG Vol. 2


Contents FOREWORD Judith Mader EDITORIAL Maggi Lussi Bell

6 7

2003 Tes ng oral presenta ons in an academic context

Zeynep Ürkün

9

2004 Current issues in performance assessment

Kari Smith

17

The case for authen c language assessment

Peter Davidson

25

The IELTS speaking test: analysing cultural bias

Rubina Khan

29

Pu8ng learners in their proper place

Dave Allan

45

A brief history of CEFR implementa on ...finding a common language

Maria K. Norton

49

Assessing the speaking skills of ESOL learners

James Simpson

54

Test prepara on: a joint teacher/student responsibility

Mashael Al-Hamly & Chris ne Coombe

60

The CEFR – the four big issues

Keith Morrow

64

The Occupa onal English Test: Tes ng English proficiency for professional purposes through contextualised communica on

John Pill

66

The candidate, the essay, the test: examina on wri ng reassessed

Cathy Taylor

71

Tes ng intercultural communica ve competence in English

Judith Mader & Rudi Camerer

73

2005

2006

2007

2008

IATEFL TEASIG

4

Best of TEASIG Vol. 2


Contents 2009 Tests of language for specific purposes

Bart Deygers

78

Spots, camera, ac on!!! Learner on the stage for evalua on

Dönercan Dönük &

83

Paired speaking tests: an approach grounded in theory and prac ce

Evelina D. Galaczi

88

Let’s assess life skills: here are the criteria!

Meral Guceri

96

2010 Developing a placement test: a case study of the Bri sh Council Barry O’Sullivan ILA project

104

Alterna ve assessment: the use of porFolios around the globe

Dina Tsagari

110

Se8ng real standards using authen c assessment in an EAP context

Peter Davidson & Chris ne Coombe

118

Bea ng chea ng – a compendium of causes and precau ons

Greg Grimaldi

124

Assessment literacy for the English language classroom

Glenn Fulcher

130

A cogni ve processing approach towards defining reading comprehension

Cyril Weir & Hanan Khalifa

135

Is language tes ng a profession?

Harold Ormsby L.

148

The tes ng of intercultural competence: seven theses

Rudi Camerer

151

Listen and touch

Thom Kiddle

154

2011

How should teachers and testers work together? What can they Judith Mader learn from each other?

158

The impact of rubrics on assessment a8tudes in literature studies

Carel Burghout

161

Teacher and student percep ons on oral exam standardiza on

İdil Güneş Ertugan &

169

IATEFL TEASIG

5

Best of TEASIG Vol. 2


Foreword This volume is the second of its kind and again is for all TEASIG members as well as for the wider TEA community. It follows on from Best of TEASIG Volume 1, published in 2018. Language tes ng is s ll just as much of a hot topic as it was in the early days of TEASIG, so it is well worth making the insights from the en re period of TEASIG’s existence accessible to all those interested in tes ng, evalua on and assessment. Assessment literacy is something which concerns teachers, test developers, curriculum planners, teacher trainers and, undoubtedly, all members of TEASIG.

was succeeded by Neil Bullock, Ceyda Mutlu and Mehvar Ergun Turkkan, who have con nued to lead TEASIG onward and upward.

This volume, like the first of its kind, includes some of the best ar cles to have been wriNen for and published by TEASIG between the years 2003 and 2011. I was pleased and honoured to be asked by the Editor to write the foreword to Best of TEASIG Volume 2, which covers the period when I was TEASIG Coordinator.

Although it can hardly be considered a posi ve development in any general way, the global pandemic of 2020 (and, no doubt, 2021) will have its effect on tes ng prac ces just as more favourable developments, like the Internet, did. I am convinced that something of value will come out of the pandemic, and that this will be reflected in tes ng prac ces and affect the way we see the main considera ons of TEASIG in the years to come. I am convinced that these advances will be evident in the ar cles selected for the next volume of Best of TEASIG.

The ar cles chosen by the TEASIG CommiNee for Best of TEASIG Volume 2 reflect the developments in the field and the hope is that they will con nue to inspire those who read them. The topics covered are as fundamental today as they were when the ar cles were wriNen, although much has changed – in tes ng, evalua on and assessment, and in the world we live in.

I followed a line of extremely illustrious Coordinators when I was elected to the posi on in 2013, taking over officially at the TEASIG Conference in Sienna in November of that year. Before being elected to the post of TEASIG Coordinator by the TEASIG CommiNee of the me, I had been the ini ator of the TEASIG Email Discussion List and its Moderator, and later TEASIG NewsleNer Editor, two roles I enjoyed very much. The TEASIG Email Discussion List was abandoned aPer a while in favour of social media communica on and a new posion created for this.

Judith Mader IATEFL Trustee / SIG Representa ve TEASIG Commi ee member and Coordinator (20052018) December 2020

The TEASIG Newsle er was renamed in 2019 as Tes ng, Evalua on and Assessment Today, to reflect exactly the same changes in communica on with and between TEASIG members. News was being communicated less and less in printed form, and the new name was found to be more appropriate. The publica on has gone from strength to strength with the current Editor of Tes ng, Evalua on and Assessment Today and these volumes of selected ar cles. Dave Allan said in his Foreword to Best of TEASIG Volume 1 that “a lot of great new thinking was bubbling...” below the surface and this can also be seen in the ar cles chosen for this volume. I became a Trustee of IATEFL in 2018 and took on the role of SIG Representa ve, represen ng all the 16 IATEFL SIGs, so stepped down as TEASIG Coordinator. I

IATEFL TEASIG

6

Best of TEASIG Vol. 2


Editorial Welcome to Best of Tes ng, Evalua on and Assessment Volume 2, which follows on from the first volume, published in January 2019, and spans the years 2003 to 2011. And what an exci ng me for tes ng that was! There were certainly many changes and innova ons in the air, and this collec on of ar cles with its different perspec ves and various themes is a worthwhile tes mony to that.

highlighted the role of both TEASIG ac vi es and publica ons in assis ng and enabling its members to explore and develop their own tes ng prac ces and to disseminate their ideas to a wider audience. Whether you choose to read the Best of Tes ng, Evalua on and Assessment Volume 2 from cover to cover or to pick out the topics most relevant to your situa on and current interests, I am sure it will serve not only as a reminder of what was happening in the tes ng, evalua on and assessment field from 2003 to 2011, but also provide ample food for thought and inspira on for the future.

Between them, the 28 ar cles address a number of issues that have influenced tes ng, evalua on and assessment prac ce, sparking interest among testers and teachers, and giving rise to analysis and debate. A quick glance through the Contents on page four will show the broad scope of these topics, which include oral presenta ons, porFolios, placement tes ng, authen c assessment, tes ng intercultural competence, the use of touch-screen technology, and cultural bias in tes ng, to name but a few. From my viewpoint, none of these issues can be ‘ cked off’ our list of enquiry and all are well worth revisi ng in the light of tes ng prac ces today.

Maggi Lussi Bell IATEFL TEASIG Editor

Sincere thanks go to everyone who took the me to gather their thoughts, examine their data, reflect on their experience, and share their ideas with the TEASIG community during those years, and not only to those whose ar cles have been included in this collec on. As with Best of Tes ng, Evalua on and Assessment Volume 1, it was not possible to contact all the authors whose contribu ons had been selected for consideraon by the commiNee. Nevertheless, the final 28 ar cles included in this volume are an illustra ve sample of the TEASIG Newsle ers and Conference Selecons from 2003 to 2011. My thanks also go to the TEASIG CommiNee for their invaluable help in making this selec on, not an easy task with a restric on on the number of ar cles that could be chosen and such a wealth of high-quality material in the TEASIG archives. Finally, I would like to express my gra tude to IATEFL Head Office for their much-appreciated support. Pu8ng together this interes ng collec on of ar cles has been an enriching experience for me and an opportunity to reflect on the extent to which tes ng has changed over the past two decades as well as perceive issues that are as relevant to us today as when the ar cles pertaining to them were first published. It has also IATEFL TEASIG

7

Best of TEASIG Vol. 2


2003

IATEFL TEASIG

8

Best of TEASIG Vol. 2


Tes ng oral presenta ons in an academic context Zeynep Ürkün, Kadir Has University, School of Foreign Languages, Istanbul, Turkey Original publica on date: IATEFL TEASIG Newsle er June 2003 This ar cle was the basis of a talk at the IATEFL Brighton Annual Conference, 23 April 2003.

It is a daun ng task to be able to find a person who enjoys the idea of being assessed. This holds true for any test, including being tested on one’s spoken performance. However, it is also a widely accepted fact that the assessment of speaking is somewhat more problema c than assessing or being assessed on, say, reading or wri ng. It would be worthwhile to first look at what makes assessment of speaking par cularly difficult.

Difficul es of assessing spoken performance Speaking is probably the most difficult skill to test. It involves a combina on of skills that may have no correla on with each other, and which do not lend themselves well to objec ve tes ng. (Kitao & Kitao, 1996) •

Speaking, unlike wri ng, reading or listening, has an obvious ‘flee ng’ nature. This makes the skill of speaking especially hard to ‘catch’. Unless the performance is captured on tape or video, it cannot be checked later.

Providing test takers with a meaningful enough context to determine their communica ve language ability is hard to aNain.

Mirroring the real-life ac vity as much as possible while assessing the oral performance also proves to be difficult.

There is a definite need to standardize markers in order to ensure marker reliability and aNaining this reliability is especially difficult in the assessment of speaking competence.

Clear-cut assessment criteria need to be developed and trialled.

Close modera on of test tasks and marking schemes are needed, again because these will contribute to reliability.

No maNer which speaking assessment task(s) we select, we end up assessing only a very small ‘sample’ of speaking tasks that can be actually used.

In spite of the difficul es inherent in tes ng speaking, a speaking test can be a source of beneficial backwash. If speaking is tested, this encourages the teaching of speaking in English language classes. If we consider how much of our me is really spent using the four skills (reading, wri ng, listening and speaking), an average person probably spends about 45% of their me listening, possibly just slightly less responding, five or six percent of reading it and just a frac on of it wri ng (Dellar, 2002).

Why is it essen al to test oral presenta ons in an academic context? The more direct we can make a test and the more we can incorporate contextual and interac onal features of reallife ac vity into our tests, the more confidently we can extrapolate and make statements about what candidates

IATEFL TEASIG

9

Best of TEASIG Vol. 2


should be able to do in that real life context. (Weir, 1995)

Oral presenta ons provide a definite true-life ac vity which students working in academic contexts will be required to do. Here are some apparent quali es of oral presenta ons which make it worthwhile to teach and assess them: •

Oral presenta ons help students to improve key, transferable skills which are highly valued by tutors, employers and organisa ons worldwide, such as effec ve communica on, independent research, collabora ng with team members to achieve a result, etc. (Assessing Oral Presenta ons and Group Work, 2002).

Oral presenta ons challenge students to approach subject learning in a fresh way, and communicate their learning and ideas orally with their peers (ibid.).

Oral presenta ons give students a meaningful context to work with, and context is one of the most important elements that determine communica ve language ability.

They may allow opportuni es for students’ confidence to build up – speaking in front of a group of people does improve with prac ce.

If there is good rapport with the audience, then very useful ques on-answer sessions may follow, which will enhance spontaneous use of spoken English – a skill we would like students in academic environments to develop.

If self-assessment and peer-assessment are part of the assessment system, students can reflect on their own performance and assess both themselves and their peers more effec vely. This will, in turn, lead to much beNer awareness of what needs to be done in oral presenta ons.

If oral presenta ons are videoed, this gives assessors a chance to give very effec ve feedback on the students’ performance.

Being able to speak convincingly and authorita vely is a useful career skill for students. One of the best ways of helping them develop such skills is to involve them in giving assessed presenta ons.

As a result of the above considera ons, it was decided at Sabancı University (the ins tute I work at) to assess our students’ oral presenta on skills and to make that a part of the formal assessment. Certainly, before assessing them, we had to teach them. At this point, I find it necessary to provide some background informa on about our teaching environment so that the ra onale behind our choice to assess oral presenta ons can be explained later. Ours is a university where the medium of instruc on is English. There are only 2 main facul es (Faculty of Engineering, and Faculty of Arts and Social Sciences), and there is also the Graduate School of Management. All students who manage to get high enough grades from the central university exam to be eligible for our university have to take a language assessment exam, based on skills tes ng, before they start their studies in their faculty. If they fail the exam, they have to have at least 32 weeks of English instruc on and then take the language assessment exam again. In our Founda ons Development Programme, we aim to improve students’ language competence, as well as ensure a smooth transi on to the academic skills they need to possess once they start in their departments. Our assessment system consists of a combina on of on-going assessment and test-based assessment, as well as self- and peer assessment. 40% of a student’s overall score is determined by on-going assessment and 60% consists of test-based assessment. Depending on the level, on-going assessment may take different forms and may contain several different components such as process wri ng, vocabulary and learning journals, graded readers, wri ng porFolios, oral presenta ons and group discussions. APer carrying out several interviews with the university lecturers, it was found out that students would definitely be required to do quite a few oral presenta ons during their academic life at university, let alone the fact that most would be required to do them aPer they started workIATEFL TEASIG

10

Best of TEASIG Vol. 2


ing. So it was decided to teach and assess them but, first of all, the disadvantages as well as the advantages of assessing them had to be taken into considera on in order to ensure a fair assessment scheme. In the following are some essen al points that deserve aNen on before one sets out to assess oral presenta ons.

Are there disadvantages to assessing oral presenta ons? • If students need to spend a lot of outside class me to prepare for their presenta on, they may be difficult to organize. •

If user-friendly marking criteria haven’t been established, they can be quite difficult to assess.

The issue of marker reliability is hard to achieve – markers need training to assess in a standardized manner, which means there will have to be videoed sessions which can be used for this purpose.

Again, in order to achieve reliability, more than one assessor may be needed, and this can be difficult to organize.

Some students may feel stressed at the idea of having to present in front of an audience.

If the class size is too large, it may be difficult to assess everyone.

Some students may feel unmo vated and this will affect the prepara on and presenta on stages, especially if they are working in groups.

Some students may be unfamiliar with the use of visual aids such as using PowerPoint, overhead sheets or slides.

Quite a bit of class me is required. Students need to have a clear idea of the requirements or there is a danger that they may try to cover more than is possible within the me available and in sufficient depth.

Topic selec on may be problema c. Students should be required to work on topics which are related to the academic environment in which they may expect to work in order to make the task meaningful.

In order to make the assessment natural and in order to ensure full student involvement, it is a good idea to include peer and self-assessment, but some students may feel slightly uncomfortable at this idea.

Considering the advantages and the disadvantages, and aPer several mee ngs with colleagues, we decided to do the following in our assessment of oral presenta ons: •

Clarify the purpose of oral presenta ons firstly for ourselves and then for our students as well. We decided that our aim had to be to develop students' skills at giving presenta ons, as well as making them do research and reading and improve their subject knowledge, because these are exactly the type of skills and sub-skills they will be required to u lize in their faculty.

Make the criteria for assessment of presenta ons clear from the very beginning. We decided that there had to be complete transparency in order to minimize the stress students may be feeling. This way, students would not be working in a vacuum, the mystery that surrounds tes ng would be dispelled, and they would know exactly what was expected of them.

We decided to make students part of the assessment system in order to increase the ownership of the assessment system. Making them assess themselves and peers would clarify the expecta ons of the assessment scheme and they would feel more comfortable. This would also increase their awareness of why they had received a certain grade. Also, when given the chance to assess each other's presenta ons, they take them more seriously and learn from the experience. Students merely watching each other's presenta ons can get bored and switch off mentally IATEFL TEASIG

11

Best of TEASIG Vol. 2


(Assessing Oral Presenta ons and Group Work, 2002). If they are evalua ng each presenta on using an agreed set of criteria, they tend to engage themselves more fully in the process, and in doing so learn more from the content of each presenta on. We also decided to familiarize students with the criteria and teach them how to use them. We had to ensure that students understood the weigh ng of the criteria. We had to explain it well to them and show whether the most important aspects of their presenta ons were to do with the way they deliver their contribu ons (voice, clarity of expression, ar cula on, body language, use of audio-visual aids and so on) or the content of their presenta ons (evidence of research, originality of ideas, effec veness of argument, ability to answer ques ons, and so on) (ibid.). In order to be fair, we decided to provide some prac ce at assessing and being assessed at oral presenta ons. For this reason, we decided to include a prac ce run before the actual assessment took place. Another alterna ve would be to watch a sample performance with the students and ask them to assess it with the criteria they would soon be assessed with. We thought it would be a good idea to have a discussion aPerwards to clarify the criteria further. We decided to work on the criteria and make them as user-friendly as possible since both the students and the assessors would be using the same marking grid. We realized that we had to set a realis c me limit and expect students to perform under that limit in order to avoid the presenta on becoming too long or boring, or with a repe on of unnecessary informa on, etc. There were also prac cal considera ons such as number of students in each class and the class me we could spare for oral presenta ons. It was obvious that most presenta ons would over-run and we had 15 students in each class. There was also the fact that students – the audience – would find it quite difficult to concentrate aPer an hour or two of presenta ons. So we decided to assess students in groups of two or three, depending on the class sizes. We felt that with larger groups it would be harder to func on or prepare for the presenta on. We also decided that we would allow 4-6 minutes delivery me for each group member. To avoid ‘favouri sm’ we decided that the groups would be formed by the teacher; to ensure full aNendance and full aNen on, we decided that the order of presenta ons would be decided on the day of the presenta on by loNery. In order to allow the presenters themselves to have the opportunity to review their own performance later on and to create material we could use for training the assessors, we decided to video-tape the sessions. This meant we would have ready videoed sessions we could use to standardise assessors the following year, and also to show to students and standardize them to peer and self-assess. We also wanted to make our students get used to the idea of using notecards, rather than scripts. This would also minimize ‘reading’ from a script and maximize their free speech. In order to make their presenta ons effec ve, we decided to spend some me teaching them how to make use of visual aids and that we would spend some me teaching them how to use PowerPoint. Our main teaching materials are the books we produced ourselves, which adhere to the specific needs of our ins tu on. They work on the principle of teaching the skills and language required for academic study. Our books are made up of modules and, in order to give students a meaningful context to work with, we decided that the oral presenta on topics would come from the modules. Our teaching semester is 16 weeks, and we felt it would be fair to do at least one prac ce oral presenta on and two assessed ones in that period. We also decided that the oral presenta ons would make up 10% of the overall grade and be part of the on-going assessment. In other words, the oral presenta ons would have a 10% weigh ng in determining the pass/fail of a student. Below is a step-by-step explana on of how oral presenta ons were taught and assessed at the intermediate level at Sabancı University. 1.

The first step was research. In order to teach students the essen al components of doing an oral presenta on, we designed a self-research ac vity on the web. The aim of this ac vity was to ask students to do the ini al research on what makes an effec ve oral presenta on on their own.

IATEFL TEASIG

12

Best of TEASIG Vol. 2


2.

At our university, all students are given laptops. As a result, they are expected to hand in any type of wriNen work in computerised format – handwriNen work is accepted only in emergencies. Since most of our classrooms are equipped with a computer and a projector, it is also expected that our students know how to use these facili es. We decided to look at general guidelines of using PowerPoint with students so that they could make use of this in their oral presenta ons.

3.

The next step was to show them a brief account of how they could make use of visual aids to enhance their presenta on and we spent some me prac sing the use of PowerPoint with them.

4.

The general guidelines for giving effec ve oral presenta ons were provided to the students and were dealt with in class.

5.

Marking criteria were selected and a mark sheet designed. Sufficient class me was spared to explain the criteria to students, followed by students’ watching a sample performance which was marked by everybody in class.

APer comple ng the above steps, the prac ce oral presenta on was administered in week 6 of the intermediate level course in order to give students me to improve their language enough to be ready for the prac ce oral presenta on. The prac ce oral presenta on was module-related, i.e. related to the work they had been carrying out in class. Students were grouped according to who could work well with who; aNen on was paid to pair shy students with more outgoing ones, who was more familiar with PowerPoint and could help out other group members, etc. The topic was to analyse the current energy needs and consump on of Turkey and decide whether Turkey had to invest more heavily on oil produc on, natural gas produc on or electricity power. Students did a good job with good research; since that had been the module topic, they were very familiar with it and the resources they could use to carry out their research. As they were listening to their friends’ presenta ons, they assessed their peers, as well as assessing themselves. Surprisingly, in many cases they gave lower scores than I gave them. It was obvious from the class discussions that the criteria had indeed been internalised and students were very much aware of what they were being assessed on. They could evaluate their own performance as well as their friends’ very effec vely. However, there were a few problems, as described in the following paragraphs. Students were extremely repe ve: “I am going to talk about … and my friend is going to men on… ”. It was obvious that we would have to show them ways of making ‘alterna ve’ approaches, for which we designed a short ac vity for use in class. The ac vity required them to record their own voice, making an opening to an oral presenta on. Listening to their own voice and pronuncia on and recording themselves on their laptop un l they felt sa sfied with their own performance was very effec ve. They were asked to email their recorded voice to the class teacher and individual feedback was provided to everyone. There were several problems with the way they ‘jumped’ from one idea to the next. It was essen al to work on effec ve transi ons and show them alterna ve ways of moving from one idea to the next in order to make their oral presenta on more coherent. Finally, in order to raise their self-awareness, we decided it would be a good idea for them to record their own voice one more me, this me making an introduc on and checking their intona on, pausing, etc. A special task was designed: students first listened to some example introduc ons which had been recorded by the teachers and then prepared their own introduc on and recorded it. Once again, they were asked to email their recorded voice to the class teacher, and individual feedback was provided to everyone. Then it was me for their first assessed oral presenta on in week 9 of the course. The topic of the relevant module of the week was ‘social change’ and under that topic cause/effect analysis was one of the objec ves that had to be taught. Students were given a choice between the two following topics: either present the internal and external reasons for a par cular social change in Turkey, or the social changes brought about by an important figure. The IATEFL TEASIG

13

Best of TEASIG Vol. 2


requirements of this first officially assessed oral presenta on were: •

Length: Each presenter would speak for 4-6 minutes. Thus, a presenta on delivered by two students would be a total of 8-10 minutes and by three students a total of 12-16 minutes. Marks would be deducted for presenta ons that were not within the me specified.

Order of presenta ons: The order of presenta ons would be decided on the day of the delivery by loNery. A student who was not present when her or his name was drawn would have her or his mark reduced by 50%.

Visual support: Students would use PowerPoint as visual support for their presenta ons.

Notecards: Students would use notecards that had their presenta ons wriNen in note form. Scripts would not be used. Students should aNempt to speak freely and should refrain from reading and/or memorizing their presenta on.

Marking criteria: Marks would be based on delivery, language, content and organisa on.

There was peer assessment as the presenta ons took place. At the end of each presenta on, first the speakers assessed their own presenta on. Then, classmates compared their assessment with the grade everyone had given to themselves, so the class discussed the performance together. Finally, the results were mailed to every student one by one, in order to keep the privacy of each student. The performances were generally around and above average (which is 70% at our university) although there were also a few very good ones. The first oral presenta on was a very valuable learning experience. Although some students had performed just about average or slightly above, the general impression was very posi ve regarding the fact that they now had a clear understanding of their strengths and weaknesses. They were determined to do beNer next me. The second oral presenta on took place in week 16, the last week of the course, 7 weeks aPer the first assessed presenta on and during that period we had done another prac ce oral presenta on. Finally, it was me to do their last assessed oral presenta on. The task was again module related – ‘news and the media’ (the relevant module of the week). This me students were grouped with different classmates than the ones they had worked with in the previous oral presenta ons. The topic was to select a recent piece of news which had led to a lot of debate and made a great impact on the public, and to talk about the underlying causes and the current effects, as well as the poten al future effects on society they could predict. Once again, students assessed their peers, as well as assessing themselves. It was extremely pleasant to see that there was a clear improvement in the performance of ALL of the 15 students in the class, which reflected itself in the grades awarded. Moreover, this sa sfac on was experienced by all the students since they were now able to assess their own performance, as well as their friends’, very effec vely. Several men oned that the whole process of learning how to do an effec ve oral presenta on had been extremely useful and were grateful for the fact that they had acquired this essen al skill so early in their academic life. Conclusion In spite of all the difficul es, assessing speaking, and assessing oral presenta ons should be an indispensable component of the assessment schemes of academic environments, as the benefits of this prac ce most definitely outweigh the hardships. Making oral presenta ons part of the official assessment scheme can be a source of beneficial backwash as oral presenta ons require the mastering of academic skills such as effec ve communica on, independent research, collabora ng with team members to achieve a result – skills which students will be required to display in academic and work se8ngs. Oral presenta ons give students a meaningful context to work with and they may provide opportuni es for students’ confidence to build up. Especially if self-assessment and peer-assessment are part of the assessment system, students can reflect on their own performance and assess both themselves and their peers more effec vely, and thus will be beNer aware of what is expected of them in their academic work and in their future careers. Our experience of teaching and making oral presenta ons part of the formal assessment has been extremely worthwhile indeed. IATEFL TEASIG

14

Best of TEASIG Vol. 2


References University of Southampton. Assessing Oral Presenta ons and Group Work. 25 October 2002 hNp:// ssing_oralp.htm. Bachman, L. F. (1990). Fundamental Considera ons in Language Tes ng. Oxford: OUP. 25 October 2002 hNp://www.disal.com.br/ Dellar, H. General English is Spoken English. nroutes/nr11/pgnr11_12.htm. Gibbs, G., Habeshaw, S. and Habeshaw, T. (1993). 53 Interes ng Ways to Assess Your Students. Bristol: Technical and Educa onal Services. McNamara, T. (2000). Language Tes ng. Oxford: OUP. Kitao, S. K. and Kitao, K. (1996). Tes ng Speaking. 5 November 2002 . Weir, C. (1995). Understanding and Developing Language Tests. New York: Phoenix ELT.

IATEFL TEASIG

15

Best of TEASIG Vol. 2


2004

IATEFL TEASIG

16

Best of TEASIG Vol. 2


Current issues in performance assessment Kari Smith, Department of Teacher Educa on, Norwegian University of Science and Technology, Trondheim, Norway Original publica on date: IATEFL TEASIG Newsle er January 2004 This ar cle is based on the talk Kari Smith gave at IATEFL Slovenia in September 2003.

I would like to start this paper by defining assessment as I see it to make it quite clear to the reader that the paper does not deal with evalua on of educa onal projects and programmes, neither with psychometric tes ng of a huge number of testees. The defini on of assessment referred to in this paper is: Assessment as a set of processes based on which we learn and make inferences about a learner's learning process, progress and product. The learners' learning is the focus of assessment which includes various processes applying a number of tools. Assessment is not absolute, it is not measurement; it consists of conclusions drawn from inferences made about evidence (performances) of learning.

Gipps (1994) claims there are three main purposes for assessment, each of which requires different assessment tools and approaches. Assessment of learning is carried out for accountability purposes; to see if schools and teachers do their jobs, and if learners reach required standards. This kind of assessment is oPen ini ated by policymakers, decision makers external to the educa onal se8ng itself. A frequently used tool is (a set of) standardized examina ons, designed by tes ng experts, which oPen reflect a na onal curriculum. The examina ons are given to all pupils or to a sample of pupils, and the findings are used to check if the system meets external demands, A second func on of assessment is for cer fica on purposes; learners are cer fied at the end of a learning process or course, and the assessment, the grade they get on the cer ficate, informs others of the extent to which learning meets specific standards of a learning process or course. This kind of assessment oPen forms a combina on of external examina ons and internal teacher assessment. In Israel, for example, the teacher's mark is 50% of the mark on the cer ficate, and the other 50% is the mark on an external examina on. The teacher's voice is heard, meaning that the mark represents the process of learning as well as the product of learning. The third func on of assessment is assessment for learning in contrast to assessment of learning. This func on of assessment allows for mul ple assessors, teacher, peer and self-assessment. Furthermore, the importance of the assessment does not lie in the mark, but in the informa on learners receive about the performance. This is what we call forma ve purposes of assessment as feedback is fed into the learning process with the inten on of improving it. Summa ve assessment as prac ced in the two other purposes discussed above, accountability and cer fica on, is given at the end of the learning process and is usually expressed in a mark.

The three func ons discussed above fit in to what Mabry (1999) calls the three assessment paradigms, the psychometric, the contextual, and the personal assessment paradigm. The psychometric paradigm represents the objec ve, standardized tes ng, usually in a mul ple-choice format. There is one correct response to each test item; thus, the test is objec ve and reliable. The psychometric paradigm is used when a huge number of candidates need IATEFL TEASIG

17

Best of TEASIG Vol. 2


to be tested for selec on purposes and, in my opinion, it is not suitable in an educa onal se8ng, to be used in schools. The purpose of the examina on is decision-making; thus, it takes on a summa ve form. The contextual paradigm, on the other hand, advocates assessment approaches and tools which have been developed for a certain context, a certain class, for example. The teacher is responsible for developing the tool, and the purpose of the assessment is forma ve as well as summa ve. The learners' voice is heard in planning the assessment instruments as well as during the correc on and marking process. The personal (ipsa ve) paradigm focuses on the individual learner from an intra-learner perspec ve. The assessment deals with the personal progress of the individual learner in rela on to personal standards and does not concern itself with external and common standards. The assessment items are mainly subjec ve in character, and the learner is engaged in the assessment process and takes on a responsible role. The main func on of this paradigm is to enhance learning and growth. It is me-consuming and might be found difficult to prac ce in large classes and in se8ngs where the teaching is dictated by an external curriculum. Whatever func on we need assessment for, whatever paradigm we choose to use, it is important to be modest about the assessment, to be aware of the fact that assessment is, aPer all, an intelligent guess! Assessment is meant to describe a learner's present competence in the target language. Competence is, however, hidden within the learner, and cannot be seen or heard. What we see and hear is the performance of the competence, what the learner chooses or is able to perform when triggered or asked to do so. The only thing that can be assessed is the performance, and the more samples of performances we collect and the greater the variety of performances, the more reliable the evidence on which we base the assessment will be. Moreover, the more intelligent our guess is, the more reliable the assessment becomes. If several people assess the same performances using the same criteria, we are likely to increase the reliability of the assessment. Good assessment strives to be as valid and reliable as possible; however, we need to be constantly aware that no assessment can ever be absolute. It is all rela ve, which means that we have to be modest about what we are able to say about a person's language competence, about the assessment.

When discussing assessment in educa onal se8ngs, the main focus should be assessment in rela on to learning, in our case language learning. Learning is not, however, only the outcome or product of learning. Learning is a process, learning means progress, and learning has a product. I call these the three Ps of learning, Process, Progress, and Product. All three have to be represented in the assessment of learning, but not necessarily to an equal extent. If we work with young learners, the focus of assessment would be the process of learning, as a major aim is to help learners develop proper learning strategies. When working with special educa on pupils, the focus might very well be on the individual progress in learning, to see how the learner progresses in rela on to her/himself. In an exam class, last year of school, the emphasis is oPen on the product of learning which is examined by external examina ons. However, whatever the se8ng is, we need to keep in mind that all three Ps need to be included in the assessment, and decide on the weight of each P in the respec ve context. Learners and parents have the right to be informed about the assessment focus and the weight of each P.

Before we can answer the ques on, it is necessary to examine what language performance consists of. Parts of language performance lend themselves to measurement, such as spelling (we can count the number of spelling mistakes), grammar mistakes, speed of reading, speed of speaking, recogni ons of facts in a reading passage, and so forth. However, there are also elements in language performance which cannot be measured – they cannot be

IATEFL TEASIG

18

Best of TEASIG Vol. 2


counted. How is it possible to measure comprehension, aural as well as wriNen? Can we possibly measure communica on – how much communica on has been made? Communica on is the mirror of comprehension, what one communicates that the interlocutor or reader comprehends. Both concepts are uncountable, and cannot be measured, yet they can be assessed. If we accept the above theory, it means that we cannot measure language performance as a whole, only parts of it. Therefore, we need to accept the fact that language performance is to be assessed and not measured, and this includes a great deal of modesty about the (hopefully) intelligent guess we make.

Tradi onally, language performance has been divided into the four skills, listening, speaking, reading and wri ng. This division has historically been sub-categorized as ac ve and passive skills, recep ve and produc ve skills, internal and external skills, among others. Recently there is also much discussion about the integra on of skills; we talk about things we have heard or read, a phone message needs to be wriNen down, we write notes from lectures, etc. But the division into the four skills has always been the prominent way to look at language. However, the me has come to seek for other ways of looking at language and ways in which language is used. In Israel a new syllabus for the teaching of English in schools has recently been published which presents language in terms of domains instead of skills. There are five domains in which language is used: •

Social interac on

Ge8ng access to informa on

Presen ng informa on

Apprecia on of language and literature

Awareness of language

Each domain requires different teaching and assessment ac vi es which break down the tradi onal division of the four language skills (Israeli Ministry of Educa on, Culture and Sport, 2001).

Social interac on is s ll oral interac on and communica on, but it is also wriNen interac on. Today's youngsters' social interac on is mainly carried out with the aid of electronic messages, chats and forums. Electronic social interac on has developed without the support of language teachers, and oPen to the horror of some, into a language of its own, which is a combina on of wriNen and oral language, of formal and informal language. It pays no respect to tradi onal spelling and grammar rules. Today this has become the prominent version of English used by teenagers, and even by some English teachers. Can the ELT world afford to ignore this form of social interac on in teaching and assessing language? A suitable assessment ac vity in this domain is to ask learners to join a forum and to bring a printout of their contribu on to the forum for assessment purposes. The main criteria would be: Did the learner understand the other messages on the forum? Did the learner get her/his own message across? These ques ons deal with the core of social interac on.

Ge8ng access to informa on means knowing how to find informa on in a wriNen text, in spoken texts, and even how to elicit tacit knowledge from other people. This domain includes readings of different kinds, finding

IATEFL TEASIG

19

Best of TEASIG Vol. 2


informa on on the internet, and even carrying out interviews with people or wri ng ques onnaires to elicit opinions and a8tudes. Knowing how to get access to informa on is one of the most important learning strategies of today. Moreover, people of tomorrow need to func on in a world with a constant flow of new informa on, in a world in which what we know today may become irrelevant tomorrow. Assessment ac vi es focus on tasks in which learners are required to find informa on about a certain topic from a variety of sources: books, journals, newspapers, internet, people, television and radio (which is s ll relevant for some),

The domain of presen ng informa on deals with how to present collected informa on and personal knowledge to others. This is done in the form of wriNen papers, lectures, PowerPoint presenta ons, exhibi ons, diagrams and tables, among others. Learners need to know how to be coherent, brief and concise in their presenta on of informa on, to be able to catch and maintain the interest of the receiver(s) of the informa on by using a variety of means. It is natural to integrate the assessment task of ge8ng access to informa on with the domain of presen ng informa on, as these are closely related to each other. We cannot assess if learners have found informa on and understood it unless they present this informa on in a variety of forms reflec ng their personal interpreta on of the informa on.

These days, when technology takes on such a prominent role, it becomes important that the art of language use and the beauty found in literature are not overlooked in the English classroom. Literature is an integrated part of language classrooms in which teachers realize that language represents a culture, and that using language well and appropriately is an art. Art is appreciated if it speaks to the individual, and the individual learner must be given tools and courage to express her/his apprecia on of arts presented by English, through English. Therefore, learners benefit from being given basic tools for how to analyse literature. Apprecia on is, however, personal and individual, and teachers would be wise to provide learners with space during the learning process and in the assessment task to express personal opinions, including cri cism. The assessment should focus on how personal apprecia on is expressed and not what it says.

The final domain to be discussed is language awareness, which can be seen as meta-cogni ve use of language. Learners are taught how language works, its construct, and its many various forms. This includes structure, lexis, phonology and language register, for example, and goes far beyond tradi onal teaching of grammar of language. Teaching focuses on understanding language, aiming at enabling the learners to explain why a certain form of language is used in a specific situa on. This is very much related to the why-ques on, which should also take a prominent place in the assessment ac vi es.

Assessment becomes an integrated part of the learning/teaching process, especially forma ve assessment and, as such, we need a variety of assessment tools which highlight different aspects of language learning. Tradi onally, tests and examina ons served as the main assessment tool, whereas today English teachers have a wide repertoire of assessment tools from which to choose. The main criteria for selec ng appropriate tools are that they promote learning and provide evidence of language performance, of performance that can be counted as well as language

IATEFL TEASIG

20

Best of TEASIG Vol. 2


performance which is assessed impressionis cally. Alterna ve assessment is a commonly used term these days, deno ng alterna ves to tradi onal tests and examina ons. Personally, I prefer to use the term ‘complementary assessment’ as I do not want to get rid of tests and examina ons. We need, however, to apply a variety of assessment tools to complement the documenta on of performance and base our assessment on more solid evidence. We create a profile of a student's learning, and not only a snapshot taken with the help of a test. • Por?olio assessment

A porFolio is a purposeful collec on of student work collected over a period of me. There is a working por?olio and a presenta on or assessment por?olio. The working porFolio is a neat collec on of worksheets and assignments the learners worked on during a course. What makes a working porFolio differ from a regular notebook or file is that the learners are requested to revise their first version of the assignment, so there are at least two versions of each assignment included in the porFolio: the first draP and the revised version. A second difference is the reflec on aNached to each assignment. The learners are requested to reflect on the learning process as well as on the product of learning. The presenta on porFolio, on the other hand, is a selec ve file which includes assignments used for assessment and not only for feedback. The teacher might want to include some core assignments in the presenta on porFolio, to make sure that there is some standardisa on in the assessment. The learners are invited to choose a fixed number of assignments they find best represent their present stage of learning. The learners choose what evidence of their competence will be presented and assessed (Smith, 2002). • Self-assessment

Self-assessment is a necessary tool for students aiming at becoming independent life-long learners. When engaged in self-assessment, learners are asked to give value to their own work, process as well as product, according to specified criteria and standards. Most learners benefit from being trained in self-assessment, and this can be done by asking them to assess homework according to an answer key. They can also be asked to discuss homework jointly, or even tests in groups, to reach an answer key, which is presented and corrected by the teacher if necessary. As a second step, individual learners are asked to check their own test based on the answer key developed. When learners are asked to work on open learning tasks, such as essay wri ng, open ques ons to reading material or even invited to express personal opinion, it is useful to let the learners work in groups to develop criteria for assessing the product of the open task, to present their criteria to the whole class, and then the final rubric for marking is developed with the help of the teacher. In order to be accurate self-assessors, learners should be fully informed about assessment criteria, and this is best done by involving them in the development process. Moreover, much learning takes place when developing criteria because learners get a deeper understanding of the task. Self-assessment does not mean, however, that there is no teacher assessment. Self-assessment is most useful when it is monitored, and therefore it is important teachers assess the same tasks. The two assessments are then compared, the average score is calculated and given to the learner, unless there is a discrepancy of 10 percent or more. If this is the case, the final score is decided in a mee ng between the teacher and the learner. Teachers who work with self-assessment in their classrooms help learners develop a skill which is not only essen al for independent learning, but also for ac ng as self-cri cal mature human beings. • Peer-assessment

Where performance assessment is concerned, peer-assessment becomes a useful assessment tool. As in the case of

IATEFL TEASIG

21

Best of TEASIG Vol. 2


self-assessment, learners need to have a clear understanding of the assessment criteria, and this is best done by involving them in developing the criteria. From my own experience I have learned that peer assessment is best applied for forma ve purposes, to provide useful informa on to the learner, and less popular when used summavely and presented in form of a score or mark. Learners do not like to give a mark to their friends, whereas they have been found to be happy to provide informa ve feedback. • Group assessment

Many teachers avoid giving group tasks for assessment purposes because they do not know how to assess these and are oPen unhappy about giving a group assessment instead of an individual assessment. A solu on to this dilemma is to divide the final assessment of group projects into three parts. Part one is the assessment of the group product, the same for all students. The second part is peer-assessment of the contribu ons to the group project. This means that group members assess each other’s contribu on to the final product. The final part is an individual reflec ve essay describing the learner's individual learning process with the task. Thus, the final assessment becomes individual, even though the task is dealt with in groups. • Tests dressed differently

Tes ng s ll plays a prominent role in performance assessment, and this paper does not argue in favour of ge8ng rid of tests. Tes ng is, however, only one assessment tool, and the teacher is wise to apply a variety of tools. Furthermore, there is s ll room for crea vity in test design and administra on. Tests can be given in pairs, for example. This allows for intensive peer learning prior to and during test taking, and the score is the same for the two learners. Another approach is to give the learners five minutes to discuss the test ques ons aPer the test has been handed out, before they sit down and take it individually as a tradi onal test. The five minutes oPen help to ease tension and does not really make any difference to the test result. Finally, a popular test with my students is when they are allowed to bring in one page of A4 paper into the test situa on. They can write whatever they want on this paper and are free to use it as they want. This has proved to help learners prepare for the test, deciding what is important enough to write down, and it reduces much of the test anxiety. Besides, the legi mate piece of paper has made all the illegi mate pieces of paper (chea ng notes) irrelevant.

Conclusions Current issues in performance assessment as presented in this paper promote new ways of defining assessment as a set of processes instead of a one- me test. Assessment serves various func ons and different assessment paradigms suit specific func ons. Furthermore, it is suggested that teachers focus on progress and process of learning and not only on the product. The tradi onal division of language into the four skills has been ques oned, especially in light of new technologies, and an alterna ve framework of domains has been proposed and described. The final part of the paper presents in brief a variety of assessment tools to be used in the language classroom. The purpose of wri ng this paper has been two-fold: to offer a different view on language performance in terms of teaching and assessment, and to suggest complementary prac cal ideas to tradi onal prac ce in language teaching and assessment. It is up to the individual teacher to decide on the approach to assessment in her/his context of teaching.

IATEFL TEASIG

22

Best of TEASIG Vol. 2


References Ediger, M. (1993). Approaches to measurement and evalua on. Studies in Educa onal Evalua on, 19/1:41-49.

IATEFL TEASIG

23

Best of TEASIG Vol. 2


2005

IATEFL TEASIG

24

Best of TEASIG Vol. 2


The case for authen c language assessment Peter Davidson, Zayed University, United Arab Emirates Original publica on date: IATEFL TEASIG Newsle er October 2005

The purpose of this ar cle is to highlight the benefits that authen c assessment has over other more tradi onal forms of assessment in the hope that test writers will devote more me to developing authen c assessment. It is argued that this will result in assessment that more accurately measures students' skills and abili es, and has a more posi ve washback effect than tradi onal types of assessment.

As with much of the terminology used in language teaching, authen c assessment has different connota ons to different people. In their book Authen c Assessment for English Language Learners, O'Malley & Valdez Pierce (1996: 4) define authen c assessment as “the mul ple forms of assessment that reflect student learning, achievement, mo va on, and a8tudes on instruc onally-relevant classroom ac vi es”. They cite performance assessment, porFolios, and student self-assessment as examples of authen c assessment. However, these types of assessments, as well as tasked-based assessment, which is also oPen used synonymously with authen c assessment, may not necessarily be authen c. The key factor that differen ates authen c assessment from other kinds of assessment is that in an authen c assessment the test task replicates as closely as possible the type of task that the test-taker would be required to perform in the target situa on (Wiggins, 1990; Campbell, 2000; Mueller, 2003; Davidson, 2005). As noted by Lund (1997: 25), “Authen c assessments require the presenta on of worthwhile and/or meaningful tasks that are designed to be representa ve of performance in the field ... and approximate something the person would actually be required to do in a given se8ng.” A more suitable pseudonym for authen c assessment than those terms men oned above may well be ‘real-life assessment’.

For the purposes of this ar cle, tradi onal assessment refers to large-scale, high-stakes, standardized assessment that u lizes test task types such as mul ple-choice ques ons (MCQs), true/false ques ons, cloze tests, short answer ques ons, vocabulary matching, etc. Essen ally the tests used in tradi onal assessment derive their constructs from a par cular model of language, such as Canale & Swain (1980), Bachman & Palmer (1982), or the pervasive Bachman (1990) model. In such models, language is subdivided and broken down into categories and subcategories which test-writers use as the basis for the constructs that their tests aim to measure.

However, despite the value of such theore cal models, from a pragma c perspec ve it is readily apparent that it is almost impossible to generate an authorita ve model of something as complex as language. Consequently, subsequent models of language and test-taker performance have tended to be less detailed and defini ve, and more general and tenta ve (e.g. Bachman & Palmer, 1996; O'Sullivan & Porter, 1996). Much of what cons tutes tradional language assessment, therefore, could be viewed as the tes ng of ar ficial constructs in ar ficial situa ons. This may in part explain why tradi onal assessments frequently employ the use of unusual test task types. Over the years many test writers and language teachers have become desensi zed to the inherent strangeness of the types

IATEFL TEASIG

25

Best of TEASIG Vol. 2


of tests that we oPen write and give to our students. Take, for example, the ubiquitous MCQs. These are used so oPen in language and psychological tests, and have become so familiar, that few teachers would raise concern when test writers produce a test that includes this type of item. But try to think of an instance outside of the language tes ng environment when you have to answer a ques on in which you are given four op ons and you have to choose the best answer. It has been argued, perhaps somewhat cynically, that the only construct that MCQs assess is the test takers' ability to answer MCQs. Equally problema c is the cloze test. What a cloze test actually measures is the subject of considerable debate. The only prac cal applica on of a cloze test I can think of is if you happened to be reading outside and it started to rain and every nth word was deleted. Perhaps the most bizarre of all test tasks is the “Find a word in paragraph X that means X.” I cannot think of any situa on when you would be required to do this in real life.

Another problem with tradi onal assessment is that, in an aNempt to keep ght control over the tes ng environment and ensure the reliability of a par cular test, test administrators oPen impose limita ons that are contrary to real life situa ons. For example, in tests of wri ng that I have been involved with, students were not permiNed to use computers to word-process their essays, nor were they allowed to consult a dic onary or a thesaurus. The fact that typing and using a dic onary and a thesaurus were actually part of the curriculum made such a decision more puzzling. What I am concerned about is that we are ge8ng to the stage where we almost expect language tests to bear liNle resemblance to the types of ac vi es that students will be required to do once they leave the surreal world of the language classroom. For example, did anybody think that the recent word associates vocabulary test or the Yes / No vocabulary tests were a liNle off beat?

What then, are the benefits of authen c assessment? Firstly, authen c assessment has the poten al to have greater construct validity than tradi onal assessments because it u lizes tasks that are more genuine than those found in tradi onal assessment. Consequently, authen c assessment can more accurately measure student's skills and abili es because it measures these skills and abili es in a more direct way. Reading a text under med condions and then answering a series of mul ple-choice ques ons about that text, for example, is an ac vity that is seldom done outside of the tes ng situa on. The indirect nature of the test tasks used in tradi onal assessment inevitably results in construct irrelevance variance – that is, the contamina on of the measurement due to the tes ng of constructs that were not intended to be measured. For example, the construct validity of a test of listening ability may be undermined by ques ons that students are required to read, resul ng in a test that is also measuring students' reading skills. The nature of authen c assessment allows the test writer to broaden the type of constructs being measured to more accurately represent the true complexity of language. Secondly, authen c assessment usually has more content validity than tradi onal assessment because authen c assessment more comprehensively measures the content of what has been taught during a par cular course. Rather than assessing a sample of constructs which tradi onal assessments measure, authen c assessment has the poten al to measure a greater range of constructs. As such, tradi onal assessments are usually rela vely short, whereas an authen c assessment is likely to take much longer as it requires students to demonstrate a range of skills and abili es. Authen c assessment more effec vely lends itself to assessing integrated skills at the macro level, while tradi onal assessment is beNer suited to measuring discrete skills at the micro level. Furthermore, the limited number of ques on types that tradi onal assessments use, such as MCQs and true/false ques ons, invariably limits the number of constructs that can actually be assessed. Finally, whereas tradi onal assessment is usually summa ve in nature, authen c assessment is oPen forma ve as it carried out throughout a course rather than at the end of a course.

IATEFL TEASIG

26

Best of TEASIG Vol. 2


Thirdly, there are a number of differences regarding the scoring validity of tradi onal and authen c assessment. With tradi onal assessment the tes ng environment is ghtly controlled to avoid differences in test scores from variables outside of the test-takers’ influence. Great effort is made to ensure that the marking of the test is as objec ve as possible, and oPen automated scoring is employed to reduce human error. Because tradi onal assessment only measures a sample of constructs, test writers employ an array of psychometric measures to determine the reliability and validity of a test before making inferences about what the test results actually mean. Authen c assessment, on the other hand, may produce lower scoring reliability than tradi onal type tests because of varied environmental test condi ons and the subjec vity of human raters. In order to ensure valid and reliable authen c assessment, greater emphasis is placed on the wri ng of criteria, rater-training, calibra on, and the monitoring of raters. Perhaps the most important difference regarding tradi onal and authen c assessment has to do with consequen al validity. Whilst we are all aware that the purpose of a test is to measure how well a student has met a par cular set of learning outcomes, we oPen forget that an addi onal, and equally important, purpose of tes ng is to facilitate learning (Biggs, 1998; Black & Wiliam, 1998; Dwyer, 1998). Authen c assessment has a significantly more posi ve washback effect on the classroom than tradi onal tests. Rather than focusing on scores and passing a test, authen c assessment focuses more on comple ng a realis c task that will have some future relevance for the students. When u lizing authen c assessment, the great transgression of ‘teaching to the test’ is not as problema c, as the test embodies the learning outcomes of the course anyway. As a consequence of the authen city, relevance and comprehensiveness of authen c assessment, many teachers and students feel that this type of assessment has higher face validity than tradi onal type assessment. Finally, because authen c assessment requires students to demonstrate their skills and abili es on tasks that simulate the types of ac vi es that they will be required to carry out in the future, authen c assessment is a much beNer predictor of future success than tradi onal assessment.

The purpose of this ar cle is not to argue against the use of tradi onal forms of assessment, which clearly are necessary when we need to assess large numbers of students in a short period of me. What I am sugges ng, however, is that rather than pu8ng efforts into designing more ar ficial and increasingly strange tests, test writers would be beNer off pu8ng more me and effort into more authen c types of assessment that are broader and more precise measurements of a candidate's performance and have a much more posi ve impact on the teaching and learning environment than many of the tradi onal types of assessment. If you are s ll not convinced by my case for authen c assessment, then answer this ques on: If you were having brain surgery, would you rather your surgeon had

B.

scored 100% on a MCQ and cloze test about brain surgery.

C.

beau ful green eyes.

D. demonstrated that they could actually perform brain surgery.

IATEFL TEASIG

27

Best of TEASIG Vol. 2


Jones, W. (Eds.). Assessment in the Arab World,

Physical Educa on, Recreaon and Dance,

IATEFL TEASIG

28

Best of TEASIG Vol. 2


The IELTS speaking test: analysing cultural bias Rubina Khan, University of Dhaka, Bangladesh Original publica on date: IATEFL TEASIG Newsle er October 2005 IELTS is an interna onally recognized English language proficiency test. Despite this, the test is perceived to have embedded cultural biases in its structure. This paper will focus on the speaking module of the IELTS and highlight some of these features. The purpose is to raise awareness of such issues amongst test designers.

IELTS, the Interna onal English Language Tes ng System, aims to assess the language proficiency of non-na ve speakers of English and tests their listening, reading, wri ng and speaking skills for academic and voca onal purposes. This paper examines the speaking module of the IELTS test. With my background of test administra on, combined with my experience as an IELTS examiner, I have observed that there are subtle cultural biases in the speaking test content towards Western culture and norms of behaviour. A small-scale study explores and analyses the cultural bias inherent in the terminology, vocabulary, topics and ques on paNerns of the speaking test. It is assumed that the presence of these unfamiliar features make the test confusing and difficult for non-na ve and local candidates who have no exposure to foreign cultural norms of speaking, exchange and interac on. This has a nega ve impact on the candidates and their oral performance is affected. This in turn has implica ons for test designers and is an issue which needs aNen on. The paper is presented in four sec ons. Sec on 1 presents the context and background of the study. It also describes the IELTS speaking test module. Sec on 2 describes the study and draws aNen on to the limita ons. Views from both na ve and non-na ve examiners from the Bri sh Council and IDP Australia centres in Dhaka, Bangladesh have been elicited about this issue. Part 3 presents the findings of the study followed by a brief discussion. The final sec on highlights the implica ons for IELTS oral tes ng.

In Bangladesh IELTS tes ng is conducted by both the Bri sh Council and IDP Educa on Australia, IELTS Australia centres, and these are based in the capital city of Dhaka. Off-site tes ng is also conducted by these centres in different parts of Bangladesh e.g. ChiNagong, Sylhet, Rajshahi and Camilla. In Bangladesh annually, approximately 12,000 candidates from both the centres sit for the IELTS test. Academic Module (AC) candidates make up between 75 to 80 per cent (approx.) and General Training (GT) module candidates between 15 to 18 per cent (approx.). 95% of the candidates are male and 5% female. 8th posi on odule candidates among the top 25 loca ons in which IELTS was taken and 22nd posi on for General Training Module candidates. IELTS is considered an important and crucial test for Bangladeshi candidates because it gives them access to a world of opportuni es. It is a pre-requisite for admission to university, further and higher educa on, employment and immigraon. In Bangladesh the main IELTS user groups can be divided into four categories:

IATEFL TEASIG

29

Best of TEASIG Vol. 2


1. IELTS for postgraduate studies/higher educa$on Candidates who have completed undergraduate programme of studies in the country and are eager to pursue a post graduate course. In other words those who obtain a Master's degree from a foreign university. Professionals such as doctors and lawyers require IELTS for professional registra on. Doctors, for example, need IELTS to sit for the PLAB test and for applica on to the UK Medical Council. Candidates who want to do a PhD in any discipline also need to sit for the IELTS test. This is the first category of IELTS users in Bangladesh. The majority of the candidates in Bangladesh sit for the IELTS test prior to undertaking post-graduate studies or pursuing higher educa on. Approximately 90% of the candidates are Academic Module candidates seeking post-graduate degrees or professional registra on. Within this category some candidates sit for the IELTS test because they have been awarded a scholarship and IELTS is a requirement. Others who finance themselves also need the IELTS score as an admission pre-requisite.

Candidates who have passed their Higher Secondary Cer ficate exam (HSC) or Ordinary/Advanced levels (O & A levels) and want to seek admission to an undergraduate course/ degree course abroad. The second category of IELTS users in Bangladesh seeks admission to undergraduate courses in foreign universi es. Currently in Bangladesh there is a marked preference to go to Australia for undergraduate studies partly due to low tui on fees and reasonable cost of living and partly due to immigra on facili es. IELTS is not only an entry requirement for admission to the university, but also a visa requirement for going to Australia.

Candidates who want to immigrate to countries like Australia, Canada and New Zealand. The third category of IELTS users in Bangladesh is for immigra on purposes. Candidates sit for the IELTS test in order to emigrate to countries like Australia, UK, Canada, New Zealand and USA. In 1998 the Australian government decided to use the IELTS test for immigra on purposes and ever since there has been an increased demand for this very reason. Some candidates are also interested in seeking jobs abroad.

Candidates who are interested to sit for the test and are keen to assess their level of English. Finally, the last category of users need IELTS for personal reasons, for self-evalua on. A very small percentage of candidates in Bangladesh are interested in their level of English and take the IELTS test to assess it. English in Bangladesh has a foreign/second language status. The majority of the candidates who opt for the IELTS test have had 12 years of formal English instruc on at school. Despite this long acquaintance with English, their general proficiency level is low. A study undertaken by the Bri sh Council (Raynor,1995) indicated that the competence level of university entrants in Bangladesh is Band 3 (Restricted) on the English Speaking Union Scale compared with the target of Band 6 (Competent). In par cular, students' speaking skills are not developed at all because speaking skills are never taught and tested, and are not part of the curriculum. Moreover, significant differences exist in the English performance of candidates from rural and urban areas. Students living in the rural areas and remote parts of Bangladesh have very liNle or no exposure to real life prac cal English. The level of schooling is low and people are generally not well-off. They are linguis cally incompetent and as a result unable to comprehend and produce proper English. Hence their performance in any oral test is not sufficient. IATEFL TEASIG

30

Best of TEASIG Vol. 2


It is to be kept in mind that educa on in Bangladesh, as in much of Asia, operates in a “transmissional, teachercentred and examina on-oriented teaching culture” (Barnes & Shemilt, 1974, cited in Penington, 1995: 707). Cortazzi & Jin (1999), commen ng about the role of culture in Asian contexts, maintain that Chinese children are socialized into a culture of learning in which there is a strong emphasis on memory, imita on and repe ve prac ce. This also serves as a descrip on of the Bangladeshi culture of learning. Ballard & Clanchey (1991: 8) describe a similar situa on (which can be applied to the Bangladeshi educa onal scenario) and contend that “in many Asian and South Asian countries, learning strategies oPen entail memoriza on and imita on, resul ng in an approach to knowledge that encourages the conserva on of exis ng knowledge”. On the other hand, they maintain that many Western cultures encourage a specula ve or cri cal approach to learning, resul ng in an extension of exis ng knowledge. Because of this, in many Western cultures writers are encouraged to present their opinions on a par cular topic as they speculate about the various possibili es contained in the issue. This is important informa on and can help to partly explain why learners in Bangladesh in general and IELTS candidates in par cular are unable to speculate or give their opinion successfully on a topic in the speaking test. In Bangladesh the teacher is the authority figure in the classroom and the teacher-student rela onship is formal. The teaching and learning paNern is tradi onal and teacher-centred. The Western learner-centred and specula ve approach is alien for Bangladeshi learners. Students in Bangladesh have not learned to evaluate or deliver opinions or cri cal judgment because their teacher's opinion is regarded as final and respected as the ul mate. Students are not familiar with the Western concept of knowledge being constantly open to extension, revision and change. Moreover, Western interac on paNerns are different from Asian modes of interac on and hence alien for Bangladeshi learners.

Literature review I would like to embed my study within the parameters of insights drawn from a number of fields. To be specific, I would like to develop this paper from a sociolinguis c and intercultural perspec ve, and base my literature review on the work of Coleman and Holliday. My point of departure for this paper is the concept of culture. I would like to begin with a defini on of culture and use it as a star ng point for further discussion. The word 'culture' in its tradi onal sense is seen as the “social milieu that provides a group with a shared construc on of reality, a tradi on and a recipe for ac on. The rules of culture are transmiNed through learning processes either tacit or conscious” (Berry et al., 1992). The defini on of culture implies that a group needs to have a shared construc on of reality. Turning to the IELTS process of tes ng we find that two major groups are involved and both groups have their own construc on of reality. These two groups are: 1. The test designers 2. The candidates/test takers. The test designers have a shared construc on of reality and the candidates have a shared construc on of reality. The ques on to be answered is do these two shared construc ons meet or is there a mismatch? To understand this, we turn to Coleman (1984) who draws on Brian Street’s dis nc on between autonomous and ideological a8tudes. He observes that the ideological approach is culturally embedded, and recognizes the significance of the socializa on process in the construc on of the meaning of literacy for members of the society. Coleman argues for a non-universal and ideological approach, and contends that “an ideological approach contrasted with the autonomous approach allows us to consider the possibility that every society creates its own meaning”. This approach is broad and flexible, and recognizes the extraordinary diversity of human behaviour and human achievement.

IATEFL TEASIG

31

Best of TEASIG Vol. 2


Coleman (1984) observes that pervasive social a8tudes to knowledge, authority and tradi on dis nguish one culture from another and favour par cular styles of learning. He con nues to comment that the classroom reflects the values of society in many subtle ways. He adds that in a society that emphasizes respect for the past and for the authority of the teacher, the behaviour of both teachers and students will mirror these values. A society that rewards independence and individuality will produce a very different classroom e queNe. Mary Muchri (in Coleman, 1996: 129-130) points out that “the ins tu onal and na onal cultures are important in not just fostering certain a8tudes in students but also in the actual interpreta on of ques ons in the examina on”. Holliday (1994) iden fies two basic contexts: • English language educa on technologies based in Britain, Australasia, North America (BANA) • Ter ary, secondary and primary (TESEP) local contexts of the rest of the world

According to Holliday (1994) there are two basic types of professional academic culture in teacher groups – collec onism and integra onism (Holliday, 1994; see Bernstein, 1971). Holliday argues that the BANA technologies are instrumentally-oriented and integra onist e.g. skills-based, discovery-oriented collabora ve pedagogy. The TESEP technologies are collec onist. e.g. didac c, content-based pedagogy. He observes that the BANA group is commercial in its approach, is based abroad and has used integra onism as a means of expansion into the TESEP territory. The test designers belong to the BANA group, which is an essen ally integra onist-collabora ve approach to educa on. The candidates in the local contexts fall under the umbrella of the TESEP group. The professional academic culture of the TESEP teacher group is collec onist (Holliday, 1994) and, as men oned earlier, has a didac ve approach to educa on. The ques on that again arises is: is there a mismatch between these two groups? If so, does this affect the validity of the test? Bangladeshi candidates are a product of the TESEP culture, i.e. the tradi onal ethos. No crea ve lateral thinking has been cul vated – strategies consist of memoriza on and regurgita on. These learners, when they sit for the test, are at a loss to make sense of Western values and customs, and as a result do not perform in a sa sfactory fashion. Cultural knowledge is important, and Weir (1990) points out that successful performance in the speaking test depends on a number of factors and background or cultural knowledge is one of them. My argument is that there is a gap between the cultures of the two groups, and there is a perceived mismatch. Since the members of the BANA group design the test, there may be culturally sensi ve areas which may inadvertently have been overlooked by them. This oversight disadvantages the candidates in the local context. If IELTS is labelled as an interna onal test, these grey areas need to be considered. We need to consider our students, their needs, their level and the cultural and educa onal background, and make the test user-friendly and less threatening,

The revised speaking test format for IELTS was introduced in July 2001. The dura on of the IELTS speaking test is between 11 and 14 minutes and consists of an oral interview between the candidate and an examiner. All interviews are recorded. The speaking test is taken at the discre on of the test centre seven days before or aPer the other three modules. There are three parts to the speaking test. In Part I candidates are asked ques ons on general and familiar topics. For example, they have to answer ques ons about themselves, their homes/families, hobbies/interests, their jobs/ studies and a range of similar topics. The first part lasts from 4 to 5 minutes (IELTS Informa on for Candidates, 2005).

IATEFL TEASIG

32

Best of TEASIG Vol. 2


In the second part candidates are given a verbal prompt on a task card and have to speak on one topic for 1 to 2 minutes. They have one-minute prepara on me. Part 2 lasts from 3 to 4 minutes. Part 3 is a two-way discussion between the examiner and the candidate, and lasts for 4 to 5 minutes. Candidates in this sec on have to speak on 2 to 3 ques ons related to one topic. Candidates are usually asked to give opinions or sugges ons about a topic. They also are asked to compare, contrast, evaluate and speculate about a topic. Candidates are assessed on the speaking module on four assessment criteria: fluency and coherence, lexical resource, gramma cal ability and pronuncia on.

The aim of the study was to inves gate and assess whether the IELTS speaking test has subtle cultural biases embedded in its structure, vocabulary paNerns and methodology. The purpose of this study is not to cri cize Cambridge, but to make test designers aware of some of the issues and to provide feedback from a local context. Feedback from test users should not be overlooked because important informa on may be picked up (Alderson, Clapham & Wall, 1995: 221).

This study was mo vated by the convic on that very liNle research has been conducted to analyse the cultural bias in the Speaking test and that the findings would provide important informa on to test designers. As far as I know, the study is unique in the sense that no previous study has been undertaken to assess the cultural bias in the IELTS speaking test in the Bangladeshi context.

Ini ally it was decided that data would be gathered from IELTS examiners in Bangladesh through discussions. on, aPer informal discussion with IELTS administrators of both centres and considering the me constraints and the busy schedule of the examiners, it was decided that examiners would fill out ques onnaires and follow-up focus group discussions would be arranged. Examiners who had some free me and were willing would be interviewed. The items or ques ons on the ques onnaire were discussed with two expatriate and one local examiner during the prepara on stage. Some minor changes and addi ons were made based on their comments. A qualita ve approach was used to analyze the findings.

This is a small scale, ini al and descrip ve study. The study has a number of limita ons. Firstly, it was difficult to gather data due to the me constraints and busy schedule of the examiners and the poli cal climate of the country was not favourable. Secondly, many Bri sh Council examiners could not be contacted personally due to unscheduled holidays. The number of ques ons in the ques onnaire was limited by the issue of security. The range of topics and open discussion of topics was also constrained due to security reasons. It was difficult to camouflage the topics and discuss them overtly in wri ng.

Data were elicited mainly through ques onnaires only. Follow-up interviews and focus group discussions could be conducted with only a limited number of examiners. Moreover, it is to be noted that feedback from only one

IATEFL TEASIG

33

Best of TEASIG Vol. 2


category of test users was sought, i.e. data was collected only from examiners. It would be interes ng to know what test designers and administrators had to say on this issue. It would also be useful to get the candidates’ perspec ves on this maNer. In addi on, ques ons from the full range of speaking test folders were not discussed once again due to security reasons. Finally, the sample size is admiNedly small and perhaps the results cannot be generalised, but it does raise important ques ons which need to be taken seriously by tes ng authori es and test designers. Ques onnaires were distributed to examiners at both IELTS tes ng centres (BC & IDP) in Dhaka. Completed ques onnaires were collected from 16 examiners and strict confiden ality was maintained with the data collected. Table 1 gives the breakdown of examiner par cipants involved in the study: Table 1: Number of examiners who

Examiners were asked to respond to ten ques ons about the IELTS Speaking test. The focus of the ques ons was on the following items: •

Ques ons they preferred and ques ons they avoided.

Ques ons they liked to rephrase/subs tute to accommodate local customs and language.

Vocabulary items they perceived posed difficul es for the candidates.

If they leP out parts of any ques on and used some ques ons more than others.

If they consciously kept the local culture in mind while selec ng tasks from Part 2 of the speaking test.

The most challenging tasks for Bangladeshi candidates in Part 2.

Main difficul es for candidates in the discussion sec on (Part 3).

Situa ons when candidates have been embarrassed or felt awkward.

Ques ons which they felt were culturally inappropriate, culturally appropriate, exhibited knowledge of local culture, needed religious considera ons, required understanding of a specific topic, posed problems of terminology and vocabulary.

Examiner responses were collected, collated and analysed qualita vely to see what the major impressions on this issue were. The findings are summarized briefly below:

IATEFL TEASIG

34

Best of TEASIG Vol. 2


The majority of the respondents stated that they chose the topics which they considered would be easy for candidates to understand and respond to. They reported that they chose those topics which they felt •

require less rephrasing

candidates have more to say about

candidates can iden fy with and relate to

were perceived to be interes ng to candidates.

Most examiners reported that selec ons of ques ons in Part 1 depended on the age of the candidate. In general, examiners preferred asking ques ons from Frame 2 because the topics seem to elicit more ready and detailed responses. They commented that, depending on the age of the candidates, the ques ons about their job are probably more interes ng. They added that Frame 1 apparently seems to be more accessible but proves to be confusing. Candidates become confused and cannot make out whether they should talk about their hometown or about the place from where they are taking the test. One expatriate examiner commented that she chose topics based on the gender of the candidates. She reported “I find men and women differ in their comfort levels with certain ques ons”. The most preferred topics were the ones related to family, spending me, personal decisions, and giving things to other people. Some examiners stated that the sub-ques ons related to Part 3 topics are selected based on candidates’ level of fluency because the ques ons become increasingly more difficult in that part.

Topics avoided In general, examiners commented that they avoided certain topics for the following reasons: •

Poor language proficiency level of the candidates

They perceived candidates did not understand certain topics.

Some topics were not common in Bangladeshi culture.

Difficult topics

Confusing topics

According to examiners, topics (from the first and second parts of the interview) most frequently and repeatedly avoided were those related to a Western way of life, ‘health’, ‘ou ngs’ and ‘humour’. The topic ‘animals’ is apparently a very easy one, but most examiners stated that they avoid this topic. It is assumed that the reason is partly cultural, because although we do have animals around us (e.g. cows, goats, buffaloes, cats and dogs etc.), and we use them for agricultural purposes, people in general do not aNach any special (emo onal) importance to animals. In fact, people are not generally interested in pets; Bangladeshis hardly keep pets, and even stray dogs and cats do not draw sympathy. There is no culture of friendship with animals and, in general, people are not sensi ve towards them as people are in Western culture. So candidates find it hard to talk about this topic. In addi on, since the majority in the country are not financially well off and struggle to feed themselves and their families, they cannot even think of keeping a pet and feeding an extra mouth. One expatriate examiner commented that Bangladeshis do not make the dis nc on between ‘holidays’ and

IATEFL TEASIG

35

Best of TEASIG Vol. 2


‘weekends’. For Bangladeshis ‘holidays’ are Fridays, the weekly holiday in Bangladesh. The average Bangladeshis cannot afford to and do not travel much, and hence either avoiding the ques on altogether or subs tu ng ‘vaca on’ for ‘holiday’ makes some sense to the ques on content. Local examiners and one expatriate examiner objected to the topic of ‘physical ac vi es’ as being an alien topic, although Cambridge has been careful enough to explain and add 'doing exercise' to help candidates understand the no on of 'keeping fit'. Examiners complained that the terminology used is very unfamiliar and commented that there is very liNle awareness about the fitness culture, which is a rela vely new concept for Bangladeshis. This is a Western concept which has been borrowed; currently, many people are not aware of it and hence candidates struggle to grasp the ques on. They observe that candidates from rural parts of Bangladesh cannot relate to the topic. They therefore they give a blank look when asked to talk about the 'physical fitness ac vi es available in their area'. One has to be aware that in an underdeveloped country such as Bangladesh there are other pressing maNers than keeping fit. One examiner notes that one par cular form of recrea on, ‘music’, is also not a familiar topic for most Bangladeshis because many people don't have music in their lives, The majority are struggling to keep body and soul together, so in general candidates have very liNle material to talk about. The concept of 'solitude' is also an abstract and alien concept so far as the context is concerned. Bangladeshi city and village dwellers are never alone. Therefore, they cannot relate to this concept. Another topic, related to the abstract concept of ‘peace', has also been iden fied as difficult by most examiners. Examiners noted that candidates really struggle to say something on this topic. The no on of 'peaceful' is significant when used in a scenario of bustle, stress and traffic. This is rather difficult for many who live within narrow constraints and may never have thought of 'peace' as a rela ve alterna ve. All local examiners and one expatriate examiner commented that the topic related to ‘humour’ is an impossible topic and they avoid it .One examiner exclaimed, “I feel one minute prepara on me is not enough for candidates to gather their thoughts on such an unusual and difficult topic”. Moreover, when laughter is absent from the lives of the majority of people because their lives are steeped in poverty, reflec ng and extending on these kinds of topics are perceived to be a joke. Further candidates in our top-down society have not been taught to form opinions or evaluate on the spot, and as a result they feel tongue- ed and perform poorly. One local examiner commented that she consciously avoided some of the above-men oned topics with candidates who came from the rural parts of the country because she perceived that they would have difficulty handling these topics, as these are not common in our culture, and they have had no exposure to the norms and conven ons of Western culture. Candidates from urban areas have had more exposure to the target language and hence are in a beNer posi on to answer these ques ons. Local examiners pointed out that some topics are too abstract and unfamiliar for lower proficiency groups. In Part 1 ques ons about ‘sense of me’ and in Part 2 par cularly ques ons of ‘distant me’ are not easy to handle and talk about. Two local examiners reported they avoid asking candidates about ‘distant me’ because experience has shown candidates cannot say much about this topic. Another expatriate examiner remarked "I don't usually ask candidates about 'ac vi es related to distant me’ in general, because a lot of younger candidates don't understand or see the point of looking back". One expatriate examiner commented that she avoided topics related to ‘sports’ because she perceived it to be sexist in the Bangladeshi context and a topic which her experience told her “most women can't discuss thoroughly”. The same examiner observed that the topic asking candidates to speak about 'manufactured commodi es’ “seems too Western to me”.

IATEFL TEASIG

36

Best of TEASIG Vol. 2


The most frequent vocabulary items which pose difficul es were (see table):

Difficult words weekends wedding

Most of the vocabulary items iden fied by examiners are alien to our cultural back-

ground. For example, let us take the lexical item ‘ageing popula on’. One expatriate examiner commented that in Bangladesh the old people are not a separate group, so candidates are lost and puzzled when asked to talk about it. The average Bangladeshi has hardly been out of the country and therefore cannot relate to the Western concept of souvenirs. The word ‘souvenir’ to the majority of the candidates, especially to those from remote parts of Bangladesh, is therefore an alien and unfamiliar concept. The majority of the examiners reported that they subs tute an easy or familiar word for a difficult word.

wearing a watch keep-fit ac vi es giP events

urban noise me too fast/too slow souvenirs ageing popula on as opposed to

The following list of words shows items frequently subs tuted. by examiners:

wedding

marriage ceremony

career

profession, job

extended family

combined, joint family

giP

present

keep-fit going out in the evenings

physical exercise

nuclear family

single family

watch

wristwatch

Careers advice

going outside

vaca on Respondents commented that on a rou ne basis they rephrased or subs tuted the original lexical items because most candidates gave a blank look or asked for repe on or rephrasing. Some would clearly say that they did not understand the ques on.

Most of the examiners reported that they omiNed the second part of a par cular ques on because they found it to be confusing for the candidates (e.g. ‘Are you happy to do that?’ ; ‘Is that alright?’). One examiner commented that the word ‘happy’ completely throws off the candidate unless he/she is linguis cally highly proficient and conversant with Western norms of politeness: “I normally avoid this bit. Culturally, I feel our candidates do not need this bit of Western politeness (which instead of pu8ng them at ease, actually makes them feel that they are being asked another ques on which they must answer) and they tend to say, “Yes, I am happy” without comprehending and understanding.”

IATEFL TEASIG

37

Best of TEASIG Vol. 2


Another local examiner perceives another explana on for candidates not rela ng to the second part of the ques on. She draws aNen on to the culture of the Bangladeshi classroom. She goes on to say “We have a tradi onal teacher-centred classroom where the teacher is on a high pedestal and there is formal distance between the teacher and the learner. This sudden informal and friendly ques on ‘Are you happy to do that?’ in an exam se8ng confuses the candidate and he/she misunderstands the whole pragma c force of the uNerance and looks bewildered and confused.”. And so the majority of the local examiners reported that they leP out the supplementary ques ons (e.g. ‘Are you happy to do that?’, ‘Is it alright?’) because they posed difficul es for the candidates. One expatriate examiner specifically informed us that she usually leP out the part ‘Is that alright?’. Her argument is that this part appeals to Western sensi vity about the possibility of personal problems. However, in Bangladesh it simply throws the candidate off and they ask you to repeat that part of the ques on because they are confused. Most local examiners reported that they chose ques ons keeping in mind not only what they perceived to be easier topics but also the cultural nuances that might impede performance in an already stressful situa on. One expatriate examiner commented that she leaves out one or two ques ons of a topic in Phase 1 because she finds some ques ons ridiculous and some too sophis cated for the average Bangladeshi candidates. Another examiner believes that comparison ques ons on certain topics in Phase 3 are difficult for Bangladeshi candidates to understand and handle because of the underlying cultural assump ons. For example, a comparison of the way parents supported adult children in the past with how this issue is resolved in the present would be relevant in the West, but the extended family culture in Bangladesh and the rela onships between parents and children, however old they are, hardly changes. Hence it is confusing and not culturally appropriate. In general, the impression is that some topics and concepts are too biased, abstract and unfamiliar for lower proficiency level candidates.

70% of the examiners expressed the fact that they used ques ons in Phase 2 based on a number of criteria. They stated: • • •

Selec on depends on the type of candidate, e.g. strong or weak Familiar and common topics Task cards are usually chosen on the basis of the linguis c ability and cogni ve level of the candidates.

The topics/task cards that appeared to be the most frequently used by examiners in Phase 2 were the ones related to: • •

Personal experiences Events that actually take place in candidates' worlds.

Examiners generally reported that they kept in mind the level of the candidate in choosing a task card. They would choose a task card depending on what they thought the candidate would be able to answer, but selec on some mes also applied to sex and culture as there are some which are easier for Bangladeshis than others, or easier for men to answer than for women. One expatriate examiner reported: “I keep in mind the local culture but also consider age, gender, experience and background. The context has to be kept in mind. Some topics work beNer than others. Personal topics seem to go well.” The majority of examiners agreed that the topic of ‘childhood’ is universal and seems to work well. The overall comment seems to be that since Bangladeshi lives are bound up in home and family, candidates enjoy talking about these things. Some examiners showed their bias towards gender and age and professions by choosing par cular topics. One IATEFL TEASIG

38

Best of TEASIG Vol. 2


examiner categorically men oned that she uses a certain topic only with older groups; some commented that they use serious topics with professionals and working groups, topics related to physical events with males, and topics associated with family maNers with older females. Examiners commented that since the aim is to make candidates speak, there is no point trapping them with difficult ques ons and making them feel uncomfortable. That is why they avoid unfamiliar ques ons and try to repeatedly choose common topics which they feel candidates are familiar with and will be comfortable to answer.

According to examiner responses, the tasks related to ‘equipment’, ‘architecture’ and topics related to abstract concepts of ‘humour’, ‘peace’ and ‘relics’ appear to be most challenging to Bangladeshi candidates. Examiners commented that it is challenging for candidates to speak at length on these topics with just a minute to prepare their speech. They contend that me constraints, unfamiliar topics as well as insufficient background informa on regarding these topics make the topic difficult to tackle.

Answering ques ons about opinions, analysis or philosophical perspec ves as opposed to facts oPen proves tough for local candidates. It requires higher level thinking and the expression of complex thoughts interpre ng informa on, dealing with nuances, which – given many candidates' educa onal background/ training and exposure to English – can be stressful and challenging. Not understanding certain lexis, e.g. ‘holiday’, ‘present’ and so on makes candidates go astray. It is felt that candidates find it extremely difficult to extend, extrapolate and give examples in the third part of the test. This is because they have not learned to do so in their school or college years, and therefore they are not familiar with these types of ques ons. If candidates are low-level, they fail in Phase 3 because the ques ons are too difficult. “Some ques ons have a cultural bias and are too Eurocentric; think of topics such as censorship, marke ng or parental roles,” says one expatriate examiner. Another examiner commented that “in general, forming opinions and evalua ng are difficult in our tradi onal top-down society.” Candidates seem to find it quite difficult to coherently express ideas on more general and abstract topics. This is because they rarely have the opportunity to express their own opinions in the course of their formal educa on. They have liNle experience of expressing opinions. They are restricted by their view that you shouldn't disagree with a teacher (examiner).

Some comments were elicited from examiners about awkward or embarrassing situa ons faced during the test. Examiner responses are provided below: • • • •

“Once a candidate broke down in tears because she was talking of a person who had died.” “When they cannot answer the ques on.” “When a girl was wearing burqa and I asked her about clothes. She did not feel confident or comfortable.” “I asked a boy about music. He came from a devout family household where they don't listen to music.”

• • •

“When they are really stuck, despite my repeated promp ng and can't con nue.” “Some mes housewives are reluctant to speak.” “They only appear awkward when they can't answer the ques on because they don't understand it.”

IATEFL TEASIG

39

Best of TEASIG Vol. 2


Some of the responses quoted above reveal that candidates are in a dilemma. They probably could have answered the ques ons, but fail to do because of their cultural background. For instance, in the case of the 'reluctant housewives': this seems to be clearly culturally based. The answers seem to be there, but they are unable to ar culate their views.

Examiners were asked to iden fy which topics raised cultural issues for them and to indicate which topics were culturally appropriate (CA), culturally inappropriate (Cl), whether the topic required religious considera ons (RC), required knowledge of local culture (LC), required understanding of the topic (UT) or had to do with understanding of vocabulary and terminology (T/V). Analysis reveals the following:

CA

CI

LC

RC

UT

TN

7

8

5

3

5

7

Seven of the topics were considered to be culturally appropriate (CA). Eight of the topics were regarded to be culturally inappropriate (Cl). Seven items were regarded to be difficult with regard to vocabulary/terminology used (V/T). Five were considered to require knowledge of local culture (LC). Three required taking religious considera ons into account (RC). Five posed difficul es of understanding the topic (UT). It was perceived that cultural issues were significant and difficult and unfamiliar vocabulary is problema c.

Discussion The analysis of responses from examiners reveals that candidates in Bangladesh have difficulty with certain ques ons and tasks of the IELTS speaking test which assume background knowledge and vocabulary beyond their range of experience and exposure. The findings show that there are a certain number of topics and vocabulary items on the speaking test which pose difficul es for Bangladeshi candidates. Some of the topics on the test have been iden fied as unusual, uncommon and unfamiliar. It is argued that these vocabulary items and topics reflect Western concepts and paNerns of interac ons, and are not culturally appropriate for local candidates. This makes the task difficult for them and affects their performance on the test. In any oral performance Skehan (2004) iden fies posi ve and nega ve factors. According to him, familiar tasks achieve greater accuracy. On the other hand, “task difficulty relates to a number of factors including abstract or unfamiliar informa on and complex retrieval”(p.17). In this study respondents have iden fied topics which are abstract and related to unfamiliar informa on. On the issue of “complex retrieval”, it may be said that the exis ng schema of the majority of candidates in Bangladesh are set within different and fixed parameters and therefore retrieval becomes difficult. Findings of the study disclose the fact that the majority of local examiners perceive that the average candidate in Bangladesh does not have the necessary background schema and therefore lacks the imagina on, informa on or language to talk about these topics. These topics are also perceived to be culturally not very appropriate.

IATEFL TEASIG

40

Best of TEASIG Vol. 2


Vocabulary is clearly an important aspect of the construct of language proficiency and is an aspect which Cambridge ESOL is interested in tes ng (SchmiN, 2004). The list of vocabulary items iden fied by examiners also proves to be problema c for local candidates. Most candidates have trouble understanding these vocabulary items as these do not exist in their linguis c and cultural repertoire. For example, as men oned earlier the words souvenir’ and ‘holiday’ are difficult to talk about. If candidates were taking a vocabulary quiz, they could perhaps match the word ‘holiday’ with ‘ me off work’. But the task is much more complex because they need a schema to handle the ques on efficiently. They fail to relate to and expand on these terms because there is a very limited culture of taking holidays and buying souvenirs in Bangladesh. The concept is very much missing. People in Bangladesh do not travel much and there is no tourism culture for the Bangladeshis. Hence there is no terminology to accompany such ac vi es. Only the elite sec on of the society, maybe 5% of the total popula on, engages in these ac vi es. We should not be oblivious to the fact that “culture is embedded in the language itself, par cularly in the seman cs of language” (Mackay, 2000: 100). From the above findings it is assumed that some of the language used in the test is not connected to the culture of some of the centres where it is used as an L2. As men oned in the sec on on findings, the ques ons in the first part of the test, e.g. ‘Are you happy to do that?’ have been cri cized by majority of the local examiners as not only being redundant, but confusing. They reported that they avoid this bit because there is formal distance between the teacher and the student in Asian educa onal contexts, and there are set beliefs about tradi onal roles of teachers and students. These ques ons reflect an approach to learning which has reference to Western cultural norms (e.g. teacher as friend). It is felt that the test needs to consider the local culture on teaching/learning. The above discussed sample ques ons of the IELTS speaking test reflect a certain amount of cultural bias in the test. Some of the ques ons draw heavily on the target culture. When the candidates face a ques on on the speaking test which does not represent their cultural expecta ons and norms, they are in midated and lost. As pointed out by examiners, some of the interna onal target culture topics used in the test confuse the candidates. They are puzzled by the ques ons and have difficulty comprehending them because they do not have access to the addi onal informa on needed to explain some of the cultural informa on and nuances. We find some mes quite fluent and mature speakers, par cularly government officials or bureaucrats, perform quite badly because of the cultural nuances underlying some of the speaking test tasks. IELTS is an interna onal test and so let us not test candidates on the cultural norms of interac on of one par cular language. In Bangladesh, candidates are not familiar with discourse conven ons. The main difficulty is the idea of con nuity in the discourse. Candidates find it hard to deal with sub-ques ons as a separate item and answer them in an isolated manner. As men oned earlier, oPen they cannot form and convey any opinion. There are comparison-andcontrast ques ons in the third part, which they fail to handle properly. Comparisons may be cogni vely treated, but gramma cal forms and lexis of comparison are not used. The greatest problem is in the area of specula on and candidates usually answer through sugges ons and/ or recommenda ons (should/must do). They cannot focus on the ques ons nor answer to the point. The majority of the examiners felt that specula ng about changes in the future was the most challenging ques on in Part 3 for local candidates. Findings show that there are a number of culturally inappropriate topics, vocabulary items and phrases that are unfamiliar and alien and tend to confuse the candidates. This creates a stressful situa on and adds to the burden of being tested. If we relate these findings to the literature reviewed earlier regarding the culture of teaching and learning, I would like to state that Bangladeshi IELTS candidates are at a disadvantage at present. Examiners reported that they avoided certain topics and refrained from asking certain ques ons. Reflec ng on this issue, it is perceived that if examiners are restric ng and limi ng the ques ons on the test, we may well ask whether the test is effec ve. The local examiners want to meet Cambridge requirements and want to be fair to the candidates too.

IATEFL TEASIG

41

Best of TEASIG Vol. 2


Sec on 4: Implica ons for the IELTS speaking test Some possible implica ons for the IELTS speaking test are highlighted below: •

Further systema c studies need to examine the impact of the IELTS speaking test in various contexts. An important issue to consider is: does the cultural bias affect the validity of the test?

The tes ng body might need to reconsider its use of some of the speaking test items. It would be worth considering ways that tested the same type of knowledge, but did not disadvantage any group. Some suggesons could be to perhaps modify the language and maybe make the ques ons on the test a liNle more neutral. Examiners during the test are ed to frames, and it is understandable that this unity and control is required for reliability purposes. However, it is perceived that these frames should not be too rigid. Testers need to look at the issue of cultural bias more seriously. Maybe they should set out to determine which ques ons disadvantage non-Western candidates. F

Sec on 5: Conclusion

Alderson, C., Clapham, C. and Wall, D. (1995). Language Test Construc on and Evalua on. Cambridge: Cambridge University Press.

Berry, J. W., Poor nga, Y. H., Segall, M. H. and Dasen, P. R.

IATEFL TEASIG

42

Best of TEASIG Vol. 2


Raynor, J. (1995). Introduc on of Compulsory English Language at Ter ary Level. A on Behalf of the University Grants Commission. Bangladesh.

IATEFL TEASIG

43

Best of TEASIG Vol. 2


2006

IATEFL TEASIG

44

Best of TEASIG Vol. 2


Pu;ng learners in their proper place Dave Allan, NILE (Norwich Ins tute for Language Educa on), UK Original publica on date: IATEFL TEASIG Newsle er Spring 2006

IATEFL TEASIG

45

Best of TEASIG Vol. 2


IATEFL TEASIG

46

Best of TEASIG Vol. 2


IATEFL TEASIG

47

Best of TEASIG Vol. 2


IATEFL TEASIG

48

Best of TEASIG Vol. 2


A brief history of CEFR implementa on … finding a common language Maria K. Norton, Bri sh Council, Italy Original publica on date: IATEFL TEASIG Newsle er Summer 2006

Overview Recent work carried out at the Bri sh Council Teaching Centre (TC) Milan provides the basis for this ar cle. Having established the CEFR as our benchmark for level tes ng and placing our students, the next challenge was to connect the learning aims of our Adult General English courses to can-dos. Consulta ons with all the principal stakeholders, namely teachers and students, as well as the academic and customer care teams, were undertaken. We wanted to ensure that the CEFR elements introduced here were conducive to enhancing our students’ learning experience.

Introduc on The Bri sh Council Teaching Centre (TC) Milan had wholeheartedly embraced the 3 strands of the CEFR: implemen ng the language levels – to which our courses were mapped; the PorFolio – selected pages of which were included in our Student Guide; and the can-do statements, which had been adapted, renamed Learning Aims, and turned into syllabi (see Manasseh, 2004). This turnabout of our course design and delivery had been well-planned and included support for teachers. An increase in student numbers was registered the following year, yet teacher focus groups held in spring 2005 revealed that most teachers were unhappy with the 3-page syllabi and we were not, in fact, providing the homogeneous product envisioned by the project leaders. At the heart of this lay a number of inconsistencies and issues which contributed to our TC can-do conundrum. The areas obstruc ng can-do coherence are detailed below: •

Can-do descriptors were conceived of as benchmarks for performance/ language ability and so lent themselves to assessment, yet we s ll had student grades riding on end-of-year grammar and vocabulary tests!

lack of teacher consulta on

insufficient pedagogical support and training to provide a forum for trouble-shoo ng

confusion over combining textbook selec on with a can-do based syllabus, i.e. teachers did not perceive a need or value in this

a methodology statement inconsistent with our aims to send out a coherent message to both internal and external customers, i.e. teachers and students

liNle support of learning strategy training plus insufficient promo on of out-of-class resources to nurture learner autonomy.

IATEFL TEASIG

49

Best of TEASIG Vol. 2


Assessment and T involvement In order to address the issue of inconsistencies between course aims and assessment, we •

replaced the end-of-course test with a con nuous assessment system, and

set up level files housing tailor-made tasks exploi ng can-do statements specified for those levels.

These steps produced a number of benefits: •

tasks resembled what teachers do in the classroom, since they were made by a team of teachers, and so were consistent with what both students and teachers do and expect.

a backwash effect was created, mo va ng students to focus on reviewing and prac sing language.

coherence between the ‘making informed choices’ aspect of the CEFR and our project.

Another aspect of this process involved the provision and communica on of the can-do based syllabi to Ss: •

a syllabus document lis ng around 15-18 can-do statements stemming from the 4 skills was wriNen for each level, selected on the basis of which can-dos teachers would most probably cover per level.

strategy can-dos were included at each level so as to begin the process of suppor ng teaching that drew on learner-training techniques (see Appendix A).

the addi on of an introductory paragraph drawing students’ aNen on to the fact that the full list of Council of Europe can-do competencies was easily accessible through a link on our website homepage.

The benefits of this approach were principally: •

replacement of the unyielding nature of the 3-page syllabus documents that had gone before with a onepage document on which teacher feedback, solicited in the crea on stage of these new documents, was very posi ve.

the first steps towards enhancing learner training had been taken and were further supported by INSETTs by specialised teachers sharing best prac ce.

A further element of this document transforma on involved adap ng to the local culture by: •

the inclusion of a list of grammar structures to be covered; between 10-15 grammar structures suppor ve of the relevant can-dos were selected at each level (see Appendix B).

Student focus groups carried out throughout 2005 had recorded student interest in grammar, no ng frequent requests for more! Therefore, this list was added to the reverse of the one-page document.

Resources A textbook may cons tute the backbone of curriculum support but must not become a straitjacket! New edi ons of one series of ELT textbooks had recently come out and so four of our levels adopted them since the learner training element had been enhanced and they supported can-do statements through their task-based content too. IATEFL TEASIG

50

Best of TEASIG Vol. 2


Courses at other levels kept their tried-and-tested coursebooks, oPen exam prepara on textbooks. A teacher version of the syllabus document was produced, cross-referencing the can-do statements to alterna ve resource references and supplementary materials.

Teacher ownership The new syllabus documents and how they were linked to can-dos were presented to teachers so as to increase teacher involvement and to communicate the direc on the TC was taking. Teacher ownership was encouraged through workshop sessions where small groups piloted some structured tasks to explore the procedure of se8ng up can-do based con nuous assessment. Teachers were able to air concerns as this was a forum to explore such queries. They were also encouraged to create further assessment tasks to add to the bank of can-do materials. It was suggested that in a 90-hour course between 8-12 pieces of coursework should be set. So far so good; teachers seemed happy and there was a buzz in the staffroom. The project was implemented in Term 1, with teachers being made aware that we would be consul ng them for further feedback at the end of term – as this was a pilot project, opportuni es for review needed to be offered.

Student and teacher feedback One important factor built into this course design project was that of seeking feedback for ongoing evalua on. On an individual basis, teachers were supported by their mentor, usually a senior teacher or a coordinator. Any issues could then be channelled into academic team mee ngs. The last day of term was set aside as a training day and this is where teachers were put into level pools with the task of assessing a new selec on of coursework pieces as well as sharing best prac ce, commen ng on their term’s work. This was followed by an INSETT session on learner training. Structured focus groups were then held to gather even further teacher comments on all aspects of the project so far and to inform its development. Feedback was construc ve with many posi ve comments made, in sharp contrast to the focus groups held 8 months earlier.

Examina on prepara on courses With these par cular courses there was a dichotomy, however, since teachers contested the validity of working with can-do assessment when they understood their remit to be that of ge8ng students through a Cambridge exam in June. This had been raised at the level pool session in December and one sugges on consisted of providing more explicit cross-referencing to the exam in the next series of assessment tasks as well as the syllabus documents. The addi onal support of further coursework tasks supplemen ng the level files was provided in the winter term, responding to teacher sugges ons from the December training day. A small team of specialist teachers then produced assessment pieces for courses at B2, C1 and C2 levels, linking them to a part of the Cambridge exam whilst s ll being can-do based. The CEFR supports learner autonomy and in presen ng this project to the teachers I emphasised this concept in order to ensure that teachers perceived the value of learner autonomy in exam prepara on courses. In addi on, assessment and counselling are obligatory for all courses; the challenge is to make this meaningful in an exam course context by rela ng the procedures to can-do statements. This can-do basis provided a common language between the two principal stakeholders, namely teachers and students. This area is, in fact, s ll under review.

IATEFL TEASIG

51

Best of TEASIG Vol. 2


Evalua on At this point in the academic year indica ons are that far more teachers than ever before are using the CEFR’s candos in their teaching for the Bri sh Council TC in Milan. All adult students on general English courses have performed a number of assessment tasks and have been counselled by their teachers on their progress and how to enhance their performance. The difference is that the assessment system makes counselling more meaningful as it helps to provide evidence for speaking to students. Student focus groups carried out in November/December reported great sa sfac on with their course package and more focus groups are planned for April/May. More will be made of the materials available in our library and computer laboratory, which students are encouraged to use for self-study. We are currently working on a Pathways project which will draw on all these materials to provide out-of-class support and encourage learner autonomy. Our first target has been reached – that of making our courses share the common denominator of can-do based assessment to chart progress. Now further steps are being taken in order to support the learner training element – Learner Pathways are being put together for September. I have wriNen a methodology statement for inclusion in our student guide-cum-notebook, which reflects all of this work and promises courses even beNer able to support the CEFR can-do perspec ve. Successes The successes of this project so far are: •

courses mapped to the CEFR in a way which is meaningful to all stakeholders

a user-friendly con nuous assessment system based on can-do statements

more meaningful student counselling procedures

teacher involvement in the Teaching Centre’s academic direc on

clear methodology statement

beNer student informa on.

References Manasseh, A. (2004). Using the CEF to develop English courses for teenagers at the Bri sh Council Milan. In Morrow, K. (Ed.). Insights from the Common European Framework. Oxford: OUP.

Appendix A: A2 Syllabus document 2005/6 – student version Welcome to your course! Here is a summary of the key elements of your course that you will cover in class. A full list of learning aims for your level can be found at www.bri shcouncil.it. You, your classmates and your teacher will nego ate further elements of the course and you will have the opportunity to develop further by using the Learning Zone and par cipa ng in the Speaking Club. We hope you will enjoy learning English at the Bri sh Council. Your Teacher IATEFL TEASIG

52

Best of TEASIG Vol. 2


Speaking I can describe past experiences and personal experiences (e.g. last weekend, my last holiday). I can describe myself, my family and other people. I can ask for and give direc ons referring to a map or a plan. I can say what I like and dislike. Listening I can catch the main point in short, clear, simple messages and announcements. I can understand phrases, words and expressions related to areas of most immediate priority (e.g. very basic personal and family informa on, shopping, local area, employment). Reading I can understand a simple personal leNer in which the writer tells or asks me about aspects of everyday life. I can find the most important informa on on leisure me ac vi es, exhibi ons etc. in informa on leaflets. I can skim small adver sements in newspapers, locate the heading or column I want and iden fy the most important pieces of informa on (price and size of flats, cars, computers etc.). Wri ng I can write short, simple notes and messages. I can give short, basic descrip ons of events. I can describe my hobbies and interests in a simple way. Strategies I can indicate when I am following. I can very simply ask somebody to repeat what they said. I can ask for aNen on.

Appendix B: Grammar list on flipside of A2 syllabus document 2005/6 – student version Grammar •

ways of expressing the future

ques ons, ques on tags and short answers

using ar cles

describing events which happened in the past and con nue to the present

talking about past experiences

expressing quan ty

asking for descrip ons – “What ................ …. like?”

simple expressions using the gerund or infini ve

“used to” – describing past circumstances

“like doing / like to do”

comparing and contras ng

expressing obliga on

hypothesising about the present and the future

passive and ac ve voice

IATEFL TEASIG

53

Best of TEASIG Vol. 2


Assessing the speaking skills of ESOL learners James Simpson, University of Leeds, UK Original publica on date: IATEFL TEASIG Newsle er Summer 2006

Introduc on This ar cle concerns the assessment of the speaking skills of adult learners of English for Speakers of Other Languages (ESOL). ESOL refers to the English learned and taught to adult migrants living in countries where English is the dominant language. I first describe some characteris cs of ESOL learners, and the study upon which this work is based. I then outline how par cipants might view speaking tests as conversa ons or as interviews rather than simply as tests, before examining one par cular ESOL learner’s test-taking experience. Finally, I draw implica ons for tes ng the speaking skills of ESOL learners. There are about 1-1.5 million migrant adults in the UK who have a need to improve their English. About one third of these are currently taking ESOL classes. They come from a diverse range of geographical and social backgrounds, and include asylum seekers and refugees, people from more seNled communi es, so-called economic migrants escaping poverty in their home countries, people joining their spouses and family members, and – in the UK and the Republic of Ireland – EU na onals from the new accession countries. Many ESOL learners have low levels of educa onal aNainment, and liNle experience of school. It follows that they also have liNle previous experience of formal tes ng situa ons of the type which they may have to undergo in their current learning environments. But these learners need to take English language tests for a variety of purposes. For example, all students on government-funded ESOL courses in the UK have to take the new Skills for Life tests from the very first level, ESOL Entry Level 1. Progression to the next class depends on success in these tests. Learners looking for work are also obliged by many employers to hold language qualifica ons. And success in an English test is used to sa sfy the language requirement for ci zenship. So tes ng is becoming a major feature of life even for beginner ESOL learners.

The ESOL Effec ve Prac ce Project assessment The paper is based on an analysis of recordings of tests collected as part of the NRDC ESOL Effec ve Prac ce Project (EEPP), a major study of Adult ESOL teaching in the UK, carried out between 2003 and 2006 (Baynham et al, 2007; Simpson, 2006). The test is a paired format speaking test administered to learners in Entry Level 1 and 2 (i.e. beginners) classes to generate a ‘before and aPer’ measurement of progress. It was developed in partnership with Cambridge ESOL and its design is based on the KET speaking test. While we were carrying out the assessments for the project, we no ced that learners who had had liNle or no schooling as children seemed to be having more difficulty performing in the test than other students. Were there iden fiable factors which hindered these test-takers’ performances? When I took a close look at the recordings and transcripts of the tests, one notable paNern which emerged was that some test-takers behaved very differently inside and outside the actual test. The remainder of this paper describes my aNempt to account for this. I first outline the ambiguous status of the speaking test. What nature of speech event is it? A conversa on? A type of interview? Or something else? Then, drawing on this discussion, and with reference to extracts from one learner’s test experience, I provide a number of possible explana ons why ESOL learners may not perform to their ability during a speaking test.

IATEFL TEASIG

54

Best of TEASIG Vol. 2


The speaking test as conversa on What does a speaking test actually test? In other words, what is the construct? This is a fundamental ques on for language testers, which I will reformulate in two ways: (1) If an assessment is tes ng conversa on, how conversa on-like is it? And (2) if it is not tes ng conversa on, then what is it tes ng? Test-takers oPen assume that they are being tested on their ability to have a conversa on. Yet a comparison of conversa on with speaking test discourse suggests that they are quite different in nature. Spoken casual conversa on is defined by Eggins and Slade (1997) as talk which, in contrast to other speech events such as interviews and service encounters, is not mo vated by any clear pragma c purpose. Within this defini on, Eggins and Slade summarise certain differences between casual conversa on and pragma cally-oriented interac on: in terms of number of par cipants (oPen there are more than two people in a conversa on); whether or not a pragma c goal is achieved (this is not the aim of casual conversa on); length (pragma c interac ons tend to be short); and level of politeness and formality (casual conversa on oPen displays informality and humour). By these criteria at least, formal speaking assessments are clearly not conversa ons. But how conversa on-like are they? This is a ques on which has exercised testers of spoken language since the publica on of van Lier’s classic paper (1989), in which he ques oned the extent to which an oral proficiency interview (OPI) was actually an example of conversa onal language use. van Lier’s analysis of language test data claimed to demonstrate that it was not conversa on-like; rather it showed many of the features of formal interviews, for example asymmetry and interviewer control. Yet the OPI is only one kind of speaking test. In response to concerns over the asymmetrical nature of par cipa on in the OPI, tes ng organisa ons have developed other test formats which aNempt to elicit a range of responses beyond simple ques on-and-answer. These include the paired format test developed by Cambridge ESOL. At higher levels in par cular, this allows for an extended long turn and peer discussion. Even so, from an outside perspec ve, in their asymmetry, power discrepancy, and inbuilt pragma c inten on, many if not most formal speaking assessments correspond to a lay defini on of interviews, and can be viewed as interview-like events. I next ask: how do par cipants themselves view the test event?

Learners’ percep ons of the speaking test How test-takers themselves perceive of the speech event called a speaking test affects their expecta ons towards it. In our case, an assump on is made that the learners involved, because of their lack of educa onal background, do not have a well-developed knowledge of the speaking test event. They thus have a limited understanding of its aim of producing a large and extensive enough sample of language by which to assess them. Interviews with learners themselves reveal a lack of understanding of the purpose of the speaking test. In this quota on, taken from an interview for the ESOL Effec ve Prac ce Project, a learner in an Entry Level 1 ESOL class in Leeds men ons the speaking test she has recently undergone: When I did the interview with Mr John and Mr James they asked, “Do you like English?” I said yes. They asked why. A strange ques on. You need it when you go out. (Translated from Arabic; words originally spoken in English are in bold print.) The learner uses the word ‘interview’ in preference to ‘test’, and reports that she takes the inten on of the ques on ‘Why?’ at face value (‘A strange ques on.’). The extract shows that ESOL learners with liNle educa onal background may be approaching the speaking test event without background knowledge founded upon prior experience of language assessment of this kind – that is, they do not fully understand the test as a test.

IATEFL TEASIG

55

Best of TEASIG Vol. 2


Underelabora on in a speaking test In the following analysis, I talk about how learners may have an expecta on of the speaking test which differs from that of the test designers. The two extracts of data below show contras ng examples of a learner behaving differently in and

outside the test event. Learners some mes say very liNle in the test itself, but when it is over, they become quite talka ve. This is the case with Tam, a Vietnamese ESOL student who only had one year of schooling in her home country and who is now learning ESOL in a Jobcentre Plus class in Leeds. Extract 1 takes place in the first part of the test, when she is answering the test interlocutor’s general ques ons about personal informa on. 1ITam where do you come from2Tum (2) Vietnam-->3Iand which town are you from in Vietnam (2) which city4T(.) ( )5Iare are you from Saigon?6T(xxx)-->7Iok thank you (.) do you work in Leeds (3) do you have a job8Tno9Iare you a student10Tyes ( )-->11Iwhat do you study (4) what subject (1) do you study English12TYes I13Ido you like it14Tyes15Iwhy16T(2)I need er I need (.) I need company I (xxx) Extract 1: Tam (T), talking to the interlocutor (I) during the test There are instances (indicated by arrows) where the interlocutor has to use the back-up cues, part of the interlocutor script, to elicit a response from Tam. This re cence on the part of Tam is in sharp contrast to the episode immediately following the test (Extract 2): 1Athank you very much indeed2Irelax now3Tbecause I have problem stomach (.) yes4I(xxx)-->5Tyeah illness long years ago begin I am in Vietnam (3) I’m er come to UK twenty second May 1999 I can see doctor (.) tab- tablets now ( ) and when and when (2) mmm usually I am very red yeah er (2) I need tablets is yeah I am er (.) sleeps er less sleep late (xxx) yeah sleeps very late er (.) one or two in the morning yeah very red before I am um (2) learning English (.) Roundhay Road (?) because student in the last (xxx) yeah (.) I am ( ) teacher ( ) change class in there (.) yeah (.) um (.) before (2) my learning English ( ) Tuesday and Thursday er (2) ten in the morning come here (.) finish (.) and aKernoon three o’clock begin (.) yeah (.) I go back full me (.) very red 6Ayeah7Tyeah really yeah8Aand you’re red now?->9T red yeah (3) before I am problem (.) go to hospital yeah er check in inside (1) yeah problem with stomach is (9) about er (2) two year (.) er three year yeah (4) about (2) 2000 and 2001 go to hospital check in stomachs yeah take tablets I can see doctor take tablets eat now and ( )10Awhich make you red11Tyeah Extract 2: Tam (T), talking to the interlocutor (I) and the assessor (A) aPer the test It is clear that Tam is quite able to embark on a long turn, trying to get across a complex message, but did not do so in the test itself. Ross (1998) calls the phenomenon of saying very liNle in the test itself “underelabora on”. What possible factors explain the contrast between Tam’s underelabora on in the test itself and her expansiveness aPerwards? The rest of this paper suggests possible answers to this ques on.

Accoun ng for difficul es during a speaking test Lack of pragma c competence Ross (1998) suggests that underelabora on occurs when learners might not possess the pragma c competence to tackle or answer the ques on. Pragma c competence in Bachman’s model (1990: 88) encompasses illocu onary competence (“knowledge of the pragma c conven ons for performing acceptable language func ons”) and sociolinguis c competence (“knowledge of the sociolinguis c conven ons for performing language func ons appropriately in a given context”). Pragma c competence is clearly an overarching set of knowledge and ability; IATEFL TEASIG

56

Best of TEASIG Vol. 2


much is encompassed within this defini on. Because of this breadth, it is an account and overlaps with many of the following factors. Differing expecta ons of verbal behaviour It is possible that the learner, Tam, brings to the test a knowledge of a formal speaking experience, for example an interview, which she equates with the experience of the speaking test. This requires her to produce minimal responses un l the formal aspect of the test is complete (signalled in Extract 2, turns 1 and 2). At that point, outside the bounds of the test, and when all par cipants are engaged in an informal chat, she is able to produce the long turns (Extract 2, turns 5 and 9). Accuracy over fluency Though her expecta on of a speaking test may not match that of the test designers, like most learners Tam possesses a no on of correctness. Despite her lack of schooling she comes to the test with some knowledge, possibly based on knowledge of the overall dominant educa onal culture in Vietnam or of her prior learning experience in the UK, that encourages her to focus on ge8ng the answer right rather than demonstra ng her range of ability at the risk of making mistakes. This, coupled with her lack of experience of the formal tes ng situa on, may have prompted her to feel that it is beNer to say liNle in the test itself than to produce incorrect uNerances. Power rela ons With the shiP from ‘test/interview’ to ‘conversa on’ comes a corresponding shiP in the power rela ons between Tam and the testers. When Tam no longer feels that she is the subordinate partner in an unequal interac on, she is able to expand her responses. This may relate to her previous experience of other interview situa ons, for example with immigra on officials or at the Job Centre, where the ques ons were similar to those of the first part of the test, but the role and status of the interviewers was very different. Communica ve stress Speaking test candidates in general are undoubtedly under an amount of what Brown and Yule (1983) term communica ve stress, where they are in the presence of unfamiliar listeners, and where it is not en rely clear what they are expected to produce in terms of length and complexity of uNerance. Cultural expecta ons of conversa onal style Tam’s test experience may exemplify the different cultural expecta ons of conversa onal style. The insight from the ethnography of speaking is that descriptors in a speaking test may well describe aspects of conversa onal style valued by a par cular speech community and not by others (Hymes, 1974). As Young (1995: 6) summarises: “... viewed from the perspec ve of the ethnography of speaking, it is clear that the descrip ons of speaking in LPI [Language Proficiency Interview] ra ng scales are, in effect, summaries of features of conversa onal style that are considered desirable by na ve speakers of English.” There is therefore a norm in tests based on a model of communica on in English-dominant countries which all learners are obliged to aspire to. Intrusion into private ma ers Underelaborate answers might “mark the boundaries of what are considered by the candidate as private maNers” (Ross, 1998: 345). In the case of Tam, the personal nature of her extended response aPer the test suggests that this is not the case here.

IATEFL TEASIG

57

Best of TEASIG Vol. 2


Implica ons for tes ng in ESOL Most of the above factors could hold for all speaking test candidates, not just adult migrants. They are, however, of special concern to migrants at a me when ESOL learners are linked in media and poli cal discourse to issues of immigra on, asylum and social cohesion. It is in this current climate that na onal tests have recently been introduced in the UK. The obvious implica on is that test-takers should be apprised of the test format and trained thoroughly before embarking on the test. Given the high-stakes nature of tests which are designed to sa sfy a language requirement for naturalisa on or ci zenship, as well as the introduc on of na onal tests for all ESOL learners, test-taking training is likely to become an integral part of ESOL lessons even at the very lowest levels. And if all candidates for the tests are working with knowledge in common, validity is strengthened. If everyone, testers and test-takers alike, understands the test event as a test, and behaves accordingly, then the test has both increased reliability and validity as a test. Yet many learners in low-level ESOL classes lack basic schooling, which means they also lack experience of what is expected in formal teaching and learning situa ons, rendering the teaching of test-taking techniques difficult. Ul mately, we may ques on whether it is fair to expect migrant learners with liNle or no previous educa onal experience to possess appropriate and adequate understanding of the speaking test event as a test. If not, alterna ve assessment approaches may have to be explored. Transcrip on conven ons (.)

unfilled pause of less than one second

(3) unfilled pause, indica ng length in seconds ?

rising intona on

--> (arrow) a feature of interest (dash) a cut-off ( ) unintelligible speech (?) plausible guess at unclear speech

References Bachman, L. (1990). Fundamental Considera ons in Language Tes ng. Oxford: Oxford University Press. Baynham, M., Roberts, C., Cooke, M., Simpson, J. and Ananiadou, K. (2007). The ESOL Effec ve Prac ce Project. London: NRDC. Brown, G. and Yule, G. (1983). Teaching the Spoken Language. Cambridge: Cambridge University Press. Eggins, S. and Slade, D. (1997). Analysing Casual Conversa on. London: Con nuum. Hymes, D. (1974). Founda ons in Sociolinguis cs: An Ethnographic Approach. Philadelphia, PA: University of Pennsylvania Press. Ross, S. (1998). Divergent frame interpreta ons in language proficiency interview interac on. In Young, R. and He, A. W. (Eds.) Talking and Tes ng: Discourse Approaches to the Assessment of Oral Proficiency. Studies in Bilingualism 14, 333-353. Amsterdam and Philadelphia: John Benjamins Publishing Company. Simpson, J. (2006). Differing expecta ons in the assessment of the speaking skills of ESOL learners. Linguis cs and Educa on 17, 40-55. van Lier, L. (1989). Reeling, Writhing, Drawling, Stretching, and Fain ng in Coils: Oral Proficiency Interviews as conversa on. TESOL Quarterly 23/3, 489-508. Young, R. (1995). Conversa onal styles in language proficiency interviews. Language Learning 45, 3-42.

IATEFL TEASIG

58

Best of TEASIG Vol. 2


2007

IATEFL TEASIG

59

Best of TEASIG Vol. 2


Test prepara on: a joint teacher/student responsibility Mashael Al-Hamly, Kuwait University, Kuwait and Chris$ne Coombe, Dubai Men’s College, United Arab Emirates Original publica on date: IATEFL TEASIG Newsle er Summer 2007

Introduc on The prospect of a test can be extremely stressful for both teachers and students. One of the best ways to reduce stress levels surrounding tests is for teachers and students to work together on test-taking strategies. In a review of the literature on test prepara on and test-taking strategies, most educators offer specific strategies that students can employ before the test, at the start of the test, during the test, and aPer the test. We feel that many of the strategies in these areas fall specifically within the language teacher’s domain; others remain the students’ responsibility, and a large number of ar cles address these strategies in great detail. A surprising number of test prepara on strategies, however, are the responsibility of both teachers and students working together. The purpose of this ar cle is to provide teachers with some specific test prepara on strategies that they can collaborate on with their students.

Learn about the exam Forewarned is forearmed. Part of a teacher’s job responsibility is to learn about assessment and how their students are to be assessed. Teachers should find out how the system of assessment works in their ins tu on and know in advance what their op ons are. OPen teachers are the only link that students have with assessment. Students should also be proac ve where assessment is concerned. They should find out about their test schedule early and plan accordingly. Prepara on for tests requires knowing how many and what types of tests will be administered, when they will take place, and the criteria that will be used to assess performance. Students also need to carry this one step further by finding out how their results are processed and inquire about what happens once their papers have been marked and the results of their exam and assignments are added together. They must know the weigh ng of an exam and how the individual sec ons pertain to the exam as a whole. Train students in good review techniques Teachers need to encourage students to plan ahead, scheduling review periods well in advance, keeping them short and doing them oPen. If students make a semester study plan and follow it, preparing for exams should only really be a maNer of reviewing materials.

Provide guidance in the forma on of study groups Another way that teachers can help empower students in test-taking skills is to help them form study groups. One of the major advantages of study groups is that members share an academic goal and provide support and encouragement for one another. When forming study groups, it is recommended that 5-6 dedicated students get together ini ally to discuss common goals and procedures. The teacher can facilitate at this ini al mee ng. For subsequent sessions, agendas should be set to avoid me-was ng. The material to be reviewed should be decided upon in advance so that group members can come prepared. Work with your students to develop agendas and a format for these sessions.

IATEFL TEASIG

60

Best of TEASIG Vol. 2


Create review tools in class Another good idea is to work with your students to create review tools such as outlines, flashcards and summaries for use in their out-of-class study sessions. This helps students organize and remember informa on as well as condense material to a manageable size. A useful tool is the study checklist, a list of everything students need to know about the exam. As students begin reviewing, they can cross off items they know or as they review them. An added advantage is that teachers can use this document as a reminder of the materials they need to include on a test.

Look at past exams together The idea of ‘studying’ from past exams is a controversial area in test prepara on. In some cultures, students see this as an opportunity to memorize past exam content for use on later exams. Some teachers aren’t in favour of this because many want to reuse exams verba m year aPer year. However, if a teacher is doing his/her job, past exams are revised and improved upon aPer every administra on, so this prac ce does not lead to a security breach. By looking over past exams or doing prac ce tests, students will have an idea of the tasks/ ac vi es that they could encounter in the actual exam. They will also know point alloca ons for each sec on. This informa on can help them plan their study me wisely. An addi onal advantage of this prac ce is that by becoming familiar with what the test ‘looks like’, anxiety surrounding the test will decrease. In the event that an ins tu on does not allow students access to past exam papers, providing students with sample ques ons from past exams also does the trick.

Work together on an ‘exam plan’ It is the teacher’s responsibility to help students make what is known as an ‘exam plan’. This plan is basically a ‘testaNack strategy’ that students can employ at the start of or during the test. The recommended areas to cover in an exam plan are: • Preview the test paper before answering anything

This prac ce gets students thinking about the material. Point values for each ques on and exam sec on should be noted so that students can effec vely budget their me. Students should be encouraged to allocate their me propor onal to the value of each exam sec on and to allow me to review their work. • Read test direc ons carefully

Students oPen think that reading direc ons carefully takes me away from the exam. This is not true; reading direc ons saves, not wastes, me. It is crucial that students are trained to read and listen to all direc ons carefully. One of the most important test-taking skills students can have is the ability to follow direc ons. Some students are so anxious to get started on the test that they skip the direc ons altogether; this is oPen a costly mistake (Lane, 2001; Coombe & Hubley, 1998). • Test troubleshoo ng techniques

Teachers need to provide students with trouble-shoo ng techniques for tests. One such problem is the ques ons that students can’t answer or ‘go blank’ on. If students get stuck on a ques on, train them to try to remember a related fact or retrieve informa on from their short-term memory. This can be accomplished by going from the general to the specific. Another strategy that teachers can help students with is to encourage them to look for answers or memory triggers in other sec ons of the test. Whatever happens, if students get stuck on a ques on, they should be trained to not spend too much me trying to answer it. Instead, they should move on to another ques on as they can always go back to problem ques ons should me allow (Loulou, 1995). • Make educated guesses

One of the areas where teacher/student collabora on can be especially useful is that of guessing strategies. IATEFL TEASIG

61

Best of TEASIG Vol. 2


When uncertain of the correct answer to a test item, it is important to encourage students to make reasonable guesses. If done intelligently, guessing is a good strategy. One of the strategies that we’ve found useful is explaining the concept of the ‘monkey score’ (Hilke & Wadden, 2000) to students. The monkey score refers to the score that a monkey would get on an item and it exemplifies the random guess. For example, on a mul ple -choice ques on with four response op ons, a student’s monkey score would be 25%. That means the student has a 25% chance of ge8ng the item correct should he/she randomly guess at the item. Students can improve on their monkey score through the elimina on of response op ons that they know to not be correct. The monkey score is a popular analogy with students. Teachers can provide students with specific strategies (i.e. process of elimina on, etc.) to help them increase their monkey score on objec ve test items. • Use strategies appropriate to the skill area

Teachers should train students in effec ve strategies for the various language skill areas to be tested (Coombe & Hubley, 1998). If the test includes short answer, essay and mul ple-choice ques ons, encourage students to fill out the mul ple-choice part first. Train students to read all op ons carefully and eliminate those that are clearly wrong, thus reducing choices and increasing the monkey score (Forster & Karn, 1998). Answering objec ve ques ons first will help students remember the material and make connec ons between concepts. The ques ons may also contain informa on that can be used to answer essay ques ons. • Ignore distrac ons

At all costs, encourage students to ignore distrac ons during the test. Students tend to compare their test behaviour to that of others. This prac ce some mes works against them. A common phenomenon during a test is that students become nervous if they feel they are working more slowly than others. Observing that other students are farther along on the test should not cause students to change their strategy or exam plan (Gall & Gall, 1993). Learn from the test experience Each test should be a learning experience for both the teacher and the students. Teachers should go over test results with students in a mely manner making a note of specific students' strengths and weaknesses. The analyses that teachers receive aPer exams provides them with invaluable informa on that can be used to give students the necessary remedia on they need for future exams. Students need to use the feedback they get from test results to master material that they didn’t do so well on. In our experience in the Arabian Gulf, when students get their papers back, their primary concern is checking the teacher’s calcula ons and/or nego a ng for more points. This prac ce is oPen disheartening for the classroom teacher as he/she oPen spends a lot of me marking the exam and providing wriNen feedback. Remind students that in addi on to double-checking the math in the calcula on of their grade, that it is also important to learn from their mistakes so these mistakes won’t cost them points on future exams. Evaluate your performance Teachers and students should first celebrate the fact that they’ve taken and survived another test (whether they have done well or not). They should then cri cally evaluate their study and exam plan together, making changes and improvements where needed. Conclusion Teaching test-taking skills is supported in the literature and is a valuable aspect of any language program. This ar cle has provided sugges ons for long-term successful learning techniques and test-taking strategies, not quick ‘tricks’. Certain responsibili es fall within the exclusive domain of the student. However, it is necessary for

IATEFL TEASIG

62

Best of TEASIG Vol. 2


teachers to be proac ve in certain areas to beNer prepare their students for tests. It is important to stress that one idea common to all those who write on the topic of test prepara on is that despite all the test-taking skills a student possesses, there is no subs tute for content knowledge. A combina on of content knowledge plus the use of appropriate test-taking skills is a winning combina on for successful test-taking.

References Coombe, C. and Hubley, N. (1998). Empowering students in test-taking strategies. Paper presented at TESOL Arabia Conference, Al Ain, UAE. Forster, D. and Karn, R. (1998). Teaching TOEIC/TOEFL Strategies. Paper presented at the Annual Mee ng of Teachers of English to Speakers of Other Languages, SeaNle, WA, March 17-21. Gall, M. and Gall, J. (1993). Making the Grade. Rocklin, CA: Prime Publishing. Hilke, R. and Wadden, P. (2000). Taking the TOEFL test. Minimax Test Prep Series. Hong Kong: Asia Pacific Press. Lane, P. (2001). How to take tests? A three-step approach. Khaleej Times Weekend, August 31, 2001. Loulou, D. (1995). Making the A: How to study for tests. ERIC/AE Digest Series.

IATEFL TEASIG

63

Best of TEASIG Vol. 2


The CEFR – the four big issues Keith Morrow, Editor of ELT Journal, UK Original publica on date: IATEFL TEASIG Conference Selec ons 2007

When they start to look at the CEFR, or to work with it, a lot of people very quickly get bogged down in detail. This is not surprising, because there is an enormous amount of detail in the Framework. What I wanted to do in my talk at Opa ja was to remind us of the big picture, to focus on the wood rather than the trees. In order to do this, I iden fied four areas where I feel that the Framework has really important things to say and explored them in rela on to the theme of the conference. I realised as I was doing this that I had probably mis tled my talk (and this report of it) since the use of the word “issues” implies areas where problems arise and need to be resolved. This was not what I had in mind, although clearly any implementa on of the ideas from the Framework in any specific context will always raise problems which need to be addressed. In my session, however, I wanted to focus on the ideas themselves, and I wanted to kindle (or rekindle) for the par cipants the spark of the enthusiasm and excitement that I personally feel about these ideas and their applica on to language teaching and tes ng. So here in summary are what I think are the four big ideas of the CEFR, and how I see them rela ng to the work of people who are members of IATEFL TEASIG. 1. Focus on the learner This is an an dote to the view of learners which permeates some discussions of tes ng. In these, test-takers are ‘candidates’ or (even worse) ‘subjects’. The CEFR is based on the premise that learners are people, and that they are human – with human strengths, weaknesses, characteris cs and foibles. The essen al message of this is that learners are different, and the implica on is that a ‘sausage-machine approach’ to assessment, forcing all learners to jump through the same hoops in the name of reliability, is likely to misrepresent the actual abili es that learners possess. The sec on of the Framework which is of most relevance here is Chapter 5 – ‘The user / learner’s competences’. This sets out the wide range of ‘general competences’ and ‘communica ve language competences’ which learners may possess to varying degrees, and provides an essen al star ng point for anybody interested in the validity of tes ng and assessment procedures. How many of these competences does the assessment draw on? Why these? Why not others? The tension between reliability and validity in assessment is not news any more, but the emphasis in the Framework on the mul -faceted nature of language users’ competences strengthens the arguments of those who resist moves towards simple models of tes ng which claim u lity simply on the basis of high reliability. This sec on reminds us that different styles of assessment may be required for different learners (with different general competences) at different stages of their language learning (as they develop a wider range of communica ve language competences). 2. The language Ever since the early days of communica ve tes ng in the late 1970s and early 80s, researchers have been aNemp ng to describe ‘language in use’ in order to establish what content would be appropriate for a communica ve test. Over the years such descrip ons have become steadily more complex, but Chapter 4 of

IATEFL TEASIG

64

Best of TEASIG Vol. 2


the CEFR ‘Language use and the language user/learner’ pulls together lots of the elements in a (rela vely!) accessible compila on, exploring: • • • • • •

different elements of the contexts in which language is used; different communica on themes; how different purposes in communica ng lead to different communica on tasks; the communica ve ac vi es language users may take part in, and the different strategies they need to use; the processes involved in using language for communica on; the types of text that language users need to produce and understand.

This is an extensive list, but an interes ng feature of the presenta on (as elsewhere in the Framework) is the constant challenge to relate these ideas to your own context. What sort of ac vi es do your learners need to take part in? What strategies will they need? What sorts of text will they need to produce or understand? This is not an easy task, but it offers the chance for course designers and designers of assessment procedures to work from the same ‘script’. This can poten ally have drama c effects in terms of the content validity of assessment, and its washback into the classroom. 3. Levels of performance For many people, these represent the ‘big idea’ of the CEFR. The defini on of ‘Can do’ statements at six levels, both in terms of general performance and in terms of a range of competences and ac vi es, is the cornerstone of the Framework. Inevitably there have been difficul es in prac ce as poli cians, coursebook publishers, and exam boards have hi-jacked the levels for their own purposes, but from an assessment perspec ve, these level defini ons are priceless. Firstly, they have a psychological reality (the A,B,C macro levels matching the tradi onal and intui ve dis nc ons between ‘beginner’, ‘intermediate’, and ‘advanced’ levels) that gives them a powerful role in establishing the validity of test specifica ons. But perhaps most significantly, the existence of these specifica ons, and all the suppor ng sub-specifica ons, gives us the best chance yet of agreeing within and across communi es of interest exactly what we mean by ‘an advanced piece of wri ng’ or an ‘intermediate listening task’. In other words, they are an essen al aid to standard se8ng in assessment, and the basis for consistent (reliable) assessment of performance in oral or wriNen tasks. Those of us who were involved in the early days of communica ve tes ng know just how much we suffered from the lack of a suppor ng framework in both these areas. 4. Insights into assessment This sec on is brief because it is so obvious. It is simply this. If you are interested in the assessment of language performance, then read Chapter 9 of the CEFR. It’s all there. As foreshadowed in the introduc on, my presenta on was not concerned with the niNy-griNy detail of implementa on. It was intended to give a glimpse of the big picture, which is some mes blurred by the difficul es of baNling through the text of the CEFR. Next me I give a similar talk, I’ve got just the song to go with it: “Don’t Give up” by Peter Gabriel and Kate Bush. References Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press. Accessed 17 May 2020 hNps://rm.coe.int/16802fc1bf

IATEFL TEASIG

65

Best of TEASIG Vol. 2


The Occupa onal English Test: Tes ng English proficiency for professional purposes through contextualised communica on John Pill, OET Centre, Melbourne, Australia Original publica on date: IATEFL ESOL and TEASIG Joint Newsle er Winter 2007

The purpose of the Occupa onal English Test (OET) is to evaluate the English-language competence of non-na ve speakers of English who have trained as health professionals in their country of origin and wish to prac se their profession in an English-speaking context. Par cularly in the health professions, there are clear risks in having any miscommunica on in the workplace. Any language test needs to seek to ensure that candidates are prepared, in language terms, for the world of work. Background In the 1980s, the Australian federal government saw the need for a beNer measure of the English language proficiency of the growing numbers of medically trained migrants to Australia. They contracted Tim McNamara to prepare a report on this issue; the research carried out became Prof McNamara’s doctoral thesis, in which he describes in detail the methodology behind the development of the test format and content specifica ons, and how the test is assessed. In his research, Prof McNamara undertook a detailed ‘job analysis’ to find out what language skills were needed by medical professionals from non-English-speaking backgrounds. The product of this analysis is a test of English for specific purposes with a focus on the use of language in the workplace of medical and health professionals. The test is also a model in terms of good prac ce in assessment. The research inves gated the kinds of interac ons overseas-trained professionals (OTPs) were most frequently involved in and which they found most difficult. Informants were the OTPs themselves, professional educators and language support staff involved with OTPs. In the original study the subjects were doctors and nurses. The research found that interac on with pa ents was the most frequent interac on and was also viewed as difficult – issues such as explaining medical ideas in simple language for pa ents and understanding colloquial language used by pa ents came high on the list. The findings of this detailed job analysis directly informed the test specifica on. Prof McNamara sought to find common areas to test that covered the areas of difficulty and the areas which the OTPs had to deal with frequently. From a review of these commonali es, it was then recognised that the test could be expanded to other groups of health professions sharing similar language skill areas. These areas are discussed in more detail below. Recogni on and administra on The OET is currently recognised by the state- and/or federal-level regulatory authori es for 12 medical and health professions in Australia, and by the Australian Department of Immigra on and Ci zenship. It is also recognised by regulatory authori es in New Zealand and Singapore. The test is offered six mes a year at over forty venues around the world. Candidates register online and the test is administered from the OET Centre in Melbourne. All test materials are returned to Melbourne for assessment.

IATEFL TEASIG

66

Best of TEASIG Vol. 2


The 12 professions currently tested by the OET At the me of wri ng, the following professions are tested by the OET: den stry, diete cs, medicine, nursing, optometry, occupa onal therapy, pharmacy, physiotherapy, podiatry, radiography, speech pathology, veterinary science Test format

Sub-test

Length

Common v. specific

Format

Ques on types

Listening

45-60 mins

Common to all candidates

Part A: consulta on Part B: seminar/talk

Note-taking, sentence & table comple on

Reading

60 mins

Common to all candidates

2 texts on general medical topics

20-24 mul plechoice ques ons

Wri ng

45 mins (including 5 mins’ reading me

Profession-specific

S mulus material (e.g. case notes) with wri ng task

LeNer of referral or dis- charge/ transfer to inform

Speaking

20-25 mins

Profession-specific

2 roleplays: pa ent/ carer & professional

Contextualised & directed interac on

The OET is made up of four sub-tests, defined in simple terms using the four macro-skills. The focus is on the communica ve use of language skills. The importance of interac on with the pa ent, which came up a great deal in the job analysis described above, is seen in the Speaking and Listening sub-tests: the Speaking interview includes two roleplays in which the candidate takes his/her role as a health professional while the interviewer plays a pa ent (or a rela ve/carer); the first part of the Listening sub-test is a medical consulta on, usually between a GP and a pa ent, for which the candidate takes notes under given headings, simula ng taking case notes for pa ent records. The recep ve skills of Listening and Reading also have a focus on language skills for professional development and work-related research situa ons: the second part of the Listening sub-test is a seminar on a medical topic with candidates taking notes, comple ng tables, etc., while the texts for the Reading sub-test are similar to academic or professional journal ar cles. The Reading sub-test is in MCQ format. Taking the Reading sub-test as an example, the test specifica on picks out the linguis c demands of dealing with academic ar cles, and the ques ons are designed to test candidates’ ability to, for example, dis nguish fact from opinion, follow a complex argument, establish what is and is not stated, determine the rela onship between ideas, and recognise inferences made in the text. These are high-level language skills required of professionals working in a context where mistakes due to misunderstanding a text could be costly. Likewise, the Listening sub-test places high demands on the candidates in terms of listening, then understanding and recording appropriate case notes or reformula ng the informa on to suit the prompts on the ques on paper. This part of the test has very strong reliability and can be seen to be tes ng directly the skills candidates need in the workplace. The Wri ng task is usually a leNer of referral. This is common to most of the professions which use the OET, while some varia on is included for par cular professions (e.g., leNers of transfer or discharge, leNers to advise or

IATEFL TEASIG

67

Best of TEASIG Vol. 2


inform pa ents or their carers). Candidates are given s mulus material, oPen case notes, and the wri ng task they are to complete. Candidates are expected to demonstrate the ability to communicate clearly and appropriately to the intended reader based on an accurate assimila on of the case notes provided (through selec ng relevant informa on and presen ng it with a suitable level of detail), and to write appropriately polite and formal content for the context and audience. What the OET tests cover The OET is intended as a test of English language skills through a professional context, while the regulatory authori es for each profession are responsible for tes ng professional skills. It is important to be aware of these areas of overlap; this is a balance all tests of English for specific purposes have to find. Language skills also overlap with communica on skills and intercultural knowledge. The OET is not directly tes ng candidates’ personali es or their ability to work with other people or to adapt to a different working culture (e.g., where pa ents don’t always accept what they are told by a prac oner, or where a different hierarchy exists among health professionals). However, the OET does expect a candidate to have the language skills to deal with difficult situa ons. For example, in the Speaking roleplays, the context is not a friendly conversa on with the interviewer. The roleplays are designed to introduce an element of tension which the candidate has to cope with (e.g., anxiety, confusion or anger). The candidate is expected to take their professional role and lead in the roleplay scenario, demonstra ng they have the language skills to do this confidently: dealing professionally with ‘difficult’ people and unexpected situa ons. Test development Materials for the Listening and Reading sub-tests are created and trialled for the OET Centre by staff at the Language Tes ng Research Centre (LTRC) at the University of Melbourne, world-renowned experts in language tes ng. The profession-specific materials for the Speaking and Wri ng sub-tests are developed by writers at the OET Centre working with professional experts nominated by the regulatory authori es who review and give feedback to ensure the currency and accuracy of the material and that newly developed materials are appropriate for use in a language test. Materials writers work to create realis c tasks that elicit a varied and appropriate sample of language from the candidate, e.g., a roleplay might require the candidate to ques on and probe, then to explain and persuade. Assessment The reliability of the OET is greatly supported by the prac ce of rou ne double-marking for the Wri ng and Speaking sub-tests: each candidate script or recording is assessed independently by two assessors, giving two perspec ves. Assessors grade on five criteria, each with six level descriptors. These scores are then analysed for the OET Centre by staff at the LTRC using mul -faceted Rasch analysis (FACETS) to provide a final ‘fair score’ for each candidate performance. Misfi8ng scripts/recordings are third-marked during this process. Criteria for Wri ng assessment • • • •

Overall task fulfilment Appropriateness of language Comprehension of s mulus Linguis c features (grammar and cohesion) • Presenta on features (spelling, punctua on and layout)

IATEFL TEASIG

Criteria for Speaking assessment • • • • •

Overall communica ve effec veness Intelligibility Fluency Appropriateness Resources of grammar and expression

68

Best of TEASIG Vol. 2


A propor on of Listening scripts are also second-marked to confirm consistency of marking, while the Reading subtest MCQ responses are computer-scanned. Item analysis then indicates whether any par cular item needs to be deleted because it is not performing reliably. All OET assessors are experienced language professionals who have received ini al training and then annual standardisa on to check that they are assessing according to the benchmarks set. The sta s cal analysis done for every administra on of the test also produces rater reliability informa on for each assessor. This allows assessors to fine-tune their grading at subsequent administra ons; it also gives assessors confidence that they are maintaining the standard sa sfactorily. Summary The Occupa onal English Test is contextualised and profession-specific, therefore a suitable test for a high-risk context where language skills and clear communica on are paramount. There is a direct link between test content and the reality of the workplace through the guidance from nominated experts. The OET Centre also benefits greatly from on-going collabora on with the LTRC on test materials development, test analysis, and research projects. Candidates prefer the OET as a measure of their language competence in a professional context – a test that recognizes their professional abili es and the demands that are made of them in their working lives and that is relevant to their training background and previous experience. The OET Centre is currently working to extend recogni on of the OET to other countries and contexts where English-language skills are required, and to expand the test to other professional fields.

Reading McNamara, T. (1996). Measuring Second Language Performance. London: Longman.

IATEFL TEASIG

69

Best of TEASIG Vol. 2


2008

IATEFL TEASIG

70

Best of TEASIG Vol. 2


The candidate, the essay, the test: examina on wri ng reassessed Cathy Taylor, Trinity College, London, UK Original publica on date: IATEFL TTEdSIG and TEASIG Joint Newsle er November 2008

What inspires us to develop and take exams

Or are they? What happens when we don't get the results we want? What happens when we fail? Our levels of selfconfidence take a baNering and many candidates probably don't feel like trying again or having to repeat the course. The mo va onal washback from examina ons is significant. This washback mo va on is more intrinsic and less easy to assess, but in many ways more important than extrinsic as the effects will be felt in areas beyond passing and failing, for example in terms of self-confidence, a8tudes to learning, reac on to pressure and deadlines, to men on but a few.

IATEFL TEASIG

71

Best of TEASIG Vol. 2


controlled wriNen. It is the porFolio component which addresses the issues a controlled wriNen assessment does not. The suite has 5 levels, ISE 0 - ISE IV, ranging from A2 - C2 on the CEFR. All levels include all three elements.

competence in dealing with various text types. The sec ons cover text types of increasing difficulty In preparing the porFolio the candidate is expected to use all the resources available to them, for example, the Internet, grammar reference books, encyclopaedias, word processing packages with all the referencing and forma8ng facili es these provide. Candidates are strongly

To some, the theore cal construct of the porFolio is sound, but there are fears that it is more vulnerable to plagiarism than more

is expected to be answered, with word counts strictly adhered to plus a high level of accuracy. Given all of the above it is difficult to plagiarise successfully.

IATEFL TEASIG

72

Best of TEASIG Vol. 2


Tes ng intercultural communica ve competence in English Judith Mader and Rudi Camerer, elc-European Language Competence, Frankfurt, Germany Original publica on date: IATEFL TEASIG Conference Selec ons – Dublin 2008

There can be no doubt about the increasing importance of intercultural competence in Europe and the world today. Many, if not most, language courses include an intercultural component, although the same cannot be said for courses which claimto teach or train intercultural competence, as it is rare that these include a language element. That the two cannot be separated and that, in fact, it is the language component which may even be the more important, is something that we have been convinced of for some me. Training in intercultural communica on is oPen based on the assump on that it is countries or cultures which are seemingly very different from each other which have the most problems in communica on. What is oPen neglected (although in fact it is common knowledge) is that bordering countries oPen consider themselves to be quite different from each other and have intercultural problems which may be equal in magnitude to the par es concerned as those between countries and cultures very distant from each other. These problems are by no means lessened by the use of a common language, English. That a8tudes play an important part in communica on is undoubtedly true; however, it is the performance which in the final instance determines the success or failure of the encounter. We approached the ques on of tes ng intercultural communica ve competence with this in mind and used approaches generally employed in language tes ng in the development of the tes ng procedure dealt with in the conference workshop and outlined in the following. The importance of intercultural competence was recently highlighted by two interes ng surveys and made clear that solu ons to the problem are a key need in many areas. In December 2007, the German weekly WirtschaTswoche asked 247 German managers which cultures they found it most difficult to get on with in a business context. These were their answers1:

China

33.7%

France

29.7%

USA

24.8%

Japan

23.6%

Russia

19.5%

India

17.5%

Great Britain

15.9%

The Netherlands

13.8%

Saudi Arabia

11.8%

Turkey

10.2%

There is liNle doubt that a similar enquiry carried out in other European countries would produce similar results. It is clear that there is an urgent need in Europe for intercultural competence and by no means only for dealing with partners in the Far East or the Gulf countries. Proximity does not by any means lead to greater understanding and beNer communica on; in fact, the opposite may well be the case. A second survey ELAN (Effects on the European Economy of Shortages in Foreign Language Skills in Enterprise) was published by the European Commission in December 2006 and involved managers from more than 2000 small and medium sized enterprises (SMEs) in 29 European countries. Among the ques ons asked was: Is there any possibility that your company ever missed an opportunity of winning an export contract due to lack of foreign language skills?

The actual loss reported was between €8 million and €3 million, the poten al loss between €16 million and €25 million.2 Bearing in mind that the number of SMEs in Europe is something like 2 million, it seems clear that shortcomings in intercultural communica ve skills imply a high economic risk both for individual enterprises and for the economy of Europe as a whole. It is against this IATEFL TEASIG

73

Best of TEASIG Vol. 2


background that intercultural competence has become a major topic both on the poli cal agenda and in the corporate sector. Today the number of training sessions in intercultural competence offered within and outside companies as well as in universi es and schools has reached an unprecedented level. Although it is difficult to obtain hard data, there is strong evidence that the majority of these training sessions focus on • • •

cogni ve and sensi vity training, intercultural simula ons, and business simula ons (intercultural on-the-job-situa ons).3

Although there is an element of performance in much of the training for which details are available, an equally large, if not larger part is played by cogni ve approaches, sensi vity and knowledge of cultural theory and the cultures concerned. Whether this really leads to beNer communica on between cultures is ques onable as no amount of sensi vity and understanding can compensate for a lack of the means to communicate, i.e. language and the ability to use this appropriately in the intercultural encounter. It is therefore puzzling that language plays such an insignificant role in most of the sessions. It almost seems as though intercultural competence is ‘languagefree’. The content of courses in intercultural competence has been tradi onally kept separate from language training – in other words, language competence has been considered in some way ‘culture-free’. There is clearly an urgent need to combine the two, i.e. to bring language into training in intercultural competence and intercultural competence into language training. This was our aim in developing the training and tes ng concept Intercultural Communica ve Competence in English, currently being developed by elc-European Language Competence and German chambers of commerce. Based on a defini on of competence focussing on prac cal abili es rather merely on cogni ve content4, a concept of intercultural competence as Intercultural Communica ve Competence became the central pivot.5 This implied a clear decision not to use psychometric test procedures or personality profilings. None of the test criteria applied by these, which range from ‘behavioural flexibility’ to ‘emo onal resilience’, ‘perceptual acuity’, ‘tolerance of ambiguity’, ‘adapta on’ etc., are based on an expert consensus. In fact, it would be hard to find a handful of experts anywhere who would agree on the meaning of any of the terms used above. This makes the construct validity of such tests highly ques onable. For language testers, indicators of competent performance are easier to iden fy and to validate if the premise that intercultural communica ve competence can be measured using observa on of performance as a basis is accepted. For this reason, we have opted for assessment procedures which observe and evaluate individual performance in order to reach conclusions about what a candidate knows or can do. Examining the Common European Framework of Reference and the descriptors for intercultural competence in it as well as taking into account relevant contribu ons to the world-wide debate on intercultural competence6, we have defined the following criteria for intercultural communica ve competence (ICC): 1. Knowledge about the processes and ins tu ons of socialisa on in one‘s own and in one‘s interlocutor‘s country. This knowledge oPen plays an important part in the ability to perform appropriately in a different culture to one’s own. Similarly, 2. knowledge of the types of cause and process of misunderstanding between interlocutors of different cultural origin is important. Although this knowledge does not have to be demonstrated explicitly in everyday performance, possessing it can make a great deal of difference to performance in intercultural encounters. Likewise, 3. the ability to engage with otherness in a rela onship of equality (incl. ability to ques on the values and

IATEFL TEASIG

74

Best of TEASIG Vol. 2


presupposi ons in cultural prac ces and products in one’s own environment) is necessary. This is closely linked to 4. the ability to engage with politeness conven ons and rites of verbal and non- verbal communica on and interac on. These vary from culture to culture, whichever language is being used, and the success of the intercultural encounter may hinge on this ability as well as 5. the ability to use salient conven ons of oral communica on and to iden fy register shiPs. As so much communica on takes place in wri ng and much of electronic communica on is liable to misunderstanding even between members of the same culture, equally important is 6. the ability to use salient conven ons of wri en communica on and to iden fy register shiPs. Finally, two of the components we consider crucial to intercultural communica on and, in par cular, dealing with misunderstandings, which in spite of training are bound to arise, are both 7. the ability to elicit from an interlocutor the concepts and values ofdocuments or events (i.e. metacommunica on), and 8. the ability to mediate between conflic ng interpreta ons of phenomena.

The second decision made was as to the language in which intercultural competence should be trained and tested. We chose English in its two major varia ons: Anglo-American (‘Mid-Atlan c’) and Interna onal English or English as a Lingua Franca. The test format developed and piloted includes 6 wriNen and 3 oral parts. Based on experience gained from pilo ng the test in Intercultural Communica ve Competence in English with over 300 candidates, we have been able to specify certain par cular competences which should be included in a curriculum for intercultural communica ve competence in English. The videos made during the pilo ng of oral tests demonstrate clearly and convincingly the dis nc on between language competence and intercultural competence. For further informa on on ICC, see www.elc-consult.com

Notes and references 1 LAB Managerpanel & WirtschaPswoche: Interna onale Benimmregeln – einfach ignorieren? Dez. 2007 / Jan. 2008 hNps://www.wiwo.de/erfolg/knigge-die-wich gsten-benimm-regeln-fuer-13-laender/5494642.html N = 246 2

hNps://ec.europa.eu/assets/eac/languages/policy/strategic-framework/documents/elan_en.pdf

3

Jürgen Bolten (2003), Yvonne Knoll (2006)

4

cf. the OECD defini on: “A competency is more than just knowledge and skills. It involves the ability to meet complex demands, by drawing on and mobilising psychosocial resources (including skills and a8tudes) in a par cular context. For example, the ability to communicate effec vely is a competency that may draw on an individual’s knowledge of language, prac cal IT skills and a8tudes towards those with whom he or she is communica ng” (OECD, 2003).

5

cf. the following defini on: “Intercultural Competence is the necessary precondi on for an adequate, successful and mutually sa sfactory communica on, encounter and coopera on between people from different cultures” (Alexander Thomas, Development of Intercultural Competence – Contribu ons of Psychology, 1996). 6

cf. (among others) IATEFL TEASIG

75

Best of TEASIG Vol. 2


Council of Europe (2001). Common European Framework of Reference for Languages. Cambridge: Cambridge University Press. Byram, M. (1997). Teaching and Assessing Intercultural Communica ve Competence. Clevedon: Mul lingual MaNers Ltd. Beneke, J. (2000). Intercultural competence. In: Bliesener, U. (Ed.), Training the Trainers. Theory and Prac ce of Foreign Language Teacher Educa on. Köln: Carl-Duisberg-Verl.

IATEFL TEASIG

76

Best of TEASIG Vol. 2


2009

IATEFL TEASIG

77

Best of TEASIG Vol. 2


Tests of language for specific purposes Bart Deygers, Ghent University, Belgium Original publica on date: IATEFL TEASIG Newsle er June 2009

Introduc on At Ghent University, there has been a steady growth of courses of languages for specific (academic) purposes (LSP), which aim at offering students the necessary skills and knowledge to be able to func on efficiently in a specific academic or professional context. Over the past decade, language tutors have steadily been developing not only course material, but also specific-purpose tests. To alleviate an apparent need in test construc on, Ghent University is now funding a research project which aims at (1) reviewing exis ng LSP tests qualita vely and quan ta vely, (2) developing a good prac ce procedure which provides a clear manual to guide the test development process, (3) producing digital test templates which will facilitate future electronic test development, and (4) crea ng assessment awareness and competence among language tutors.

English for veterinary sciences Out of the various tests that were reviewed, the test of English for Veterinary Sciences (EVS) was singled out for more extensive qualita ve and quan ta ve research. We decided to focus on this course because the specific purpose fields of veterinary sciences is easier to narrow down than those of humani es, economics or engineering. Narrowing down the professional field was vital in our research process since we were interested in determining the extent to which subject specialists (i.e. people working in the specific-purpose field) felt the test represented professional reality and the extent to which they felt it should represent professional reality. Based on an ini al qualita ve and quan ta ve analysis, we rewrote the first paper-based test of English for Veterinary Sciences (Paper-based 1). This rewriNen version (Paper-based 2) became the basis for the final computerised test (Computerised), which will be reviewed quan ta vely during the summer of 2008. It is important to stress here that the test of EVS is the final test of a specificpurpose language course. This implies that any decision made on test level had its implica ons at course level.

IATEFL TEASIG

78

Best of TEASIG Vol. 2

Table 1


Table 1 outlines the changes that were made to each test. APer each transi on phase we analysed the test both quan ta vely and qualita vely. The paragraphs below outline the methods used and the outcomes drawn from them. Quan ta ve methods and outcomes On a quan ta ve level, we used the results of 32 students1 to determine the facility value2 and the discrimina on index3 of the items. Table 2

Paper-based 1

Paper-based 2

Mean Facility Value (FV)

70.7%

60.5%

Discrimina on Index (DI)

47% flawed

15% flawed

Split-half reliability

.12

.77

Computerised To be analysed in summer 2008

The ini al qualita ve results (see Table 2) show a lower FV for Paper-based 2 than for Paper-based 1, indica ng that the adjustments made to the test led to increased difficulty. The measures taken in improving the discrimina on index of Paper-based 1 have paid off as well, although 15% flawed items is s ll too much; we hope to see this problem solved by the final analysis of the computerised test with a larger respondent popula on in the summer of 2008. Finally, the measures taken in the first transi on also bring about a more homogenous test in terms of difficulty. A rather striking observa on to be made here, when aligning the quan ta ve data with the respondents’ opinions regarding level of difficulty, is that the test takers do not always correctly es mate the level of difficulty of test items. The results seem to indicate that the perceived difficulty of a task is linked to increased task authen city. However, this preliminary hypothesis is yet to be confirmed or refuted by addi onal research.

Qualita ve methods and outcomes In order to determine the perceived authen city and face validity4 of the test, we interviewed 2 subject specialists, 6 PhD students and 11 Master of Veterinary Sciences students. Since “professionals typically call upon a rich inventory of tacitly known criteria in order to determine whether and to what extent some par cular performance is competent or falls short of the mark” (Jacoby & McNamara, 1999), we wanted our respondents •

to point out subject-specific conven ons Or, as one respondent (a PhD student) pointed out when commen ng on a task on condi onals: “We don’t use if-clauses. If you know something, you write about it. If you don’t know for sure, you don’t men on it.” to comment on task authen city Or, as another PhD student remarked when asked about his opinion of a wri ng task based on a graph: “You don’t write about a graph. That’s why you use a graph, not to have to write about the numbers.” to supply us with subject-specific knowledge Or, as one student said about a task which talks about not having saved a dog from a “tumour in the leP frontal lobe of the cerebrum” because a brain scan had not been taken: “The dog would have been dead anyway”.

Since we wanted to grasp our respondents’ intui ve reac on to the tasks, we also asked them to partake in a think-aloud protocol, which together with the ensuing usability analysis allowed us to list the ques ons that IATEFL TEASIG

79

Best of TEASIG Vol. 2


worked and the ones that needed refinement. Facial expressions, sighs and other non-verbal cues provided us with further input to elaborate on during the interview. In the interviews one of our main concerns was the perceived authen city of the tasks and the extent to which our respondents felt that authen city was required. We will now consecu vely deal with the authen city of the input material, the subject-specific authen city, and the task authen city.

Authen city of the input material Concerning the perceived authen city of the input material, there is 100% consensus among all respondents (subject specialists, students and PhD students), that topic-specific input is not necessary, but “nice”. Our respondents took Alan Davies’ statement that “for face validity reasons, the s muli in [LSP] tests will be field related” (Davies, 2001: 143) one step further, claiming that “You could even use fic on, but you might as well use scien fic texts while you’re at it”.

Subject-specific authen city In the aforemen oned ar cle, Davies points at an important LSP issue, i.e. the boundary problem. As no scien fic discipline ends where another begins, each specialisa on meanders into different other fields. In other words, English for veterinary sciences takes in English for chemists, English for anatomy, and quite a few other subvarie es. The idea that “[LSP] Varie es exhibit a lack of both discreteness and of coherence.” (Davies, 2001:137) is backed by Douglas, who states that “[there is] no way of determining how “specific” specific needs to be” (Douglas, 2001). Another voice in the specificity debate, Ken Hyland, contests Douglas and Davies, boldly asser ng that “[…] effec ve language teaching in the universi es involves taking specificity seriously. It means that we must go as far as we can” (Hyland, 2002: 9). Regarding specificity, we again found a 100% consensus among the respondents we interviewed. They all stated that specificity in tests of English for veterinary sciences is an unaNainable goal. The broad field of biomedical sciences was perceived as specific enough. Indeed, what’s specific for one subject specialist will be incomprehensible for another, one respondent remarked, thereby agreeing with Douglas’ reasoning that, ul mately, specificity implies one test for one person. Task authen city When researching the reac ons to task authen city in the test of English for Veterinary Sciences, we held the view of Wu and Stansfield, who in their 2001 ar cle deem it “clear that if [LSP tests] are to be considered as valid, the authen city of their language and tasks must be verified” and state that the test task must “be the closest equivalent of what one encounters in specific work”. Considering task authen city one of the key components of LSP tests, we were surprised to find that the respondents only seemed to spontaneously make authen city-related remarks when authen city was absent. Not only did authen city never come up as a posi ve criterion, we were surprised to see non-authen c tasks (i.e. vocabulary-related cloze tasks) labelled authen c. Most likely this has got more to do with the students’ expecta ons of tasks a language test should contain, rather than with their full grasp of the concept at hand. Even more striking, quite a few respondents seemed to complain about an “authen city overload”. Indeed, all respondents agree on the need for explicit vocabulary and verb tes ng, which corresponds with Clapham’s

IATEFL TEASIG

80

Best of TEASIG Vol. 2


observa on: “An ideal EAP [English for Academic Purposes; a label which also fits this test] test would, therefore, contain both separable gramma cal tasks, and ‘authen c’ (Fulcher, 2000) university materials and tasks” (Clapham, 2000). When discussing the revisions made in the computerised test, one student remarked that he liked the test, but felt that it could do with a few “drill tasks”. Even if the above might hint towards a respondent tendency to downplay the importance of task authen city, their choice of wri ng assignment does offer an argument in favour of task authen city. When asked what kind of wri ng assignment they preferred, six out of seven PhD respondents opted for a scien fic ar cle instead of a popularising one. The reason for choosing the scien fic paper was the same among all respondents: popular wri ng does not correspond to what is perceived as a real-life task. One respondent remarked that he never writes “a popularising thing” and claimed that it would not feel natural to do so. On the other hand, six out of seven Master’s students chose to write the popular ar cle, claiming that it would be a welcome change and that they, in their future prac ce, would get much further using a popularising approach with clients.

Conclusion As a preliminary conclusion to our research we can already claim that the compa bility of our qualita ve and quan ta ve methods is beyond any doubt. The quan ta ve data offered us interes ng insights, which we then used in the interviews. Pi8ng the quan ta ve data against the qualita ve results has also proven to be an interes ng experiment. In terms of direct project outcomes, we are also sure that we will have created a beNer LSP test by the end of summer 2008. More importantly however, we have now gained further insights into the importance of authen city in LSP tes ng. Although the respondents appreciate authen c tasks, they do not consider subjectspecific input material a necessary precondi on for a good LSP test. Rather, they describe subject-specific input as being “nice” and “fun”. We also observed a need for non-authen c test tasks. In future research we would like to determine whether this is a genuine need or rather an expectancy bred by previous experience with more tradi onal language tests. Contras ng with the above, we did observe a clear test-taker tendency to choose the most authen c wri ng assignment. Students picked the alterna ve which most closely related to their current or future professional ac vi es. In future research we would like to work on the perceived difficulty of authen c tasks versus non-authen c ones. We would also like to perform similar research on larger sample popula ons within different LSP fields.

Notes 1. A higher number of respondents was not available, since this was the number of students aNending the course. 2. Facility value: the percentage of students that get an item right (Alderson, Clapham & Wall, 2005: 80-81). 3. Discrimina on index: the extent to which an item discriminates between students with a basic level of proficiency versus those with an advanced level (Alderson, Clapham & Wall, 2005: 81-87). 4. The extent to which test takers perceive a test to be an acceptable measure of the ability they wish to measure.

IATEFL TEASIG

81

Best of TEASIG Vol. 2


Bibliography Alderson, C., Clapham, C. and Wall, D. (2006). Language Test Construc on and Evalua on. Cambridge: Cambridge University Press. Bachman, L. F. (2000). Modern language tes ng at the turn of the century: assuring that what we count counts. Language Tes ng, 17(1). Clapham, C. (2000). Assessment for academic purposes: where next? System, 28(4). Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press. Davies, A. (2001). The logic of tes ng Languages for Specific Purposes. Language Tes ng, 18(133). Douglas, D. (2000). Assessing Languages for Specific Purposes. Cambridge: Cambridge University Press. Douglas, D. (2001). Language for Specific Purposes assessment criteria: where do they come from? Language Tes ng, 18(171). Dovey, T. (2006). What purposes, specifically? Re-thinking purposes and specificity in the context of the ‘new voca onalism’. English for Specific Purposes, 25(4). Elder, C. et al. (Ed.). (2001). Studies in Language Tes ng 11. Experimen ng with Uncertainty. Essays in Honour of Alan Davies. Cambridge: Cambridge University Press. Figueras, N., North, B., Takala, S., Verhelst, N. and Van Avermaet, P. (2005). Rela ng examina ons to the Common European Framework: a manual. Language Tes ng, 22(261). Hylan, K. and Hamp-Lyons, L. (2002). EAP: issues and direc ons. Journal of English for Academic Purposes, 1 (1). Hyland, K. (2002). Specificity revisited: how far should we go now?. English for Specific Purposes, 21(4). Lee, H. and Anderson, C. (2007). Validity and topic generality of a wri ng performance. Language Tes ng, 24 (307). Lewkowicz, J. A. (2000). Authen city in language tes ng: some outstanding ques ons. Language Tes ng, 17(43). Parkinson, J. et al. (2004). The use of popular science ar cles in teaching scien fic literacy. English for Specific Purposes, 23(4). Shin, S. (2005). Did they take the same test? Examinee language proficiency and the structure of language tests. Language Tes ng, 22(31). Weigle, S. C. (2007). Assessing Wri ng. Cambridge: Cambridge University Press. Wu, W. M. and Stansfield, C. W. (2001). Towards authen city of task in test development. Language Tes ng, 18(187).

IATEFL TEASIG

82

Best of TEASIG Vol. 2


Spots, camera, ac on!!! Learner on the stage for evalua on Dönercan Dönük and Vildan Özdemir, Mersin University, Turkey Original publica on date: IATEFL TEASIG Conference Selec ons – Cyprus 2009

The study we have carried out in the classroom environment was based on the cogni ve processes of the learner through the discovery of himself. The theore cal background of the research is the construc vist approach, and the final outcome is the porFolio the learner has formed as an end product of forma ve tes ng and self-evalua on with the help of the teacher. 1. Introduc on Language abili es are not easy to measure, for they do not rely on concrete measurements as in physical sciences. When it comes to oral proficiency tests, the difficulty is doubled since these tests do not rely on such concrete tools as paper and pencil. Being a produc ve skill, oral proficiency is rated high in a language proficiency exam. However, learners’ true abili es are not always reflected in the scores that they obtain in these tests, which are applied as one shot. Tes ng the ability to speak is a most important aspect of language tes ng, as success in communica on oPen depends as much on the listener as on the speaker. Therefore, it is of vital importance to establish a standard during the procedure and applica on of oral proficiency skills both for the period of tes ng and for the reliability and validity issues of the test. Besides the effect of tes ng on teaching and learning, harmful or beneficial backwash is determined by means of these established standards. It is common prac ce for most teachers to test what is easy to test rather than what is important to test (Hughes, 1991). In this way, harmful backwash becomes inevitable for the construct validity and, relatedly, content and criterion-related validity. To be on the safe side, oral produc on tests should be well designed on the basis of content valida on and classroom procedures. To Hughes (1991), in examining any test, it is first necessary to inves gate its reliability or its consistency in ra ng. Because the tasks and ques ons of oral proficiency interviews by design are tailored for different examinees, the most common way to report test reliability is through an inves ga on of inter-rater reliability or the correla on between the ra ngs of two or more raters assigned to the same test. 2. What makes an oral proficiency test inaccurate? “The scoring of the oral interview is highly subjec ve, and it has low reliability, for the performance may not accurately reflect the true ability of the testers at a specific occasion’’ (Heaton, 1990). The key terms in tes ng are reliability and validity. For oral proficiency tests to be reliable, they need to be applied on the same basis in all cases, which means there must be consistency in terms of the applica on of the test. Tests can produce inaccurate results unless some criteria in terms of test techniques are considered. There is a need for a more planned process and a larger por on of me for these tests to be more accurate. Assessment is forma ve when teachers use the test to check on the progress of their students to see how far they have mastered what they should have learnt and then use this informa on to modify their future teaching plans. In forma ve tes ng, informal tests, quizzes and porFolios can be used, and encouraging the students to carry out self-assessment on the work they do can be applied in classroom procedures.

IATEFL TEASIG

83

Best of TEASIG Vol. 2


3. The aim of the study This study aims to set the standard for the applica on and assessment of oral proficiency tests. The basic principle the study relies on is alterna ve tes ng systems through tasks, for the large-scale standardized tests tend to be one-shot, decontextualized and norm-referenced, and foster extrinsic mo va on. 3.1 Alterna ve assessment tools To Brown (2004), tests are formal procedures, usually administered within strict me limits to sample the performance of a test taker in a specific domain. This tradi onal no on of tes ng now seems to have been replaced by alterna ve assessment systems, which have addi onal elements such as porFolios, journals, observa ons, self-assessments, etc. in order to triangulate data about students. These tools use real world contexts and simula ons, and require students to perform, create, and produce. They are non-intrusive in that they extend the day-to-day classroom ac vi es and allow students to be assessed on what they normally do in class every day. According to Brown and Hudson (1998), the alterna ve assessment tools focus on processes as well as products. They tap into higher level thinking and problem-solving skills, and provide informa on both about strengths and weaknesses of the students. Besides, they are mul culturally sensi ve when properly administered. In alterna ve assessment, the learners are in the centre of learning, the teacher ac ng as a mediator. Being responsible for their own learning, the learners choose the topic for the conversa on of their own accord and effec vely par cipate in the decision-making processes, make both self-evalua on and peer evalua on. 3.2 Performance-based assessment There has been an objec on towards giving standard tests recently and, as a result, the tests given to learners are more process-oriented as well as product-oriented, more forma ve, and with authen city features. The tasks given need to bear some elements of performance, and out of these tasks, porFolios emerge. According to Brown (2004), porFolios as alterna ve tools foster intrinsic mo va on, promote student-teacher interac on with the teacher as a facilitator, individualize learning and celebrate the uniqueness of each student, facilitate cri cal thinking, self-assessment and revision processes, offer opportuni es for collabora ve work with peers, and permit the assessment of mul ple dimensions of language learning. In the alterna ve assessment of this study, the teacher firstly makes a needs analysis by using a data collec on tool (a semi-structured sheet) to decide on where to start. As a second step, she verifies the data obtained through mini-interviews, also pinpoin ng the strengths and weaknesses of the par cipants. Thirdly, she goes through the scores of the par cipants in the other speaking classes they have taken before, if available. Taking all the data into considera on, she makes an ac on plan based on the poten al problema c areas of learning on the side of the students and determines representa ve tasks in line with the syllabus items such as individual presenta ons, discussion, etc. She also specifies all possible content with the test constructs and decides on the opera ons (direc ng, describing, expressing, elici ng, narra ng, repor ng) and the skills (informa onal skills, interac onal skills, skills in managing interac ons). As a next step, she arranges the classroom for the use of technology as well as arranging the flow of tasks. She makes an arrangement with each student for self-evalua on aPer each performance. She also arranges the conduct and delivery of the porFolio items, and finally she evaluates the learner on the basis of his performance as well as his other contribu ons. The student in the alterna ve assessment responds to the needs analysis, cooperates with the teacher, uses technology in order to detect the weaknesses and strengths of each performance, keeps a porFolio including

IATEFL TEASIG

84

Best of TEASIG Vol. 2


the items each task requires (peer evalua on sheets, self-evalua on sheets, voice recording files, authen c scripts prepared by the learner, mistake data file, etc.). As a result, he ac vely par cipates in each class. 4. The procedure of the study The students are asked to carry out the following tasks in order to equip themselves with the essen al skills. a) Individual presenta ons Suppor ng individual seminar studies and developing academic presenta on skills, presenta ons enable the students to transfer knowledge, increase awareness of body language, acquire the skills to control stress and manage the self, as well as become familiarized with the assessment and evalua on forms. b) Semi-structured conversa ons in the context of a situa on These conversa ons develop the skill of response to be given to natural linguis c input. The teacher candidates are given a series of situa ons and are asked to make up conversa ons relying on the paNerns coded by the examiner. c) Mini drama prac ces These improve the interac onal skills and construct a cri cal look at oral proficiency skills. In mini drama prac ces, the teacher candidates improve their skills of role-play or conversa on simula ons in a context. d) A role model study: ‘What would you do if you were……?’ This task improves the learner’s response to a hypothe cal situa on and reveals his intellectual maturity and his skills in decision-making processes; it also furthers free conversa onal skills. e) Seminar studies These studies enable the learner to have a closer look at the criteria required by academic speech, make up an academic speech, and present it within the criteria of register and style. f) Descrip ve studies: a new insight Descrip ve studies allow both for space in the vision of the learner for broadening the horizon and for the improvement of the skills required for more elabora on of things such as artefacts or people, etc. g) A social project: field work in the world of the others The project builds up the awareness of cultural heritage and facilitates a more in mate bond with other people through empathy. h) Debate: ‘But, to my mind…..’ The debate enables the forma on of such linguis c skills as ques ons and requests for elabora on, appearing not to understand, invita ons to ask ques ons, interrup on, abrupt change of topic, turn-taking, etc. Moreover, it builds up the power of suppor ng an idea and persuasive speech. i) Free topic presenta on This provides the learner with the overall competence of making a speech before an audience regardless of the type of text. 5. Technology maPers … Since spoken language is not easy to keep in mind in fine detail, without a recorder it is impossible to apply the procedure of marking as is done with composi ons, where the examiner can go back and make the assessment at leisure. In addi on, recordings help the learner to obtain the opportunity to go back to the moment of speech and discover poten al mistakes. The internet is used for comments on the performance in addi on to individual face-to-face mee ngs. The computer is used both for peer-evalua on and self-evalua on, and for the storage of the porFolio items. IATEFL TEASIG

85

Best of TEASIG Vol. 2


Each student is evaluated on the basis of: •

the e-porFolio

classroom presenta ons

classroom par cipa on

internet collabora on

extra-curricular ac vi es as in the social studies.

The e-porFolio includes: •

scripts of presenta ons

self-evalua on reports

peer evalua on reports

a good collec on of mistakes

voice recordings

videos/ podcasts

pictures or other visual materials

comments on the nature of the lesson and personal sugges ons.

6. Conclusion Improving oral communica on skills of teacher candidates turns out to require a different model for the oral communica on skills. The learner-based model which has been adopted has naturally brought the necessity for an alterna ve assessment system which is more process-oriented and more forma ve. Using alterna ve assessment tools such as porFolios has facilitated the learners’ cri cal thinking, development of individual learning strategies, and self-assessment, which will be important for their future teaching career. All the tasks and ac vi es carried out during the courses have been supported by means of technology. The teacher, taking all the available data and alterna ve assessment tools into considera on, can manage to create real-world contexts in which learners have performed, created, produced, and been responsible for their own learning and evalua on.

References Bachman, L. F. (2000). Some construct validity issues in interpre ng scores from performance assessments of language ability. In Cooper, R. L., Shohamy, E. and Walters, J. (Eds). New Perspec ves and Issues in Educa onal Language Policy: A FestschriT for Bernard Dov Spolsky. Amsterdam: John Benjamins Publishing Company. Bachman, L. F. (2007). Language assessment: Opportuni es and challenges. Paper presented at the Annual Mee ng of the American Associa on For Applied Linguis cs, Costa Mesa, CA. Bailey, A., Bachman, L. F., Griffin, N., Herman, J. L. and Wolf, M. K. (2008). Recommenda ons for Assessing English Language Learners: English Language Proficiency Measures and Accommoda on Uses. CRESST Report 737. Brown, H. D. (2004). Language Assessment: Principles And Classroom Prac ce. New York: Pearson Educa on, Inc. GoNlieb, M. (1995). Nurturing Students Learning Through PorFolios. TESOL Journal, 5(1), 12–14. IATEFL TEASIG

86

Best of TEASIG Vol. 2


Heaton, J. B. (1988). Wri ng English Language Tests. New York: Longman Inc. Hughes, A. (1989). Tes ng for teachers. Cambridge and New York: Cambridge University Press. Johnston, J. (2007). Assessment of Language Learning in English Speaking Children. In Encyclopedia of Language and Literacy Development (pp. 1-9). London and Ontario: Canadian Language and Literacy Research Network. Kim, H. J. (2006). Issues of Ra ng Scales in Speaking Performance Assessment. Columbia University Working Papers in TESOL & Applied Linguis cs, Vol. 6, No. 2. Liao, Y. F. (2004). Issues of Validity and Reliability in Second Language Performance Assessment. Teachers College, Columbia University Working Papers in TESOL & Applied Linguis cs, Vol. 4, No. 2. O’Sullivan, B. (2008). Notes on Assessing Speaking. Cornell University Language Resource Center. Saville, N. and Hargreaves, P. (1999). Assessing Speaking in the Revised FCE. ELT Journal Vol. 53/1. Weir, C. J. (1990). Communica ve Language Tes ng. London: Pren ce Hall.

IATEFL TEASIG

87

Best of TEASIG Vol. 2


Paired speaking tests: an approach grounded in theory and prac ce Evelina D. Galaczi, Cambridge Assessment, Cambridge, UK Original publica on date: IATEFL TEASIG Conference Selec ons – Cyprus 2009

Introduc on What does it mean to test speaking ability? He and Young’s (1998) common-sense answer that the best way to test a learner’s speaking ability is to “get him or her to speak” hides a mul tude of factors which interact in complex and some mes unpredictable ways to affect the assessment of speaking. Since the first use of oral proficiency tests by Cambridge ESOL in 1913, the assessment community has engaged in ongoing debates about fundamental issues in speaking assessment. Many empirical endeavours over the last 25 years have highlighted the complexity of performance assessment and the mul tude of factors which cooperate to produce a final score (Chalhoub-Deville, 2003; Deville & ChalhoubDeville, 2006; Luoma, 2004; McNamara, 1997; Swain, 2001; Taylor, 2000, 2001, 2003; Taylor & Wigglesworth, 2009). Milanovic and Saville (1996) provide a useful overview of these factors, or ‘facets’, and propose a conceptual framework which illustrates their close inter-rela onship and suggests avenues for research. Knowledge and ability

conditions Tasks developer

and construct

Assessment criteria Assessment conditions and training

Figure 1: Milanovic and Saville’s (1996) framework of performance assessment. This paper is situated within this broad framework of performance assessment and will focus on two widely used speaking test formats – direct and semi-direct – and will also explore the benefits and caveats associated with one specific direct test format – the paired speaking test, which has been a feature of Cambridge ESOL speaking tests for more than a decade. Finally, implica ons for the communica ve classroom will be briefly overviewed.

IATEFL TEASIG

88

Best of TEASIG Vol. 2


Direct and semi-direct tests of speaking The two most widespread test formats in the assessment of speaking are ‘direct’, which involve interac on with a live examiner, and semi-direct, in which test taker speech is elicited by a machine. In a face-to-face/phone test of speaking, the test taker is required to interact with another person (either an examiner or another test taker, or both); in a computer/tape mediated test the test taker is required to respond to a series of prompts delivered by either audio/video tape or through a computer. The main characteris c of the direct face-to-face channel is that interac on in it is bi- or mul -direc onal and jointly achieved by the par cipants. It is, in other words, ‘co-constructed’ and reciprocal, with the interlocutors accommoda ng their contribu ons to the evolving interac on. The construct assessed here is clearly related to spoken interac on, which is integral to most construct defini ons of oral proficiency. Computer/tape-based tes ng, in contrast, is uni-direc onal and lacks the element of co-construc on, since the test takers are responding to a machine. In a semi-direct speaking test, the construct is defined with an emphasis on its cogni ve dimension and on produc on. The vast majority of speaking is characterised by reciprocity and joint co-construc on of interac on. Weir (2005) contends that “clearly, if we want to test spoken interac on, a valid test must include reciprocity condi ons” (p. 72). The author is referring to condi ons where interlocutors share responsibility for making communica on work, and where they show flexibility and adjust their message to take the listener’s contribu ons into account. Direct tests of speaking, such as the majority of Cambridge ESOL’s tests, are based on a socio-cogni ve model with an emphasis not just on the knowledge and processing dimension, but also on the social dimension (Taylor, 2003). The semi-direct test format lacks the advantages of an interac onist defini on of the construct, which results in narrower construct defini ons of this form of assessment, and has, indeed, been a point of cri que (see, for example, Chun, 2006). Its main advantage is prac cality, uniformity and standardisa on of the input the test takers are receiving. The choice of channel – direct or semi-direct – has implica ons for the type of tasks which can be used in the test. The tasks in computer/tape-based tests are oPen monologic, where one speaker produces a turn as a response to the prompt. The turn can vary in length from brief one-word responses to longer responses las ng approximately a minute (e.g., the Pearson Test of English Academic or iBT TOEFL). In contrast, interac ve direct tests allow for a broader range of response formats where the interlocutors vary, and so do the task types (e.g., the Cambridge ESOL Main Suite tests or the Trinity General English Speaking Tests) and, in addi on to monologic tasks, also involve interac ve tasks shared by the examiner and test taker(s). The variety of tasks and response formats which a direct test contains allows for a wider range of language to be elicited, and so provides broader evidence of the underlying abili es tested and consequently contributes to the exam’s validity. It is important to note that there isn’t just one way of tes ng speaking, or one ‘best’ way to do it. Both the direct and semi-direct formats offer their unique advantages and disadvantages which have to be addressed. The fundamental ques on is not whether a specific speaking test is direct or semi-direct, but whether the test is valid for its purpose. A semi-direct test would be suitable, for example, for the purpose of providing ‘snapshots’ of language ability which would be valuable for ins tu onal screening purposes. In contrast, a direct test of speaking would be more suitable in cases where evidence of breath and depth of language is needed. Singleton and paired speaking tests The tradi onal approach to direct assessment of speaking has typically involved a singleton face-to-face, direct format which involves an examiner/rater and a test taker who are typically engaged in a spontaneous or structured ques on and answer session. It has now been generally recognized that one of the weaknesses of the singleton format is that the range of tasks and types of interac on are more restricted, with an unequal distribu on of rights IATEFL TEASIG

89

Best of TEASIG Vol. 2


and responsibili es between the examiner and candidate. In other types of interac ons, such as peer-peer discussions or conversa ons, the conversa onal rights and responsibili es of the par cipants are more balanced, and a wider spectrum of func onal competence is sampled. The paired test format (and the group oral test), which creates opportuni es for peer-peer interac on, is a natural alterna ve to the singleton test format. The last decade has seen a growth in the use of paired assessment, largely as a response to the move towards a more communica ve approach in language teaching, and as a reac on to some of the limita ons of the singleton oral test format. Paired speaking assessment has been taken up by large-scale test providers as well and is, as previously men oned, a dis nguishing feature of Cambridge ESOL speaking tests. One of the main strengths of the paired format is the strong rela onship between teaching and tes ng, and the fact that paired tests have posi ve washback (impact) on teaching as they encourage more pair work in class, while at the same me reflec ng what is already happening in the classroom. Theore cal advances have played a role as well and current theories of communica ve language ability (Bachman, 1990; Bachman & Palmer, 1996; Canale & Swain, 1980) have also informed the design of paired tests. Such frameworks include a conversa on management component and presuppose the need for oral tests to provide opportuni es for test takers to display a fuller range of their conversa onal competence. The inclusion of more types of talk in paired tests through a wider variety of tasks, therefore, broadens the evidence gathered about the examinees’ skills and supports the validity of the test. Finally, interviewing pairs is more prac cal and economical as it reduces the amount of examiner talk needed for conduc ng the tests (Swain, 2001; Taylor & Wigglesworth, 2009). Such theore cal and pedagogical considera ons, as well as dissa sfac on with the inability of the interview oral test to elicit various speech samples, have played a key role in the widespread use of the paired speaking test format. While the benefits of the paired format have been widely acknowledged, there are also a number of challenges, which have led certain researchers to ques on the fairness of this test format (Foot, 1999; Norton, 2005). This is a point which we shall return to later in the paper. The available body of literature offers many valuable empirically-based insights about the benefits and caveats associated with the paired format. The research on the paired format can broadly be divided into four areas which focus on different issues: features of the interac on in paired tests/tasks, the effects of background variables, the raters’ and the test takers’ perspec ves. I will now focus on each of these areas in turn. Features of the interac on in paired speaking tests The empirical work on the paired format has shown that, unlike the conven onal singleton interview test format, which results in tasks with asymmetric par cipa on, paired oral tasks are more symmetrical in the interac on possibili es they create (Egyud & Glover, 2001; Galaczi, 2008; Iwashita, 1998; Kormos, 1999; Lazaraton, 2002; Taylor, 2001); they elicit a wider sample of learner performance with a bigger range of speech func ons (Ffrench, 2003) and provide more opportuni es for test takers to display their conversa onal management skills (Kormos, 1999; Brooks, 2009). Brooks (2009), for example, compared interac on in two tests of oral proficiency – the individual format and paired format, and found more complex interac on between par cipants in the paired configura ons. More specifically, she observed more promp ng, elabora on, finishing sentences, referring to a partner’s ideas, and paraphrasing in the paired format. The available literature has confirmed, therefore, Skehan’s (2001) conten on that paired test tasks “enable a wider range of language func ons and roles to be engineered to provide a beNer basis for oral language sampling with less asymmetry between par cipants” (p. 169). This, in turn, allows for beNer inferences to be made about a candidate’s proficiency in wider real-life contexts, and provides evidence for the validity of paired tasks. A fundamental concern inherent in paired tasks relates to the issue of interpre ng individual scores based on

IATEFL TEASIG

90

Best of TEASIG Vol. 2


jointly constructed interac on distributed among par cipants (see McNamara, 1997; Swain, 2001). The fundamental dependence of the two test takers in terms of the interac on they produce has led to some researchers arguing for the awarding of shared scores for interac onal competence in paired tasks. For example, May (2009) focused on pairs who oriented to an asymmetric paNern of interac on and inves gated the reac ons of trained raters to their assessment. One of the main findings to emerge was that the raters in the study ques oned the separability of the candidate’s contribu ons, since they perceived key features of the paired interac on as mutual and shared achievements. The author concluded with the thought-provoking sugges on that test takers in a paired test should be given a shared score for interac onal competence. This asser on echoes Taylor and Wigglesworth’s (2009) conten on that we may have to design and use different assessment scales and criteria, some aimed at the assessment of individual performance, and some aimed at joint performance. This is a ques on which future research endeavours and academic discussions will need to address and shed light on. The raters’ perspec ve There has been surprisingly liNle research carried out on how raters form judgements about paired interac on. The excep on is an excellent recent study by Ducasse and Brown (2009), which has advanced our understanding of how raters view and approach the assessment of paired interac on through considering the raters’ view of what they perceive to be important in the performances they have to evaluate. The authors asked the ques on, “What do raters actually focus on?” and found that raters oriented to three sets of categories when making ra ng decisions: (a) non-verbal interpersonal communica on, which includes gaze and body language; (b) interac ve listening, which encompasses indica ng comprehension and suppor ve listening in the form of giving suppor ve audible feedback; and (c) interac onal management skills, which involve management of topics and turns both ‘horizontally’ (i.e., between adjacent turns) and ‘ver cally’ (i.e., across topics). These findings clearly show that raters draw on a broad understanding of the underlying construct of interac onal competence in paired talk and consider, in the words of Taylor and Wigglesworth (2009), “communica on-oriented as well as purely linguis c skills” (p. 331), which are, in addi on, oPen culture-dependent. Effect of background variables It is now widely recognised that one of the challenges associated with speaking tests is the interlocutor effect, i.e., the systema c effect on performance of variables pertaining to the examiner or the peer interlocutor, and that the test takers’ talk is inevitably influenced by the other par cipant (Luoma, 2004; O’Sullivan, 2002). Among the variables which have been shown to poten ally play a role are the proficiency level of the paired par cipants, their personality, and acquaintanceship. The effect of personality was the focus of Berry’s (1993) work, which examined the interac on between the test taker personality and the discourse produced, and reported that the discourse varied according to personality: the extroverts performed beNer in a paired task when paired up with another extrovert, as opposed to an introvert. No differences were reported for introverts. In a later study, Berry (2004) inves gated the discourse of extrovert and introvert students in a group oral test. In contrast to the previous study, she found that the degree of extroversion had no effect on the scores of the extroverts, but the effect was present for introverts. The focus on personality was picked up by Nakatsuhara (2009), who concluded that the extraversion level of test takers had some influence on test performance, but it was closely related to task type. The author found that introverts preferred structured, highly-prompted tasks, whereas extroverts preferred a higher degree of freedom, which echoes Berry’s (2004) asser on that the effect of personality could be dominant when “either extreme is placed in their least favoured situa on” (Berry, 2004: 502). The effect of the peer interlocutor’s proficiency level has received empirical aNen on as well, with studies mostly lending support to the effec veness of paired format tests even when proficiency levels within pairs differ to some IATEFL TEASIG

91

Best of TEASIG Vol. 2


extent (this, of course, excludes cases of wide divergence in proficiency levels). Iwashita’s (1998) study indicated that while the proficiency of the interlocutor had some impact on the amount of talk (being matched with a higher proficiency partner generally resulted in more talk), there was liNle difference in test scores based on this variable. Talking more, therefore, did not necessarily lead to higher scores. Nakatsuhara (2006) found more similari es than differences in conversa onal styles in same-proficiency level and different-proficiency level pairs. Csepes (2002) and Davis (2009) found no significant differences based on proficiency level. Davis reported that the majority of pairs in his study oriented to a collabora ve style of interac on and concluded that given appropriate constraints, a difference in proficiency level need not preclude the use of the paired format. While the above findings have found minimal effect for proficiency level, Norton (2005) reported that test takers paired with a higher-proficiency partner can have some advantage by ge8ng beNer quality language. O’Sullivan (2002) focused on the effect of learner familiarity on performance in paired tasks. The author found evidence of an acquaintanceship effect and reported that interviewees achieved higher scores when working with a friend. The findings were more complex, however, and O’Sullivan also found a ‘sex-of-interlocutor’ effect and further speculated that the effect of interlocutor familiarity and gender may be culturally-specific. Taken together, the abovemen oned studies on the role of interlocutor variables in paired tests have suggested that some background variables can poten ally impact on the discourse co-constructed in a speaking test. However, the available studies do not support any simple linear rela onship between these variables and test discourse and scores, a point also made in the context of gender-related effects by Brown and McNamara (2004: 533), who argued against “any simple determinis c idea that gender categories will have a direct and predictable impact on test processes and test outcomes”. The same argument can be extended beyond gender to the whole range of interlocutor variables. These variables “compete in the context of an individual’s social iden ty” (Brown & McNamara, 2004: 533) and no linear, clear-cut behaviours based on background characteris cs can be claimed, as Taylor and Wigglesworth (2009) also argue. The key ques ons, therefore, shiP from the role of background variables (we know they play a role) to what test developers should do about such construct-external factors, and whether they should try to eliminate such variability altogether or how they should control for it. Swain (cited in Fox, 2004: 240) wisely argues that variability related to different characteris cs of conversa onal partners is “all that happens in the real world. And so they are things we should be interested in tes ng”. She further contends that elimina ng all variability in speaking assessment is “washing out … variability which is what human nature and language is all about” (Swain, cited in Fox, 2004: 240). Coping successfully with such real-life interac on demands, therefore, becomes part of the construct of interac onal competence. The empirical studies have highlighted test providers’ ethical responsibility to construct tests which would be fair and would not provide (inten onally or uninten onally) differen al and unequal treatment of candidates based on background variables. The use of mul -part tests which tap into different types of talk (e.g., candidate-examiner; candidate-candidate, candidate-candidate-examiner, candidate only) is fundamental, as it would allow tests to elicit a broader range of language through tasks which involve both interviewer-candidate talk, candidate-only talk, and candidate-candidate talk. Mul -part tests which include some paired tasks would op mise the benefits of the paired format while controlling for possible limita ons, and such is the prac ce adopted by Cambridge ESOL. Test taker percep ons While rela vely liNle research has been published focusing on the test takers’ percep ons of paired tests, we can draw on the available research on group oral tests. In terms of paired tests, Egyud and Glover (2001) note that students like pairings. In the group-oral test context, Nakatsuhara (2009) found that group oral tests are viewed posi vely, and test takers see the possibility of posi ve classroom washback. In a large-scale study, Van Moere (2006) rejected the objec ons that candidates may not be able to get equal opportuni es to par cipate because of

IATEFL TEASIG

92

Best of TEASIG Vol. 2


other dominant group members. Two earlier studies have also indicated that students experience less anxiety with the group oral test (Ockey, 2001) and that students perceive the group discussion task as genera ng the most natural discourse and crea ng the least pre-test anxiety (Fulcher, 1996). Conclusion The broader implica ons of the overviewed research are mul -layered, and the available research has provided insights into: •

defining interac onal competence more comprehensively and precisely,

designing tests which op mise the benefits of the paired format while at the same me addressing its limita ons, and developing assessment scales which allow for the reliable and valid assessment of paired performances (for a fuller discussion, see Taylor and Wigglesworth, 2009).

More specifically, the available research has indicated that paired tasks allow for the assessment of a broader range of interac onal skills than the more tradi onal interview-format tests, which is a strong argument for their validity. The available research has also indicated that the richer and more complex language which is co-constructed in a paired test has implica ons for the assessment of that performance, since it may be difficult to meaningfully disentangle the two candidates’ contribu ons. Furthermore, raters bring in a whole host of criteria in assessing paired performance in addi on to the classic turn-taking concepts which usually inform current assessment scales of interac onal competence. Criteria such as non-verbal communica on, body language and gaze, as well as suppor ve listening and the indica on of comprehension, are influen al in the decision-making process of raters. It would be important, therefore, to design assessment categories and descriptors which address the broader range of behaviours observed by raters. Finally, the available research has also indicated that interlocutor variables could play a poten al role, but it is not possible to fully generalise the direc on or magnitude of the effects. All of these are insights and lessons which have informed the work on speaking tests at Cambridge ESOL. Finally, I would like to briefly explore the implica ons of this paper for the communica ve classroom. The close rela onship between paired tests and communica ve classrooms has been made clear. An obvious conclusion is that learners need to develop appropriate interac onal skills for effec ve collabora ve interac on. Classroom prac ce is needed which gives students experience with task-based learning in groups and pairs, and in building collabora ve interac onal skills. The available empirical endeavours which have been overviewed in this short paper have provided us with valuable insights about the conceptualisa on of interac on in paired tasks. The overviewed research has indicated that collabora ve interac onal skills entail: •

Topic development skills, which involve the expansion of own topics and other’s topics through topic recycling and topic development • Turn-taking skills, which include smooth transi on between turns, strong cohesion between turns, and appropriate shiPs between topics • Ac ve listening skills and the ability to provide listener support • Equality and mutuality in interac on • Non-verbal support, which includes gaze and body language. We have come a long way in furthering our understanding of the complexity of speaking assessment and the paired speaking test format. We s ll have a long way to go. And on this journey, we should never lose sight of the fundamental ques ons in language assessment, ques ons about fitness for purpose, fairness, and impact. This paper has hopefully demonstrated the value of direct assessment of speaking and the paired format and, in Brooks’ (2008) words, the fact that the laNer brings “challenges, but mostly opportuni es”.

IATEFL TEASIG

93

Best of TEASIG Vol. 2


References Berry, V. (2004). Personality characteris cs as a poten al source of language test bias. In Huhta, A., Sajavaara, K. and Takala, S. (Eds). Language Tes ng: New openings (pp. 115-124). Jyvaskyla, Finland: Ins tute for Educa onal research. Brooks, L. (2008). Paired oral proficiency tes ng: Challenges but mostly opportuni es. Paper presented at AILA, Germany, 26/08/2008. Brooks, L. (2009). Interac ng in pairs in a test of oral proficiency: Co-construc ng a beNer performance. Language Tes ng, 26(3), 325-340. Brown, A. and McNamara, T. (2004). ‘The devil is in the detail’: researching gender issues in language assessment. TESOL Quarterly, 38(3), 524-538. Canale, M., and Swain, M. (1980). Theore cal bases of communica ve approaches to second language teaching and tes ng. Applied Linguis cs, 1, 1-47. Chalhoub-Deville, M. (2003). Second language interac on: current perspec ves and future trends. Language Tes ng, 20(4), 369-383. Chun, C. W. (2006). An analysis of a language test for employment: The authen city of the PhonePass test. Language Assessment Quarterly, 3 (3), 295-306. Csépes, I. (2002). Is tes ng speaking in pairs disadvantageous for students? Effects on oral test scores. novelty, 9 (1), 22-45. Davis, L. (2009). The influence of interlocutor proficiency in a paired oral assessment. Language Tes ng, 26(3), 341 -366. Deville, C. and Chalhoub-Deville, M. (2006). Old and new thoughts on test score variability: Implica ons for reliability and validity. In Chalhoub-Deville, M., Chapelle, C. A. and Duffs P. (Eds.). Inference and generalizability in applied linguis cs: Mul ple perspec ves (pp. 9-25). Amsterdam: John Benjamins. Ducasse, A. M. and Brown, A. (2009). Assessing paired orals: raters’ orienta on to interac on. Language Tes ng, 26(3), 423-444. Egyud, G. and Glover, P. (2001). Oral tes ng in pairs: A secondary school perspec ve. ELT Journal, 55(1), 70-76. Ffrench, A. (2003). The change process at the paper level. Paper 5, Speaking. In Weir, C. and Milanovic, M. (Eds). Con nuity and innova on: Revising the Cambridge Proficiency in English examina on 1913 – 2002, pp. 367-446. Cambridge: Cambridge University Press. Foot, M. (1999). Relaxing in pairs. ELT Journal, 53(1), 36-41. Fox, J. (2004). Biasing for the best in language tes ng and learning: An interview with Merrill Swain, Language Assessment Quarterly, 1(4), 235-251. Fulcher, G. (1996). Tes ng tasks: Issues in task design and the group oral. Language Tes ng 13, 23-51. Galaczi, E. (2008). Peer-peer Interac on in a Speaking test: The case of the First Cer ficate in English examina on. Language Assessment Quarterly, 5(2), 89-119. Iwashita, N. (1998). The validity of the paired interview in oral performance assessment. Melbourne Papers in Language Tes ng, 5(2), 51-65. Lazaraton, A. (1992). The structural organiza on of a language interview: A conversa on analy c perspec ve. System, 20(3), 373-386. Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge University Press. May, L. (2009). Co-constructed interac on in a paired speaking test: The rater’s perspec ve. Language Tes ng, 26 (3), 397-422.

IATEFL TEASIG

94

Best of TEASIG Vol. 2


McNamara, T. (1997). ‘Interac on’ in second language performance assessment: Whose performance? Applied Linguis cs, 18(4), 446-466. Milanovic, M., and Saville, N. (1996). Introduc on. In Milanovic, M., and Saville, N. (Eds.). Studies in Language tes ng 3: Performance tes ng, cogni on, and assessment: Selected papers from the 15th language tes ng research colloquium, Cambridge and Arnheim. Cambridge: Cambridge University Press. Nakatsuhara, F. (2006). The impact of proficiency level on conversa onal styles in paired speaking tests. Cambridge ESOL Research Notes 25, 15-20. Nakatsuhara, F. (2009). Conversa onal styles in group oral tests: How is the conversa on constructed? Unpublished PhD thesis, University of Essex. Norton, J. (2005). The paired format in the Cambridge Speaking tests. ELT Journal, 59(4), 287-297. Ockey, G. (2001). Is the oral interview superior to the group oral? Working Papers, Interna onal University of Japan, 11, 22-40. O’Sullivan, B. (2002). Learner acquaintanceship and oral proficiency test pair-task performance. Language Tes ng, 19(3), 277-295. Skehan, P. (2001). Tasks and language performance assessment. In Bygate, M., Skehan, P. and Swain, M. (Eds.). Researching pedagogic tasks. (pp. 167-185). London: Longman. Swain, M. (2001). Examining dialogue: Another approach to content specifica on and to valida ng inferences drawn from test scores. Language Tes ng, 18(3), 275-302. Taylor, L. (2000). Inves ga ng the paired speaking test format. Cambridge ESOL Research Notes, 2, 14-15. Taylor, L. (2001). The paired speaking test format: Recent studies. Cambridge ESOL Research Notes, 6, 15-17. Taylor, L. (2003). The Cambridge approach to speaking assessment. Cambridge ESOL Research Notes, 13, 2-4. Taylor, L. and Wigglesworth, G. (2009). Are two heads beNer than one? Pair work in L2 assessment contexts. Language Tes ng, 26(3), 325-339. Van Lier, L. (1989). Reeling, writhing, drawling, stretching, and fain ng in coils. Oral proficiency interviews as conversa ons. TESOL Quarterly, 23, 480-508. Van Moere, A. (2006). Validity evidence in a university group oral test. Language Tes ng, 23(4), 411-440. Weir, C. (2005). Language tes ng and valida on. New York: Palgrave Macmillan. Young, R. and He, A. (Eds.). (1998). Talking and tes ng: Discourse approaches to the assessment of oral proficiency. Amsterdam: John Benjamins.

IATEFL TEASIG

95

Best of TEASIG Vol. 2


Let’s assess life skills: here are the criteria! Meral Guceri (†), by Sabancı University, School of Languages, Turkey Original publica on date: IATEFL TEASIG Conference Selec ons – Cyprus 2009

Oral assessment, which involves not only various speaking tasks but also assessment criteria, is an indispensable component of evalua on. However, there is an ongoing argument as to whether holis c criteria or analy cal ones should be preferred to evaluate learner performance. This paper discusses whether both holis c and analy cal criteria could be used to assess learner performance by involving them into the evalua on scheme. One of the aims of the Freshman English Program, ENG 101 and 102-3 credit university courses at by Sabancı University in Istanbul, is to develop the oral presenta on skills of learners. Making an oral presenta on (OP will be used for short) requires a thorough understanding of the task, in-depth research, careful prepara on, content relevance (that is, focusing on an issue and designing the content accordingly), taking care of the flow of the ideas, and providing relevant examples. Furthermore, aNen on to the needs of the audience and designing relevant slides are to be considered as well. Learners vary in their speaking ability as most people get nervous while speaking in public. Going through the dos and don’ts of OPs and viewing several OP videos in the classroom usually raise awareness. Moreover, if the assessment criteria are focused on, by giving learners a chance to evaluate a couple of recorded OPs, learners appreciate it a lot as it enhances certain skills to integrate into their presenta ons. The next step is how OPs are designed and delivered in the classrooms. First, learners are grouped (4 students per group), and then they are assigned OP tasks on various topics related to the course theme. In addi on, they are given sources to make use of, but are also free to add other sources of their choice. Learners prepare in groups by providing support to each other: pre-viewing each other’s slides and making recommenda ons for improvement. Groups make an appointment and share their outlines with the instructor before they deliver OPs. Either analy cal or holis c criteria are used by Freshman English instructors at different mes. However, this paper argues that it is possible to involve learners in OP assessment. That is, the students who are not doing an oral presenta on on a par cular OP day are split into groups to evaluate the performance of their classmates. Each group is assigned with a chunk of the analy cal criteria below (content, organiza on, delivery, persuasion) while the instructor evaluates presenters using the aNached holis c criteria. Presenters should iden fy a social problem and propose solu ons. New informa on should be integrated, a handout should be designed, and its soP copy sent to the audience as soon as the presenta on is over. Every member of the audience will be in one of the following three groups to provide feedback: content, organiza on and delivery. The groups rotate so that audience members have the chance to par cipate in each group. At the end of each presenta on, audience members give feedback on the specific criteria displayed in Figures 1, 2 and 3 below. The feedback is then sent to the presenter(s). Content feedback should be based not only on the clarity of informa on, but also evaluate the use of a variety of sources. Therefore, student assessors are provided with the following 5 scale content criteria (see Figure 1) to fill in. Moreover, in order not to waste paper and send instant feedback to the presenters as soon as an OP is over, they fill in the soP forms and email them to the presenters. It is worth men oning that learner evalua ons are highly

IATEFL TEASIG

96

Best of TEASIG Vol. 2


objec ve and there is correla on between the assessment that the instructor does using the holis c criteria and the one that student assessors do using the analy cal criteria.

Oral Presenta on Peer Feedback Group 1 — Content You will be giving your classmate feedback on the content of his/her presenta on. Pay special aNen on to what the person says, the arguments given, and the solu ons offered. Ra ng (1=needs improvement, 5=fantas c)

Criteria

Is it clear that the speaker researched the topic and made use of source informa on?

12345

Does the speaker make it clear why the social problem is important to know about?

12345

Does the speaker offer useful and interes ng solu ons to the problem?

12345

Are the solu ons convincing?

12345

Does the audience learn something from the presenta on?

12345

Did the speaker include a handout that had the main points of the presenta on and references?

12345

Did the speaker use reliable resources?

12345

Did the speaker seem to have used large sec ons of words that were directly from the source material?

12345

Comments

The handout was (circle all that apply)… helpful

informa ve

visually interes ng

crea ve

vague

confusing

visually boring

complete

My favorite thing about this presenta on was... The areas to be concerned with are… Figure 1: Content feedback.

IATEFL TEASIG

97

Best of TEASIG Vol. 2

distrac ng


Organiza on feedback deals with the introduc on, development and conclusion of the ideas presented. The amount of informa on should be well-adjusted due to the limited human aNen on span. For this, it is recommended that presenters avoid details as they will not be remembered anyway. A presenter should start with a clear-cut introduc on which aNracts the aNen on of the audience. To what extent the problem is iden fied and highlighted, and whether solu ons are offered need to be considered. Providing an eye-catching conclusion is highly important, as is an aNrac ve introduc on. Having a good conclusion slide with the main ideas that are to be remembered will have a memorable impact. (See Figure 2 for organiza on feedback.)

Oral Presenta on Peer Feedback Group 2 — Organiza on You will be giving your classmate feedback on his or her organiza on. Pay special aNen on to whether the speaker maintained interest while having clear organiza on of his or her argument.

Ra ng (1=very poor, 5=very good)

Criteria Did the speaker begin with an introduc on that caught your aNen on?

12345

Did the speaker use phrases that focused your aNen on on what was coming next?

12345

Did the speaker clearly explain the problem before offering solu ons?

12345

Did the speaker conclude the presenta on in an effec ve way?

12345

Did the speaker memorize his or her presenta on?

12345

The overall organiza on was (circle all that apply)… clear

easy to follow

confusing

so-so

problema c

Other: ……………………………………………………………………………………….

My favorite thing about this presenta on was… The areas to be concerned with are…

Figure 2: Organiza on feedback.

IATEFL TEASIG

98

Best of TEASIG Vol. 2

Comments


Delivery of speech is as important as the content and organiza on. Awareness of the needs of the audience and adjus ng the content to their needs is an invaluable skill. Star ng with material awareness, pu8ng what to say in a logical sequence, and ensuring the speech will cap vate the audience require effec ve delivery skills. When presen ng in front of an audience, there is a desired image which requires a presenter to be enthusias c, confident, pleasant and calm with a relaxed appearance. Speaking slowly, accurately and clearly by showing appropriate emo on and feeling rela ng to the topic are the essence in public speaking. Furthermore, a presenter should establish rapport with the audience and adjust the tone, level and pitch of their voice to ensure it is loud enough to project to the back of the room. Keeping eye contact with the audience will enable a presenter to gauge audience reac on and adjust accordingly. Maintaining eye contact may be hard for shy people; one suggested way to begin is to look slightly above the level of audience eyes, but nothing can be beNer than genuine eye contact. (See Figure 3 for the analy cal criteria provided for student assessors who give delivery feedback.)

Oral Presenta on Peer Feedback Group 3 — Delivery You will be giving your classmate feedback on his or her delivery. Pay special aNen on to how the presenter uses physical gestures and eye contact as well as how well you can understand what she or he is trying to say. Ra ng (1=very poor, 5=very good)

Criteria Did the speaker use adequate eye contact?

12345

Were the speaker’s posture and physical gestures distrac ng?

12345

Did the speaker refer to the handout in an effec ve way?

12345

Could you understand what the speaker was saying?

12345

Was the speaker’s tone of voice, intona on and volume good?

12345

Did the speaker stay within the given me limit?

12345

Comments

The overall delivery gave me the feeling that the speaker was (circle all that apply)… nervous

calm

prepared

not taking things seriously

shy

confident

unprepared

managing nerves well

Other: ……………………………………………………………………………………….

My favorite thing about this presenta on was… The areas to be concerned with are…..

Figure 3: Delivery feedback.

IATEFL TEASIG

99

Best of TEASIG Vol. 2


While oral presenters give their presenta ons using slides with bullet points, visuals or media extracts, the audience (students assessors) provides peer feedback using the above-men oned analy cal criteria, and the class teacher gives feedback using the holis c criteria which have recently been revised by the Sabancı University Freshman English team. The criteria both have numerical and leNer grades with descriptors which highlight content, organiza on, delivery, quality of slides, language, prepara on and research (see Appendix 1). Prac ce and rehearsal of speech prior to the presenta on can be done at ease and comfortably, in front of a mirror, family or friends. Using a tape-recorder and listening to oneself are recommended. Video-taping a presenta on and analyzing it is a useful tool for self-development. This prac ce will raise awareness of the strong and weak points for improvement. This paper highlights the crucial role of analy cal and holis c criteria to provide feedback and to evaluate performance in oral assessment. Analy cal criteria are used not only to provide feedback to peers, but also to raise feedback-providers’ awareness of the areas that need to be taken into considera on when preparing and giving an OP. Both types of criteria (see Appendix 1) are used simultaneously, and the presenter’s self-evalua on should also be added to this process.

(† Deceased May 2019) Editor’s note: This ar cle has been reproduced by permission of Irem Guceri.

Appendix 1

presenta on exhibits a very high level of prepara on and research, understanding of content, organiza on, audience awareness, and presenta on skills.

• • • • •

• • •

IATEFL TEASIG

100

Best of TEASIG Vol. 2


presenta on shows a good level of prepara on and research, understanding of content, organiza on, audience awareness, and presenta on skills.

• • • • •

• • •

presenta on shows a fair level of prepara on and research, understanding of content, organiza on, audience awareness, and presenta on skills.

• • •

• • •

IATEFL TEASIG

101

Best of TEASIG Vol. 2


prepara on and research, understanding of content, organiza on, audience awareness, and presenta on skills.

• • • • • • • •

• •

prepara on and research, understanding of content, organiza on, audience awareness, and presenta on skills.

• •

• • • • •

• • •

IATEFL TEASIG

102

Best of TEASIG Vol. 2


2010

IATEFL TEASIG

103

Best of TEASIG Vol. 2


Developing a placement test: a case study of the Bri sh Council ILA project Barry O’Sullivan, Roehampton University, London, UK (now Bri sh Council, UK) Original publica on date: IATEFL TEASIG Newsle er March 2010

Introduc on In this paper I will outline the process by which a major placement test was developed. The client for this project was the Bri sh Council and the test was developed jointly by the Centre for Language Assessment Research (CLARe) at Roehampton University and the Centre for Research in English Language Learning and Assessment (CRELLA) at the University of Bedfordshire. The paper will begin with the requirements of the tender and the poten al problems associated with these, then move on to briefly outline how the development team went about the whole process. The tender • When the tender for this project was made public, we decided aPer some serious debate to consider submi8ng a proposal to the Bri sh Council. The debate occurred because we had some real worries over the requirements outlined in the tender document, which can be summarised as: •

A wriNen level test(s) which is appropriate for the network of teaching centres and which places students with a suitable degree of accuracy.

The test should be appropriate for all levels from pre-A1 to C2.

The test(s) should be a posi ve experience for the customer and offer an appropriate level of challenge.

The test should take a maximum of 40 minutes.

Test(s) will be rolled out to the network of teaching centres in September 2008 (start date was to be August 2007).

Of all these, the demand that the test should test across all six CEFR levels was the biggest hurdle, with the 40minutes’ me limit running a close second.

The problem with having to test across six levels (seven if we count the addi onal request that a pre-A1 level should be included) is that all tests suffer from a boNoming off and a topping off effect. This means that instead of ge8ng a nice clean straight-line graph of score versus ability, score rising equally with ability, we get an ‘s’ shaped graph. In Figure 1, we can see the problem as soon as we try to place the six boundaries for cut-scores. The test might work very well at levels A2, B1 and B2, less well at levels A1 and C1, and not at all at levels A0 and C2. I should stress here that level A0 does not actually exist on the CEFR scale; it is just the term we used to signify the pre-A1 level.

IATEFL TEASIG

104

Best of TEASIG Vol. 2


The me issue was problema c as it is well known that, for a placement test to be successful, we need to have the maximum number of items possible. This means that there are sufficient data points to allow us to make clear and meaningful dis nc ons between levels. If we had been able to take the decision to create a grammar-based 80-item MCQ test or set of tests, things would have been easier. However, another of the limita ons of the tender was that this could not be done as it would nega vely impact on the customer experience.

Figure 1: Problem with using a single test to cover all levels. The planning stage With all this in mind, we set up a project planning team, consis ng of members from both CLARe and CRELLA. APer some discussion, we decided: •

that a single test would not work, and

to include grammar, vocabulary and reading sec ons.

The design we came up with was for three linked tests each covering three levels, and between them covering all seven levels.

We also decided that prospec ve test takers should first take a filter test or task to decide which of the three papers they might sit. The exact nature of this task was not clear at this stage, but it was decided that some measure of flexibility would be needed to sa sfy the needs of the wide range of Bri sh Council interna onal Teaching Centres.

Figure 2: The planned system.

At this point, we also decided to give the test a name. It was to be known as the Bri sh Council Interna onal Language Assessment, or ILA. We also created a project management group (PMG), consis ng of individuals from the development teams at CLARe and CRELLA as well as representa ves from the Bri sh Council. This group was to meet regularly during the project, and each sub-group (CLARe for grammar and vocabulary, and CRELLA for reading) was to formally report to the PMG on a monthly basis.

The design stage Before thinking about the contents of the test itself, we first needed to gather informa on about the prospec ve candidates and about the teaching centres. To do this we asked the Bri sh Council to help. Following a series of conversa ons with staff at the Head Office in London and with individual senior teachers interna onally, we iden fied the following key features:

IATEFL TEASIG

105

Best of TEASIG Vol. 2


range of levels, ages, purposes for studying English

58 teaching centres means broadest range of language and cultural backgrounds.

These suggested that we should: •

use a number of different task/item types

test a range of knowledge & skills

ensure that topics & language are appropriate for candidature

model progression in all skills/knowledge sec ons.

The next stage was to look to the tes ng context for clues as to how the test might be structured. We recognised the fact that the project teams in the UK provided the assessment exper se while local knowledge and exper se was provided by senior teachers. With this in mind, we decided to base our later decisions on feedback we received from these local experts to a series of ques onnaires (all delivered via the web). Thanks to these individuals, we were able to iden fy the appropriate task & item types while at the same me clarifying what should be tested. Limita ons of space mean that we cannot go into the details of all stages of the process here, so I will briefly outline what we did for the grammar paper. A review of the assessment literature suggests that grammar is the best predictor of both reading and listening ability (Shiotsu, 2003; Joyce, 2008). However, there is no descrip on of gramma cal progression in CEFR. We therefore looked to any available published work on gramma cal progression (looking to examina on boards and publishers). We found that the lists provided by City & Guilds at all six CEFR levels were par cularly useful, and obtained permission from them to take their lists as a star ng point in our process. Following a series of discussion with UK-based teachers and grammarians, we finally decided on a list of almost 120 forms divided into 16 sec ons. The sec ons were: simple sentences

ar cles

main verb forms

determiners

other verb forms

adjec ves

modals

adverbs

nouns

intensifiers

pronouns

punctua on

possessives

spelling

preposi ons and preposi onal phrases

discourse

Armed with this list, we then developed a detailed ques onnaire for Bri sh Council teachers. When the feedback from this ques onnaire was analysed (we used Mul -faceted Rash analysis to place all of the forms on a single scale), we found that there was a high level of agreement among the teachers as to the likely progression of learners through the forms. We therefore decided to base our specifica on for the papers on this high-quality feedback.

IATEFL TEASIG

106

Best of TEASIG Vol. 2


The development stage Preliminary working specifica ons were first drawn up – there were to be a total of nine versions of these specifica ons before the end of the project. Each of the three sec ons (grammar, vocabulary and reading) were looked at separately as all three needed some major decisions. Briefly, these were: Grammar Format – we opted for 3-op on MCQs based on feedback from Bri sh Council teachers. Vocabulary Progression based on work frequency – the format at the lower levels was to be simple word recogni on and defini on, while at the higher levels it was to be a matching task with two elements, one looking at synonyms and the other at colloca on. This proved to be a very tricky item to write though very successful in discrimina ng weaker from stronger students. Reading No modern reliable model of reading progression existed, so the CRELLA team worked to update the model suggested by Urquhart & Weir (1989). Item types were to start at word and phrase level, and finish (at C2) with a complex two-text integra on task, which proved difficult even for na ve-speakers of English. Teams of item-writers were commissioned to write items which were trialled by the team on well over 1,000 learners at the different levels. All responses were analysed and some minor changes to the design were necessitated: for example, with the grammar, it was clear that the discourse-related forms were proving to be quite difficult and so minor changes to the specifica ons in which the expected level of forms were defined were made. Following a period of trialling and analysis, the final test versions were prepared for administra on. At the same me, we were devising a series of pre-test tasks that could be used as filters. APer a period of consulta on with teaching centres, we finally decided to create a total of five tasks. Figure 3 shows how these would help to decide which paper each learner should sit. The idea was that senior staff working at a teaching centre could decide which task was most suitable for that centre. Centres might use one or more tasks, or supplement a task with a task of their own provided it could be demonstrated to produce consistent results.

Figure 3: The filter test system.

IATEFL TEASIG

107

Best of TEASIG Vol. 2


The administra on stage Three complete versions of the ILA were handed over to the Bri sh Council on schedule in August 2009. The following six months were then used to roll out the test to the network of teaching centres. Roll-out events were held in London, Kuala Lumpur and Cairo, at which senior teachers from across the network were introduced to the test and to the ra onale that lay behind it. Even at this late stage, we were very keen to hear their feelings about and comments on the test. Some minor changes were made to the specifica ons and to the filter-tasks at this point in response to comments made by these groups of teachers. In fact, one of the filter tasks was developed and introduced based on feedback from one of the three groups. Another change that was made was to the answer sheets so as to make them easier to score. Teaching centres were encouraged to forward all completed answer sheets to CLARe so that the tests could be carefully analysed and a clear picture of how well (or not) the test was working built up. At the me of wri ng, this roll-out period is drawing to a close and the final few centres are now adap ng the ILA. The monitoring stage As could be seen from the above sec on, the monitoring phase has already begun. It is not at all unusual that a test developer starts the monitoring process from the earliest stages of the development cycle, so including this monitoring sec on here is a bit of fudge, since it takes place throughout the cycle. Monitoring started with the review of all planning decisions by the PMG and con nued through to the itemwri ng, and the trialling and analysis phases. The more formal monitoring phase began with the administra on of the ILA and will con nue throughout 2009 and into early 2010. Informa on about the administra on of the test and about the performance of the test items and test sec ons is being rou nely gathered so that a comprehensive report can be drawn up for the Bri sh Council in the spring of 2010. So far, things are going well, with over 4,000 answer sheets now gathered. This number is expected to increase drama cally before the final deadline in early 2010 so that the final analyses can be based on a significant dataset. In the autumn of 2009, we plan to develop a web-delivered ques onnaire for administra on across the teaching network in order to gather feedback from as broad a range of staff as possible and, if feasible, from students as well. This will allow us to learn more about the predic ve quali es of the test and to explore the impact the test is having on the teaching centres and on the learners who take it. Conclusion Like just about any test development project, the ILA project has been a real learning experience for all concerned. The Bri sh Council, in their original tender, made a series of demands that, at first sight, seemed too challenging to be met. The proposal we came up with challenged the people in the Bri sh Council to rethink the nature of exper se and to essen ally look to a partnership to develop and deliver the test. This is a very important issue, in point of fact. There is a mistaken understanding of the no on of test exper se, in my opinion. It was very clear to us on this project that the type of exper se brought to the table by people like myself and Cyril Weir (the Director of CRELLA) differed markedly from that brought by the Bri sh Council team and even more drama cally by the senior teachers who par cipated in the development project. While we ‘interna onal experts’ were able to bring our knowledge of test development theory and prac ce to bear, we would have been very hard pushed to create a meaningful test without the level of context or domain knowledge provided by our Bri sh Council partners.

IATEFL TEASIG

108

Best of TEASIG Vol. 2


During the project we also learnt the value of (amongst other things): Project management A project of this size needs a competent and realis c project manager to ensure that the project plan is adhered to. (We were very lucky to have had the services of such a person in Michelle Flinn of Roehampton University.) Problem-solving All problems can be solved as long as the personnel involved are willing to listen and to accept construc ve cri cism of their ideas (and that cri cism is not taken personally). Taking me This was a demanding project. We were under constant me pressure from day one to comple on, and the whole team were expected to work long and unsocial hours, par cularly at key mes. Teamwork We were fortunate to have a pair of established teams with a long history of working on projects together, both within and between the teams. Without the teamwork and camaraderie, a project of this size and with such ght deadlines could not have been delivered. We also built up a strong working rela onship with our Bri sh Council colleagues, which was vital. Rigour The Bri sh Council wanted a world-leading placement test for its teaching network. In order to deliver this, every element of the test was based on rigorous background research and feedback. This ranged from item and task formats to delivery and scoring systems. We believe that this test has met the expecta ons of the Bri sh Council, but we recognise the degree of hard work that was required for this to be the case.

References Joyce, P. (2008). Componen ality in tests of Listening Comprehension. Unpublished PhD thesis submiNed to Roehampton University, London, UK. Shiotsu, T. (2003). Vocabulary breadth, Syntac c knowledge, and Word Recogni on Efficiency in EFL Reading. Unpublished PhD thesis submiNed to The University of Reading, UK. Urquhart, S. and Weir, C. (1998). Reading in a second language: Process, product and prac ce. London: Longman.

IATEFL TEASIG

109

Best of TEASIG Vol. 2


Alterna ve assessment: the use of porTolios around the globe Dina Tsagari, Oslo Metropolitan University, Oslo Original publica on date: IATEFL TEASIG Newsle er March 2010

The paper discusses how current theories of language learning and teaching have focused aNen on on ‘alterna ves in assessment’, the movement that has recently made its appearance in the field of language tes ng and assessment, and defines their basic characteris cs and types. In the sec ons to follow, the paper presents one of these methods in detail, namely porFolios, and provides a detailed descrip on of how these are used in three different contexts: USA, Australia and Europe. It concludes with a discussion of a number of issues in the hope that they will serve as a springboard for further research and experimenta on in the field.

Introduc on Language tes ng, generally associated with formal assessment procedures such as tests and examina ons, is a vital component of instruc onal language programmes throughout the world. However, educators and cri cs from various backgrounds have expressed their dissa sfac on towards the use of tests and exams as the primary measure of student achievements. More specifically, they argue that: a)

Informa on about the products and, more importantly, about the process of learning and the ongoing measurement of student growth needed for forma ve evalua on and for planning instruc onal strategies cannot be gathered by conven onal tes ng methods (Wiggins, 1994; Wolf et al., 1991).

b)

Tests and exams, especially high-stakes ones, are said to have nega ve ‘washback effects’ (Alderson & Wall, 1993) experienced on a number of levels, e.g.

c)

I.

Curricular level. Tests of this kind are said to be responsible for narrowing the school curriculum by direc ng teachers to focus only on those subjects and skills that are tested (Madaus, 1988; Shepard, 1991);

II.

Educa onal level. Cri cs also point out that high-stakes examina ons affect the methodology teachers use in the classroom (Shepard, 1990; Wall & Alderson, 1993); the range, scope and types of instruc onal materials teachers use (Hamp-Lyons, 1998; Tsagari, 2007); students’ learning and studying prac ces (Black & Wiliam, 1998).

III.

Psychological level. Tests are also said to have undesirable effects on students’ psychology (Paris et al., 1991; Tsagari, 2007) and on teachers’ psychology (Gipps, 1994; Johnstone et al., 1995).

It is also believed that teacher-made tests, if used as the sole indicators of ability and/or growth of students in the classroom, are likely to generate faulty results which cannot monitor student progress in the school curriculum (Barootchi & Keshvarz, 2002; O'Malley & Valdez Pierce, 1992), tend to overemphasise the grading func on more than the learning func on of the language learning process, and create compe on between pupils rather than personal improvement, leading to de-mo va on and making students lose confidence in their own capacity to learn (Black, 1993; Crooks, 1988).

Other than the above, interest groups represen ng both linguis cally and culturally diverse students as well as students with special educa on needs have called for a change to approaches to assessment that are more

IATEFL TEASIG

110

Best of TEASIG Vol. 2


sensi ve and free of the norma ve, linguis c and cultural biases found in tradi onal tes ng in order to ensure equity in educa onal opportuni es and achieve educa onal excellence for all students (Hamayan, 1995; HuertaMacias, 1995; Soodak, 2000). What is ‘alterna ve assessment’? As a consequence of all the above cri cism, a shiP in prac ce made its appearance that has come to be known as the ‘alterna ve assessment’ movement (Clapham, 2000; Gipps & Stobbart, 2003; Shohamy, 1998; Smith, 1999). However, there is no single defini on of ‘alterna ve assessment’ in the relevant literature. For some educators, alterna ve assessment is a term adopted to contrast with standardised assessment, e.g. professionally-prepared objec ve tests consis ng mostly of mul ple-choice items, especially in the US tradi on (Huerta-Macias, 1995). Hamayan sees that alterna ve assessment “refers to procedures and techniques which can be used within the context of instruc on and can be easily incorporated into the daily ac vi es of the school or classroom” (1995: 213). In a more recent publica on, Alderson and Banerjee (2001: 228) provide the following defini on: ‘Alterna ve assessment’ is usually taken to mean assessment procedures which are less formal than tradi onal tes ng, which are gathered over a period of me rather than being taken at one point in me, which are usually forma ve rather than summa ve in func on, are oTen low-stakes in terms of consequences, and are claimed to have beneficial washback effects. Types of alterna ve assessment The following are some of the most commonly used types of alterna ve assessment: conferences, debates, diaries/ journals, drama za ons, games, observa ons, porFolios, projects, self-/peer-assessment, think-alouds, debates, demonstra ons, etc. (based on Brown, 1998; Cohen, 1994; Genesee & Upshur, 1996; Ioannou-Georgiou & Pavlou, 2003; O’Malley & Valdez Pierce, 1996). In the following sec ons I will discuss how porFolios, as one method of alterna ve assessment, are used in the classroom and beyond. The nature of porTolio assessment Emana ng from the world of business, ‘porFolios’ or ‘process-folios’ (Gardner, 1993) have been used in many fields for quite some me (Jongsma, 1989). Architects, photographers, painters, graphic designers, ar sts, journalists, etc. have used porFolios to record and demonstrate their craP and display their best work for employment purposes. PorFolios have been transferred to educa on quite recently, that is, in the 1990s, and have been used in a variety of educa onal fields, e.g. in chemistry (Shay, 1997), science (Williams, 2000) and general educa on (Tillema, 1997). In foreign and second language teacher educa on, professional porFolios have been used as a documenta on of staff development, e.g. in training principals and managers (Tillema, 1998) as well as in teacher educa on (Bailey et al., 1998). As a tool for assessing students’ language performance, porFolios have been hailed as a major innova on in language teaching and learning. As such, porFolios have been used with various age groups e.g. from school to university (Baack, 1997; Smith, 1996) and for various purposes e.g. assessing general language skills (Alderson, 2000; Brown, 1998; Weigle, 2002) as well as assessing English for specific purposes (Douglas, 2000). What is an assessment porTolio? Ever since the appearance of porFolios on the educa onal assessment scene, various defini ons have been proposed in the interna onal literature which focus on different aspects. For McLean (1990), for instance, a porFolio is a systema c and cumula ve folder of learned material. Caudery (1998) comments that in porFolio assessment, a student's achievement is assessed not on the basis of a single performance, as in a conven onal

IATEFL TEASIG

111

Best of TEASIG Vol. 2


examina on, but through a number of separate pieces of work produced over an extended period of me. Finally, Tsagari (2000) defines porFolios used for classroom assessment purposes as a collec on of a student’s work, usually constructed by selec on from a larger corpus and oPen presented with a reflec ve piece wriNen by the student to jus fy the selec on. She also stresses that the involvement of the student in reviewing, selec ng and reflec ng on their work is central in the compila on of their porFolios. What are the basic characteris cs of porTolios? PorFolios are considered to serve the following two func ons in the total learning process: •

A pedagogic func on – PorFolios are seen as a tool for self-organised language learning: learners learn to collect authen c data of their own work, record it in suitable ways, and reflect on their language learning experiences, thus enhancing learning and assis ng learners to develop their learning skills.

A repor ng func on – PorFolios are seen as a tool for documen ng and repor ng language learning outcomes to a variety of relevant stake-holders (teachers, ins tu ons, parents, administrators, etc.) and for a variety of purposes, such as for giving marks in schools or ins tu ons, for applying to a higher educa on ins tu on, or documen ng language skills when applying for a job, etc.

Strengths and weaknesses of porTolio assessment The advantages of using porFolio assessment in classroom-based assessment have been widely discussed in the EFL/ESL literature. For example, Hamp-Lyons (1996) and Caudery (1998) note that porFolio assessment can: •

remove the me constraint of formal tes ng;

allow learners to display their overall performance rather than their performance at a par cular me on a par cular day;

offer the learner richer feedback and opportuni es to respond to such feedback;

have good face validity as the assessment conducted has a clear rela onship to what has been taught;

have a posi ve washback effect by increasing student involvement, awareness and mo va on throughout the course;

allow student involvement in assessing their own progress, especially through discussion of their work with the teacher during the course;

offer an improved basis for assessment – students are assessed on a sample of their work which is both larger and usually more representa ve than that obtained in a single examina on; and

remove the stress associated with formal examina ons, which can of course affect the work produced.

PorFolios as an assessment tool have been implemented in various contexts. Case studies of porFolio assessment in EFL/ESL classes have been reported in Greece (see Tsagari, 2001 & 2005), Iran (Barootchi & Keshavarz, 2002), Israel (Smith, 1996 & 2002) and Lebanon (Shaaban, 2000), etc. However, aNempts have been made to standardise porFolios at a na onal and transna onal level. The following will briefly report on the aNempts of describing porFolio assessment in the USA, Australia and Europe. PorTolio assessment in North America PorFolio assessment has been a major focus of aNen on in American literature and has been originally used as a system of informal classroom-based assessment (Smolen et al., 1995). However, over the years, porFolios evolved

IATEFL TEASIG

112

Best of TEASIG Vol. 2


from being a system of informal, classroom-based assessment into being a formal means of evalua on, supplemen ng or even replacing tradi onal standardised tests in many states (Hamp-Lyons & Condon, 2000). Various aNempts have been made to use porFolios as an assessment tool on a district level (Callahan, 1999; Gomez, 2000; Wolfe et al., 1999). What characterised most of these aNempts was an amount of variability and lack of standards across schools and districts. This comparability issue has brought with it the need for a more careful evalua on of porFolio assessment, in par cular closer scru ny to see whether porFolio assessment has sufficient rigour to contribute to summa ve assessment and/or public cer fica on. Hamp-Lyons (1996: 156-158) men oned problems concerned with variability, transparency, costs, validity and reliability of porFolio assessment based on her experience with wri ng porFolios in various American states. Furthermore, Hamayan (1995) pointed out that alterna ve assessment procedures in the US context, such as porFolios, have yet to “come of age”, not only in terms of demonstra ng beyond doubt their usefulness in measurement terms, but also in terms of being implemented in mainstream assessment, rather than in informal classroom-based assessment. She argued that consistency in the applica on of such alterna ve assessment methods is s ll a problem, that mechanisms for their thorough self-cri cism and evalua on are lacking, that some degree of standardisa on will be needed, and that their financial and logis c viability remain to be demonstrated if they are to be used for large-scale assessment. PorTolio assessment in Australia In Australia, porFolio assessment, especially for primary-aged ESL students, took a much ghter approach (Mincham, 1995). This resulted in statements giving clear guidelines for the numbers and types of ac vity-based samples of wri ng and speaking that should be collected and included in students’ porFolios each year (see Tsagari, 2004 for samples). The samples all exemplified certain genres (Mar n, 1984), which were described in great detail so that all teachers not only have a descrip on of each genre, but the same descrip on as well. Reading and listening were assessed through teachers' observa ons and records (the work of the ‘Australian School’ being summarised in detail by Kay and Dudley-Evans, 1998). It is not only the framework of the porFolio which is more structured in the Australian approach, marking is also more standardised. Accompanying each genre descrip on was a corresponding ‘assessment proforma’ se8ng out the language features which are to be assessed and making explicit the criteria to be used. The proformas were accompanied by ‘moderated samples’ of students' spoken and wriNen language (see Tsagari, 2004 for examples). Mincham (1995: 87-88) provides anecdotal evidence of the advantages of the Australian approach to porFolio assessment, with its emphasis on specified genre and clear criteria of assessment, and emphasises that the Australian system has characteris cs which American approaches to porFolio assessment generally lack – the procedures are explicit, criterion-based, standardised, relevant and task-based. However, the Australian approach remains controversial. It is oPen said to be prescrip ve and rigid, and Kay and Dudley-Evans (1998: 311) argue that such an approach “… disempowers rather than empowers.…”. They add, however, that “… benefits realised at later and higher levels of wri ng [through the] transfer of knowledge of genre outweigh the possible fossiliza on of wri ng formats, models and conven ons and apparent prescrip vism.” PorTolio assessment in Europe The European Language Por?olio (ELP), the latest in a long series of prac cal tools produced by the Language Policy Division of the Council of Europe (Council of Europe, 2001), was launched in 2001. Unlike the American or Australian porFolios, the ELP is intended to:

IATEFL TEASIG

113

Best of TEASIG Vol. 2


facilitate mobility in Europe by presen ng language qualifica ons in clear and interna onally comparable ways;

encourage the learning of foreign languages;

emphasise the value of mul -lingualism and mul -culturalism, and contribute to mutual understanding in Europe;

promote autonomous learning and the ability to assess oneself.

What is the ELP? The Council of Europe sees a porFolio as a document in which those who are learning or have learned a European language – whether at school or outside school – can record and reflect on their language learning and cultural experiences (Christ, 1998; LiNle, 2002). Three parts of an ELP Despite varia ons in its format across European member states, the ELP has been developed for different sectors of educa on/training, e.g. primary educa on (10-12 years), secondary educa on (11-15/16 years), and post-school educa on/training (15/16 years upward). It is divided into the three following sec ons: a)

Language Passport: This sec on provides an overview of the individual's proficiency in different languages at a given point in me and contains the following ‘hard pages’: a profile of language skills – a self-assessment grid – brief presenta on of lingual and inter-cultural experiences – cer ficates and diplomas.

b) Language Biography: This sec on facilitates learners’ involvement in planning, reflec ng upon and assessing their learning process and progress. It consists of the following elements: the foreign languages learnt outside schooling curriculum – personal strategies in developing the skill ‘I learn how to learn’ – instruc ons for filling in the self-assessment pages – self-assessment pages – descrip on of lingual and inter-cultural experiences. c)

Dossier: This part of the ELP most closely corresponds to the ar st’s porFolio. It is a collec on of pieces of personal work of various kinds which clearly show what one has achieved in different languages and documents, and illustrates achievements or experiences recorded in the Language Biography or Passport. This part of the ELP can contain: summary of contents with descrip ons of the pieces of work included and a selec on of items of personal work, e.g. outcomes of project work, sample leNers, memoranda, brief reports, audio or video casseNe – diplomas, cer fica ons, cer ficates – quizzes, wriNen tests – – exercises, ac vi es – tape-recorded or videotaped ac vi es, etc.

Electronic ELPs During the first years of the ELP project, aNempts to create electronic ELPs were made (Stoks, 1999) but no fullyfledged electronic ELPs have been created so far, due to cost and difficul es in crea ng sa sfactory electronic implementa ons. However, having a completely electronic, web-based ELP is an idea that is being inves gated further. Websites are now being designed for the distribu on of the contents of an ELP, as well as auxiliary materials. Future goals In order to achieve its full poten al and goals, the ELP has a long way to go. To embed the ELP well into its environment and gain enough recogni on within educa onal systems and among employers, various types of suppor ng measures are necessary, such as collabora on with authori es, establishing links between the ELP and the relevant examina ons and diplomas, combining the ELP with other innova ons, involvement of teacher associa ons or unions, involvement of groups of users in the development of an ELP, conduc ng ini al teacher training and in-service teacher training, etc.

IATEFL TEASIG

114

Best of TEASIG Vol. 2


Conclusion Even though the interna onal literature recognises that porFolios are becoming a vital assessment strategy within the field of educa on, what should be kept in mind is that porFolios need to be augmented by other formal and informal methods of assessment and not be used as the sole assessment approach. Informa on gathered from porFolios coupled also with informa on collected from classroom tests and other methods of classroom-based assessment can beNer equip students, parents and professionals in making informed decisions regarding educa onal goals and instruc onal objec ves (Barootchi & Keshvarz, 2002; Antonopoulou & Manoli, 2003). However, further theore cal and empirical work needs to be done to examine porFolio use and other alterna ve assessment prac ces in depth. For example, we need to understand how the aspects of porFolio assessment are actually accomplished in classroom interac on and to develop appropriate theory and research methods in the study of such highly complex and dynamic teaching-learning-assessing interfaces before any definite conclusions about their posi ve effects on teaching and learning are drawn. Therefore, the present paper makes an urgent appeal to future researchers with an interest in the area to conduct empirical research in this exci ng field within EFL se8ngs.

References Alderson, J. C. and Banerjee, J. (2001). Language tes ng and assessment (Part 1). Language Teaching, 34: 213-236. Alderson, J. C. (2000). PorFolio Assessment. In Alderson J. C. Assessing Reading. Cambridge: Cambridge University Press. Alderson, J. C. and Wall, D. (1993). Does washback exist? Applied Linguis cs, 14: 115-129. Antonopoulou, N. and Manoli, P (2003) PorFolios: An Alterna ve Method of Asessment. In New Direc ons in Applied Linguis cs, Proceedings of the 13th Interna onal Conference of the Greek Applied Linguis cs Associa on (GALA), School of Philosophy, Aristotle University of Thessaloniki, Thessaloniki, Greece. Volume 9: 423:433. Baack, E. (1997). PorFolio Development: An Introduc on. English Teaching Forum, 35/2. Bailey, K. M. (1998). Alterna ve’ assessments: Performance tests and porFolios. In Bailey, K. M. Learning about Language Assessment: Dilemmas, Decisions, and Direc ons. USA: Heinle & Heinle Publica ons. Barootchi, N. and Keshavarz, M. H. (2002). Assessment of achievement through porFolio and teacher-made tests. Educa onal Research, 44: 279-288. Black, P. J. (1993). Forma ve and summa ve assessment by teachers. Studies in Science Educa on, 21: 49-97. Black, P. and Wiliam, D. (1998). Assessment and Classroom Learning. Assessment in Educa on: Principles, Policy and Prac ce, 5: 7-74. Brown, J. B. (Ed.) (1998). New Ways of Classroom Assessment. USA: TESOL. Callahan, S. (1999). All done with the best of inten ons: One Kentucky High school aPer six years of state porFolio tests. Assessing Wri ng, 6: 5-40. Caudery, T. (1998). PorFolio assessment: A viable op on in Denmark?’ Sprogforum, 11:51-54. Christ, I. (1998). European Language PorFolio. Language Teaching, 31:214-189. Clapham, C. (2000). Assessment and Tes ng. Annual Review of Applied Linguis cs, 20: 147-161. Cohen, A. D. (1994). Assessing language ability in the classroom. Boston, MA: Heinle and Heinle. Council of Europe (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge: Cambridge University Press. Crooks, T. J. (1988). The impact of classroom evalua on prac ces on students. Review of Educa onal Research, 58: 438-481.

IATEFL TEASIG

115

Best of TEASIG Vol. 2


Gardner, H. (1993). Assessment in context: The alterna ve to standardized tes ng. In Gifford, R. and O’Connor, M. C. (Eds.) Changing assessments: Alterna ve views of ap tude, achievement and instruc on. MassachuseNs, USA: Kluwer Academic Press. Genesee, F. and Upshur, J. (1996). Classroom-based Evalua on in Second Language Educa on. Cambridge: Cambridge University Press. Gipps, C. and Stobbart, G. (2003). Alterna ve Assessment. In Kellaghan, T. Stufflebeam, D. L. and Wingate, L. A. (Eds.) Interna onal Handbook of Educa onal Evalua on. Dordrecht: Kluwer Academic Publishers. Gipps, C. V. (1994). Beyond Tes ng: Towards a theory of educa onal assessment. London: Falmer Press. Gomez, E. (2000). Assessment porFolios: Including English language learners in large scale assessment. In ERIC Digest (ED447725). ERIC Clearinghouse on Languages and Linguis cs, Washington, DC. Hamayan, E. V. (1995). Approaches to alterna ve assessment. Annual Review of Applied Linguis cs, 15: 212-226. Hamp-Lyons, L. (1996). Applying ethical standards to porFolio assessments in wri ng in English as a second language. In Milanovic, M. and Saville, N. (Eds.) Performance Tes ng, Cogni on and Assessment. Cambridge: Cambridge University Press. Hamp-Lyons, L. (1998). Ethical Test Prepara on Prac ce: The Case of the TOEFL. TESOL Quarterly, 32: 329-337. Hamp-Lyons, L. and Condon, W. (2000). Assessing the Por?olio. New Jersey: Hampton Press Inc. Cresskill. Huerta-Macias, A. (1995). Alterna ve assessment: Responses to commonly askedques ons. TESOL Journal, 5: 8-11. Ioannou-Georgiou, S. and Pavlou, P (2003). Assessing Young Learners. Oxford: Oxford University Press. Jongsma, K. (1989). PorFolio assessment. The Reading Teacher, 43: 264-265. Johnstone, P., Guice, S., Baker, K., Malone, J. and Michelson, N. (1995). Assessment of teaching and learning in literature-based classrooms. Teaching and Teacher Educa on, 11: 359-371. Kay, H. and Dudley-Evans, T. (1998). Genre: What teachers think. ELTJ, 52: 308-314. LiNle, D. (2002). The European Language PorFolio: Structure, origins, implementa on and challenges. Language Teaching, 35: 182-189. Madaus, G. F. (1988). The influence of Tes ng on the Curriculum. In Tanner, L. N. (Ed.) Cri cal Issues in Curriculum: 87th Yearbook for the Na onal Society for the Study of Educa on. Chicago: University of Chicago Press. Mar n, J. R. (1984). Language, register and genre. In Chris e, F. (Ed.) Language Studies: Children's Wri ng: Reader. Geelong: Deakin University Press. Mincham, L. (1995). ESL student needs procedures: An approach to language assessment in primary and secondary contexts. In Brindley, G. (Ed.) Language Assessment in Ac on. Sydney: Na onal Centre for Language Teaching & Research, Macquarie University. O’Malley, M. and Valdez Pierce, L. (1996). Authen c assessment for English language learners. New York: AddisonWesley. Paris, S. G., Lawton, T. A., Turner, J. C. and Roth, J. L. (1991). A developmental perspec ve on standardized achievement tes ng. Educa onal Researcher, 20:12-20. Shaaban, K. (2000). Assessment of young learners’ achievement in ESL classes in the Lebanon. Language, Culture and Curriculum, 13: 306-317. Shay, S. (1997). PorFolio assessment: A catalyst for staff and curricular reform. Assessing Wri ng, 4: 29-51. Shepard, L. A. (1990). Inflated test score gains: Is the problem old norms or teaching the test? Educa onal Measurement: Issues and Prac ce, 9:15-22. Shepard, L. A. (1991). Will na onal tests improve student learning? CSE Technical Report 342, CREEST, University of Colorado, Boulder.

IATEFL TEASIG

116

Best of TEASIG Vol. 2


Shohamy, E. (1998). Alterna ve assessment in language tes ng: Applying a mul plism approach. In Li, E. and James, G. (Eds.) Tes ng and Evalua on in Second Language Educa on. Hong Kong: Language Centre, Hong Kong University of Science and Technology. Smith, K. (2002). Learner PorFolios. English Teaching Professional, 22: 39-41. Smith, K. (1999). Language Tes ng: Alterna ve Methods. In Spolsky, B. (Ed.) Concise Encyclopedia of Educa onal Linguis cs. Amsterdam: Elsevier. Smith, K. (1996) Ac on research on the use of porFolio for assessing foreign language learners. IATEFL Tes ng Newsle er, 7: 17-24. Smolen, L., Newman, C., Wathen, T. and Lee, D. (1995). Developing student self-assessment strategies. TESOL Journal, 5: 22-27. Soodak, L. C. (2000). Performance assessment: Exploring issues of equity and fairness. Reading and Wri ng Quarterly, 16: 175–178. Stoks, G. (1999). The Dutch porFolio project. Babylonia, 1: 54-56. Tillema, H. (1998). Design and validity of a porFolio instrument for professional training. Studies in Educa onal Evalua on, 24: 263-278. Tillema, H. (1997). Assessment before development. Lifelong Learning in Europe, 1: 37. Tsagari, K. (2000). Using Alterna ve Assessment in Class: The case of PorFolio Assessment. ASPECTS, 61: 6-24. Tsagari, K. (2001). The implementa on of ‘PorFolio’ assessment in English classes in Greek state schools: Teachers’ percep ons. In The Contribu on of Language Teaching and Learning to the Promo on of a Peace Culture, Proceedings of the 12th Interna onal Conference of the Greek Applied Linguis cs Associa on (GALA), School of Philosophy, Aristotle University of Thessaloniki, Thessaloniki, Greece. Volume 8: 517-527. Tsagari, K. (2004). Alterna ve Methods of Assessment. In West, R. and Tsagari, K. Tes ng and Assessment in Language Learning, AGG65, Vol 3. Hellenic Open University: Patras. Tsagari, K. (2005). PorFolio assessment with EFL young learners in Greek State schools. In Pavlou, P. and Smith, K. (Eds.) Serving TEA to Young Learners: Proceedings of the Conference on Tes ng Young Learners. IATEFL and CyTEA: Israel: ORANIM – Academic College of Educa on. Tsagari, K. (2007). Inves ga ng the Washback Effect of a High-Stakes EFL Exam in the Greek context: Par cipants’ Percep ons, Material Design and Classroom Applica ons. Unpublished PhD thesis, Department of Linguis cs and English Language, Lancaster University, UK. Wall, D., and Alderson, J. C. (1993). Examining Washback: The Sri Lankan Impact Study. Language Tes ng, 10: 4169. Weigle, S. (2002) PorFolio Assessment. In Weigle, S. Assessing Wri ng, Cambridge: Cambridge University Press. Wiggins, G. (1994). Toward more authen c assessment of language performances. In Hancock, C. R. (Ed.) Teaching, Tes ng and Assessment: Making the Connec on. Lincolnwood, Illinois USA: Na onal Textbook Company. Williams, J. (2000). Implemen ng porFolios and student-led conferences. In ENC Focus, 7/2. Wolf, D., Bixby, J. Glenn, J. and Gardner, H. (1991). To use their minds well: Inves ga ng new forms of student assessment. Review of Research in Educa on, 17: 31-74. Wolfe, E. W., Chiu, C. W. T. and Reckase, M. D. (1999). Changes in secondary teachers’ percep ons of barriers to porFolio assessment. Assessing Wri ng 6: 85-105.

IATEFL TEASIG

117

Best of TEASIG Vol. 2


Se;ng real standards using authen c assessment in an EAP context Peter Davidson, Zayed University, United Arab Emirates and Chris$ne Coombe, Dubai Men’s College, United Arab Emirates Original publica on date: IATEFL TEASIG Conference Proceedings – Dubai 2010

Introduc on In order to determine ESL/EFL students’ readiness to study at an English-medium university or college, most ter ary ins tu ons rely solely on students aNaining a certain score on an interna onal benchmark test such as IELTS or the TOEFL. However, because language proficiency on its own is not a good predictor of success at university, there is a need for assessment protocols that measure other academic competencies to ensure that students have the skills required to study at university. This paper outlines a series of assessment tasks that integrate reading, wri ng and speaking into a series of authen c tests that have proved successful in determining whether or not EFL students have the necessary academic skills and competencies they will need in order to cope with the demands of baccalaureate study at an English-medium university.

Defining ‘authen c assessment’ In recent mes the term ‘authen c assessment’ has been used almost synonymously with performance-based or tasked-based assessment (Lloyd & Davidson, 2005). However, while an authen c assessment is likely to require the test candidates to carry out some kind of performance or task, a performance-based or a tasked-based assessment may not necessarily be authen c. The key requirement of an authen c assessment is that it requires the test-taker to demonstrate that they are able to complete a par cular task or tasks that resemble something that they are likely to have to do in the target situa on (Wiggins, 1990; O'Malley & Valdez Pierce, 1996; Mueller, 2003). As noted by Lund (1997: 25), “Authen c assessments require the presenta on of worthwhile and/or meaningful tasks that are designed to be representa ve of performance in the field … and approximate something the person would actually be required to do in a given se8ng.” An authen c assessment for a lifeguard, for example, would require them to demonstrate that they could actually rescue somebody from the water. Tradi onal versus authen c assessment Davidson (2005: 49) has iden fied how tradi onal and authen c assessments differ in terms of construct validity, content validity, scoring validity, and consequen al validity (see Table 1). Authen c assessment has the poten al to have greater construct validity than tradi onal tests because it u lizes tasks that are more genuine than tradi onal tests. For example, a tradi onal wri ng test typically requires candidates to write an essay on an unfamiliar topic under med condi ons. However, as noted by Perelman (2005), “Nowhere except on examina ons such as the SAT does an individual have to write so quickly on an unfamiliar topic … most college wri ng assignments involve planning, wri ng and rigorous revising.” Consequently, authen c assessment can be a more accurate measure of student's skills and abili es because it measures these skills and abili es in a more realis c and direct way. Reading a text under med condi ons and then answering a series of mul ple-choice ques ons about that text, for example, is an ac vity that is seldom done outside of the tes ng situa on. The indirect nature of tradi onal type tests inevitably results in construct irrelevance variance, or the contamina on of the measurement due to the tes ng of constructs that were not intended to be measured. For IATEFL TEASIG

118

Best of TEASIG Vol. 2


example, the construct validity of a test of listening ability may be undermined by ques ons that students are required to read, resul ng in a test that is also measuring students' reading skills. The nature of authen c assessment, on the other hand, allows the test writer to broaden the constructs to be measured. Table 1: Contras ng tradi onal and authen c assessment.

Tradi onal assessment

Authen c assessment

Construct validity limited construct validity

expanded construct validity

contrived

genuine

indirect

direct

recall / recogni on

construc on / applica on

construct irrelevance variance

construct relevance consistency

Content validity lower content validity

higher content validity

assesses a sample of constructs

assesses the whole construct

short

me-consuming

selec ng a response

performing a task

discrete skills

integrated skills

summa ve

forma ve

Scoring validity high scoring reliability

low scoring reliability

controlled tes ng environment

varied environmental test condi ons

objec ve

subjec ve

automated scoring

human raters

psychometric measures

criteria and standardiza on

norm-referenced

criterion-referenced

Consequen al validity nega ve washback

posi ve washback

score / grade

descrip on of performance

emphasis on passing a test

emphasis on comple ng a task

lower face validity

higher face validity

poor predictor of future performance

good predictor of future performance

IATEFL TEASIG

119

Best of TEASIG Vol. 2


Authen c assessment usually has more content validity than tradi onal tests because authen c assessment more comprehensively measures the content of what has been taught during a par cular course. Rather than assessing a sample of constructs which tradi onal tests measure, authen c assessment has the poten al to measure a greater range of constructs. As such, tradi onal tests are usually rela vely short, whereas authen c assessments are likely to take much longer as they require students to demonstrate a range of skills and abili es. Authen c assessment more effec vely lends itself to assessing integrated skills at the macro level, while tradi onal assessment is beNer suited to measuring discrete skills at the micro level. In other words, tradi onal tests are suitable for measuring lower-level cogni ve skills, whereas authen c assessments lend themselves beNer to measuring higher-order thinking skills. Furthermore, the limited number of ques on types that tradi onal assessment uses, such as mul ple-choice and true/false ques ons, invariably limits the number of constructs that can actually be assessed. In contrast, authen c assessments are likely to use a range of measures which provide students with opportuni es to showcase what they can actually do. Finally, whereas tradi onal assessment is usually summa ve in nature, authen c assessment is oPen forma ve as it is carried out throughout the dura on of a course, rather than at the end of a course. There are also a number of differences regarding the scoring validity of tradi onal and authen c assessment. With tradi onal assessment the tes ng environment is ghtly controlled to avoid differences in test scores from variables outside of the test-takers’ influence. Great effort is made to ensure that the marking of the test is as objec ve as possible, and oPen automated scoring is employed to reduce human error. Because tradi onal assessment only measures a sample of constructs, test writers employ an array of psychometric measures to determine the reliability and validity of a test before making inferences about what the test results actually mean. Tradi onal tests are oPen norm-referenced as a students' performance on a test is compared to how other students performed on that test. Authen c assessment, on the other hand, may produce lower scoring reliability than tradi onal type tests because of varied environmental test condi ons and the subjec vity of human raters. As with task-based assessment, when interpre ng the results of authen c assessment it is some mes difficult to extrapolate exactly what the results mean. In order to ensure valid and reliable authen c assessment, greater emphasis is placed on the wri ng of criteria, rater training, calibra on, and the monitoring of raters, and student performance is compared to the criteria rather than the performance of other students. Perhaps the most important difference regarding tradi onal and authen c assessment has to do with consequen al validity. Whilst we are all aware that the purpose of a test is to measure how well a student has met a par cular set of learning outcomes, we oPen forget that an addi onal, and vitally important, purpose of tes ng is to facilitate learning (Biggs, 1998; Black & Wiliam, 1998). Authen c assessment has a significantly more posi ve washback effect than tradi onal tests. Rather than focusing on scores and passing a test, authen c assessment focuses more on comple ng a realis c task that will have some future relevance for the students. When u lizing authen c assessment, the great transgression of ‘teaching to the test’ is not as problema c as the test embodies the learning outcomes of the course anyway. As a consequence of the authen city, relevance and comprehensiveness of authen c assessment, many teachers and students feel that this type of assessment has higher face validity than tradi onal assessment. Finally, because authen c assessment requires students to demonstrate their skills and abili es on tasks that simulate the types of ac vi es that they will be required to carry out in the future, authen c assessment is a much beNer predictor of future success than tradi onal assessment. A sample authen c test When designing authen c assessment to determine students' readiness to study in a baccalaureate program, it is important to consider what tasks the students will be required to perform once they are in the target situa on, that is, the baccalaureate program. An examina on of needs analyses conducted at numerous ter ary ins tu ons around the world reveals that in order to be successful, baccalaureate students need competence in the following seven basic academic areas:

IATEFL TEASIG

120

Best of TEASIG Vol. 2


understanding lectures

understanding academic texts

taking notes from lectures and academic texts

par cipa ng in academic discussions

conduc ng basic research

giving oral presenta ons

wri ng academic essays

As a result, at Zayed University we developed a series of mul -dimensional assessment tasks designed to determine whether or not a student was ready to meet the demands of studying in the university’s baccalaureate program (Davidson and Dalton, 2003). Listening to, and retelling, a lecture In this task, students listen to a 15 -17 minute lecture, take notes, and write a ‘retell’ of the lecture based on these notes. In order to make the test authen c, we aim to replicate the baccalaureate learning context as much as possible. Therefore, students are told the theme of the lecture the day before the lecture is given, and the lecture is delivered on a video casseNe. It is of a similar level to a lecture given at the beginning of a GE course, and at the end of the lecture students are encouraged to ask ques ons for up to ten minutes. Reading and Wri ng This assessment task requires students to read three to four short semi-academic texts and take notes, and write a 500-600 word essay based on these notes. In order to replicate the type of reading and wri ng tasks that students will need to do in the baccalaureate program, the texts are distributed to students four days before they are required to write their essay. The texts, which are usually adapted slightly, come from a range of sources including textbooks, journals, magazines, newspapers and the Internet, and they may include non-linear informa on. The rhetorical structure of the academic essay is either problem-solu on or argumenta ve. Students have one and a half hours to write their essay, and they are allowed to refer to the texts and their notes when they are wri ng it. Students are also encouraged to use their laptops to write their essay. This assessment task is authen c because it replicates what students will be expected to do in the baccalaureate program. Informa on literacy This assessment task ensures that students are equipped with the necessary informa on literacy skills that they will need in the baccalaureate program, an area that is oPen overlooked. Informa on literacy is the ability to “recognize when informa on is needed and have the ability to locate, evaluate, and use effec vely the needed informa on” (American Library Associa on, 1989: 1). It is crucial that students become informa on literate in order to cope with the wealth of informa on that is available in the informa on age (Kuhlthau, 1999). In this task students write a 1200-1500 word research paper suppor ng an argument through independent research, and defend this research in a 10-15 minute oral defense or viva. In order to make the task authen c, students decide on a topic, write a research proposal, select 5-10 electronic and print sources, take notes from these sources, conduct research using ques onnaires, surveys and interviews, and write up their research and submit a porFolio containing all their work. Students have two weeks to complete their research paper, and they are given only minimal help from faculty. This is a highly authen c assessment as students are required to demonstrate their ability to carry out a research project, something that they will be required to do once they are in the baccalaureate program.

IATEFL TEASIG

121

Best of TEASIG Vol. 2


Academic discussion Students are also required to par cipate in a 10-12 minute academic discussion with four other students on a topic that has previously been studied (Davidson & Hobbs, 2010). In order to replicate the type of speaking tasks that students will need to do in the baccalaureate program, and in order to prepare them for the academic discussion, four to five semi-academic texts are distributed to students four days before they are required to par cipate in the academic discussion. The texts come from a range of sources including textbooks, journals, magazines, newspapers and the Internet, and they are usually adapted to ensure that they are not as long or as difficult as the type of reading texts students get in the Reading and Wri ng task. The texts typically present arguments for or against a par cular topic, and on the day of the discussion a point of view to be argued is assigned to each candidate ten minutes before the discussion begins. Students can use the texts and their notes while they are preparing for the discussion and during the discussion itself. It could be argued that assigning students a point of view to argue is not very authen c, but it was a necessary concession to ensure that students actually engaged in an academic discussion. Conclusion Whilst some ins tu ons rely solely on a specific score on an interna onally recognized exam to determine ESL/EFL students' entry into an English-medium university, it has been argued in this ar cle that addi onal assessment methods are necessary. In order to comprehensively gauge a student’s readiness for effec ve par cipa on in a baccalaureate level program, mul -dimensional authen c assessment tasks like those outlined above are needed to ensure that students not only have sufficient language, but also the necessary academic study skills, informa on literacy skills, and informa on technology skills they will need to cope with the demands of baccalaureate study. Curriculum designers and administrators can be assured that once a student has demonstrated competency in mastering the tasks inherent in the type of authen c assessment outlined above, they will be ready for baccalaureate study. The main advantages of authen c assessment are that it has the poten al to produce broader and more precise measurements of student performance, and that it can facilitate learning through a beneficial washback effect and the use of diagnos c feedback. This is not to argue against the use of tradi onal assessment, but rather to promote the use of authen c assessment to augment, rather than totally replace, tradi onal assessment prac ces.

References American Library Associa on Presiden al CommiNee on Informa on Literacy. (1989). Final Report. Chicago: Author. (ED 316 074) Biggs, J. (1998). Assessment and Classroom Learning: a role for summa ve assessment? Assessment in Educa on, 5/1: 103-110. Black, P. and Wiliam, D. (1998). Assessment and Classroom Learning. Assessment in Educa on, 5/1: 7-74. Davidson, P. (2005). Using authen c assessment to inform decisions on ESL/EFL students' entry into Englishmedium university. In Davidson, P., Coombe, C. and Jones, W. (Eds). Assessment in the Arab World, (pp. 47-59). Dubai: TESOL Arabia. Davidson, P. and Dalton, D. (2003). Mul ple-measures assessment: Using 'visas' to assess students' achievement of learning outcomes. In Coombe, C. A. and Hubley, N. (Eds). Case Studies in TESOL Prac ce Series: Assessment Prac ces, (pp. 121-134). Virginia: TESOL. Davidson, P. and Hobbs, A. (2010). Authen c academic speaking assessment. In Jendli A., Coombe, C., and Miled, N. (Eds). Developing Oral Skills in English: Theory, Research and Pedagogy (pp. 223-248). Dubai: TESOL Arabia. Fitzpatrick, K. A. (1995). Leadership Challenges of Outcome-Based Reform. Educa on Digest 60/5: 13-17.

IATEFL TEASIG

122

Best of TEASIG Vol. 2


Kuhlthau, C. C. (1999). Literacy and Learning for the Informa on Age. In Stripling, B. K. (Ed). Learning and Libraries in an Informa on Age: Principles and Prac ce, (pp 3-21). Englewood: Libraries Unlimited. Lloyd, D. and Davidson, P. (2005). Task-based integrated-skills assessment. In Lloyd, D., Davidson, P. and Coombe, C. (Eds). The Fundamentals of Language Assessment: A Prac cal Guide for Teachers in the Gulf, pp. 157-166. Dubai: TESOL Arabia. Lund, J. (1997). Authen c assessment: Its development and applica ons. Journal of Physical Educa on, Recrea on and Dance, 68/7: 25-28, and 40. Mueller, J. (2003). Authen c Assessment Toolbox. Retrieved on 5 June 2004 from: hNp:// jonathan.mueller.faculty.noctrl.edu/toolbox/index.htm. O'Malley, J. M. and Valdez Pierce, L. (1996). Authen c Assessment for English Teachers: Prac cal Approaches for Teachers. London: Addison-Wesley. Perelman, L. (2005). New SAT: Write long, badly and prosper. LA Times, May 29, 2005. Wiggins, G. (1990). The case for authen c assessment. Eric Digests 328611. Retrieved on 5 June 2004 from: hNp:// www.ericfacility.net/databases/ERIC_Digests/ed328611.html.

IATEFL TEASIG

123

Best of TEASIG Vol. 2


Bea ng chea ng – a compendium of causes and precau ons Greg Grimaldi, Sabancı University School of Languages, Turkey (now Cardiff University ELP, Wales) Original publica on date: IATEFL TEASIG Conference Proceedings – Dubai 2010

Copying and other forms of chea ng in exams, and plagiarism and fabrica on in coursework, appear from media coverage to be growing problems. How do they occur? Why? And what are effec ve deterrents? This talk referred to news ar cles to represent trends in chea ng before outlining results of a teacher/student survey on why students may cheat and a checklist for judging and minimising the scale of problems. Trends Mobile phones have recently exacerbated chea ng in exams. Concerns abound that students may use drugs such as Ritalin to op mise exam performance, just as athletes may use steroids to enhance their physique. However, notes remain the most common tool of exam cheats. ANempts to counter chea ng include technological innova ons such as security-tagged exam papers and others containing microtext. An -plagiarism soPware is available, and some schools even install cameras in exam rooms to review suspect test-takers. Penal es for failing to follow exam procedures range in Britain from a warning to being barred from taking any more exams that year. Fines for chea ng in Kenya were raised fiPyfold in 2007. Parents and teachers have been jailed in China for conspiring to aid students, and an examina ons board head in Africa received a 14-year jail term for distribu ng exam papers. Bri sh police point out the obvious dangers inherent in illicitly aiding students in the wriNen part of their driving test. Plagiarism has problema sed the increase in non-exam-based assessment over the last few decades and aNracted more aNen on with the arrival of the internet, which provides countless and a growing number of resources on which to draw. It also seems to be encouraged by a lack of adequate supervision and support as much as by students’ a8tudes themselves. Thirdly, technology for storing and reproducing works may have led to a relaxa on of cultural a8tudes towards copyright infringement, with a consequent diminishment in the seriousness with which plagiarism is viewed: notable accusa ons of plagiarism have been recorded in recent years in the fields of literature, journalism, poli cs, and music. Schools are aNemp ng to deter its occurrence by revising policies on academic integrity, beNer supervision (or in the case of lecturers, peer review) and expulsion. Finally, fabrica on of research findings has hit the headlines in recent years, with stem cell and cancer research notably afflicted. In addi on, an Iranian minister was dismissed last year for faking an honorary doctorate from Oxford University. Reasons Apart from the role of technology in facilita ng informa on-gathering and storage as well as communica ons, intensifying compe on for school and university places and jobs may be one reason for the apparent swelling in media interest in chea ng. A university educa on for a sizeable minority of the popula on may be an explicit government objec ve, linking educa on to economic success, whereas in previous decades skilled workforces may have possessed a range of levels of educa on. Impossible-seeming odds may encourage despera on: 775,000

IATEFL TEASIG

124

Best of TEASIG Vol. 2


exam entrants baNled for 13,500 government jobs in China last year: 57 test-takers for each posi on. The stakes are also higher: dispari es in income are widening, and graduates of the top universi es are far more likely than anyone else to gain the best-paid and most influen al jobs. In Turkey, a university graduate is also currently four mes less likely to be unemployed than someone who did not aNend university. The percep on that only a top educa on will do is becoming reflected further and further down the school system. Schools may be held more accountable for their results by governments for whom educa on is a poli cal baNleground, and children are given na onal tests at young ages (7 in England), on the basis of which parents may choose schools. Primary-school children are loaded with homework and sent for extra classes to prep for exams from the age of 10 by parents anxious to get them into the best secondary schools, in order to give them the best opportunity to enter the best universi es. In their aNempts to get the best for their children, parents may put children under pressure to achieve at all costs. A cultural trend towards challenging authority and norms can also suggest why chea ng may be becoming more acceptable. Rather than simply ‘toeing the line’, children and parents want to hear jus fica ons for procedures, and are likely not to respect them if ra onales are unforthcoming or incompa ble with their personal needs. In individualis c socie es, opera ng on the limits of legi macy in order to find a compe ve edge may be preferable to fair play on a level playing field. Addi onally, the upsurge in fee-paying pupils inevitably leads to demand for increased customer sa sfac on, so that some parents may mistakenly believe they are en tled to a qualifica on when purchasing an educa on. However, if tests are used in ways they were not designed for, to support an apparently unfair system, aNempts at chea ng may be more easily comprehensible. A third of the first 100 finishers in a Chinese marathon last year were found to have cheated by accep ng liPs, using fake IDs, etc. Their aim? To benefit from a points bonus awarded to promising athletes by the higher educa on admissions system to boost their chances in applying to top universi es there. These challenges to authority may be escalated by modern mass cultural intermingling. Non-na ves can face rules and norms which seem illogical or needless; one not brought up within a culture or system is oPen less oblivious to its irra onali es and inconsistencies. With significant influxes of immigrants, expatriates, and their offspring living in different countries, they or their opinions, methods and influence may more frequently challenge the norms in the countries they have adopted. Survey results My (largely mul ple-choice) survey was completed anonymously by 45 students (at approximately CEFR A2-level) and 12 English language instructors in prepara on English classes at a private university in Istanbul, Turkey. The students were all Turkish, and around half had only ever been in state educa on before university; very few had been educated en rely privately. The teachers were also mostly Turkish (a few Westerners par cipated), with a similar (mostly state) educa onal background, who had spent the majority of their careers teaching in Turkey. Although the numbers were limited, some trends could be discerned (others, however, remained hazy): 1.

Student respondents had less nuanced views than teachers on why students would cheat. Teachers found plausible suggested causes such as ‘to compensate for lack of academic ability’, ‘the assessment is simply too important not to fail in’, ‘others seem to be doing it’, ‘not understanding or accep ng why it is unethical’, ‘to get a passing rather than a failing grade’ and ‘pressure from family to pass’, whereas only the laNer two reasons were significant to students.

2.

Students believed slightly more than teachers that students had posi ve reasons for acceptable academic conduct (such as feeling that ‘academic dishonesty would be wrong or unfair’), rather than being only deterred by punishments.

IATEFL TEASIG

125

Best of TEASIG Vol. 2


3.

Some forms of unfair conduct, such as feigning illness or handing in work late without penalty appeared more acceptable to students than teachers, perhaps because it is harder to perceive a direct, obvious effect on marks from these than from, say, copying a beNer student’s paper.

4.

The students, on the whole, appeared to view academic dishonesty more benignly than the teachers, 66% feeling ‘sorry’ for a student who was caught; 73% said that they would view it is as a ‘mistake’, rather than evidence of ‘dishonesty’ (43%), whereas around half the teachers selected each of these terms. The number of students who stated that they would feel nothing at all was the same as the propor on who imagined only short-lived resentment, and together they numbered twice as many as those claiming they would confront their classmates. Only 1 in 9 claimed they would inform a teacher, although 1 in 4 did say they would resent the examiners for not catching the cheat.

Precau ons I compiled a checklist of issues which may affect academic integrity in language ins tu ons. Categories of factors which may compromise academic integrity include: culture; the na onal educa on system; the parent ins tu on; aspects of the language school or programme itself – including the director, the examiner-teachers, the assessment system, and par cular assessments; parents; and students. There is liNle that most of us can do about the first two. To summarise the others: ins tu ons, directors, and teachers probably need to be at least largely independent of their students’ success (financially, psychologically and in terms of performance evalua ons) to successfully baNle grade warping. I shall concentrate for the remainder of this ar cle on two areas of importance. Invigila on As it appears so simple, it sounds patronising, but teachers need training in how to invigilate. Many students are more experienced in taking exams than many teachers are in invigila ng them. The ingenuity of determined cheats should not be underes mated. Requests for clarifica on, toilet breaks, the me, ven la on etc. are all poten ally suspicious. In addi on, some teachers resent invigila on, which is seen as an addi onal burden rather than as part of their workload, and/or believe that a quick glance up from an exam paper, book, or computer is enough to maintain control. Invigila on is a hideously tedious ac vity, and all but disciplined minds will, consciously or otherwise, wander aPer alterna ve ways to pass the me. And it is not simply a case of the more the beNer: deploying too many invigilators risks all those present feeling surplus to requirements, so that quality of supervision may in fact decrease: for nearly five minutes of one listening exam I was kindly permiNed to observe recently, not one of the eight instructors present (three more had popped out temporarily) was watching the 130 test-takers in the room. Jus fica on Finally, we need to communicate clearly to students the reasons why they are not allowed to copy, plagiarise, or fabricate, rather than rely on our authority to discourage poten al miscreants – or simply labelling them as such resignedly if they are discovered. The new genera on of students is unlikely to be persuaded by clichés such as the following: •

“Exam cheats are only chea ng themselves.” (Clearly, if they remain uncaught, it’s exactly the opposite which is true.)

“You have to do it by yourself.” (Says who? Isn’t teamwork and social-mindedness normally encouraged? In many cultures, helping a brother or sister in difficulty is far more important than abiding by such a demand.)

“It’s not fair and it’s not right.” (Life’s not fair, and don’t most people in many countries of our globalising world know it. Bea ng the system may be a cause for celebra on rather than condemna on.) IATEFL TEASIG

126

Best of TEASIG Vol. 2


“Academic theP is s ll theP.” (Well, not really: with most theP the person stolen from loses something, whereas academic theP – especially by students – rarely relieves or deprives the original ‘owner’ of anything.)

Instead, we could try couching the necessity for academic integrity in the following terms: •

“How would you like it?” – Depict students’ losing out to unfairly advantaged classmates, and invite them to support your preven ng this.

“Your future depends on it.” – If they don’t gain the skills tested, they will be ill-equipped for the future; if they pass without them, they will only fail later.

“Our future depends on it.” – Your school’s reputa on depends on its standards: if students cheat in assessments, the qualifica ons gained lose value in the eyes of prospec ve employers, further educa on ins tu ons, etc.

“Others’ futures depend on it.” – Some mes, a qualifica on indicates exper se in areas of life-or-death decision-making: medical qualifica ons, engineering, drivers’ and pilots’ licences, and the like. The case is harder to make for language qualifica ons, but the principle can be established.

In summary, technological developments together with economic and societal changes render academic integrity an ongoing concern. Ins tu ons can combat this through educa ng students and teachers, inves ng in resources and training, admi8ng the issue, aNemp ng to understand it, and deterring it in proac ve yet produc ve ways rather than simply blaming its occurrence on incorrigible defects in the characters of those they catch.

References Andalo, D. (2007). Watchdog expresses concern over exam cheats. The Guardian [online]. Retrieved on 3 November 2020 from: hNps://www.theguardian.com/educa on/2007/mar/16/schools.uk BBC (2008a). Egypt school exam cheats jailed. BBC News [online]. Retrieved on 3 November 2020 from: hNp:// news.bbc.co.uk/1/hi/world/middle_east/7606062.stm BBC (2008b). Kenya exam cheats face big fines. BBC News [online]. Retrieved on 3 November 2020 from: hNp:// news.bbc.co.uk/1/hi/7625189.stm Branigan, T. (2009). Boom in exam cheats baNling for China’s top jobs. The Guardian [online]. Retrieved on 3 November 2020 from: hNps://www.theguardian.com/world/2009/jan/21/china-exam-chea ng-top-jobs Branigan, T. (2010). Thirty runners disqualified from Chinese marathon for chea ng. The Guardian [online]. Retrieved on 3 November 2020 from: hNps://www.theguardian.com/world/2010/jan/21/disqualified-chinesemarathon Cauvin, H. E. (2000). Cancer Researcher in South Africa Who Falsified Data Is Fired. The New York Times [online]. Retrieved on 3 November 2020 from: hNps://www.ny mes.com/2000/03/11/us/cancer-researcher-in-south-africawho-falsified-data-is-fired.html Corbyn, Z. (2010). Newcastle goes back to basics to avoid plagiarism. Times Higher Educa on [online]. Retrieved on 3 November 2020 from: hNps://www. meshighereduca on.com/news/newcastle-goes-back-to-basics-to-avoidplagiarism/410267.ar cle

IATEFL TEASIG

127

Best of TEASIG Vol. 2


Finder, A. (2007). 34 Duke Business Students Face Discipline for Chea ng. The New York Times [online]. Retrieved on 3 November 2020 from: hNps://www.ny mes.com/2007/05/01/us/01duke.html Foster, P. (2009). Chinese parents jailed in hi-tech chea ng scam. The Telegraph [online]. Retrieved on 3 November 2020 from: hNps://www.telegraph.co.uk/news/worldnews/asia/china/5099593/Chinese-parents-jailed-in-hi-techchea ng-scam.html Garner, R. (2008). Exam chiefs turn to Bond-style gadgets to defeat the cheats. The Independent [online]. Retrieved on 3 November 2020 from: hNps://www.independent.co.uk/news/educa on/educa on-news/exam-chiefs-turn-tobond-style-gadgets-to-defeat-the-cheats-834590.html Goss, P. (2007). How technology killed the pub quiz. MSN UK [online]. No longer available. McShannon, J. (2002). Advanced-level complaints. The Guardian [online]. Retrieved on 3 November 2020 from: hNps://www.theguardian.com/educa on/2002/nov/05/tefl.furthereduca on NY Times (2002). Plagiarism Inves ga on Ends at Virginia. The New York Times [online]. Retrieved on 3 November 2020 from: hNps://www.ny mes.com/2002/11/26/us/plagiarism-inves ga on-ends-at-virginia.html Obi, I. and Ewuzie, K. (2010). Research by proxy in Nigeria’s Ivory Towers. BusinessDay [online]. No longer available. Parker, Q. (2008). Iranian minister sacked over fake degree. The Guardian [online]. Retrieved on 3 November 2020 from: hNps://www.theguardian.com/educa on/2008/nov/04/oxforduniversity-highereduca on-iran Paton, G. (2010). More pupils chea ng in exams, says OfQual. The Telegraph [online]. Retrieved on 3 November 2020 from: hNps://www.telegraph.co.uk/educa on/educa onnews/7139336/More-pupils-chea ng-in-exams-saysOfqual.html People’s Daily (2006). University head says China’s academic ethics at rock boNom. China Digital Times [online]. Retrieved on 3 November 2020 from: hNps://chinadigital mes.net/2006/07/university-head-says-chinas-academicethics-at-rock-boNom/ Reuters (2007). Bri sh students warned: Exam cheats will be caught. Reuters Life! [online]. Retrieved on 3 November 2020 from: hNps://www.reuters.com/ar cle/us-britain-exams-cheats-idUSL1158468020070511 Sample, I. (2008). Exam chea ng alert over brain drugs. The Guardian [online]. Retrieved on 3 November 2020 from: hNps://www.theguardian.com/science/2008/may/22/drugs.medicalresearch Sang-Hun, C. (2009). Disgraced Cloning Expert Convicted in South Korea. The New York Times [online]. Retrieved on 3 November 2020 from: hNps://www.ny mes.com/2009/10/27/world/asia/27clone.html Shepherd, J. (2008). Universi es review plagiarism policies to catch Facebook cheats. The Guardian [online]. Retrieved on 3 November 2020 from: hNps://www.theguardian.com/educa on/2008/oct/31/facebook-chea ngplagiarism-cambridge-varsity-wikipedia Smith, J. (2009). Sneaking notes s ll exam cheats’ most common ruse. NZ Herald [online]. Retrieved on 3 November 2020 from: hNps://www.nzherald.co.nz/nz/news/ar cle.cfm?c_id=1&objec d=10586077 Telegraph (2007). Why exam cheats must try harder. The Telegraph [online]. Retrieved on 3 November 2020 from: hNps://www.telegraph.co.uk/news/uknews/1556219/Why-exam-cheats-must-try-harder.html Townsend, M. and Hudson, M. (2004) Universi es declare war on the copycat exam cheats. The Observer [online]. Retrieved on 3 November 2020 from: hNps://www.theguardian.com/uk/2004/jun/20/ administra on.highereduca on Turner, D. (2010). School exam cheats increase. Financial Times [online]. Retrieved on 3 November 2020 from: hNps://www.P.com/content/f56fcc3e-10d0-11df-975e-00144feab49a Woolcock, N. (2003). Man jailed for helping driving test cheats. The Telegraph [online]. Retrieved on 3 November IATEFL TEASIG

128

Best of TEASIG Vol. 2


2020 from: hNps://www.telegraph.co.uk/news/uknews/1437782/Man-jailed-for-helping-driving-test-cheats.html Yahoo! (2006a). Bosnian clampdown on cheats sparks student complaints. Yahoo! UK [online]. No longer available. Yahoo! (2006b). Students who cheat could be taken to court in China. Yahoo! UK [online]. No longer available. Zaman (2009). 1.3 million students sit for university entrance exam. Today’s Zaman [online]. No longer available. Copies of the ar cles no longer available online are available from the author GrimaldiG1@cardiff.ac.uk on request.

IATEFL TEASIG

129

Best of TEASIG Vol. 2


Assessment literacy for the English language classroom Glenn Fulcher, University of Leicester, UK Original publica on date: IATEFL TEASIG Newsle er September 2010

New assessment needs The first decade of the 21st century has seen a phenomenal increase in the tes ng and assessment responsibili es placed upon language teachers. This is primarily due to the increased use of tests and assessments for the purposes of accountability. Malone (2008) iden fies the No Child LeP Behind legisla on in the United States and the widespread implementa on of the Common European Framework of Reference (CEFR) in Europe as major change factors. This is not to say that tests have not always been tools by which policymakers seek to control educa on. However, nowadays we are more aware than ever of the power of tests (Shohamy, 2001), and the way in which they are used in poli cal systems to manipulate the behaviour of teachers and hold them accountable for much wider policy goals (Fulcher, 2009), including na onal economic development. Brindley (2008) notes that it is the externally mandated nature of tests that make them such aNrac ve poli cal tools. The term ‘assessment literacy’ (S ggins, 1991) has become accepted to refer to the range of skills and knowledge that a range of stakeholders need in order to deal with the new world of assessment into which we have been thrust. Yet there is liNle agreement on what ‘assessment literacy’ might comprise.

Issues in assessment literacy The range and number of stakeholders who require a level of assessment literacy has grown. Taylor (2009) includes university admissions officers, policymakers and government departments, and teachers. Yet there are few textbooks and training materials available for non-specialists. Echoing Brindley (2001: 127), Taylor (2009: 23) argues that most available textbooks are “...highly technical or too specialized for language educators seeking to understand basic principles and prac ce in assessment”. Davies (2008) calls the older texts a ‘skills + knowledge’ approach to assessment literacy. ‘Skills’ refers to the prac cal know-how in test analysis and construc on, and ‘knowledge’ to the “relevant background in measurement and language descrip on (ibid., 328). He argues that what is missing is a focus upon ‘principles’. The earliest aNempt to define assessment literacy for teachers was produced by the American Federa on of Teachers (1990), although the term ‘assessment literacy’ was not in use at the me. The competencies included selec ng assessments, developing assessments for the classroom, administering and scoring tests, using scores to aid instruc onal decisions, communica ng results to stakeholders, and being aware of inappropriate and unethical uses of tests. It can be seen that Davies’ no on of ‘principles’ was present in this early document, although there is liNle evidence of it having impacted upon the teaching of language tes ng, either in the textbooks of the me, or the courses as surveyed by Bailey and Brown (1996). Brindley (2001) was the first language tester to visit the topic of assessment literacy. He argued for a focus on ‘curriculum-related assessment’. Perhaps most importantly, Brindley recognizes that teachers work within me and resource constraints, and urges tes ng educators to recognize that they must develop flexible approaches to their assessment needs. These recommenda ons prefigure Davies’ concern for an expansion of the tradi onal content of books on language tes ng to meet the emerging needs of the 21st century.

IATEFL TEASIG

130

Best of TEASIG Vol. 2


Research Bailey and Brown (1996) and Brown and Bailey (2008) looked at the content of language tes ng programmes and the textbooks used, discovering that liNle had changed over the decade of the research. Plake and Impara (1996) report on a survey of assessment literacy in the United States, designed to measure teachers’ knowledge of the components of the American Federa on of Teachers Standards (1990). Using a 35-item test, the researchers discovered that on average teachers were responding correctly to just 23.2 items, which they argue shows a low level of literacy. Hasselgreen, Carlsen and Helness (2004) and Huhta, Hirvala and Banerjee (2005) conducted a survey at the European level that was designed to uncover the assessment training needs of teachers. The research uncovered the following needs: porFolio assessment, preparing classroom tests, peer and self-assessment, interpre ng test results, con nuous assessment, giving feedback on work, validity, reliability, sta s cs, item wri ng and item sta s cs, interviewing and ra ng. There is a key problem with exis ng studies of assessment literacy. Even when they have focused directly upon teachers, they have used closed-response items that tend to elicit similar responses from all teachers, and do not give them the opportunity to voice needs in their own words.

The current study Funded by the Leverhulme Trust, a survey instrument was developed in early 2009, and was designed to overcome some of the methodological problems in previous studies. The ques onnaire (see Appendix) was delivered online using Lime Survey SoPware (hNp://www.limesurvey.org/). Data was collected between June and September 2009, and in total there were 278 responses. Descrip ve data showed that these were both geographically and linguis cally diverse, represen ng a reasonable spread of language teachers in North America, Europe, Australia and Asia.

Findings and discussion Sta s cal data showed that respondents generally thought that all the listed subjects were important for assessment literacy, and so the sta s cs are not presented here. The quan ta ve data does not provide useful informa on, as also appears to be largely the case with the European study (Hasselgreen et al.). However, the present study included a large number of open-ended items that elicited interes ng qualita ve data. Ques ons 2 and 3 were asked before the respondents were exposed to the closed-response op ons, and so were likely to generate the most open, unaffected responses from teachers. These two ques ons were designed to get teachers to compare their experience of language tes ng training with their percep on of what they s ll need in order to effec vely use assessment in their current posts. Responses to Ques on 2 from teachers reflected the flipside of the findings of Brown and Bailey (2008): irrespec ve of where they are in the world, teachers report having studied the same language tes ng topics that teacher-trainers report teaching, with a par cular emphasis upon cri quing language tests. However, the responses to Ques on 3 revealed a different perspec ve. The most frequently men oned topic was that of sta s cs. The emphasis, however, was not upon simple calcula on of basic test sta s cs, but upon developing a conceptual understanding of the sta s cs: Why are we doing this? What do we assume about the way languages are learned? In short, sta s cs need to be embedded within a larger narra ve that relates them to their historical context, and a philosophy of language and measurement. Ques on 5 invited teachers to cri que a language tes ng textbook which they had last used, but the responses are not presented here. Of more importance are the responses to Ques on 6 which was designed to capitalize on the reflec ons elicited in Ques on 5 to discover if teachers had ideas for the content of training materials that were not provided by exis ng textbooks. Perhaps the most important finding is that teachers consider the design of tests and

IATEFL TEASIG

131

Best of TEASIG Vol. 2


assessments, especially for use in their own classrooms, as an ongoing design process. What is required, many argued, is a text that explains that process and how it is followed through in both standardized and classroom assessment contexts. The second area of interest was in the social impact of tes ng, par cularly high-stakes tes ng which impacts upon the lives of teachers. Many respondents wanted to see a treatment of the poli cs and economics of tes ng, par cularly cri quing the role of test providers. This was associated with the purpose of tes ng – why, how and when tes ng and assessment should (or should not) take place, and the ethical issues surrounding the use of test scores. A historical context was also seen by some as highly relevant to understanding the emergence of many tes ng prac ces.

Conclusion This survey has shown that language teachers are very much aware of a variety of assessment needs that are not currently catered for in exis ng materials designed to improve assessment literacy. Of par cular importance is the finding that so many of the respondents recommended that principles be embedded and elucidated within a procedural approach to dealing with the prac cal nuts-and-bolts maNers of building and delivering language tests and assessments. Furthermore, it was recommended that this procedural approach should treat large-scale and classroom-based assessment in a much more balanced way, avoiding the tendency to focus upon the former. The deliverables from the project are a new text for the teaching of language tes ng (Fulcher, 2010), which is linked to an evolving electronic resource (hNps://languagetes ng.info/). Acknowledgements I am grateful to the Leverhulme Trust for the award of a Leverhulme Research Fellowship during 2009 to conduct the research described in this brief report. I am equally grateful to the University of Leicester, which granted me a sabba cal.

Editor’s note: in the original publica on the author offered to provide references on request.

Appendix Q1. [Designed to confirm that respondents are, or have been, language teachers]

Q 2. When you last studied language tes ng, which parts of your course did you think were most relevant to your needs? Q3 . Are there any skills that you s ll need?

Q4. Please look at each of the following topics in language tes ng. For each one please decide whether you think this is a topic that should be included in a training course on language tes ng. Indicate your response as follows: 5 = essen al, 4 = important, 3 = fairly important, 2 = not very important, 1 = unimportant

IATEFL TEASIG

132

Best of TEASIG Vol. 2


• History of language tes ng 1 2 3 4 5 • Procedures in language test design 1 2 3 4 5 • Deciding what to test 1 2 3 4 5 • Wri ng test specifica ons/blueprints 1 2 3 4 5 • Wri ng test tasks and items 1 2 3 4 5 • Evalua ng language tests 1 2 3 4 5 • Interpre ng scores 1 2 3 4 5 • Test analysis 1 2 3 4 5 • Selec ng tests for your own use 1 2 3 4 5 • Reliability 1 2 3 4 5 • Valida on 1 2 3 4 5 • Use of sta s cs 1 2 3 4 5 • Ra ng performance tests (speaking/wri ng) 1 2 3 4 5 • Scoring closed-response items 1 2 3 4 5 • Classroom assessment 1 2 3 4 5 • Large-scale tes ng 1 2 3 4 5 • Standard se8ng 1 2 3 4 5 • Preparing learners to take tests 1 2 3 4 5 • Washback on the classroom 1 2 3 4 5 • Test administra on 1 2 3 4 5 • Ethical considera ons in tes ng 1 2 3 4 5 • The uses of tests in society 1 2 3 4 5 • Principles of educa onal measurement 1 2 3 4 5

Q5. Which was the last language tes ng book you studied or used in class? What did you like about the book? What did you dislike about the book? Q6. What do you think are essen al topics in a book on prac cal language tes ng? Q7. What other features (e.g. glossary/ac vi es etc) would you most like to see in a book on prac cal language tes ng?

Q8. Do you have any other comments that will help me to understand your needs in a book on prac cal language tes ng?

IATEFL TEASIG

133

Best of TEASIG Vol. 2


Q9. How would you rate your knowledge and understanding of language tes ng? 5 = very good 4 = good 3 = average 2 = poor 1 = very poor Q10. And now, just a few quick ques ons about you. Are you male or female? Female | Male Q11. What is your age range? •

Under 20

21 - 25

26 - 30

31 - 35

36 - 40

41 - 45

46 - 50

51 - 55

56 - 60

61 – 65

Above 65

Q12. Please select your current educa onal level. •

High School Graduate

BA degree

MA degree

Doctorate

Other

Q13. Which is your home country? Which country do you currently live or study in? Q14. Which language do you consider your first language? What other languages do you speak?

Thank you for comple ng this survey.

IATEFL TEASIG

134

Best of TEASIG Vol. 2


A cogni ve processing approach towards defining reading comprehension Cyril Weir (†), University of Bedfordshire, UK and Hanan Khalifa, Cambridge Assessment, University of Cambridge, UK Original publica on date: IATEFL TEASIG Newsle er September 2010

Acknowledgment: This is based on an original ar cle published in Cambridge ESOL’s Research Notes issue 31, pp 210, available online at hNps://www.cambridgeenglish.org/Images/23150-research-notes-31.pdf

Introduc on In this ar cle we focus on a cogni ve processing approach as a theore cal basis for evalua ng the cogni ve validity of reading tests. This approach is concerned with the mental processes readers actually use in comprehending texts when engaging in different types of real-life reading. However, we first start by a brief review of other approaches that aNempted to establish what reading comprehension really involves. A factorial approach to defining reading comprehension From the 1960s onwards there has been a strong interest in the issue of the divisibility of reading for tes ng purposes. In pursuit of this divisibility, hypothesis-tes ng researchers oPen adopted a purely quan ta ve approach to establishing what reading is by a post hoc, factorial analysis of candidate performances in reading tests. This methodology tells us whether the different reading items we have included in our tests load on the same factor. Davis (1968) provides an early example of empirical research into the factors contribu ng to successful test performance. He employed eight subtests designed to measure dis nct opera ons. When applying factor analysis, five factors showed appreciable percentages of unique variance and were consistent across the test forms, which made him argue that “comprehension among mature readers is not a unitary mental opera on” (Davis, 1968: 542). The factors were: recalling word meanings, drawing inferences, recognising a writer’s purpose/a8tude/tone, finding answers to explicit ques ons, and following the structure of a passage. Using factor analysis on engineers' reading performance, Guthrie and Kirsch (1987) iden fied two factors. Firstly, reading to comprehend, which involves reading carefully to understand the explicitly stated ideas, was clearly differen ated from tasks involving reading to locate informa on which required selec ve sampling of text (see Weir et al., 2000 for similar findings). However, these findings in favour of divisibility of the reading construct are not shared by other researchers. Rosenshine’s (1980) review of factor analy c empirical studies suggests that different analyses yielded different unique skills. Even though some skills emerged as separate, the results were not consistent across the studies, which led him to conclude by saying “at this point, there is simply no clear evidence to support the naming of discrete skills in reading comprehension” (Rosenshine, 1980: 552). Schedl et al. (1996) looked at the dimensionality of TOEFL reading items specifically in rela on to “reasoning” (analogy, extrapola on, organisa on and logic, and author’s purpose/a8tude) as against other types (primarily items tes ng vocabulary, syntax and explicitly stated informa on). Their study did not support the hypothesis that the “reasoning” items measured a separable ability factor.

IATEFL TEASIG

135

Best of TEASIG Vol. 2


Limita ons of the factorial approach The factorial approach focuses on the separability of the capabili es that a reader is assumed to need in order to tackle certain test items, rather than on the actual processes which a reader might be expected to apply in realworld reading. The concern in this psychometrically driven approach is thus not with the actual components of the reading process that are necessary for comprehension, but with the factors which can be shown sta s cally to contribute to successful performance. The approach might be described as focusing upon a product in the form of the outcome of a test rather than upon the process which gave rise to it. Many of these post hoc quan ta ve studies are limited to the extent they do not test the range of types of reading (careful and expedi ous), nor do they consider the need to shape reading to the reader’s goals, or the level of cogni ve demand imposed on the reader by a par cular task. Given the aim of evalua ng the cogni ve validity of reading tests, an approach premised solely on a post hoc factorial analysis of reading tests seems problema c. Weir (2005: 18) cau ons against relying on this procedure for construct valida on as “sta s cal data do not in themselves generate conceptual labels”. Field (in prepara on) echoes this posi on, poin ng to the “dangers of relying exclusively on an approach that aNempts to track back from a product or outcome to the process that gave rise to it.” Such analyses by their nature tell us liNle about what is actually happening when a reader processes text under test condi ons. We need to go deeper and examine as far as is possible the nature of the reading ac vi es in which we engage during a test in such a way as to enable comparison with ac vi es occurring during non-test reading. We argue below that in respect of cogni ve validity, we need to establish clearly the types of reading we wish to include.

Informed tui on: a subskills approach to defining reading comprehension The kind of factorial approach described above emerged during a me when the climate of opinion in the methodology of teaching reading strongly favoured what was termed a subskills approach. Like the factorial approach employed by researchers, it assumed that reading might be subdivided into the competencies which the skilled reader is believed to have. In L2 pedagogy, the development of the subskills movement (Grellet, 1987; Munby, 1978; NuNall, 1996) aimed to break reading down into its cons tuent competencies. This arose in large part from the need to develop communica vely-oriented pedagogical syllabuses and the need felt by teachers to provide more focused prac ce in the skill as an alterna ve to relying on more and more general reading. The approach has mainly been based on informed intui on rather than empirical research, but has been found to be useful by a genera on of teachers. As a result, it has figured prominently in EFL reading materials for teaching purposes and test specifica on (see Williams & Moran, 1989). It became accepted pedagogical prac ce to break the reading process down and to address component skills separately. In the field of tes ng, the subskills approach has given rise to the no on that it is possible to link par cular item or task types to specific subskills that they are said to tap into. A growing body of literature (e.g. Bachman et al., 1988; Lumley, 1993; Weir & Porter, 1994) suggests that it is possible with clear specifica on of terms and appropriate methodology for testers to reach closer agreement on what skills are being tested. Similarly, Alderson (2005: 125137) in the DIALANG project noted that individual items are now viewed as tes ng iden fiable skills. However, the value of this subskills approach for tes ng is conten ous. The jury is s ll out on whether it is possible for expert judges to be convincingly accurate in their predic ons about what competencies individual items in a test are assessing. Test developers may be beNer served if they aNempt to design the overall spread of items in a test in such a way as to cover the reading construct that is appropriate to reading purpose and target level of processing difficulty; if they iden fy which types of reading are most appropriate to different proficiency levels and aNempt to ensure that the cogni ve processing demands needed to complete such tasks are commensurate with the skilled reading process as evidenced by research in cogni ve psychology.

IATEFL TEASIG

136

Best of TEASIG Vol. 2


Limita ons of the informed intui on approach Informed intui ve approaches have been helpful in advancing our conceptualisa on of what is involved in reading both for pedagogical and assessment purposes. The problem is that they were more organisa onally than theore cally driven; they oPen only represent the views of expert materials designers as to what is actually being tested in terms of reading types. More importantly, the central role of the test taker has been largely overlooked. So far liNle reference has been made to the cogni ve processing that might be necessary for second language (L2) candidates to achieve the various types of reading ini ated by the reading test tasks employed. To clearly establish the trait that has been measured, we need to inves gate the processing necessary for task fulfilment, which is the focus of the next sec on.

A cogni ve processing approach to defining reading comprehension In aNemp ng to understand what is involved in the process of reading comprehension, researchers have proposed various theories and models of reading (e.g. Birch, 2007; Cohen & Upton, 2006; Goodman, 1967; Gough, 1972; Just & Carpenter, 1980; LaBerge & Samuels, 1974; Kintsch & van Dijk, 1978; Perfe8, 1999; Rayner & Pollatsek, 1989). These theorists all recognise the reading process as combining ‘boNom-up’ visual informa on with the ‘top-down’ world knowledge that the reader brings to the task, but they diverge in their accounts of the importance accorded to each and of the ways in which the two sources of informa on are combined by the reader. In boNom-up processing, linguis c knowledge is employed to build smaller units into larger ones through several levels of processing: the orthographic, phonological, lexical, syntac c features of a text and then sentence meaning through to a representa on of the whole text. In top-down processing, larger units affect the way smaller units are perceived. Sources of informa on include context, where general and domain specific knowledge is used to enrich proposi onal meaning, and/or the developing meaning representa on of the text so far, created in the act of reading a text. There are two dis nct uses for context: one to enrich proposi onal meaning extracted from a decoded text, and the other to support decoding where it is inadequate. Stanovich (1980) argues that an interac ve compensatory mechanism enables unskilled readers to resort to topdown processing through using contextual clues to compensate for slower lexical access due to inaccurate decoding. He suggests that skilled L1 readers employ context for enriching understanding rather than for supplemen ng par al or incomplete informa on as is the case for the poor reader. Jenkins et al. (2003) note research sugges ng that skilled readers rarely depend on top-down predic on to iden fy words in context because they have such rapid word iden fica on skills which outstrip the rather slower hypothesis forming top-down processes. The opposite is true for less-skilled readers as their boNom-up processing of print is slower than topdown word predic on processes. The current accepted view, however, is that we process at different levels simultaneously and draw on both boNom-up and top-down processes in establishing meaning. The cogni ve validity of a reading task is a measure of how closely it elicits the cogni ve processing involved in contexts beyond the test itself, i.e. in performing reading tasks in real life. We have drawn on the work of authors working within the field of cogni ve psychology in order to devise a model of the L1 reading process – supported by empirical evidence – which can be treated as the goal towards which the L2 reader aspires. There will of course be some individual varia on in cogni ve processing, but we need to consider whether there are any generic processes that we would want to sample in our reading tests which would bring the process of comprehending in a test closer to that in real life. The generic cogni ve processes contribu ng to reading that we have iden fied from the literature are represented in Figure 1 and explained in the subsequent text.

IATEFL TEASIG

137

Best of TEASIG Vol. 2


Figure 1: A model of reading.

IATEFL TEASIG

138

Best of TEASIG Vol. 2


In discussing these cogni ve processes, we will start with a brief descrip on of the metacogni ve ac vity of a goal se er (see leP-hand column) because, in deciding what type of reading to employ when faced with a text, cri cal decisions are taken on the level(s) of processing to be ac vated in the central core of our model. The various elements of this processing core (see middle column) which might be ini ated by decisions taken in the goal seNer are then described individually. A discussion of the monitor then follows as this can be applied to each of the levels of processing that is ac vated in response to the goal seNer’s instruc ons. We then return to discuss in more detail the types of reading we have listed under the goal se er and relate them to appropriate elements from the central processing core.

The goal sePer The goal seNer is cri cal in that the decisions taken on the purpose for the reading ac vity will determine the rela ve importance of some of the processes in the central core of the model. Urquhart and Weir (1998) provide an overview of the goals that are open to the reader and characterise reading as being either careful or expedi ous and taking place at the local or global level. Global comprehension refers to the understanding of proposi ons beyond the level of micro-structure, that is, any macro-proposi ons including main ideas, the links between those macro-proposi ons, and the way in which the micro-proposi ons elaborate upon them. At the macro-structure level of the text, the main concern is with the rela onships between ideas represented in complexes of proposi ons which tend to be logical or rhetorical. Individual components of these complexes are oPen marked out by the writer through the use of paragraphs. This kind of process is important in careful global reading opera ons where the reader is trying to iden fy the main idea(s) by establishing the macro-structure of a text. It is also related to search reading where the reader is normally trying to iden fy macro-proposi ons, but through short cuts due to me pressure. Global comprehension is also related to the top structure level of the text where the reader, through skimming, is trying to establish the macro-structure and the discourse topic, and in careful global reading to determine how the ideas in the whole text relate to each other and to the author’s purpose. Local comprehension refers to the understanding of proposi ons at the level of microstructure, i.e. the sentence and the clause. Cohen and Upton (2006: 17) suggest that local comprehension is strongly associated with linguis c knowledge. Alderson (2000: 87) makes the connec on between local comprehension and test items which focus on understanding explicit informa on. In textually explicit ques ons, the informa on used in the ques on and the informa on required for the answer are usually in the same sentence. In our model above, local comprehension is at the levels of decoding (word recogni on, lexical access and syntac c parsing) and establishing proposi onal meaning at the sentence and clause level. Careful reading is intended to extract complete meanings from presented material. This can take place at a local or a global level, i.e. within or beyond the sentence right up to the level of the complete text. The approach to reading is based on slow, careful, linear, incremental reading for comprehension. It should be noted that models of reading have usually been developed with careful reading in mind and have liNle to tell us about how skilled readers can cope with other reading behaviours such as skimming for gist (Rayner & Pollatsek, 1989: 477-478). Expedi ous reading involves quick, selec ve and efficient reading to access desired informa on in a text. It includes skimming, scanning and search reading. Skimming is generally defined as reading to obtain the gist, general impression and/or superordinate main idea of a text; accordingly, it takes place at the global text level. Scanning involves reading selec vely, at the local word level, to achieve very specific reading goals, e.g. looking for specific items in an index. Search reading, however, can take place at both the local and global levels. Where the desired informa on can be located within a single sentence, it would be classified as local, and where informa on has to be put together across sentences, it would be seen as global. In both cases the search is for words in the same seman c field as the target informa on, unlike scanning where exact word matches are sought. IATEFL TEASIG

139

Best of TEASIG Vol. 2


Once we have discussed the central processing core of the model from the boNom level upwards, we will return to these purposes for reading in order to examine the rela onships between the intended purpose and the processing ac vity it elicits in this central core. Central processing core The processes described here aNempt to characterise the reading behaviours available to the competent L1 reader which the L2 reader might be expected to progressively approximate to as their proficiency level in L2 improves. The knowledge base on the right-hand side of the model is drawn upon by the central processing core in line with the intended purpose and the performance condi ons established by the task. Word recogni on Word recogni on is concerned with matching the form of a word in a wriNen text with a mental representa on of the orthographic forms of the language. In the case of the less experienced L2 reader, the matching process is complicated by a more limited sight vocabulary in the target language, and by the fact that the reader does not make the kind of automa c connec on between wriNen word and mental representa on that an experienced reader would. Field (2004: 234) cites Coltheart’s (1978) dual route model of decoding, which suggests that we process wriNen words in two ways. A lexical route enables us to match whole words, while a sub-lexical route permits us to iden fy words by means of grapheme-phoneme correspondence. All languages appear to use both routes. The problem for the L2 reader of English is that it is much more difficult to match an unfamiliar wriNen form to a known spoken one by means of the sub-lexical route or to internalise the spoken forms of wriNen words. Much of the matching during the acquisi on of L1 reading skills in English relies quite heavily on analogies between words with similar wriNen forms (light – fight – right). L2 learners, with limited vocabulary and less automa c paNern recogni on, are less able to apply these analogies. The opaque orthography of English may result in greater dependence on the lexical route and thereby increase the difficulty when unskilled L2 readers meet words in a text which they have never encountered before in a wriNen form. This may mean that test developers need to ensure that at lower levels of proficiency the number of unknown words in a text needs to be controlled and the length of texts these candidates are exposed to will need to be shorter than those for skilled readers. L2 readers with L1 language backgrounds in which the orthographies are very dissimilar to that of English, e.g. in the script or direc on of reading, will face addi onal problems at the decoding level (see Birch, 2007). Jenkins et al. (2003) note that less skilled readers are constrained by inefficient word recogni on which requires aNen onal resources and uses up available working memory capacity that might otherwise be used for comprehension. In the skilled reader, efficient word recogni on frees up aNen onal resources, thereby increasing the capacity in working memory available for more complex opera ons. Accuracy and automa city of word recogni on is cri cal for the skilled reader (see Grabe, 2004; Perfe8, 1997; Wagner & Stanovic, 1996). Automa city is the result of increasing experience in decoding and of the mind’s orienta on towards crea ng processes which are undemanding upon aNen on. Those readers who can decode accurately and automa cally will backtrack less oPen and have more aNen onal capacity available in working memory for comprehension, e.g. establishing proposi onal meaning, inferencing and building a mental model, and integra on of informa on across sentences. Lexical access Field (2004: 151) describes this as the “retrieval of a lexical entry from the lexicon, containing stored informa on about a word’s form and its meaning”. The form includes orthographic and phonological mental representa ons of a lexical item and possibly informa on on its morphology. The lemma (the meaning-related part of the lexical entry) includes informa on on word class and the syntac c structures in which the item can appear, and on the range of possible senses for the word. The orthographic form plays a part in what was described in the previous IATEFL TEASIG

140

Best of TEASIG Vol. 2


sec on as word recogni on. Some accounts describe sets of visually similar words in the reader’s mental vocabulary as being in compe on with each other. Individual words are ac vated in rela on to the extent to which they do or do not resemble a target word on the page. Finally, a point is reached where one word accumulates so much evidence that it is selected as the correct match. Frequent words appear to be iden fied more quickly than infrequent ones because, according to serial models of lexical access, words are stored on this principle. Other theories such as parallel access suggest that words are ac vated in accordance with their frequency and the closest match to context (Field, 2004: 117,151). This suggests that test developers need to ensure that there is a suitable progression in terms of lexis from frequent words to those with less frequent coverage as one moves up the levels of proficiency in L2 reading examina ons. Syntac c parsing Fluency in syntac c parsing is regarded as important in the comprehension process by a number of authori es (Perfe8, 1997). Once the meaning of words is accessed, the reader has to group words into phrases, and into larger units at the clause and sentence level to understand the message of the text. Cromer (1970) illustrates the importance of competence in the syntax of the target language for deriving meaning from text. He demonstrates that good comprehenders use sentence structure as well as word iden fica on to comprehend text (see also Haynes & Carr, 1990). It is therefore important that test developers ensure that the syntac c categories appearing in texts employed at each proficiency level are appropriate to the candidate’s level of development. Establishing proposi onal (core) meaning at the clause or sentence level Proposi onal meaning is a literal interpreta on of what is on the page. The reader has to add external knowledge to it to turn it into a message that relates to the context in which it occurred. Inferencing Inferencing is necessary so the reader can go beyond explicitly stated ideas as the links between ideas in a passage are oPen leP implicit (Oakhill & Garnham, 1988: 22). Inferencing in this sense is a crea ve process whereby the brain adds informa on which is not stated in a text in order to impose coherence. A text cannot include all the informa on that is necessary in order to make sense of it. Texts usually leave out knowledge that readers can be trusted to add for themselves. Problems may of course arise where the assumed knowledge relates to that of the L1 host culture and such inferences are not possible for the L2 learner who lacks this knowledge. Inferencing may also take place at word level, when a word is referring to an en ty as in the case of pronouns, or is ambiguous in its context, or is a homograph. It may also involve guessing the meaning of unknown words in context. Hughes (1993) argues that we should replicate real-life processes and aNempt to sample all types of inferencing ability in our tests with the caveat of being able to select texts which are close to the background and experience of the candidature. However, he admits that pragma c inferencing ques ons (where the reader not only makes use of informa on in the text, but also refers to their own world knowledge) are problema c where candidates have very different knowledge, experience and opinions. Pragma c evalua ve inferences are par cularly difficult to include in tests because of the marking problems associated with poten al variability in answers. Even though the evidence available to readers from the text is given, they will come to it with different perspec ves and expecta ons (see Chikalanga, 1991 & 1992). Test developers need to be conscious of this at the item-wri ng stage to avoid penalising candidates who may lack par cular world knowledge.

IATEFL TEASIG

141

Best of TEASIG Vol. 2


Building a mental model Field (2004: 241) notes that “incoming informa on has to be related to what has gone before, so as to ensure that it contributes to the developing representa on of the text in a way that is consistent, meaningful and relevant. This process entails an ability to iden fy main ideas, to relate them to previous ideas, dis nguish between major and minor proposi ons and to impose a hierarchical structure on the informa on in the text.” Ongoing meaning representa on is provisional and liable to revision as well as upda ng with new informa on from the text. Selec on may occur whereby stored informa on is reduced to what is rela ve or important. According to Kintsch and van Dijk (1978: 374), the proposi ons represen ng the meaning of a text are linked together, usually by argument overlap, to form a hierarchical text base. Microstructures are processed, converted into seman c proposi ons, and stored in the working memory, while the cohesion between them is established. As the process moves on, a macro-structure is built up. Background knowledge, stored in long term memory, is u lised to supply an appropriate schema for the macrostructure, as well as to aid coherence detec on in the construc on of the micro-structure. Crucial informa on tends to be at the top levels of this hierarchy, while detailed informa on is at the lower levels. As we discuss below, while building a mental model there is a need to monitor comprehension to check the viability of the ongoing interpreta on. Monitoring chiefly checks the consistency of incoming informa on against the meaning representa on established so far. If the two conflict, the reader regresses to check. This type of monitoring is especially absent in weaker readers. World knowledge in the form of schemata in long-term memory plays an important part in judging the coherence and consistency of what has been understood when it is integrated into the ongoing meaning representa on. Crea ng a text-level structure At a final stage of processing, a discourse-level structure is created for the text as a whole. The skilled reader is able to recognise the hierarchical structure of the whole text and determines which items of informa on are central to the meaning of the text. The skilled reader determines how the different parts of the text fit together and which parts of the text are important to the writer or to reader purpose. The development of an accurate and reasonably complete text model of comprehension would therefore seem to involve understanding of discourse structure and the ability to iden fy macro-level rela onships between ideas. It also involves understanding which proposi ons are central to the goals of the text and which are of secondary importance. The monitor Thus far we have looked at each of the levels of processing that may be brought into play as a result of metacogni ve decisions taken in the goal-se8ng stage. A further metacogni ve ac vity may take place aPer ac va on of each level of the processing core: test takers are likely to check the effec veness of their understanding (S cht & James, 1984). The monitor is the mechanism that provides the reader with feedback about the success of the par cular reading process. Self-monitoring is a complex opera on which may occur at different stages of the reading process and may relate to different levels of analysis. In decoding text, monitoring involves checking word recogni on, lexical access, and syntac c parsing. Within meaning building it can involve determining the success with which the reader can extract the writer’s inten ons or the argument structure of the text. Researchers like Perfe8 (1999) or Oakhill and Garnham (1988) have argued that the unskilled L1 reader oPen fails to monitor comprehension or at least makes less use of monitoring strategies, par cularly at the comprehension level. Studies have also shown that one of the hallmarks of a good reader is the ability to check the meaning representa on for consistency. Skilled readers, on failing to understand a part of a text, will take ac on such as rereading to deal with the problem (see Hyona & Nurminen, 2006). IATEFL TEASIG

142

Best of TEASIG Vol. 2


The components of goal seNer and monitor can be viewed as metacogni ve mechanisms that mediate among different processing skills and knowledge sources available to a reader. Urquhart and Weir (1998) provide detailed explana ons about how these metacogni ve mechanisms enable a reader to ac vate different levels of strategies and skills to cope with different reading purposes. The reader may choose to skim, search read, scan or read carefully in response to the perceived demands of the test task. The level of processing required by the ac vity will also relate closely to the demands set by the test task. Rela ng reading types to the central processing core The goal seNer part of the model is cri cal in that the decision taken on the purpose for the reading ac vity will determine the processing that is ac vated, dependent, of course, on the limita ons imposed by the L2 reader’s linguis c and pragma c knowledge, and the extent of the reader’s strategic competence. Rothkopf (1982) illustrates how the purpose for reading a text determines what and how much the reader takes away from it. Once the readers have a clear idea of what they will be reading for, they can choose the most appropriate process(es) for extrac ng the informa on they need in the text (see Pressley and Afflerbach, 1995 for a comprehensive review of planning processes). The goal seNer determines the overall goal of the reading, and also selects the type of reading which is likely to achieve that goal. Below we describe three expedi ous reading methods/types: skimming, search reading and scanning, and careful reading. Expedi ous reading: skimming For Urquhart and Weir (1998) the defining characteris cs of skimming are that the reading is selec ve and an aNempt is made to build up a macro-structure (the gist) on the basis of as few details from the text as possible. Skimming is selec ve: depending on how much informa on readers decide to process, they may access words and possibly process en re sentences. The reader will allocate his aNen on, focusing full aNen on on proposi ons that seem to be macro-proposi onal and reducing aNen on on others. He uses knowledge of text and genre which indicates likely posi ons for macro-proposi ons, e.g. first sentence of paragraph. Presumably, the monitor checks as to whether the material surveyed is appropriate or not; in this case, the amount processed may be quite substan al. The reader will pause at appropriate points to seman cally process words, phrases and clauses. If skimming is equivalent to gist extrac on, then presumably proposi ons are commiNed to the long-term memory on the hypothesis that they represent the macro structure. That is, the process of deba ng whether a proposi on is part of the macro structure or not, which is assumed to take place during careful reading, is here replaced by a guess that is usually supported by general knowledge of the world or domain knowledge. The reader is trying to build up a macrostructure of the whole text (the gist) based on careful reading of as liNle of the text as possible. This is why skimming does not lend itself to the construc on of numerous test items. A study of samples of EAP reading tests such as TEEP or IELTS (see Weir et al., 2000) reveals that skimming rarely features in items in those tests and when it does, it is realised in only a single item asking a ques on such as ‘What is the main idea of this passage?’ (see TOEFL, 1991 and UETESOL, 1996). Skimming requires the crea on of a skeletal text level structure and, in par cular, a decision as to the superordinate macro-proposi on (Kintsch & van Dijk, 1978) that encapsulates the meaning of a text. However, because of the rapid and selec ve nature of the processing involved, it is unlikely to result in a detailed meaning representa on of the whole text, a meaning representa on that includes the rela onships between all the macroproposi ons and their rela ve importance. To arrive at a comprehensive and accurate text level structure, careful, rather than merely expedi ous, global reading is necessary. One can locate macro-proposi ons in two ways: • by selec ve eye movements which aNempt to locate sentences within the text sta ng the major issues, and • by evalua ng proposi ons as they are read, in order to iden fy those which start a new meaning structure

rather than those that are subservient to or suppor ve of an ongoing structure. IATEFL TEASIG

143

Best of TEASIG Vol. 2


Search reading In search reading, the reader is sampling the text, which can be words, topic sentences or important paragraphs, to extract informa on on a predetermined topic. The reader may draw on formal knowledge of text structure to assist in this search for informa on on pre-specified macro-proposi ons (see Trabasso & Bouchard, 2002; Urquhart & Weir, 1998). Pugh (1978: 53) states that in search reading: the reader is a emp ng to locate informa on on a topic when he is not certain of the precise form in which the informa on will appear... the reader is not pursuing a simple visual matching task (as in scanning), but rather needs to remain alert to various words in a similar seman c field to the topic in which he is interested. It is true that the visual ac vity involved is similar to scanning in many ways. However, the periods of close a en on to the text tend to be more frequent and of longer dura on and, since informa on is more deeply embedded in the text, there is more observance of the way in which the author structures his subject ma er and, hence, the linearity and sequencing. Informa on about the structure of the text may be used to assist in the search. For Urquhart and Weir (1998) search reading involves loca ng informa on on predetermined topics so the reader does not have to establish an overall representa on of the whole of the text as in skimming. The reader wants only the relevant informa on necessary to answer the set ques ons on a text. In cogni ve terms, it represents a shiP from generalised aNen on to more focused aNen on. The start of the process is to look for related vocabulary in the seman c field indicated by the task or item. Once the required informa on to answer a ques on has been quickly and selec vely located, careful reading will take over and this may involve establishing proposi onal meaning at the sentence level, enriching proposi ons through inferencing, and it may require the reader to integrate informa on across sentences. In the test situa on the wording of the ques ons does not usually allow the candidate simply to match ques on prompts to text and so lexical access is more demanding than in scanning tasks. Search reading involves the different aspects of meaning construc on up to and including the level of building a mental model, but it does not require the crea on of a text level structure. The rela ve importance of the informa on in the text (micro- versus macro-proposi on) is not an issue: all that maNers is that the informa on has a bearing on the knowledge that is sought. Scanning Scanning involves reading selec vely to achieve very specific reading goals. It may involve looking for specific words or phrases, figures/percentages, names, dates, or specific items at the local word level. It is a perceptual recogni on process which is form-based and relies on accurate decoding of a word or string of words. Rosenshine (1980) defines it as involving recogni on and matching. The main feature of scanning is that any part of the text which does not contain the pre-selected word, symbol or group of words is passed over. A low level of aNen on is accorded un l a match or approximate match is made. The reader will not necessarily observe the author’s sequencing by following the text in a linear way. Here, very few components of our model are involved. Suppose, at the lowest level, the goal has been set as scanning a text to find a reference to a par cular author. In fact, it is arguable that only a limited amount of lexical access is required. Presumably liNle or no syntac c processing needs to be involved, no checking of coherence, and no aNempt to build a macro-structure. There is usually no need to complete the reading of the sentence, or to integrate the word into the structure of preceding text. As a result, scanning involves none of the different aspects of meaning-building that we have iden fied in our model.

IATEFL TEASIG

144

Best of TEASIG Vol. 2


Careful reading We will focus here on the different aspects of meaning-building upon which this reading type depends, and will dis nguish between processing that takes place at the local and at the global level. Careful local reading involves processing at the decoding level un l the basic meaning of a proposi on is established. Some local inferencing might be required to build a mental model at the enriched sentence level. However, it does not entail integra ng each new piece of local informa on into a larger meaning representa on. The defining features of careful global reading are that the reader aNempts to handle the majority of informa on in the text; the reader accepts the writer's organisa on and aNempts to build up a macrostructure on the basis of the majority of the informa on received. Careful global reading draws upon all the components of the model. The reader decides to read the text with a rela vely high level of aNen on as, for example, for study of a core textbook at undergraduate level. The goal seNer sets this aNen on level not just for the reading opera on, but also for the monitoring that accompanies it. The reader would normally begin at the beginning of the text and con nue through to the end, employing the processes detailed in the central core of the model above: integra ng new informa on into a mental model and perhaps finally crea ng a discourse level structure for the text where appropriate to the reader’s purpose. A more demanding level of processing in careful reading would be required when establishing how ideas and details relate to each other in a whole text. The reader not only has to understand the macro and micro proposi ons, but also how they are interconnected. This will require close and careful reading, and perhaps even a rereading of the whole text or at least those parts of it relevant to the purpose in hand. Most likely, all of the processing components listed in the central core of our model above will be required in this ‘reading to learn’ ac vity, where there is new as well as given informa on to be understood. Cohen and Upton (2006: 17) describe this ‘reading to learn’ process. With reference to the new TOEFL iBT, they state that: …according to the task specifica ons (ETS 2003) reading to learn is seen as requiring addi onal abili es beyond those required for basic comprehension. Reading to learn ques ons assess specific abili es that contribute to learning, including the ability to recognise the organisa on and purpose of a text, to dis nguish major from minor ideas and essen al from non-essen al informa on, to conceptualise and organise text informa on into a mental framework, and to understand rhetorical func ons such as causeeffect rela onships, compare-contrast rela onships, arguments, and so on… In the real world, the reader some mes has to combine and collate macro-proposi onal informa on from more than one text. The likelihood is that the process would be similar to that for a single text representa on model, but that aPer reading one text, the knowledge (and perhaps linguis c) base will have been expanded as a result of the final meaning representa on of the text being stored in long-term memory. The need to combine rhetorical and contextual informa on across texts would seem to place the greatest demands on processing (Enright et al., 2000: 4-7). Perfe8 (1997: 346) argues that this purpose requires an integra on of informa on in a text model with that in a situa on model in what he terms a documents model, which consists of “An Intertext Model that links texts in terms of their rhetorical rela ons to each other and a Situa ons Model that represents situa ons described in one or more texts with links to the texts”. This would require more demanding processing than the other reading ac vi es described above, including a greater level of global inferencing and text level structure building, and perhaps necessita ng regression across whole texts. Leaving aside for the moment the cogni ve load imposed by the complexity of the text employed in the test, one might argue that difficulty in processing is in large part a func on of how many levels of processing in our model are required by a par cular type of reading. This is an issue on which we do not have empirical evidence. However, we IATEFL TEASIG

145

Best of TEASIG Vol. 2


might hypothesise that, muta s mutandis, the following order of difficulty might well be obtained in reading types. Star ng with the easiest and ending with the most difficult, our best guess would be: 1. Scanning/searching for local informa on 2. Careful local reading 3. Skimming for gist 4. Careful global reading for comprehending main idea(s) 5. Search reading for global informa on 6. Careful global reading to comprehend a text 7. Careful global reading to comprehend texts. As with most scales, we can be reasonably confident of the posi oning at the two extremes (2 is more difficult than 1, and 7 more difficult than 6 in the scale above). The middle three types of reading (3, 4 and 5) are a closer call, and it is likely that aNen on to contextual parameters might be necessary to establish clear water between these levels. Ashton (2003), using six subtests of CAE Reading, demonstrated that items 6 and 7 on the scale above were consistently more challenging than item 5. Rose (2006) conducted research during the review of FCE and CAE into the amount of me needed for candidates to complete a careful global reading mul ple-choice (MC) item and an expedi ous local mul ple-matching item. She found that a careful reading MC item needed more me to answer than an expedi ous mul ple-matching item, thus indica ng that it was worth more marks.

Conclusion In this ar cle we have presented and argued for a cogni ve processing approach as the most tenable and produc ve theore cal basis for establishing what reading comprehension really involves. This is especially so given the discussed limita ons of the factorial approach tradi on and those of the reading subskills approach – an approach based largely on ‘expert judgement’ that takes liNle account of the cogni ve processes test takers actually employ. We hope that this model and account of cogni ve processing will provide a useful basis for establishing the cogni ve validity of reading tests, i.e. the extent to which the tasks that test developers employ will elicit the cogni ve processing involved in reading context beyond the test itself.

(† Deceased September 2018) Editor’s notes: This ar cle was reproduced by permission of Shigeko Weir and the co-author, Hanan Khalifa. In the original publica on the authors offered to provided references and further reading on request.

IATEFL TEASIG

146

Best of TEASIG Vol. 2


2011

IATEFL TEASIG

147

Best of TEASIG Vol. 2


Is language tes ng a profession? Harold Ormsby L., Na onal Autonomous University of Mexico, Mexico Original publica on date: IATEFL TEASIG Newsle er March 2011

This ar cle is based on a contribu on on 15 April 2010 to a discussion on LTEST-L1 and is a response to the view that ‘Weak professions, such as language tes ng, do not enjoy the legal backing that strong professions have’. That means that anyone can claim to be a ‘professional language tester’. While I agree with this, I don't think it's the most important point. Each profession is unique – that's one element that makes it a profession; others are having the group's own gathering contexts (colleges, associa ons, virtual spaces etc.) and a bibliography. But that means that we need to pay close aNen on to language tes ng's uniqueness before we go around looking for models in other professions. (I'm going to ignore the fact that language tes ng has lots in common with educa onal and psychological tes ng – that can be looked at on another occasion.) There are three social groups involved in language tes ng: users (including meta-users, i.e. those who order someone else to use test scores to make decisions), test-makers (a mixed bag), and test-takers. 1. Users Users provide the jus fica on, mo va on and oPen (some of) the funding for making a test. Importantly, a user defines the use(s) a test's scores and its interpreta ons is (are) to be put to. (Indeed, some mes testmakers put together a product to see if users can be found, but that has to be based on a stereo-typified user, so there's no contradic on with the foregoing.) 2. Test-makers I'll get back to us in a bit. 3. Test-takers If a user decides not to pay for every aspect of a test forever, the user retains the power to make test-takers pay. It is possible that a test's results might benefit test-takers somehow, but again that is largely a user's affirma on, not something anyone else actually determines empirically. Of course, one of language tes ng's unique characteris cs is that anyone can be a test user. A concept that might be similar to our ‘user’ in medicine and law has to be defined very differently; civil engineering has something like usership but, again, the differences are significant. The point is that it's silly to postulate the crea on of the Universal Associa on of Test Users to which all new-borns in the world shall be subscribed ‘just in case’. It's no less silly to postulate a universal associa on of test-takers. Anyway, this leaves us with test-makers. Unless they resort to the fantasized user to mo vate a new product, testmakers just sit around looking for users. Comparisons with many street corners around the globe and certain shop windows in Amsterdam are neither inaccurate nor unkind. But once the longed-for user comes through the door, a mighty team of test-making workers has to be pulled together. Problems from the point of view of regula ng and policing can be seen as follows:

IATEFL TEASIG

148

Best of TEASIG Vol. 2


1. Very few people are actually full- me test-makers for their en re working life. While one could propose something like actors' unions, where dues only come out of real pay checks, even that seems like a very unsa sfactory model for a test-makers' associa on. (For one thing, it would involve a binding contract with users and meta-users – see above.) 2. Test-makers are a mixed bag of special es (not unlike the movie industry). An associa on that requires certain qualifica ons for membership would undoubtedly have to have sec ons for special es and for languages (there are poten ally thousands). It would be an unimaginably huge, complex and expensive bureaucracy. 3. Being hemmed in by predetermined professional qualifica ons is bothersome to many users (and all metausers); because their func on in the game thus empowers them, they can freely go out to any street corner and pick up unemployed people to be cheap test-making team members. However, it is, I think, very much to all test-makers' benefit, one way or another, that their profession not be comparable to a cross between a 19th century Wild-West show and a cesspool, as it currently is. Undoubtedly, the Codes of Ethics and Prac ce (e.g. the ILTA code, see hNps://www.iltaonline.com/page/CodeofEthics) have been very worthwhile, a good beginning. Nonetheless, as I think back on the processes of wri ng the ones that I took part in, I think we all had something like the Hippocra c Oath (HO) in mind. (Indeed, it may have been men oned in discussions.) The problem with the model of the HO is that it lays obliga ons only on those who choose to take the oath – and, by the way, it excludes certain prac oners of the medical arts, midwives and baNlefield surgeons among them – but it in no way serves to ‘clean up’ the ancient medical industry. It expressly enjoins against fraud (e.g. charging, but not giving service) and creepiness (messing with a householder's children, servants and women), which were probably fairly common among non-oath-takers in those mes, but it cannot exclude those who do those things from the broader industry. Yes, the Codes are directed at ‘language tes ng prac oners’ who are all individuals but not all those individuals are users, who are the most powerful players in the game. Users can and, I'm sure, do override Codes with impunity, if not also gay abandon. And, in the end, there's really nothing test-making team members can do about it because they want to keep ge8ng paid (or, worse, because they've signed contracts that prevent them from leaving in a huff). And so, I propose that we (the profession) need to regulate and police projects, not tests. At the project level, when all the players are in place and either ge8ng ready to do their thing or actually doing it, the quality of test-making ac vi es, products and, indeed, social outcomes can be inves gated, evaluated and commented on independently and on an on-going basis. Posi ve comments are good; nega ve ones are undesirable; transparency has to be the watchword. While some sort of (interna onal) associa on of test-makers (all special es, languages etc.) is needed to do this, there's no advantage to making membership qualifica ons-based – it should be open to all who want to enter and pay (reasonable, low) dues. Who is chosen to do the inves ga ng, evalua ng and commen ng about any given project will depend on the nature of the project in ques on, especially on what language the test(s) is (are) (going to be) in. Who does the choosing can be determined by an associa on's membership in normal, qualifica ons-based ways – the fact that an associa on is open to all doesn't mean that it can't respect and honour professional qualifica ons, and give those who have them power over those who don't (a rela ve status, not to be discussed here).

IATEFL TEASIG

149

Best of TEASIG Vol. 2


While no user can be required to request inves ga on, evalua on and commen ng, test-makers can at least make an aNempt at convincing them to do so. Crea ng a class of test-makers who have inves gated, evaluated and commented on other colleagues' projects will be a plus for the profession. Let me add here that, personally, I have no interest whatsoever in seeing any interna onal test-makers' associa on pay, necessarily, any par cular aNen on to the doings of The Great Test Factories, especially in EFL but also in other (kinds of) language(s). Some of these places do what seems to be an excellent job; others have manoeuvred themselves into posi ons where the power structure they work in uncri cally avoids defining their products as lowquality. If they want to have their projects inves gated, evaluated and commented on, they certainly have the wherewithal to pay for it, so too much thinking about them is wasted me. The projects that concern me, personally and professionally, are the thousands or even tens of thousands of small, medium and some mes large tes ng projects in all sorts of languages (L1, L2, FL etc.), that are done largely in isola on, under users (or meta-users) who have no visible interest in quality and whose test-makers have no prac cal way of ge8ng (professionally powerful) help from a counterweight to these users. These are the colleagues from, as I've oPen called it, the swamp, many, in the end, vic ms of the language tes ng profession's neglect. These kinds of projects' needs prompted me to propose support for funding from a consor um of founda ons which would be the topic for another piece. So, I believe that the overall situa on tells us that something must be done. I believe that prac cal ways can be found to do it, and that those ways can reflect language tes ng's uniqueness. We are, I think, leP with long-ago Rabbi Hillel’s ever-haun ng ques on, “And if not now, when?” 1

A brief descrip on of LTEST-L can be found at hNp://languagetes ng.info/ltest-l.html.

IATEFL TEASIG

150

Best of TEASIG Vol. 2


The tes ng of intercultural competence: seven theses Rudi Camerer, elc-European Language Competence, Frankfurt, Germany Original publica on date: IATEFL TEASIG Newsle er August 2011

The following text was submiNed to a mee ng of the SIEATAR Regional Group Rhein-Main, Germany and was intended to trigger a discussion about the compila on of 53 tests of intercultural competence as currently listed on the SIETAR-Europe website. SIETAR is the Society for Intercultural Educa on, Training and Research and operates globally. It is important to note that there is no perceivable tradi on within SIETAR of seeing intercultural competence as a communica ve competence, i.e. connec ng it with the ac ve use of a (na ve or foreign) language. The methods applied in tes ng intercultural competence therefore build on psychological constructs rather than on prac cal communica ve performance. It is the purely psychological approach to the tes ng of intercultural competence which I wanted to address in this discussion.

1. Tests may be meaningful, useful and helpful procedures, provided they comply with basic quality criteria, adhere to principles of fairness, and do not serve unethical purposes. The quality criteria at issue for tests are validity, reliability and objec vity. Tes ng is a universal feature of social life. Throughout history people have been put to the test to prove their capabili es or to establish their creden als … Tests to see how a person performs, par cularly in rela on to a threshold of performance, have become important social ins tu ons… (McNamara, 2004) 2. A variety of test types is currently available (diagnos c, personality, proficiency, placement, achievement, progress tests …), the significance of each of which relates to the context of development and use. The same quality criteria are applicable to all test types. Ge[ng it right, ensuring test fairness, is a necessity not an ideal for tes ng. In developing assessment tools a decision must be taken on what is criterial in the par cular domain under review, and this decision and the test measures used for opera onalizing it must be ethically defensible. Test developers must be made accountable for their products. (Weir 2005) 3. Personality tests and performance tests are two basic test types, oPen known as ‘psychometric tests’. There is, however, an important dis nc on, which is that personality tests are based on self-evalua on procedures, whereas performance tests require the observa on and standardised ra ng of the ac ve performance of one or more candidates. The use of psychometric tes ng in selec on is now well-established, and it can be used to provide objec ve informa on about different areas for candidates’ skills, for example the extent of their knowledge, mo va ons, personality and poten al. (Carter & Russell, 2001)

IATEFL TEASIG

151

Best of TEASIG Vol. 2


4. Some authors of personality tests avoid using the word ‘test’ and prefer terms such as assessment scale, competence scale, sensi vity scale, profile, inventory, indicator etc. Regardless of the names used, the procedures are frequently employed for personnel selec on and are seen as tests by HR managers as well as many other users. cf. SIETAR europa Online Documenta on Centre 3.3 Assessments & instruments h p://www.sietareuropa.org/SIETARproject/Assessments &instruments.html#Topic26 (accessed 1 August 2011) 5. The number of personality tests based on self-assessment which are available is astounding. A list published annually by the University of Trier names no less than 6220 psychometric tests of this type. However, it is the case for all of them that their validity is difficult to establish. So far, no empirical proof exists that there is a significant rela onship between a candidate’s test results and his/her performance outside the test situa on. There is considerable evidence to suggest that when predic ve valida on studies are conducted with actual job applicants where independent criterion measures are collected, observed (uncorrected) validity is very low and oTen close to zero. This is a consistent and uncontroversial conclusion. (Morgeson, 2007) 6. Flexibility, Empha c communica on, Emo onal stability, Tolerance of ambiguity, Personal autonomy, Emo onal intelligence, Reflected awareness, Inner-referenced vs. outer-referenced, Adapta on, Integra on… . Based on criteria like these, established test procedures in the intercultural training field claim to iden fy the degree of intercultural competence acquired. Not one of these criteria can be regarded as being supported by academic consensus. In other words, the construct validity of each of the terms is weak. So far, no empirical proof exists that a significant rela onship exists between a candidate’s test results and his/her performance outside the test situa on. The construc on of a psychometric test begins with the compila on of a list of empirically verifiable phenomena which exhibit, more or less concealed, the property concerned and its characteris cs. Theore cal assump ons, personal bias and convic ons play an important role in this. To avoid succumbing to subjec ve factors such as these, the decision on which phenomena should be recognised as indica ng a certain property should be based on the consensus of those dealing with the property methodically and scien fically. Academic psychology has a long way to go to reach this consensus. The confused muddle of contents typical for academic psychology and observable even with basic proper es like intelligence, a rac on, competence etc. will necessarily follow. (Meyer, 2004) 7. A valid test of intercultural competence must focus on a person’s communica ve performance in (poten ally) cri cal intercultural encounters, in other words, the ac ve use of language should not be ignored. If these are intended to provide a forecast of a candidate’s communica ve performance outside the test situa on, the test procedures must be based on standardised procedures for observa on and ra ng. The Common European Framework of Reference for Languages may provide a basis for a test construct supported by widespread academic consensus. In fact, the importance of this one single document in language circles has been unsurpassed up to now. (Mader & Camerer, 2010) Every test makes an impact and has consequences, even if these may not be immediately recognisable. Before you think about giving a test, whether a test you have set yourself or a published test, think about all the consequences the test may have, on the learners, on you, on your teaching, on the ins tu on, and on all the stake-holders involved. You should always have a good reason for doing a test and be aware of its effects. (Mader, 2011)

IATEFL TEASIG

152

Best of TEASIG Vol. 2


References Carter, P. and Russell, K. (2001) Psychometric Tes ng. Chichester: John Wiley. Mader, J. (2011). Tes ng and Assessment in Business English. Berlin: Cornelsen. Mader, J. and Camerer, R. (2010) Interna onal English and the Training of Intercultural Competence. interculture journal, (9)12, 97-116. McNamara, T. (2000). Language Tes ng. Oxford: OUP. Meyer, H. (2004), Theorie und Qualitätsbeurteilung psychometrischer Tests. StuNgart: Kohlhammer. (author’s transla on) Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K. and SchmiN, N. (2007) Are we Ge8ng Fooled Again? Coming to Terms with Limita ons in the Use of Personality Tests for Personnel Selec on. Personnel Psychology 60, 1029-1049. Weir, C. J. (2005). Language Tes ng and Valida on. Basingstoke: Palgrave Macmillan.

IATEFL TEASIG

153

Best of TEASIG Vol. 2


Listen and touch Thom Kiddle, NILE (Norwich Ins tute for Language Educa on), UK Original publica on date: IATEFL TEASIG Newsle er August 2011

Will touch-screen technologies herald a revolu on in how listening comprehension is assessed? If there’s one thing that’s sure to date fast in today’s world, it’s an ar cle about new technologies. So, please read this fast, as what may be new and interes ng at the top of the page could well be old hat by the me you reach the boNom! In full consciousness of that danger, I’d like to suggest that the biggest revolu on in personal compu ng in the last five years has been the advent of touch-screen technology. The possibili es offered to the tes ng world have not been fully explored as yet, and it is worth looking at the poten al changes this technology could bring to the tes ng of listening, and in par cular the use of non-linguis c response items. As recently as 2006, Chapelle and Douglas were able to state with confidence that candidate responses in language tes ng took the following form: “… wriNen responses are typically keyed or clicked in, although handwri ng and speaking are increasingly used as well” (Chapelle & Douglas, 2006: 24). We may soon be able to add to these the concept of entering item or task responses by touching, moving, dragging or otherwise manipula ng content on the screen with our fingers, and therefore introduce the idea of a response to a text or other form of input that does not depend on linguis c output, but rather on the comple on of a visual or spa al task. The idea of non-linguis c response to linguis c input may seem anathema to language tes ng, par cularly with the popularity of integrated tests, e.g. a combined listening and summary wri ng item such as in the Pearson Test of English (Academic), but it may actually be more in tune with what assessment of a ‘recep ve skill’ is really trying to get at. Bachman and Palmer (1996) argue that it is much more useful to see language use being realized while learners are performing specific tasks. They state: We would thus not consider language skills to be part of language ability at all, but to be the contextualized realiza on of the ability to use language in the performance of specific language use tasks. We would therefore argue that it is not useful to think in terms of ‘skills’, but to think in terms of specific ac vi es or tasks in which language is used purposefully. (Bachman & Palmer, 1996: 75-76) If we take the approach that ‘language use’ in terms of listening is successfully processing spoken language to achieve comprehension, then demonstra on of that comprehension in a listening test need no more be expressed in successful output of language or the ability to read and choose a mul ple choice op on than it would be in ‘real life’. How strange it would be when someone asked you to open a door if you responded by saying ‘I am going to open the door for you’, or held up a card chosen from four in your hand with the word ‘door’ on it. Wouldn’t it be a much more realis c realiza on of successful comprehension of this request to actually open the door, by sliding it open on screen with your finger, just as we’d respond to the real request by opening the door in ques on? This may seem a somewhat simplis c example of a listening item and a non-linguis c form of response, but these types of item may offer other important benefits to test developers and users. One of the trickiest jobs in designing listening items is establishing the rela onship between text difficulty and ques on difficulty, and in fact, in all linguis c response listening items, there is constant interplay between the two. As John de Jong noted in the recent IATEFL TEASIG Pre-Conference Event workshops in Brighton, even the most complex text can be used in a low-level IATEFL TEASIG

154

Best of TEASIG Vol. 2


item if we ask a ques on such as ‘How many people are speaking?’ or ‘What language are the people speaking?’. When we look at the text itself, we can evaluate such things as speed, accent, background noise, type-token ra o, structural complexity, topic and rhetorical organiza on to determine difficulty. Some of these can be objec vely analysed, such as speed and type-token ra o, and, to a certain extent, structural complexity (use of complex sentences, length of pre-modifying noun phrases, etc.), whereas some can only be analysed subjec vely – the effect of background noise, accent effect, topic and rhetorical organiza on, for example. Taking the ques on in isola on, we have only the possibility of subjec ve analysis of its difficulty – that is, the topic focused on, the task imposed, and the level of language used in the ques on. When we put the text and the ques on together to make an item, we can study how the two combine to determine item difficulty, oPen with empirical analysis of candidate performance based on candidates of ‘known’ ability taking the item. It is this complex and, to a certain extent, permanently elusive interac on between text difficulty and ques on difficulty which can be simplified and pinned down with non-linguis c response items, as we need only be concerned with text difficulty in terms of language. It may also be much easier to align non-linguis c, task-based response items with grading scales based on competencies. For example, the C1 descriptor for Listening to Announcements and Instruc ons states: Can extract specific informa on from poor quality, audibly distorted public announcements, e.g. in a sta on, sports stadium etc. Can understand complex technical informa on, such as opera ng instruc ons, specifica ons for familiar products and services. (Council of Europe, 2001: 67) Here it is easy to see how a non-linguis c response item based on the realiza on of a task such as pu8ng opera ng instruc ons into prac ce by carrying out a process on screen could be a much more effec ve assessment of listening comprehension than responding to a mul ple-choice ques on. A lower-level task could be the following (see Figure 1): “Welcome to the registra on process for your new XC1000 mobile phone. In order to register your phone with the company’s Fire ID security system, please complete the following steps. On the first screen enter your four-digit pin code in both boxes and then press the ‘Done’ bu on. For your phone this is 4...7...3...1. On the next screen you will see a range of security op ons. The op ons are one star for low-level security, two stars for medium-level security and three stars for high-level security. It is very important that you select the high-level security op on for your new company phone. Thank you. The registra on process is now complete.”

Figure 1: Task involving registra on of new mobile phone.

IATEFL TEASIG

155

Best of TEASIG Vol. 2


An example of a higher-level task could, for example, be manipula on of objects in a room according to housemates’ conversa ons (see Figure 2) or following instruc ons by mobile phone in order to do the weekly shopping trip.

Figure 2: Task involving manipula on of objects in a room.

It may be that these non-linguis c response item types add to the Messickian (1996) concept of construct validity through authen city of task type, offering “…representa ve coverage of the content and processes of the construct domain” (Messick, 1996: 20) and par cularly in ESP tes ng situa ons, for nurses or flight-traffic controllers, for instance. They certainly seem to offer more authen city in terms of task type than simultaneous reading and listening items, or the ubiquitous mul ple-choice item, for example. A quick survey of online language tests from the major developers (Pearson PTE Academic, TOEFL iBT & Cambridge BULATS online), shows no evidence of this type of ques on reaching the test public as yet, with the only discernible approach to non-linguis c response being picture selec on in mul ple-choice items. One reason for this may be a fear of the construct-irrelevant variance which may be introduced through unfamiliar item types, that is, the technological competence required to respond to items, and of course the need for touch-screen technology to be available for the candidate (though all of the above example items could conceivably be taken with a conven onal mouse and cursor interface). There are other reasons why we may need to be wary of such non-linguis c response items in language tes ng. From the test developer’s perspec ve, there are the problems of programming such items and the difficul es of ensuring that items are available cross-plaForm – the fact that items delivered using Flash Player will not work on touch-screen devices such as the iPad or iPhone, for example. However, there may well come a me in the near future when for the major test publishers it is more cost-effec ve to equip test centres with portable, reusable, internet-ready touch-screen, secure test-response devices, rather than rely on sending test papers and answer sheets, or dealing with local computer network issues. The same concerns apply for those of us who develop tests on a smaller scale, of course, and the difficulty of providing an equivalent pen & paper backup test version is rather more complex with non-linguis c response items for obvious reasons. However, as with many op ons for online crea on and delivery of tests, it may be possible to provide commercial plaForms with Wizards which allow for the crea on of non-linguis c response items, as is currently possible to a certain extent with Ques onmark, for example. From another angle, it may be much more difficult to create non-linguis c items which assess listening comprehension in terms of a8tude or emo on, if for no other reason than because non-linguis c responses to different a8tudes and emo ons may be purely subjec ve on the part of the responder and impossible to define as ‘correct’ or ‘incorrect’. There is also the danger that in trying to develop higher-level non-linguis c response items, IATEFL TEASIG

156

Best of TEASIG Vol. 2


there may be the unintended consequence of simultaneously increasing the cogni ve complexity of tasks to be performed, thereby introducing a further area of construct-irrelevant variance into the difficulty equa on. This makes a listening comprehension task more akin to a jigsaw puzzle than a demonstra on of listening proficiency. Similarly, it may be that these item types are more suited to lower-level listening assessment, in which less weight is given to abstract topics and more to concrete use of language. As Wagner (2005) reports, Hansen and Jensen (1994) compared the results of scores (based on varying ques on types) between different level learners. They found that lower-level learners had more difficulty in comparison to higherlevel learners on broad, global ques ons (represen ng the need for top-down processing), than they did with detail ques ons (represen ng bo om-up processing). The researchers found evidence that lower-level listeners relied on verba m responses in answering ques ons. This worked well for detail ques ons, but was less effec ve for global ques ons. (Wagner, 2005: 4) However, as with most concepts in tes ng, the poten al advantages and disadvantages of embracing touch-screen technology to create non-linguis c response items will only be able to be weighed and evaluated with extensive trialling and tes ng of items, plaForms and texts. The benefits which the developments in technology may bring to us in the tes ng world are too important to ignore and may also allow us the ability to interact with listening texts in a way closer to real life and to make redundant the debate between listening once or twice to a text. We could have such features as a paraphrase buNon, which rephrases the last point made by the speaker in other words, or a number of back-channelling buNons which allow checking of understanding – such as ‘Sorry?’ – and invite clarifica on on the part of the speaker... However, these are topics for another ar cle as this one will surely be out-of-date if it carries on much longer!

References Bachman, L. and Palmer, A. (1996). Language Tes ng in Prac ce. Oxford: Oxford University Press. Chapelle, C. A., and Douglas, D. (2006). Assessing Language through Computer Technology. New York: Cambridge University Press. Council of Europe. (2001). Common European Framework of Reference, English version. Language Policy Division, Council of Europe, Strasbourg. Hansen, C., and Jensen, C. (1994). Evalua ng Lecture Comprehension. In J. Flowerdew (Ed.), Academic Listening (pp. 241-268). New York: Cambridge University Press. Messick, S. (1996). Validity and Washback in Language Tes ng. Language Tes ng, 13(3), 241-256. Wagner, E. (2005). Video Listening Tests: A Pilot Study. Teachers’ College, Columbia University. Retrieved on 11 November 2020 from: hNps://academiccommons.columbia.edu/doi/10.7916/D8WQ037Q.

IATEFL TEASIG

157

Best of TEASIG Vol. 2


How should teachers and testers work together? What can they learn from each other? Judith Mader, elc-European Language Competence, Frankfurt, Germany Original publica on date: IATEFL TEASIG Newsle er August 2011

The talk given during the TEASIG stream at the IATEFL Conference 2011 in Brighton described some of what follows here. The ar cle summarises some of the insights gained during the streamlining of the assessment procedure at a business school in Germany. The streamlining process was headed by a coordinator and involved the par cipa on of instructors for task-se8ng, development of criteria, and trialling of assessment tasks. During the process, the various strengths and weaknesses of the two groups (teachers and testers) and how these could be used to best advantage became clear. This resulted in a clearer defini on of what the various groups involved in the process require: the ins tu on, the teachers, the test-seNers, and those ac ng as examiners. The streamlining project was based on exis ng procedures which had been developed by highly experienced teachers with, however, liNle formal knowledge or experience of tes ng theory and test development. The project coordinator not only had wide experience of teaching, but had also worked as a test developer for several years. What became clear during the project was that the interests of the two groups (teachers and testers) are oPen contradictory, and that a collision of these interests may lead to deficits in assessment procedures. Both groups must learn to set aside their par cular ‘hat’ and put on, if only temporarily, the other ‘hat’ for the dura on of the development. The interests of the ins tu on were easier to define and ul mately easier to cater for. The overwhelming interest of the ins tu on was in having clear definable standards to be adhered to in teaching and assessment. These range from course descrip ons to standardised assessment procedures, including test format, criteria and invigila on instruc ons. The content and details of these were leP up to the language experts involved. The standards should be easy to understand for all involved in the process, non-idiosyncra c and interna onally relevant. Interna onally, there has come to be increasing demand for qualifica ons based on CEFR levels. Unsurprisingly for a European ins tu on, the business school had made the decision to base its language courses on the CEFR and implement the use of the CEFR throughout all courses, in par cular English courses, as part of the process of standardisa on. It was therefore necessary that teachers first be informed of what the CEFR is and what statements about levels can mean. For English, the admission level required was set at B2 and the top exit level at C1. The admission procedure was redesigned to cater for this. The next step was the familiarisa on of teachers with the CEFR and the discussion of some misconcep ons. The first misconcep on to be dealt with was that concerning the nature of the CEFR itself. Although some teachers were aware of the existence of the CEFR and used the names of the levels fairly freely, they were oPen not aware of several aspects of the CEFR. These ranged from ignorance of the existence of an actual document with descrip ons of the levels to the origins of the CEFR (“The CEFR comes form Brussels” is a fairly common misconcep on). Several teachers thought that they could define levels themselves, as they had done in the past with the terms intermediate, advanced, etc. Other misconcep ons were: “C2 is only used to describe na ve-speakers.” “No one at B1 should make gramma cal mistakes.”

IATEFL TEASIG

158

Best of TEASIG Vol. 2


“No-one really knows what the levels mean.” “The CEFR is theore cal and teachers can’t understand it.” “The descriptors are mostly linguis c.” Although not directly connected with the actual assessment procedure, clearing up these misconcep ons was an important part of the project, as only with a clear defini on of the issues involved and the relevant terms to be used could a set of assessment procedures be designed. Although the need for standardisa on, most importantly of assessment, but also of teaching materials and methods, had been established for all courses, many teachers felt that this standardisa on was not necessarily in their interests, and might restrict their own crea vity and individuality, and put it at risk. Test-se8ng is not usually one of teachers’ main interests, although they are oPen expected or obliged to do it as part of their work. In this case, the quality of the teaching and the standard of the material and methods used in all English courses are extremely high. Requirements for employment for teachers are fairly strict and this is reflected in the teaching. Students were, and con nue to be, extremely sa sfied with their language courses. Although teachers and testers represent separate sec ons of the same profession, rather like doctors and surgeons, there is no reason why it should always be the same individuals doing both jobs. In several cases, teachers were happy to surrender the work of test-se8ng to colleagues who enjoy doing it and may be beNer at it. In other cases, teachers see the necessity for learning more about tes ng before embarking on it. Most of those involved with language tes ng have been language teachers at one me or another and it is likely that this will remain the case. It seems unlikely that there is, or will be, a group of test-seNers who have no prac cal experience of the teaching process. In fact, teaching experience and knowledge and experience of learning processes appear essen al for the development of assessment procedures and the se8ng of tests. However, paradoxically, it is this very experience which may also prove a hindrance to teachers when faced with the task of se8ng tests for their own learners at their own ins tu ons. Although individuality of teaching style was not part of the discussion and not to be restricted by the standardisa on procedure, the overlap and conflict between teaching and tes ng became increasingly clear during discussions. Up to the start of the project, assessment had been carried out using semi-standardised tasks and criteria, but leaving teachers with a great deal of freedom to interpret these as they felt suitable. This led, for example, to teachers allowing students to give assessed presenta ons of different lengths, to various wri ng tasks in exams (ranging from essays to leNers of applica on), and the interpreta on of marking criteria (such as language accuracy) in different ways. Examina on procedure was also oPen different from one teacher to another. Good teachers are those who are able to adapt to their learners’ needs, be flexible in their choice of classroom tasks, possibly taking topics and events of current interest into account, as well as focus on different learners’ interests, strengths and weaknesses when giving feedback. Good teachers are crea ve in their design of classroom ac vi es, and are willing and able to help learners in difficul es. Good teachers do not always s ck rigidly to instruc ons in teacher’s books but adapt these to suit their own and their learners’ par cular circumstances. If this descrip on of some of the characteris cs of a good teacher is acceptable, it is worth thinking about which of these characteris cs are necessary or even desirable in a test-seNer or examiner. It is these very quali es in teachers (adaptability, flexibility, crea vity, keeping up-to-date, willingness to help, etc.) which can lead to the following situa ons (given as examples). These inevitably raise ques ons in the mind of a test-seNer or tester. Each candidate may get a slightly different task in an oral exam, depending on his/her interests. The examiner some mes helps if the candidate has difficul es. However, if test tasks are adapted to suit each learner, how can it be ensured that a valid and objec ve assessment decision is reached? How far should learners’ strengths be taken

IATEFL TEASIG

159

Best of TEASIG Vol. 2


into account and allowed to compensate for their weaknesses? Will doing this provide an assessment which will be useful to employers or other ins tu ons, and reflect a level of the CEFR? How crea ve should a test-seNer be and how crea ve is an examiner allowed to be when giving a test, for instance, of speaking? If candidates are given different amounts of help during an exam, won’t this affect the reliability of the results? Very topical texts are used in wriNen exams, which oPen means that an exam can only be set at the last minute. However, how can a good test be set by one person in a hurry? Surely someone else should read it, try it, and edit it? Shouldn’t it be piloted? How topical does the content of a test need to be? Teachers’ reac ons to these ques ons may range from unawareness of the issues involved to rejec on of these issues and to the feeling that their own and their learners’ teaching and learning needs are not being taken into account. A test-seNer’s approach may be quite different. Assuming that specifica ons exist for the tests involved, the testseNer must be able to work with these (perhaps crea vely) without compromising the standards set in them. This is quite different to the approach used by a teacher when planning a lesson. The stakeholders (candidates, ins tu ons and other users of the test) must be able to rely on the test or exam conforming to the specifica ons and guidelines. Test-seNers must be able to find the answers in the specifica ons to any ques ons they may have while construc ng items. They will not expect that they have to think of these answers themselves, as teachers might. Similarly, examiners will require guidelines for test procedure and marking criteria which are easy to follow and lead to consistent scoring. The different answers given by an ins tu on, a test-seNer and a teacher to the following ques ons show how essen al it is to discuss all the relevant issues thoroughly before implemen ng an assessment programme. The differences in the answers given reflect the various interests at stake. These are by no means irreconcilable in terms of prac cal work and can be used effec vely to enable all the groups involved to learn from each other and so cater for all interests. It is not a ques on of right or wrong, but rather of using the strengths of the different par es involved to reach the best solu on to address as many of the issues arising as possible when implemen ng an assessment programme. Here are just some of the ques ons (a fairly random selec on) which have been discussed by the different groups in the programme referred to here: • What should the content of a final exam be? • Can topical texts be used in an exam? • How many criteria do we want? • Should we mark our own students? • Should students be allowed to use dic onaries in exams?

The ques ons remaining for discussion are

1. Are all good teachers necessarily good testers or examiners? 2. Which useful characteris cs of good teachers may be less useful or even counter-produc ve in tests? 3. How can teachers be convinced of the need for assessment standards?

Editor’s note: In the original publica on, readers were encouraged to send their answers to these ques ons to the TEASIG Newsle er and/or the TEASIG Discussion List.

IATEFL TEASIG

160

Best of TEASIG Vol. 2


The impact of rubrics on assessment a;tudes in literature studies Carel Burghout, Fontys University of Applied Sciences, College of Educa on, Tilburg, The Netherlands Original publica on date: IATEFL TEASIG Conference Proceedings – Innsbruck 2011

Introduc on A limited survey on the impact of rubrics is ed in with a much larger project ini ated by a number of colleges of educa on in the Netherlands in order to raise awareness of contemporary assessment and to hone the skills of teacher-trainers and educators. The method used in this part of the research was to introduce rubrics in courses in which no rubrics had been used before. The introduc on of rubrics was proposed as a pilot project in both cases. The students agreed to par cipate in one of the two pilot cases and the use of a rubric was introduced as part of a request by students to change the form of assessment of the course. At the end of the course, students were asked to fill in a ques onnaire to measure the impact of the rubrics on their work and also on their percep on of how they were assessed. The response in the first group was around 65%, in the second 75%. The outcome of the two surveys is presented in the appendices and makes clear what may be (carefully) concluded and how this will lead to further work.

The context The lectorate for tes ng and assessment at Fontys College of Applied Sciences wants to improve the standards of tes ng and assessment at teacher-training colleges. We have, in the past four years, developed na onal standards for B Ed and M Ed courses. These are minimum standards, although mostly quite ambi ous. For the B Ed we are also working on na onal tests, but not for the M Ed. We want to improve the standards at our colleges by star ng a debate with teacher-trainers and students about what is good prac ce in tes ng and assessment. This includes asking teacher-trainers to consider their percep ons of tes ng and assessment, and also to reflect on their prac cal and theore cal knowledge of tes ng, assessment and evalua on. The partners involved are four colleges of educa on in the southern half of the Netherlands and an important college in Amsterdam. The work done so far has been to look for the right tool to make teacher-trainers more aware of the standards in tes ng and assessment which they are supposed to meet, and to see how more limited classroom research may be included in the overall study. For the larger project we will opt for a computer adap ve test in which teacher-trainers (and students and teachers in schools in future) may select an area of interest and test their knowledge of that area. Most importantly, the feedback on their answers and scores will lead to links under which ar cles and sites rela ng to that area of tes ng and assessment can be found.

IATEFL TEASIG

161

Best of TEASIG Vol. 2


The method I was allowed to limit my research to my own teaching prac ce, specifically the teaching of literature and cultural studies and didac cs. These tend to run into each other and blend in my courses. I suppose I am preaching to the converted when I state that tradi onal tests do not do jus ce to many aspects of teaching literature and culture. They may make clear that work is read, lectures have been digested and even, to a degree, that the student has gained some insight. But when one’s apprecia on of literature has to be translated or transformed into teaching literature at secondary level, other aspects come into play. This is the reason why, in my courses, the test is always only a part of the assessment; an essay on one of the works and a didac c assignment (materials development) complement the test. When I read Paran and Sercu’s book Tes ng the untestable in language educa on, I found that although the combining of various instruments to assess literature and cultural studies is in itself a sound approach, I was forced to ask myself whether what I regarded as ‘quality’ in essays and didac c assignments was sufficiently transparent to me, and to my students. I wondered if rubrics would help, as these were men oned in the book, and decided to introduce them in two courses I was teaching in the last quarter of the year. These courses had actually already been running for a number of weeks. Although rubrics are used in our curriculum occasionally, they were not u lised in literature courses. One such course, a first-year survey of Bri sh literature, is linked to learning how to write academically and is assessed through four essays, two short and two long. It became clear that aPer half a year, many students did not yet have a clue what was actually expected of them when it came to academic wri ng. They submiNed essays that demanded so much feedback that in some cases the teacher thought he was wri ng the essays, and the students seemed to think the ‘game’ was to elicit such feedback, work it in without thinking cri cally, and to repeat the process the next me round. It was clear that some instrument was needed to make the students reflect on their work and the standards of quality they were supposed to meet, and a rubric fiNed the bill. In this way we could make the students see the feedback as assessment for learning, i.e. feedback as leading to beNer learning, rather than as something to be worked in and then forgoNen. I introduced a rubric in this course as an aid for the students. The aNributes of this rubric, i.e. the contents of the grid that formed the analy cal rubric, were not debated with the group, just clarified. I asked the students to check their work using the rubric to see if they met requirements before handing the work in. Later, when I had handed back the essays, I asked them to use the rubric once more to see if, through using the rubric, they could gain beNer understanding of what the feedback was based on (i.e. the aNributes in the rubric). In both cases the rubric proved to be a good support for the students’ wri ng remarkably quickly. Response of the first group Two thirds of the group responded to the ques onnaire (see Appendix A). There were no scores below the neutral or middle score of ‘3’. Interes ngly, planning the essay scored a bit higher than understanding the teacher’s expecta ons. The best score supports the idea of the use of rubrics in assessment for learning, as is shown in the response to Ques on 4: How (if at all) did the rubric help you to reflect on your work? I realise that any conclusions drawn from such a limited survey are only valid in my own backyard, so to speak, but s ll the response in the case of the course on Academic Wri ng is promising in that the rubric was accepted, used and proved to be of help to the students. Introducing the rubric at the beginning of the course on academic wri ng in the coming academic year may lead to more posi ve results, certainly when the feedback given on the essays is more strongly related to the rubric from the very start. In this pilot, both the students and I had to get used to the instrument. I was pleasantly struck by the degree to which the students were willing to use rubrics in their own assessment of similar work in schools.

IATEFL TEASIG

162

Best of TEASIG Vol. 2


The second case The next case started at the same point in me in a second-year course, but for a different reason. Students in this course were supposed to create a digital learning object on one of the books we discussed in class. A digital learning object combines elements of presenta ons such as PowerPoint, Prezi or a WebQuest, and approaches a literary topic using audio and visual media. The students thought this approach was repe ve as they had made such objects before. They wanted a different form of assessment, albeit s ll in the form of materials design for classroom use in secondary schools. I suggested worksheets to assess reading as the new form. I presented the students with a number of aNributes that good worksheets ought to have, such as an invi ng, aNrac ve design, a clear focus and the use of scaffolding, the help given in the design to allow the learner to reach the next stage in his learning. A number of students in this class had worked with me on worksheets for teen novels in one of their B Ed courses. There we had included peer feedback in class, so I suggested we would include peer feedback in this course as well, and then move on to peer feedback online in a digital learning environment so students could add comments from home as well. From defining aNributes for a good worksheet to fi8ng them into a rubric is only a small step. I told my students about my reading-up on rubrics and suggested we create a rubric together. Pu8ng a rubric together would force the students to think about the standards they needed to meet in materials design and about standards in general. I dedicated a good part of a lesson to explaining what rubrics are and do. The next step was to make the students think about what this par cular rubric should contain and how we were going to organize the feedback. I presented them with 7 steps in crea ng a rubric, taken from my reading (Arter & Chappius, 2007). The steps can be found in Wikipedia, at hNp://en.wikipedia.org/wiki/Rubric_(academic) #Steps_to_create_a_rubric. Finally, I gave the students the set-up for a rubric, in which I had already filled in the aNributes for worksheets in the leP-hand column. I told them nothing was holy, that they could comment on the set-up, even add or delete, but in any case they should come up with two more columns, the insufficient and the excellent. The result of the students’ work over the next week led to a discussion on one of the aNributes, the boNom one about worksheets needing to be transferable or not. Many students felt that this was op onal, and so not a requirement in the rubric. Instead they came up with the very good aNribute ‘easy to rate’ to replace it. More classroom discussion led to the adop on of the finalised rubric for the course (see Appendix C). Responses of the second group I had expected the response to this rubric and its use to be very posi ve, as the students had been involved. The work on the construc on of the rubric showed me that many students took the work very seriously. But the scores from the ques onnaire (see Appendix B) showed that the rubric had had less posi ve impact than the one used for academic wri ng. Some students felt that working with the rubric had usurped precious me which could have been spent discussing novels. Although I think these students may have forgoNen the course was supposed to have a didac c focus as well, I can see where their comments came from as, again, we had started working on and with a rubric when the course was already well underway. It is clear that the rubric itself must be reconsidered. Some students did not think the entries in the rubric were transparent enough and that it did not work well as an analy c rubric. The rubric actually worked against the students in some cases, as they felt all the requirements brought forward in the grid were daun ng. I need to think (and maybe discuss with this group) about either fine-tuning the requirements or else trying out a holis c rubric for such work.

IATEFL TEASIG

163

Best of TEASIG Vol. 2


There are a number of posi ve remarks about the rubric for worksheets. In every group there are the forerunners. It appears that these students, who were most taken by the idea and uses of a rubric, were inclined to use the rubric as it was meant to be used (and therefore to think in a structured way about the feedback they gave). Again, taking me right at the beginning of the course to make clear what the added value is, and firmly establishing rules about how peer feedback also counts as part of the students’ assessment (something I could not and cannot enforce aPer a course has already started), may help to obtain beNer results with the use of a rubric the next me this course or a similar course is run. What next? The course on Academic Wri ng started again in September 2011 and its rubric was introduced at the beginning of the course. The first wriNen work is scheduled to be handed in in November, so at the me of wri ng I cannot tell how the new group of students use the rubric and how they perceive its benefits. The course on Modern Novels is scheduled for next year. In the mean me, I aim to introduce the idea of making a rubric in a course that has a similar set-up, a course on Modern Poetry. To avoid the piFalls men oned above, I will make using the rubric and giving peer feedback more aNrac ve by allowing these to form a larger part of the overall assessment of the course. I will also try out having a holis c rubric and an analy cal rubric side-by-side so students can choose which one they prefer to use. At the same me, I am currently (November 2011) broadening my research by doing a survey of the use of rubrics in courses on all subjects taught at our College of Educa on. The data will hopefully help to put my own efforts into perspec ve and yield insight into what impact the use of rubrics usually has in this se8ng.

References Arter, J. and Chappius, J. (2007). Crea ng and Recognizing Quality Rubrics. Upper Saddle River, N.J.: Pren ce Hall. Paran, A. and Sercu, L. (Eds.) (2010). Tes ng the Untestable in Language Educa on. Bristol: Mulitlingual MaNers.

IATEFL TEASIG

164

Best of TEASIG Vol. 2


Appendix A QUESTIONNAIRE ON THE RUBRIC FOR ACADEMIC WRITING (with responses) The ques onnaire runs from score 1 (not at all) to score 5 (very much). Out of a group of 16 students wri ng essays at that point in the course, 11 responded.

1. How (if at all) did the rubric add to your understanding of your teacher’s expecta ons? 2 x neutral, 6 x score 4, 3 x score 5 2. How (if at all) did the rubric help you plan how to approach your essays? 2 x neutral, 5 x score 4, 4 x score 5 3. How (if at all) did you use the rubric in the process of comple ng your essays? 2 x neutral, 5 x score 4, 4 x score 5 4. How (if at all) did the rubric help you to reflect on your work? 6 x score 4, 5 x score 5 5. To what extent (if at all) do you think the rubric will influence your marks? 3 x neutral, 3 x score 4, 5 x score 5 6. Did using the rubric make a difference to the quality of your work? 3 x neutral, 6 x score 4, 2 x score 5 7. Did the rubric help you understand the feedback on your essays? 4 x neutral, 4 x score 4, 3 x score 5 8. Are there any elements which should be added to the rubric? • Possibly an ‘overall impression’ somewhere • More room for crea vity in essay wri ng 9. Do you think you will use rubrics in your classroom when you teach? Why or why not? • Yes, possibly. • Already do so. • Yes, but in a simplified form. • Yes, as it is a good guideline. • Yes, but only as an aid, not to base marks on. • Yes, it is very helpful; yes, it is helpful if you use it to be specific about what you expect. 10. Anything you would like to add: • Why didn’t you introduce it at the beginning of the course? • I used earlier feedback to make my own framework and used the rubric as a final check before submi8ng my work. • It was an essen al tool in this course. I wrote and compared my work with the rubric and then finetuned it.

IATEFL TEASIG

165

Best of TEASIG Vol. 2


Appendix B QUESTIONNAIRE ON THE RUBRIC FOR WORKSHEETS MODERN NOVEL (with responses) Out of 18 students working on worksheets, 14 responded. Some respondents did not answer ques ons about feedback because, at the me they were asked to fill in the ques onnaire, they had not yet put up worksheets, although in some cases they had been designing them. These students opted to work for their exams first and do the materials development later. Time pressures and a sense of having too much to do may have influenced the responses. A number of students commented in the course evalua on that they thought working on the rubric had

IATEFL TEASIG

166

Best of TEASIG Vol. 2


IATEFL TEASIG

167

Best of TEASIG Vol. 2


Appendix C FINAL RUBRIC TO ASSESS WORKSHEETS

IATEFL TEASIG

168

Best of TEASIG Vol. 2


Teacher and student percep ons on oral exam standardiza on İdil Güneş Ertugan and Pınar Gündüz, Sabancı University, Turkey Original publica on date: IATEFL TEASIG Conference Proceedings – Innsbruck 2011

Introduc on Tes ng of academic reading, listening and wri ng are given a lot of importance and there is comprehensive research on the valida on of these tests to increase standards. However, oral assessment and ensuring standards have not aNracted as much aNen on, and oral skills are not tested at many ins tu ons. In our ins tu on, which is an English-medium university, to cope with the demands of baccalaureate study, students need to have good communica on skills to aNain academic competencies such as for seminar discussions or debates. Thus, oral assessment has an important place. This paper briefly outlines the stages of the oral assessment procedures at an English-medium university prepara on school, and describes a research study carried out to find out how students and teachers perceive the oral exam standards. The research consisted of surveys given to teachers (n=25) and students (n=35), focus group discussions with students, and an ac on research with a sample oral assessment video shown to students and teachers. The results will be discussed under three main headings, namely standardizing administra on, grading and feedback. For each, we will discuss the methods and steps we take to ensure the highest possible standards as well as give teacher and student reac ons and comments.

The task and procedures In each course, students have two assessments, the first being a pair discussion and the second one a group discussion. In both cases, students are provided with a general topic coming from their coursebook, with 4-5 guiding ques ons to help them organize and plan their discussion. APer the predetermined prepara on me, students start their discussion.

Standardizing administra on In order to standardize administra on, all oral assessment topics reflect the content of the coursebook. The tasks are carefully chosen from the main themes so that all students are equally familiar with the topics. In this way, we try to ensure that all students have enough informa on to discuss the topic. The tasks also reflect the contentbased nature of the instruc on at the ins tu on. All themes in the coursebooks are intended to prepare students for their general requirement courses in the first two years in their facul es in terms of vocabulary and general background knowledge. As Luoma states, "Authen c assessments require the presenta on of worthwhile and/or meaningful tasks that are designed to be representa ve of performance in the field … and approximate something the person would actually be required to do in a given se8ng" (1997: 25). Seminar-style discussions frequently take place in facul es where students are expected to voice their opinions and discuss their ideas on the course topic. Therefore, the task mimics what students will be expected to do in their future studies. The survey results indicated that a majority of the students prepared for the oral assessment by revising the content covered in their books as well as their vocabulary lists for the units. This is in line with the expecta ons of the ins tu on, as we are aiming to prepare

IATEFL TEASIG

169

Best of TEASIG Vol. 2


students for their faculty courses in terms of content and vocabulary. All tasks are reviewed and edited by teachers to make sure each topic is covered equally in all classes. During the assessment, students are not allowed to interact with the assessor. The assessor is merely responsible for welcoming the students and se8ng the task. This is mainly in order to ensure a standard approach in every exam room. As Hughes states, “… even where quite specific tasks are set, the ‘interlocutor’ can have a considerable influence on the content of an oral test” (2003: 117). Thus, even when the interlocutors have strict scripted instruc ons, the decisions interlocutors make may s ll interfere with the performance. What’s more, it makes it difficult for the assessors to focus on students’ performance and ask ques ons at the same me. From the students’ point of view, “An advantage of having candidates interac ng with each other is that it should elicit language that is appropriate to exchanges between equals. It may also elicit beNer performance, inasmuch as the candidates may feel more confident than when dealing with a dominant, seemingly omniscient interviewer” (Hughes, 2003: 121). Student-to-student interac on has its problems as well. For instance, there is the risk that one of the students may dominate the discussion. What’s more, as Weir states (2005: 153), “… if there is a large difference in proficiency between the two, this may influence performance and the judgements made on it”. For this reason, the class teacher matches up the students in their classes instead of pairing/grouping students randomly. While doing this, teachers consider the personali es of the students and their level of English in order to make sure that student performances are not affected by these factors. However, the focus group discussions and the surveys revealed that 4 students preferred to talk to an interlocutor instead of having a pair discussion. 3 teachers also supported the idea of using interlocutors. Although the literature supports the idea that student-to-student interac on is more advantageous than students talking to an interlocutor, based on our finding we piloted the use of an interlocutor in an unassessed speaking task that mimicked actual exam condi ons. However, as the literature suggests, students tended to start talking to the interlocutor and asking ques ons once the interlocutor stepped in, which meant that the nature of the task changed from a seminar-style discussion to an interview. Moreover, teachers also stated that it was quite difficult for them to take notes on performance as well as select and ask ques ons. A week before the assessment, the assessors send out the procedures for teachers and the student informa on sheet. The procedures for teachers give informa on about the me and the place the assessment will take place in, how to use the criteria, and detailed step-by-step instruc ons on how to carry out the assessment to ensure uniform condi ons of administra on. Moreover, possible problems and guidelines on how to deal with them (e.g. a student coming late) are also explained in detail. The student informa on sheet includes informa on on ming, which units in the book they might be ge8ng ques ons from, and the criteria. These help students to be familiar with the assessment method and the criteria. The procedures for the teachers enable teachers to familiarize themselves with the process in advance. It also ensures that all assessors follow the same steps and act in the same way when a problem occurs. As Weir proposes (2005: 199), “with an adequate marking scheme and sufficient standardiza on of examiners, …. a high standard of interrater and intra-rater reliability should be possible”.

Standardizing grading To standardize grading and to be fair to all students, it is important that raters are trained before each assessment. Therefore, prior to the test, teachers have a standardiza on mee ng that all raters in that level have to aNend. Weir defines standardiza on sessions as “… the process of ensuring that markers adhere to an agreed procedure and apply ra ng scales in an appropriate way” (2005: 198). During the standardiza on mee ng, the marking criteria and scheme are discussed to ensure raters fully

IATEFL TEASIG

170

Best of TEASIG Vol. 2


understand the procedures and that they are familiar with the criteria. Next, raters watch at least two sample videos of students performing the same task on a similar topic. The raters are required to take notes using the standard scoring sheets while watching the video. APer watching, raters give an individual grade using the global and/or the detailed criteria. Following this, teachers are formed into groups of 3 to 4 where they discuss and jus fy their grades, and come to an agreement. Whole group discussion follows. During this final discussion, the level assessor acts as a facilitator, drawing aNen on to points not men oned and asking ques ons to guide the discussion. Each student in the samples is discussed in detail even if the whole group seems to agree on the grade. Strengths and problems in students’ performance are discussed and, referring to criteria, the grade is jus fied. If there seems to be a disagreement among groups, teachers may refer to the tapescript provided by the assessor and/or re-watch specific sec ons of the sample. In the teacher survey, we asked teachers’ opinions on their percep ons of an effec ve standardiza on session (see Table 1). Table 1: Teachers’ percep ons of effec ve standardiza on sessions. Helpful

Not very helpful 8

Having the tapescript

17

Discussing the criteria

22

3

Using global and detailed criteria

20

5

Discussing grades in groups

24

1

Agreeing on a grade as a whole group

24

1

Having quality/audible sample videos

25

0

Watching samples from different bands

18

7

Students were asked to comment on to what extent they were aware of how teachers prepare for the grading of student performance. The survey revealed that very few students were aware that teachers needed to aNend standardiza on sessions. As an ins tu on, we value making our procedures, criteria, and other informa on transparent to students. Therefore, the results show that teachers need to beNer inform students about the procedures as this would help students understand that their grades are not dependent on the personal judgment of the assessor. Having clearly defined and user-friendly criteria that include all areas that are intended to be assessed is crucial to increasing reliability and standards. In our ins tu on, we use two types of criteria, global and detailed. The criteria are based on the interac on strategies component of the CEFR. During assessment, teachers are required to refer firstly to the global criteria mainly due to me constraints. If assessors cannot agree on a grade, they refer to the detailed criteria. The teacher survey indicated that teachers found it useful to have two types of criteria and stated that the global criteria help save a lot of me. They were also happy with the fact that the criteria reflected CEFR expecta ons for the levels, as it meant that the criteria reflected course objec ves and content. Although both criteria are available to students on the online learning support soPware, 20 students stated that they were not aware of the different components of the criteria. In other words, they were not aware of what exactly they were graded on. Some students did not know that the teachers used criteria to grade students. The focus group discussion also had similar results. This shows that more me should be spent in class going over the criteria and the scoring grids as a

IATEFL TEASIG

171

Best of TEASIG Vol. 2


teaching tool. When two classes were shown a video sample and asked to grade the samples using the criteria, students stated that they were beNer informed about what they were required to do, and that they felt more confident about the exam. To increase reliability, all students are assessed by two different raters. During assessment, raters need to carry out mul ple tasks. For instance, they welcome the students, set the task, keep track of the prepara on me and the actual speaking me, and take notes on student performance. Inevitably, the task requires the rater to exercise his or her own personal judgment using the criteria in order to give a grade. Thus, it is important that students are graded independently by two raters. This not only ensures being more objec ve in scoring, but also sharing the du es and avoiding any mistakes. As a result, both raters and students perceive the scoring to be more standardized and fairer. This was supported in our ques onnaires: 24 teachers stated that having two assessors helped them increase standardiza on and fairness, and 32 students stated that they would like to be assessed by two raters. While pairing up the raters, the level assessors try to take some factors into considera on. Pairing male with female, na ve with non-na ve, and experienced with inexperienced raters are considered as much as possible to avoid any external factors that may have an effect on student performance and score. More importantly, to avoid bias, class teachers cannot assess their own students. 23 teachers in the ques onnaire agreed that they could grade students they did not know more fairly (see Appendix 1). However, 30 students stated that they would like their own teachers to assess them. From an affec ve point of view, students might perform slightly beNer with a teacher they know but, considering that a substan al majority of the teachers feel they would not be as fair with students they know, we believe that, to ensure high standards and fairness, teachers not assessing their own students is necessary. Addi onally, students will need to aNend interviews or talk to people they are not familiar with in their facul es and in their non-university lives, which makes it more authen c. Scoring sheets, or ra ng forms, are standard documents that each rater needs to use to record notes about student performance. The scoring sheets mainly guide teachers to take notes on range, accuracy, and other components of the criteria. The informa on teachers write on the scoring sheet helps them give a grade aPer the discussion is over. They u lize this informa on to give examples of student language use and other components to jus fy their grades while discussing. The informa on is then summarized on the student feedback sheet so that the exam will have a posi ve washback effect. The teacher ques onnaire also showed that 21 teachers found scoring sheets very useful (see Appendix 1). The literature also supports the use of scoring sheets, Luoma (2004) highligh ng benefits of rater forms as they: • • • • •

help structure the process, speed the scoring process up, make the scoring consistent, ensure that the intended features of talk are assessed, “help the rater focus on the performance in hand and compare it against the criteria rather than using other examinees’ performances as comparison points” (Luoma, 2004: 172).

Standardizing feedback As men oned before, raters make notes on the feedback sheets based on their notes on the scoring sheets. Wiggins (in Luoma, 2004: 174) states that “… feedback that supports learning describes the performance in concrete terms, rela ng it to the task instruc ons and descrip ons of expecta ons about good or acceptable task performance”. The feedback sheets provide students with informa on on which areas they did well on, and areas they can further improve. This process turns the assessment into a useful teaching and learning tool.

IATEFL TEASIG

172

Best of TEASIG Vol. 2


During the focus group discussion and also in the student ques onnaire, the majority of students (n=21) stated that they found the feedback sheets useful, and they really benefited from the detailed feedback. Students commented that especially examples of their mistakes and teacher sugges ons helped them understand what they needed to improve. It is very important to have high standards in all exams, and it is probably the most challenging in speaking where the performance is usually not recorded. In the survey, 7 teachers thought that it would be a good idea to record student performance to ensure higher standards. 12 students also expressed a preference for their performance to be recorded, especially as they could then ask for their performance to be re-evaluated if they believed they were not graded fairly. As an ins tu on striving to ensure highest possible standards and be fair to all students, the ins tu on is currently looking into how this could be realized. As this is a maNer of financial resources, it might be put into prac ce whenever funds are available. When necessary, this would give us the opportunity to check disparate ra ngs more effec vely.

References Hughes, A. (2005). Tes ng for Language Teachers. Cambridge: CUP. Luoma, S. (2004). Assessing Speaking. Cambridge: CUP. Weir, C. (2005). Language Tes ng and Valida on. New York: Palgrave Macmillan.

Appendix 1: Quota ons from teachers and students

On assessing students from different classes Teacher comments: • I prefer grading students I don’t know so I can’t be biased. I may also be affected by what I know about their classroom performance and confuse this one with their one-off performance. • As we are human beings and eager to be emo onal, it is hard not to be subjec ve towards someone we know. Student comments: • I feel uncomfortable speaking in front of a different teacher. • With my own teacher, I’d be less stressed.

On having two assessors Teacher comments: • Definitely. Although there is criteria, it is usual for examiners to rate the same candidate quite differently and award substan ally different score. It, in a way, provides balance. • Yes, most of the me. It is more reliable than one person assessing the students. And a huge responsibility, too! • It is be er than one and administra vely more prac cal than three or more.

IATEFL TEASIG

173

Best of TEASIG Vol. 2


Student comments: • I think they take the average grade. It is be er because the second teacher can make the grade higher. • It is more fair, I think. It is good that they check each others’ grade. On scoring sheets Teacher comments: • It helps me be more analy cal and pay a en on to all components. Otherwise, I think I would tend to judge according to use of language only. • It reflects the complexity of speaking and reflects the importance we give to different features of speaking such as interac on. On recording performances Teacher comments: • Is it possible to record performances. This way, we can jus fy the grades. • St discussions can be audio-recorded. • I would like us to record the oral assessments. I have had students approach me aPer the oral exam and want to know why I gave them the mark I did because they and their friends thought they should have got much higher. unless there are detailed notes on what they said, it is oPen hard to 'jus fy' the mark. Student comments: I think camera or mobile phone or anything else have to use to record our conversa on because I want to prove that I spoke well if I have bad grades. On whether students’ oral performance in class matches their exam grades Teacher comments: • Some mes some of them get be er grades than we expect and this is usually the case, not the opposite. • Other than a couple of occasions where I thought the students could have performed be er, I think their performances in class do match with their OA grades. • There are usually one or two who you expect to do quite well, who bomb… but they oTen do be er than expected. Some students perform badly compared to their speaking in class because it's an exam – not because of the system or procedures, it's unavoidable if you test speaking.

On the effec veness and clarity of the feedback on feedback sheets Student comments: •

Some mes I don’t understand why I got a 7 although everything looks ‘good’. Teachers should always write clear comments and examples.

Some teachers write more detailed feedback than others. Some teachers give examples of my mistakes. This helps me understand what I need to improve be er.

On which set of criteria is most frequently preferred Teacher comments: • Global unless I am unsure.

IATEFL TEASIG

174

Best of TEASIG Vol. 2


Global easier to refer to.

• Global usually, I tend to use the detailed criteria when there is a wide proficiency difference between the 4 areas assessed.

First global, then detailed if I cannot decide.

I use the short one during the exam and only to look at the detailed one when necessary.

Global but when I have ques ons in my mind. When it is difficult to decide I refer to the detailed criteria.

Global. When I have a lot of ques on marks about the performance revealed, I need to find proof in the detailed criteria to see where that piece fits, and then I try to decide accordingly to be fair.

• Global. ATer listening carefully for 10 minutes, it is usually possible to match what you have heard to the set of 4 -6 sentences which best represent it. Diving into the detailed criteria risks losing the overall view, which is what we are asked for (a single mark).

IATEFL TEASIG

175

Best of TEASIG Vol. 2


IATEFL is the Interna onal Associa on of Teachers of English as a Foreign Language, which has the aim of linking, developing and suppor ng English language teaching professionals worldwide. IATEFL is a charity organiza on, founded in 1967 by the late Dr Bill Lee. IATEFL now has over 4,000 members in 100 countries and associate members from more than 120 countries. There are many different types of IATEFL membership to suit all ELT professionals. Find out why you should you join IATEFL and how you can benefit at hNps://members.iatefl.org/ IATEFL has 16 SIGs (Special Interest Groups), including TEASIG. Find out more at hNps://tea.iatefl.org/

IATEFL TEASIG (Tes ng, Evalua on and Assessment Special Interest Group) TEASIG was established in 1986 and has played an ac ve role in fostering assessment literacy in the areas of tes ng, evalua on and assessment. TEASIG’s mission is to share, spread, support and enhance knowledge of assessment and assessment literacy among ELT professionals interna onally through webinars, conferences, sessions, discussions, publica ons and networks for its members and others.

TEASIG organises events around the world • organises a pre-conference event and runs a TEASIG Showcase at the IATEFL annual conference • organises regular webinars • produces newsleNers and publica ons on TEA-related issues The name, Tes ng, Evalua on and Assessment, reflects the wide area of interests represented. TEASIG’s goal is to reflect these various interests, whether they concern classroom assessment and tests, external standardized examina ons, large-scale tes ng, evalua on of individuals, courses, teaching, programmes or ins tu ons.

TEASIG members • • • • • • •

are part of a large network of like-minded colleagues pay reduced fees at all IATEFL and TEASIG conferences have access to recordings of TEASIG webinars enjoy the opportunity to meet face-to-face with experts in the field pay reduced fees at other conferences around the world through IATEFL associate members receive all TEASIG publica ons obtain regular updates on TEASIG and TEA-related events.

Join TEASIG Social Media – Facebook hNps://www.facebook.com/Teasig, TwiNer hNps:// twiNer.com/iatefl_teasig and LinkedIn hNps://www.linkedin.com/groups/8369406/ – to par cipate in discussions about TEA, meet fellow testers and TEA professionals, and share ideas.

IATEFL TEASIG

176

Best of TEASIG Vol. 2

Profile for iatefl

IATEFL TEASIG 'Best of...' Volume 2  

IATEFL TEASIG 'Best of...' Volume 2  

Profile for iatefl