A TENTATIVE STUDY OF RELIABILITY ANDVALIDITY CHARACTERISTICSOF THE SELF2TAUGHT ENGLISHORAL EXAMINATION
Abs t ract
It is well over a decade since the English Self2taught Examinations ( ES TE) were f irst administered acrossChina. However , no report has been thus far p ublished to show adequate attention has been paid to describingand validating the test in its own right . Theref ore , the writer intends to take a close look at one of the courseexaminations required of the self2taught students in accomplishing a degree /shlunwen/shanghaiyingyulunwen/program —Oral Test (O T) . Thispaper contains two major parts , with part one dealing with the limitations of the test examined in the light ofcurrent t rends in testing theories and part two f ocusing on some suggestions with a view to enhancing thereliability and validity of the test .Key wor dsSelf2taught Examination ; oral test ; validity ; reliabilityI . Int roductionIt is well over a decade since t he Englis h Self2t aught Examinations ( ESTE) were f i rst administ eredacross China t o people wishing t o be certif icat ed t hrough self2t aught lea r ning met hod. As a supplement t of ormal t ertiary education grant ed t o high school graduat es on t he basis of t he College Ent ra nceExaminations , ESTE have served t he p urp ose of p roviding alt ernative ways of assessment t o Englishla nguage lea r ners t o whom college2based cur riculum is not available . Wit h t he social and economicdevelopment , Englis h is becoming an overwhelmingly imp ort ant requisit e f or p rof ession and t heref ore ,ESTE have gained p op ularity among more and more people f rom a variety of backgrounds : high schoolgraduat es who get f lunked in College Ent rance Examinations , experienced wor king st af f who seekadva ncement , and even in2college students who are non2English majors a nd wa nt t o get bet t er p repa redf or f uture wor k. In a sense t he examinations p rovide an incentive f or language lear ners t o imp rove t hei rla nguage p rof iciency in order to f urt her t hei r personal a nd p rof essional development .The examinations have undergone several modif ications in aspects of t est design , requi rement ,f ormat a nd administ ration. However , no rep ort has been t hus f a r p ublis hed to s how at t ention has beenpaid t o describing a nd validating t he t est in its own right . As a t eacher of Englis h f or quit e some time a ndalso int erest ed in t esting design t heories and p edagogical application of t hese t heories , t he writ er of t hepaper int ends t o t a ke a close look at one of t he course examinations requi red of t he self2t aught students inaccomplishing a degree p rogram — Oral Test , wit h a view t o making some usef ul suggestions f or itsvalidation.Some of its limit ations will be examined in t he light of cur rent t rends in t esting t heories a nd f rom t heexperiences t he writ er has acqui red in her p articipating in administering t he Oral Test as a n examiner overa couple of years .12Owing t o unavailability of f i rst2ha nd st atistical dat a drawn f rom studentsp actual perf orma nce , t hepaper sets out by means of qualit ative measurement only , which , t heref ore , may likely involve subjectivejudgement and estimat e .II . Limit ations of t he Oral Test (OT)1 . Description of t he OTIt has t o be made clea r t hat t he Oral Test under discussion here ref ers t o what is adopt ed in Shanghaias various oral t ests a re implement ed in dif f erent a reas on p rovincial basis across t he whole count ry. TheOT is at p resent t aken by t housa nds of people in Ap ril every yea r . All t he t est p op ulation is group edaccording t o t hei r t est code numbers into two batches wit h one batch of p eople ta king t he OT in t hemor ning a nd t he ot her in t he af t er noon . Usually f our sets of t est considered p arallel f orms are used ateach administ ration : two f or morning session and two f or af t er noon session. The two sets f or each sessionalt er nat e wit h each ot her wit h one student ta king one only.The O T in p resent use is ca r ried out by means of p rint ed a nd taped stimuli . At a t esting set ting , anexaminer merely ha nds t he ca ndidat e a piece of paper wit h reading t ask , a nd t hen t he candidat e is givena period of 4 minut es or so t o p repa re . When t he time is up , t he ca ndidat e is requi red t o sit f ace2t o2f acewit h two examiners wit h a t ape2recorder on a desk in2between . The minut e t he candidat e begins his/ heroral t asks , t he next candidat e is let in a nd st a rts p repa ration in t he mea n time sit ting in t he sameclass room. All t he inst ructions f or oral t asks are t aped including timing a nd t he ca ndidat e is expect ed t of ollow t he p rocedures recorded in t he t ape . He/ s he is scored on t he sp ot by one of t he two rat ers a nd t hescores a re yielded f or grammar , p ronunciation , f luency a nd overall comp re hensibility. There is noint eraction between t he examiners a nd t he t est t aker . The whole p rocedure goes f or about f our minut esand t hen t he second candidat e t akes his/ her place wit h a second pa rallel set t o go t hrough in t he samemanner . This time , t he ot her rat er t a kes t he resp onsibility of ma r king t he successor .The O T in cur rent use cont ains t hree parts as s hown in t he f ollowing t able :I TEM WEIGHTING TIME L EN GTHPart I Reading aloud 30 % about 40”Part I I Answering questions 40 % about 2 min.Part I I I Free t alk 30 % about 1 min. The maximum score is 100 mar ks wit h t he cut2off p oint set at 60 and t he overall score comes f romadding t hree separat e holistic scores of t he t hree p arts . The p revious records show t hat t he passing rat eva ry f rom yea r t o year wit h less t han 30 % on average , which indicat es t he Oral Test is t he most diff icultof all t he course examinations .There a re basically two types of t ests : norm2ref erenced t est and crit erion2ref erenced t est . Crit erion2ref erenced t ests may broadly ref er t o t ests which are designed t o measure well2def ined objectives a ndp roduce scores t hat a re int erp ret ed in absolut e t est —usu. as t he percent of t he objectives lea rned. Sucht ests a re typically associat ed wit h crit erional levels of perf orma nce as pass2f ail cut p oints . Obviously t heOT f alls under t his cat egory.2 . Test syllabusDeriving f rom a t estps specif ication , a t est syllabus is considered cent ral a nd crucial p art of t he t estconst ruction a nd evaluation p rocess . It is dif f erent f rom t est specif ication in t hat a t est specification isint ended f or int er nal p urp oses only a nd f ollowed by t est designers while a t est syllabus is a p ublicsimplified document describing what t he t est cont ains and di rect ed more t o t eachers a nd students whowis h t o p repa re f or t he t est .The only document concer ning oral t est syllabus and applicable t o all t he oral t ests in dif f erent a reaswas p ublis hed in 1998 by Foreign L anguage Teaching and Research Press ( see App endix 1 ) . The13CEL EA J ourna lp71“nationwide general guidelines”st at e t hat any oral t est should def ine its coverage a nd requi rements in linewit h t he oral course objectives , which can be defined broadly as t raining Englis h oral ability a ndcommunicative compet ence (see Appendix 2) .#p#分页标题#e#
The Oral Test in question is designed and conduct ed inShanghai , wit h its own cha ract eristics a nd distinctive f ormat slightly dif f erent f rom what is stip ulat ed int he national general t est syllabus . However , as f ar as t he writ er is concer ned , t here has been no oral t estsyllabus sp ecif ied f or Sha nghai version . Theref ore , t he nationwide objectives f or oral course a re ref er redt o in a broad sense as what is t o be measured in t he OT.3 . Analysis of t he OTBef ore t he a nalysis of t he t est itself , at t ention needs t o be paid t o t est syllabus . As indicat ed above ,no specif ic syllabus f or t he O T in Shanghai is available , which may simply mean t he t est ta kers have veryvague idea about what t he t est is aimed at , what cont ent is t o be covered a nd what met hods are t o beused. Thus t he lack of inf ormation about t he t est will not only aff ect t est t akersp perf ormance but alsoundermine t he t estps const ruct validity.Then when it comes t o t est itself , fi rst a nd f oremost , validity a nd reliability as core of a t est invit emost concern . It is admit t ed t hat measurement of oral p rof iciency is one of t he weakest it ems in t estingbat t ery , as bot h validity a nd reliability are suspect ed. Even so , resea rches a re still conduct ed in t he hopeof maximizing validity wit hout sacrif icing reliability or vice versa . Af t er all , if t he t est inaccurat elyrenders a t est eeps real oral p roficiency , it might unjustly cost him/ her not only one yea rps time as t his oralt est is administ ered only once a year , but a p ossible cha nce t o build his/ her career or f uture lif e .Theref ore , high2st akes t ests like t his s hould be as nea r accurat e and f ai r as p ossible and must bescrutinized ca ref ully.Despit e dif f erent p hrasing f or t he two t erms , validity ref ers t o“t he extent t o which t he results of t hep rocedure serve t he uses f or which t hey were int ended”( Hatch & Fa r hady 1982 : 250 ) , or p ut moresimply ,“a t est is said t o be valid if it measures accurat ely what it is int ended t o measure”( Hughes 1989 :22 ) . Reliability can be def ined as “t he ext ent t o which a t est p roduces consist ent results whenadminist ered under similar conditions”( Hatch & Fa r hady 1982 : 244) , or in ot her words , a reliable t est ina sense p rovides consist ent , replicable inf ormation about student p erf ormance . A good t est in essence canbe cha ract erized as having high degree of bot h reliability a nd validity above anyt hing else likep racticality. Reliability is a requi rement f or validity , and t he investigation of reliability a nd validity canbe viewed as complement a ry aspects of identif ying , estimating , a nd int erp reting dif f erent sources ofva riance in t est scores (Bachman 1990 : 1602162) . A reliable t est doesnpt necessa rily suggest having highvalidity while a valid t est must be f i rst of all reliable .(1) ReliabilityAs reliability is a necessa ry condition f or validity , care should be t aken in investigating reliability.Empi rical studies show t hat scores may vary as a result of dif f erences in bot h sp ea kerps perf orma nceand rat erps p erf ormance , which may be explained by t he f act t hat reliability actually embodies twof olddimensions : t he t est itself which det ermines oneps perf ormance and ma rkers whose scoring may be subjectt o individual dif f erences .a) Test reliabilityThere a re two asp ects of t est reliability under concer n : reliability coef f icients (int er nal consist ency)of a single t est a nd cor relation coef f icients of pa rallel sets of t est , bot h of which are st atistical concepts .However , f or lack of st atistical dat a of studentsp perf orma nce , t he reliability figure based onmat hematical calculation is imp ossible . What ca n be done , t heref ore , is t o compa re how homogeneous oridentical t he dif f erent sets are in t erms of measuring t he same t rait or t raits , or t o what ext ent t he p arallelf orms are agreeable , as it is of t en not ed t hat alt ernative f orms a re simply not available .Pa rallelism of it ems typically mea ns st atistical equivalence plus cat egorial simila rity. Against t his , acompa rison is made between p arallel versions used in t he past years a nd it is f ound t hat mat erial selectionwas not on t he same basis of dif f iculty level . Ta ke t he t est papers used in t he year 2000 f or example .14A Te nt a t ive S t udy of Re li a bili t y a nd Va li di t y Cha r a c t e ri s t i c s of t he S e lf2t a ugh t . . . #p#分页标题#e#
Zhang Yanl iCompa red wit h Set 2 , Set 4 was much more demanding on t he p art of t he t est t a kers . The reading t extadopt ed in Set 4 was about hospit al noise wit h more dif f icult vocabulary ( e1 g1 serenity , result a nt , etc . ) ,more long sent ences ( a t ot al of 3262word t ext divided int o 10 sent ences wit h an average sent ence lengt h of3216 words) , complex st ructures (wit h comp ound , complex or comp ound2complex sent ences t aking up70 % of t he t ot al t en long sent ences) , a nd involved concepts , a nd t heref ore , it might t ake longer time f orcandidat es t o p rocess and st ore in t hei r minds , while t he reading t ext in Set 2 was relatively easier invocabula ry , synt ax and sema ntic ref erence ( a tot al of 2532word t ext divided int o 20 sent ences wit h anaverage sent ence lengt h of 1217 words , a nd simple sent ences accounting f or 85 % of t he t ot al 20sent ences) and t hus requi red less ef f ort in comp re hending as well as memorizing. Furt hermore , t he t opicschosen f or Part I I I Free Talk were also not of quit e simila r nature . The one used in Set 2 entitled“Whydo you enjoy Englis h self2t aught p rogrammes”was a more f amiliar a nd releva nt t opic t o all t he studentsand t heref ore t hey might be motivat ed t o exp ress ef f ectively t hei r opinions . In cont rast , however , t het opic in Set 4“In what way do you t hink ma ny noises af f ect t he p atients in a hospit al ?”might involve t estt akerspp rior knowledge of t he cont ent a rea“hospit al”. So t hose who were unf amilia r wit h t he hospit alnoises might not have much t o t alk about and t his unf amiliarity a nd i r releva nce might const rain t hei rla nguage perf ormance . In ot her words , t his topic was biased against some t est ees while in f avor of someot hers . As a result , t he dif f erences or lack of homogeneity of it ems may lower degree of reliability ofpa rallel t ests .The question t hat clearly arises f or t est reliability is whet her t est t a kerps chances a re p rejudiced by t heversion t hey t a ke , t heref ore whet her t est scores result ed f rom t he pa rallel f orms a re compa rable is calledint o question. If t hese questions couldnpt be a nswered p rop erly , if t he rulers used t o measure were not oft he same lengt h , t hen any conclusion or decision made based on t he results of t he measurement would bequestionable a nd even absurd.b) Score r reliabilityThe ot her comp onent of t est reliability involves t he scoring of rat ers , which is esp ecially p roblematicwhen t he t est is a subjective one in which“a degree of judgements is called f or on t he part of t he scorer”( Hughes 1989 : 36 ) . In t he Oral Test under discussion , t he scorers need t o exercise some degree ofsubjectivity in holistic scoring , which gives rise to t he issue of scoring reliability as underst anding of t hemar king crit erion may vary f rom person t o p erson and applying of t he ma rking crit erion by t he samemar ker may va ry f rom one occasion t o a not her .Scoring rubrics or ma rking scheme may be used to guide rat ers in making t hei r judgements accordingt o cert ain crit eria , however , empi rical evidence has suggest ed t hat in spit e of such guidelines , rat ers mayvery well ar rive at simila r ratings f or quit e dif f erent reasons . In ot her words , t here is disagreementamong dif f erent judges using t he same crit erion.In a crit erion2ref erenced t est like t he O T , est ablis hing a well2def ined scoring crit eria may help t oreduce subjectivity , a nd t hus imp rove objectivity of mar king f or t he sake of reliability. However , in spit eof having broken down t he crit eria f or t he Oral Test (see Appendix 3) int o several cat egories , one is stillconf ront ed wit h a subjective int erp ret ation wit hin each cat egory as t he crit eria f or evaluation a re t oovague . Due t o t he f act t hat examiners received no special inst ructions in t he evaluation of speechbef ore ha nd , t here were serious p roblems f or t hem t o det ermine what kind of la nguage use t o beconsidered“crit eria”against which t est ees were evaluat ed. For example , in t he ma rking crit erion f orPa rt I , it st at es t hat “Reading aloud should be conduct ed coherently a nd f luently wit h cor rectp ronunciation at normal speed ”. Given t he same crit erion , rat ers might come up wit h dif f erentint erp ret ations of what constitut es a“cor rect p ronunciation”or assign dif f erent weights unconsciously t ova rious f eatures of speaking ability (cohesion , p ronunciation , f luency , sp eed etc) , so scorers on dif f erentgrounds might employ eit her higher or lower score t hres holds a nd yield dif f erent scores .Furt hermore , t his crit erion2ref erenced assessment is generally done by rat er wor king on his/ her ownand t here is no second ma r king , t heref ore t he lack of second mar king may also be seen as a p ot entialt hreat t o t he reliability of t he t est . How p ossible is it t o ensure t hat one assessor is rating ca ndidat es inhis/ her set ting on t he same basis as a not her assessor in next door and t he assessor is consist ently15CEL EA J ourna lp71implementing t he same crit erion , in ot her words , how t o gua rant ee bot h int er2rat er reliability a nd int ra2rat er reliability is wort h f urt her investigating. Unless we can be sure t hat t he scores do in f act accurat elyand consist ently ref lect lea rner at t ribut es , t he int erp ret ation of t he scores may be seriously f lawed.(2) ValidityValidity is a mat t er of degree rat her t han an all2or2not hing t rait . A t est ca n be highly valid f or onep urp ose but not f or anot her . Like reliability , validity also reveals a number of aspects . Basically t here aref our typ es : f ace validity , cont ent validity , crit erion2relat ed validity a nd const ruct validity. Emp hasis islaid on cont ent validity and const ruct validity in t he f ollowing discussion , a nd f ace validity as beingsup erf icial and i r releva nt (St evenson 1985 : 43 ) is dismissed. The reason f or leaving crit erion2relat edvalidity out is t hat some independent and highly valid assessment used as crit erion measure against whicht he t est is validat ed is just neit her p racticable nor available in our situation.a) Content validityCont ent validity is“t he ext ent to which a t est measures a rep resent ative sample of t he subject mat t ercont ent”( ( Hatch & Far hady 1982 : 251) . To satisf y cont ent validity of t he O T , t hen , t he mat erials ort opics select ed should be a nature likely t o elicit a continuous f low of speech and enable us t o gain arelatively complet e picture of how English lear ners perf orm t o cope wit h oral t asks .If we look at t he t est f ormat of t he O T , we may f ind all t he t hree pa rts a re designed t o t est oralcont ribut ory subskills or oral p roduction , at least at t he f ace level : Part I int ended as a subtest f orp ronunciation a nd int onation ; Part I I set t o elicit oral resp onses wit h t he reading passage as stimuli ; a ndPa rt II I adopt ed f or t esting oral p roduction. However , t he p roblems lie in t he mat erial selection a ndt opical domain. For example , t he reading t ext used in Pa rt II of Set 4 (see Appendix 5) of t he O T wasmore app rop riat e f or t esting reading comp rehension rat her t han being as inp ut t o activat e or stimulat eoral exp ression , t heref ore , it was not a rep resent ative sample f or t esting sp ea king ability. Besides , t het opic used in 1999 Set 2 (see Appendix 4) “Do you like Chinese t raditional medicine ? Why ?”might bemore suit able f or assessing medicine knowledge rat her t ha n oral p rof iciency as it might cause dif f iculty t osome people lacking in releva nt knowledge even in t hei r mot her t ongue a nd leaving lit tle room f or t hem t ot ap t hei r oral p ot entials .Because of t he const rained nature of t he Oral Test , t he t est as a rep resent ative sample of a ra nge ofcommunicative skills , will be questionable , which may be viewed as inadequacy of t he t estps cont entvalidity.Besides , reading aloud a nd p repared monologue are also two of t he t echniques not recommended bysome t est exp erts ( Hughes 1989 : 1092110) .b) Const ruct validityCompa red wit h cont ent and crit erion2relat ed validity which a re concer ned wit h some sp ecif icp ractical use of t est results , const ruct validity is concerned wit h more general and t heoretic hyp ot hesizedla nguage ability or psychological t raits . A t est ca n be said t o have const ruct validity if it ca n bedemonst rat ed t hat it measures just t he ability which it is supp osed t o measure .According t o Hughes ( 1989 ) , const ruct “ref ers t o a ny underlying ability ( or t rait ) which ishyp ot hesised in a t heory of language ability”(p . 26) . Derived f rom t he course objectives , t he“const ruct”or“t rait ”s t he said Oral Test is supp osed t o measure should be oral ability a nd communicativecompet ence . Such oral ability , according t o Wei r & Bygat e ( 1992 ) , consists of routine skills ,imp rovisation skills a nd micro2linguistic skills . However , if we examine t he OT p ap er , we ca npt f indmuch verif ication f or t his supp osition.Pa rt I is a one2minut e or so t est of reading aloud. Reading aloud was a met hod of t esting used inp revious centuries . It was , however , discredit ed as a met hod of t esting af t er World War I I as it wasconsidered not t o be a t est of normal communicative ability. Considering t he f act t hat t he const ruct ofspeaking ability consists of a wide ra nge of cont ribut ory subskills or knowledge such as p ronunciation ,f luency , synt ax , vocabulary , r hetorical organization , cohesion , p ragmatic f unction , regist er ,16A Te nt a t ive S t udy of Re li a bili t y a nd Va li di t y Cha r a c t e ri s t i c s of t he S e lf2t a ugh t . . . Zhang Yanl ipa rap hrase , conversational adjustment , tur n2t aking , etc . , we may see t he p oint f or including“readingaloud”t hough it is hardly a replicating of a real2lif e situation . The p roblem is t hat whet her as much as30 % weighting is allocat ed t o t his mechanical rat her t ha n communicative skill is app rop riat e when t oday amore realistic at titude which looks on t he language as a mea ns of communication has been accept ed.Pa rt II is t he most p roblematic section as f ar as const ruct is concerned. In Pa rt I I , students areexpect ed t o list en t o questions recorded in t he t ap e about t he reading mat erial which t hey a re supp osed t ohave p repa red and t hen give quick resp onses derived f rom t he comp rehension of t he reading mat erialwit hout looking at it . It seems at least 2 va riables a re involved in t he p rocess of giving cor rect answersorally : t he ability to comp re hend a writ ten passage and list ening ability , which a re actually t est ed in ot hercourse examinations (such as Comp rehensive Reading Course Tests I and I I , and List ening Course Test) .What impedes students may not be oral ability but rat her may be incomp re hensibility of a cert ain word ina passage or inability t o underst a nd t he t ap ed questions . Theref ore even eff ective communicative abilityp rovides no gua rant ee f or reaching t he cor rect a nswers , as t here is not much close link between oralability and a f ixed set of resp onses . Moreover , some of t he questions f ocussing on det ails of t he passageserve t he imp osed p urp ose f or memorization and t hus exert a const raining imp act on t he oral p ot entials oft he t est t a kers , which may yield a measure of a slightly dif f erent const ruct t ha n what is stip ulat ed in t het est syllabus . All t hese may indicat e t hat t he OT is not high valid in a sense f or t he p urp ose of t esting oralp rof iciency or const ruct .III . Sugges tionsClearly t hen , t here are some p roblems a rising in t he O T in its p ret est design or while2t estadminist ration , but on t he whole t he OT is not invalidat ed by t he p roblems t hat sur round it . As t hep urp ose of t he writ erps at t empt is t o supplement it , ret aining its st rengt hs a nd enha ncing t heint erp ret ation of t he t est scores at more valid and reliable levels , t hus t o adequat ely address t he issue inquestion , t he results or scores of t est t akersp perf orma nce need t o be analyzed t hrough st atisticalp rocedures in its p ost t est st age . Af t er all , bot h reliability a nd validity a re st atistical concepts .In light of t he p resent observations a nd discussions , however , how ca n we imp rove reliability a ndvalidity at a higher level wit h p racticability t aken int o consideration ? Too many scholars or researchershave p ut f orwa rd various solutions . Even so , in t he case of t he O T , t he writ er still int ends t o highlight t hef ollowing implications :(1) A test syllabus specified for t he Shanghai version s hould be issued so t hat t he test takers a re as clea ras possible at t he outset about t he OT.(2) Wit h respect to reliability :a) Ma r kers s hould be st abilized a nd t rained , as is st at ed by Brown , “Int er2rat er and int ra2rat erreliabilities are necessa ry conditions f or investigating t he validity of t he oral t est , t heref ore specialemp hasis should be placed on t he background and t raining of t he rat ers”(1989 : 98) . To do so , scorerscan be inst ruct ed t o p ractice on at least several recordings of t he p ast t ests in order t o st abilize t hei rjudgement bef ore beginning actual mar king and new ma rkers should have t hei r wor k checked byexperienced rat ers bef ore being allowed t o score .b) Crit erional levels serving as rating scales s hould be suf f iciently p recise f or ma rkers t o ha ndle wit h.More emp hasis may be placed on ot her dimensions inst ead of so called “cor rect ”p ronunciation inassessing p rof iciency of sp oken English and p rop ortion of various f eatures of speaking ability s hould bebet t er balanced. .c) Diff erent sets of t est used as parallel f orms need t o be as identical as p ossible , t heref ore ,st atistical a nalysis such as cor relation coeff icients across t he pa rallel sets s hould be conduct ed p rior t o t heapplication of t he t ests .(3) From t he standpoint of validity :a) Great imp ort a nce must be at t ached t o t he selection of mat erials or t opics .17CEL EA J ourna lp71b) Emp hasis s hould be laid on t he ability t o p roduce an ut t erance , so it ems in which depending t oomuch on reading ability a nd list ening ability is built wit hin t he oral t est s hould be avoided. Inst ead a moredi rect measurement involving more int eraction is called f or .c) Under p ossible condition , crit erion2relat ed validity needs exploring t o investigat e how successf ullyt he examination has f ulf illed its int ention of indicating or p redicting language abilities .IV. ConclusionIn t he course of doing t he study , t he writ er of t en f inds t hat more questions have been raised t hanresolved : The administ ration of oral t est is subjective in nature , t hen what is t he impact of t hissubjectivity on t he validity of such a p rocedure ? Is t here a need f or a more st anda rdized t est model ? Whatselection crit eria s hould be employed t o qualif y oral examiner ? How p ossible is it t o develop a means ofobt aining richer inf ormation on communicative compet ence quickly a nd economically wit hout requi ringhighly t rained t esters ? Theref ore , more resea rch and a nalysis need t o be undert aken as t esting is only amea ns t o an end , not end itself .It is surely a complexity of administ ering valid and reliable oral t est as it involves so many students ,administ rative cent ers or of f ice people and examiners , but itps cost ef f ective a nd ef f orts a re wort h whilemaking as t ests ca n yield bot h negative a nd p ositive was hback a nd good t ests will encourage t he use ofbenef icial t eaching2leaning p rocesses . Only by doing so ca n we , t o t he most p ossible ext ent , verif y t hehigh qualities of t he t est whose impact ca n bring about p ositive was hback t o t est t a kers .ReferencesBachman , L 1990 . Fundamental Considerations in Language Testing. New York & Oxf ord : Oxf ord UniversityPress .Brown , J . D. 1989 . Short2cut Estimates of Criterion2ref erenced Test Reliability. Paper presented at the 11thAnnual Language Testing Research Colloquium , San Antonio , March 1989 .Brown , J . D. 1998 . Understanding Research in Second Language Learning. Cambridge : Cambridge UniversityPress .Hatch , E. M. & H. Farhady. 1982 . Research Design and Statistics for Applied Linguistics. Rowley , Mass . :Newbury House Publisher , Inc.Hughes , A. 1989 . Testing for Language Teachers. Cambridge University Press .Stevenson , D. K. 1985 . Authenticity , validity and a tea part . Language Testing 2/ 1 : 41247 .Weir , C.J . &M. Bygate . 1992 . Meeting the criteria of communicativeness in a spoken language test . Journalof English and Foreign Language , Nos 10211 (3) :27243 .Appendix 1 The National Tes t SyllabusAppendix 2 The Oral Course ObjectivesAppendix 3 #p#分页标题#e#/shlunwen/shanghaiyingyulunwen/Ma rking CriteriaAppendix 4 Tes t Paper for 1999Appendix 5 Tes t Paper for 2000Appendix 6 Tes t Paper for 1998(Omit t ed)18A Te nt a t ive S t udy of Re li a bili t y a nd Va li di t y Cha r a c t e ri s t i c s of t he S e lf2t a ugh t . . .
3近年来,随着学习者自主逐渐成为外语教学的研究重点之一, 自我评估和学生互评作为该概念在评估测试领域的重要体现, 也越来越受到研究者们的关注。然而, 在中国的英语教学环境下针对中国英语学习者的相关研究却不多见。本文旨在研究中国大学生二语写作自我评估和学生互评的评分者信度。通过对中国科学技术大学52 位非英语专业大一学生二语写作自我评估和学生互评的调查,作者发现:学生自我评估的信度为0. 432 , 互评的信度为0.202 ; 自评者有高估的趋势, 而互评者则有低估的趋势;自评者和互评者均有高估低表现、低估高表现的倾向。关键词: 自我评估;学生互评;学习者自主高等教育自学考试(上海) 英语口试信度效度初探上海外国语大学英语学院 张艳莉⋯⋯⋯ 12高等教育自学考试是为满足社会自学者的需要和自学考试的特点而施行的, 然而十多年来,有关对考试试卷本身的分析还不多见,因此本文作者拟就在每年四月份在上海举行的自学考试口试进行分析。本文分两大部分, 第一部分从当代语言测试的一些理论与作者作为该考试口试官的经验等对该考试进行信度与效度分析;第二部分则就如何提高该考试的信度与效度提出建议。
关键词: 自学考试;口试;信度;效度
大学英语四级考试对于大学英语教学的反拨效应研究
本次研究以大学英语教师为研究对象, 调查他们对于大学英语四级考试对其教学产生的影响的看法, 通过问卷及访谈手段收集数据资料,采用相关关系研究的方法处理数据。结果发现大学英语四级考试对于教学的确有重要影响,但这种影响是较为表层的,即影响的只是教师教什么而非如何教。本研究对有关反拨效应的研究提供了实证性的补充, 对大学英语四级考试改革和大学英语的教学之间的关系作了进一步的探究。
关键词: 反拨效应; 大学英语四级考试; 大学英语教学二语教育的教师自主性研究在二语(外语) 教育领域, 随着学习者自主性研究的不断深入, 教师自主性在过去的十年也逐渐成为研究的一个焦点。通过全面、深入研读有关文献, 本文首先探讨教师自主性的涵义,接着分析教师自主性和学习者自主性之间的关系。对两者关系的讨论主要围绕教师在发展学习者自主性过程中的责任、态度和能力进行。文章最后审视促进教师自主性发展和教师发展的三种主要途径, 即人们较为熟悉的行动研究、反思性教学以及近年来才兴起的探索型实践(explorat ory p ractice) 。#p#分页标题#e#
关键词: 教师自主性; 学习者自主性; 二语(外语) 教育教学风格的隐含成因———区域教师调查之反思上海交通大学
尽管语言教师们以其自身的方式诠释着大部分教学方法和内容, 但我国外语教育的探讨大都着眼于外在的行为方面而非教师内在特征及个人风格。本文在相关概念鉴别和区域教师调查的基础上, 通过对教师概念与行为差异的系统性反思,分析了教学风格的内在因素,并呼吁在重视国内教师个体内在教学风格的同时,需超越方法性理论来关注教师自身的能力及条件。
关键词: 教学风格; 内在因素; 外在因素; 教师发展阅读教学的社会建构主义途径澳大利亚堪培拉大学
基于维果茨基学派的建构主义学习理论在第二语言或外语学习中越来越重要。然而这种建构主义的理论对于第一线的教师来说常常显得遥不可及甚至毫不相关。本文旨在探讨英语作为外语的阅读教学的建构主上海英语论文义途径, 阐释这种对话式途径使阅读者参与同文本或作者的意义建构,而非阅读过程的缄默的局外人。这种读者地位的转向意味着阅读者需要对阅读采取一种策略,以及教师需要以“脚手架”式的方式帮助学生获得有效独立的阅读策略。本文还探讨了这种“脚手架”支持的诸多具体方法, 以有助于把建构主义话语的这种理论转变为第一线课堂教师使用的实践。
关键词: 建构主义;阅读;策略126中文摘要