Primož Jurko. Target Language Corpus as an Encoding Tool: Collocations in Slovene-English Translator Training - PDF

Description
DOI: /elope Primož Jurko University of Ljubljana Slovenia Target Language Corpus as an Encoding Tool: Collocations in Slovene-English Translator Training Summary he opening part of

Please download to get full document.

View again

of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Engineering

Publish on:

Views: 18 | Pages: 12

Extension: PDF | Download: 0

Share
Transcript
DOI: /elope Primož Jurko University of Ljubljana Slovenia Target Language Corpus as an Encoding Tool: Collocations in Slovene-English Translator Training Summary he opening part of the article discusses reasons for the lukewarm reception of language corpora in the language teaching community. he irst reason is the complex syntax and rudimentary user interface of early corpora accessible in the 1990s. he second reason why corpora have witnessed a relatively slow start in language teaching is the fear of the unknown and of an unruly linguistic reality that is often at odds with rules taught at school. he practical part of the article presents a survey conducted among Slovene university students of translation. he survey focused on the efect of using a target language corpus in the course of Slovene into English translation in terms of English collocation. It found that the number of collocation errors in translation can be greatly reduced by competent use of a L2 corpus, which yields a translation with a higher level of idiomaticity. Key words: translation into L2, collocation, corpus, target language corpus, translator training Vpliv ciljno-jezičnega korpusa na prevajanje kolokacij pri študentih prevajanja iz slovenščine v angleščino Povzetek Članek se v začetku osredotoča na razloge za zadržan sprejem jezikovnih korpusov pri poučevanju jezikov. Kot prvi razlog navaja zapleteno skladnjo in nerodne uporabniške vmesnike zgodnjih korpusov iz 90-ih let. Kot drugi verjeten razlog vidi strah pred neznano jezikovno stvarnostjo, ki je pogosto skregana s šolskimi pravili. Praktični del članka predstavlja raziskavo med slovenskimi študenti prevajanja, ki razgrinja vpliv uporabe angleškega korpusa pri prevajanju v angleščino. Rezultati kažejo, da lahko ustrezna uporaba sodobnega korpusa bistveno zmanjša število napak pri angleških kolokacijah in tako pripomore k višji kvaliteti in idiomatičnosti prevoda. Ključne besede: prevajanje v tuj jezik, kolokacija, korpus, ciljno-jezični korpus, izobraževanje prevajalcev UDK 81 25: :378(497.4) TRANSLATION STUDIES 153 Target Language Corpus as an Encoding Tool: Collocations in Slovene-English Translator Training 1. Corpora past he advent of corpus linguistics has in practically every way changed the way we as linguists both deal with and look at language. Access to large quantities of computer-processed texts has proven invaluable in all disciplines of linguistics: rather than merely one of several branches of linguistics, corpus linguistics is seen as a methodology which can be applied to any sphere of linguistics (Mcenery and Wilson 2001, 2). he irst and perhaps most obvious linguistic discipline to proit from corpus linguistics was lexicography, with the publication of the revolutionary Collins COBUILD EFL dictionary in 1987, edited by J.M. Sinclair, one of the pioneers of corpus linguistics. Other publishers of linguistic reference works soon followed suit, and within less than a decade virtually all contemporary dictionaries were corpus-based, with EFL dictionaries leading the way (e.g., the 4 th edition of the Oxford Advanced Learner s Dictionary of Current English (Cowie, 1989), the 1 st edition of the Cambridge International Dictionary of English (Procter, 1995) and the 3 rd edition of the Longman Dictionary of Contemporary English (Summers, 1995), to name but the most widely used). he most recent linguistic ield to tap into the rapidly growing sphere of corpus linguistics is language teaching. Römer (2009, 84) inds that corpus linguistics can make a diference and that it has immense potential to improve pedagogy, but puts it to corpus linguists that they have so far focused on other, arguably higher priority tasks. In her text Römer also noted that corpus linguists have yet to come up with an interface of research and practice that will be suiciently userfriendly to open the door to a wider acceptance and recognition of corpora as a viable and valuable language teaching tool. Recent developments have shown a marked improvement in precisely this direction, which is attested by high numbers of users of contemporary corpora: the Corpus of Contemporary American (aka COCA, Davies 2008) is currently accessed by a massive 40,000 community of unique users each month. he marked surge in numbers of corpora users from hundreds in the early period of corpus history (late 1980s and 1990s) to several ten-thousands today is largely attributable to two developments: one is the growing shift of linguistic focus on lexicological matters, and the other is improved ease of access. he latter seems to be of particularly high importance, and great efort is directed towards bridging the gap between the wealth and complexity of information stored in corpora, on the one hand, and the needs and expectations of users, on the other. Recent surveys of corpora and corpus tools have shown that a modern user interface has become more Google-like (Kilgarrif and Kosem 2012, 49): text-input box, drop-down menus and practically instant results. Such an interface enables the user to gain access to the wide array of information available in the corpus without spending a substantial amount of time just to come to grips with anything beyond the most basic queries. his user-friendliness is a relatively new feature and stands in stark contrast to what corpus users had to live with only a decade or two ago. Users of the early corpora in the 1990s will no doubt remember the painful experience of learning the ropes of the British National Corpus (BNC, featuring the old interface available at see Fig. 1) or its Slovene counterpart FIDA (currently superseded by FidaPLUS available at the irst few days and even weeks of using it were very puzzling, to say the least. he whole enterprise of learning the query syntax of the corpus required one to invest substantial efort and time. Just 154 Primož Jurko Target Language Corpus as an Encoding Tool: Collocations in Slovene-English Translator Training Figure 1: Sample page of hits on the BNC for the query coca how much efort and time was enough, depended in principle on two factors: one was the users needs and expectations and the other their previous exposure to computers in their raw form, in a manner of speaking. he range of early corpus users extended from highly proicient, trained lexicographers to just curious what it is students of linguistics or languages, with several layers of mid-experts in between. When it comes to classroom activities, it is far from surprising then that the use of early corpora witnessed a slow start. While many (or even most) university teachers were thrilled at the possibility of studying authentic language and discovering language patterns that had up to then been invisible, they were faced with a huge drawback in the form of the highly complex query syntax. For anything beyond the simplest queries, mastering the internal rules of a corpus was a prerequisite. Consider the following example of a relatively simple query in the FidaPLUS corpus of Slovene: #1dež/1-3#1sneg his particular query returned concordances containing the noun lemma dež (rain) appearing anywhere in the span from 1 to 3 words before or after the lemma sneg (snow); so, while it is far from complicated in terms of complex queries, it is still highly structured and very cryptic to the untrained eye. his means that early corpora required a level of expertise that was simply too high for many, and even though the broadened horizons of corpus-driven research appealed to teachers and students alike, few of them actually ventured to try more than getting to know the basics of TRANSLATION STUDIES 155 what corpora are, and more importantly, how they can afect methods of teaching and learning. Complex syntax and a rough interface are not, however, the only cause of the sluggish start by language corpora in classroom use. here is another put of of corpora in general and this one has to do with the nature of language itself. Or perhaps more to the point, it is related to the ways we teach language and the way language is, which are two very diferent concepts. Teaching language is invariably subject to simpliication and generalization. While it is true that both simpliication and generalization are applied to a varying degree on all levels of education in general, there can also be no doubt that without these mechanisms the study of language in full scale would turn into an insurmountable task. Indeed, starting with children s mother tongue acquisition, parents behave like innate language teachers (Bolinger 1981, 165), and simpliication in communicating with ofspring is practically programmed in our minds. Simpliication as a methodology of teaching has strong roots, then, and it is only natural that even at the higher education level we tend to see language as a system of rules. he rules may apply to morphology, syntax, phonology, etc., and although they may change over time, teachers pass them on and students soak them in as eternal truths. However, what corpora reveal is authentic language usage, which often does not play by the rules. Linguistic reality has always been much more intricate than any textbook would have us believe, and modern corpora give it to us (more or less) just as it is. he image of real language is frequently in many ways a distortion of what we learn at school, and distortion leads to discomfort. Linguistic reality is hard to accommodate for both teachers and students: the former may feel threatened by questions from some inquisitive student about a topic that does not comply with the rules taught as school, yet s/he found it in the corpus. Students, on the other hand, are at irst likely to be puzzled by the discovery of data that do not it into the neatly organized categories of their linguistic knowledge. he discrepancy between corpus-revealed real usage and school-acquired regularity appears to have a particularly strong impact on foreign language learners Figure 2: COCA interface with POS menu pulled down 156 Primož Jurko Target Language Corpus as an Encoding Tool: Collocations in Slovene-English Translator Training (Granath 2009, 49), who in L2 cannot rely on their language instinct as self-assuredly as they do in their mother tongue. 2. and present However, as this paper will show, with proper training and encouragement, students have much to gain from the use of contemporary L2 corpora. As a case in point, a survey based on classroom/home implementation of COCA will be presented. COCA is a corpus that, since its introduction in 2008, has seen an explosive growth in terms of importance and number of users, which has made it arguably the most popular and widely used English corpus. A detailed presentation of COCA and its features would fall beyond the scope of this paper; instead, the main features that facilitate translation of collocations into English are briely presented below. Suice it to say that it is freely available on the internet to registered users; once registered, COCA saves the entire history of every user s queries. he irst feature allows the user to look for collocates of the keyword within a given span: the interface lets you choose between searching for an exact word form, a lemma or any part of speech (POS). he latter is particularly useful for EFL users in searching for acceptable collocations (like, say, adjectives that can precede a given noun, or prepositions used after adjectives). One of the greatest lexical problems in translation is dealing with instances of divergent polysemy. he worst case scenario in divergent polysemy for a translator develops when a polysemous lexeme in the source language is rendered by a multitude of lexemes in the target language (for a full account, see Gabrovšek 2005, 120). For instance, let s take the Slovene verb začeti, which is in most contexts translatable into English as either begin or start. But are they interchangeable, i.e. are they full synonyms? In choosing one or the other, most EFL users play it by ear, in a manner of speaking, but how do we get to the bottom of it? In principle, whenever EFL users are of two minds about words with a similar meaning, the irst reference work that one thinks of is a thesaurus with meaning discrimination or a dictionary of synonyms. However, with begin and start, even the most recent one, the Oxford Learner hesaurus (2008) seems to be at a loss for diferences in denotation, since one is deined in terms of another: start means to begin doing sth (see Fig. 3). Figure 3: begin vs. start in the Oxford Learner s hesaurus TRANSLATION STUDIES 157 he problem of distinguishing between two words with meanings as close as those of the verbs start and begin frequently boils down to their context, i.e. their collocational behavior. he note at the bottom of Fig. 3 hints at precisely this important diference between the verbs: what objects they can or cannot take. his is where another feature of COCA is invaluable: the comparison. It is intended to compare two words with similar meaning. Selecting the option COMPARE in the top left part of the screen ofering various display options opens up two comparison boxes, into which the observed words are entered. As above, the COMPARE feature also has the option to look for collocates of the two words. he results are displayed with the help of color graphics, which allow the user to evaluate the results with a quick glance at the screen: bright green for word combinations where the use of either word 1 or word 2 is exclusive of the other, pale green for combinations where one of the words is markedly preferred, and white for combinations where both words are possible, i.e. for neutral ground, so to speak. So, in order to determine possible contexts for the verbs begin and start, we looked at the direct objects of the two verbs: a query was built to list nouns that immediately follow the respective verbs (see Fig. 4). Figure 4: COCA - comparison of nouns following start and begin A quick check of the results reveals that there are indeed considerable diferences between the two verbs in question: many things can only be started and fewer can only be begun. Even advanced EFL users are likely to ind in the results something they did not previously know. he comparison feature of COCA, then, is particularly valuable in highlighting the diferences in collocators of two words that are regarded as quasi synonyms. As will be shown below, students of 158 Primož Jurko Target Language Corpus as an Encoding Tool: Collocations in Slovene-English Translator Training translation who used COCA in their home assignments were able to show marked improvement in their English collocation output, as compared to their colleagues who did not. 3. COCA and students: the survey To assess the level of improvement in L2 translation of collocations that can be achieved through the use of a target language corpus, a survey was conducted among Slovene university students of translation. he sample group did not access a corpus of English in preparing their translations, while the control group was free to use COCA. Here are the technical details: all subjects were 3 rd year undergraduate students (6 th semester) A language (mother tongue): Slovene. B language (irst foreign language): English C language (second foreign language): German, French or Italian sample group size: 19 students control group size: 96 students (in 4 groups of 23, 19, 27 and 27 members). he survey was carried out as part of the A-B (Slovene into English) translation class. he class took place twice a week and students were required to prepare one translation for each session as a home assignment. he source language texts averaged 1,600 characters in length and varied in diiculty, but were not genre or otherwise marked. All students received basic information about corpora in general and about COCA in particular in the course of their preliminary study. However, this study involved only a few previous handson experiences with L2 corpora in terms of classroom activity, which is why they were given a brief account of the most useful features of COCA. Particular emphasis was put on distinguishing between googling for word combinations and using COCA. he use of popular search engines like Google, Yahoo or Microsoft s Bing (whichever is set as default in their web browsers) in searching for acceptable English collocations seems to be a perennial favorite among students, indeed, in many it has grown into a habit that they ind diicult to kick even after having learned the beneits of using a well-structured corpus. For most people the search engine of choice these days is apparently Google, and students seem to be persuaded by the exorbitant numbers of hits for a given search string that this particular word combination is widely used in a given language. What they do not know, however, is that Google does not really search all of the World Wide Web in a fraction of a second, but rather performs a calculation and gives you an estimate of how many hits appear to be out there. And what you are given is a VERY rough estimate. In all fairness, every Google user can also get the exact number of hits, but only if they log into their account and change the Google Instant Predictions setting to Never show Instant results. Also, you must set the number of results displayed to 100. What this means in practice is something you can easily try out for yourself: enter the string to the best of my knowledge into Google s search box, and it will probably give you anywhere between 150,000,000 and 300,000,000 hits. If you repeat the same search without Instant predictions, you will get the same number, but here comes the truth: say you want to check the results listed from and you click on the appropriate link at the bottom of the page: you cannot see the requested results, because Google runs out of the results at 632. What is more, if you repeat the search a few minutes later, you are very likely to ind that the number has changed, in my case the number of hits was reduced to a mere 414. his is a mere technicality, and there are far more convincing reasons why search engines should not be TRANSLATION STUDIES 159 considered a quick and handy replacement for a corpus (for a fuller account, visit byu.edu/coca/help/google_e.asp). he survey was carried out by comparing students translations of ten diferent Slovene texts. he students from the sample group (translation done without using COCA) handed in three translations for evaluation at every session, while the control group (translation done with access to COCA) handed in six translations over the course of two years. Of course, since the translations were made as a home assignment, this leaves some room for speculation as to whether the students stuck to the agreement of refraining from COCA or not. However, the results featured a very consistent array of diferences between the two groups, which we believe is largely attributable to the (un-)availability of a target language corpus in the process of translation. 4. Survey results he data table below shows the distribution of collocation errors (CE) in the translations of the sample group compared to the control group. Data is sorted in descending order according to the 3 rd column, i.e. average number of collocation errors per translation in the sample group. Source language text SAMPLe: translation WITHoUT L2 corpus access No. of Ce in translations Average no. of Ce per translation CoNTRoL: translation WITH L2 corpus access No. of Ce in translations Average no. of Ce per translation Ratio Ce CTRL/SMPL 1. JSKD 4,4, ,3,2,2,1,4,0,1,2,3,1,2, : Terorist 5,2,5 4 1,3,3,2,1,2,0,1,2,3,1,2, : DSKP 3,2, ,2,0,0,1,0,0,0,1,1,1, : SG 1,4, ,0,2,0,0,3,0,1,1,0,0, : 4 5. NM 2,5,2 3 0,1,0,0,2,0,0,1,1,0,0, : 6 6. Ajdovščina 3,1, ,1,0,0,0,1,0,0,0,2,0, : 8 7. Izola 2,3, ,1,2,2,3,1,0,2,1,0,0, : MK 60 2,4, ,3,2
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks