Table Of Content

Towards a Unified Approach to Memory- and Statistical-Based Machine Translation Daniel Marcu Information Sciences Instituteand Departmentof ComputerScience UniversityofSouthern California 4676AdmiraltyWay, Suite1001 Marina delRey, CA90292 [email protected] Abstract and selects and generates new translations by performing similarity matchings on these trees. We present a set of algorithms that en- Veale and Way (1997) store complete sentences; able us to translate natural language new translations are generated by modifying the sentences by exploiting both a trans- TMEM translation that is most similar to the in- lation memory and a statistical-based put sentence. Others store phrases; new trans- translation model. Our results show lations are produced by optimally partitioning that an automatically derived transla- the input into phrases that match examples from tion memory can be used within a sta- the TMEM (Maruyana and Watanabe, 1992), or tistical framework to often find trans- by finding all partial matches and then choosing lations of higher probability than those the best possibletranslation using a multi-engine found using solely a statistical model. translationsystem(Brown,1999). The translations produced using both With a few exceptions (Wu and Wong, 1998), the translation memory and the sta- mostSMTsystemsarecouchedinthenoisychan- tistical model are significantly better nelframework(seeFigure1). Inthisframework, thantranslationsproducedbytwocom- thesourcelanguage,let’ssayEnglish,isassumed mercial systems: our hybrid system to be generated by a noisy probabilistic source.1 translated perfectly 58% of the 505 Most of the current statistical MT systems treat sentences in a test collection, while this sourceas a sequence of words(Brown et al., the commercial systems translated per- 1993). (Alternativeapproachesexist,inwhichthe fectlyonly40-42%ofthem. source is taken to be, for example, a sequence of aligned templates/phrases (Wang, 1998; Och et 1 Introduction al.,1999)orasyntactictree(YamadaandKnight, 2001).) Inthenoisy-channelframework,amono- Over the last decade, much progress has been lingual corpus is used to derive a statistical lan- madeinthefieldsofexample-based(EBMT)and guage model that assigns a probability to a se- statisticalmachinetranslation(SMT).EBMTsys- quence of wordsor phrases, thusenabling oneto tems work by modifying existing, human pro- distinguish between sequences of words that are duced translation instances, which are stored in grammaticallycorrectandsequencesthatarenot. a translation memory (TMEM). Many methods A sentence-aligned parallel corpus is then used have been proposed for storing translation pairs inordertobuild aprobabilistictranslationmodel in a TMEM, finding translation examples that arerelevant for translatingunseen sentences, and 1For the rest of this paper, we use the terms source modifying and integrating translation fragments andtargetlanguagesaccordingtothejargonspecifictothe noisy-channelframework.Inthisframework,thesourcelan- to produce correct outputs. Sato (1992), for ex- guage is the language into which the machine translation ample, stores complete parse trees in the TMEM systemtranslates. Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2001 2. REPORT TYPE 00-00-2001 to 00-00-2001 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Towards a Unified Approach to Memory- and Statistical-Based Machine 5b. GRANT NUMBER Translation 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION University of California,Information Sciences Institute ,4676 Admiralty REPORT NUMBER Way,Marina del Rey,CA,90292 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 8 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 the statistical model. The translations produced Source Channel P(e) e P(f | e) ufsingboththetranslationmemoryandthestatisti- calmodelaresignificantlybetterthantranslations producedbytwocommercialsystems. best observed Decoder 2 TheIBMModel 4 e f For the work described in this paper we used a modified version of the statistical machine trans- argmax P(e | f) = argmax P(f | e) P(e) e e lation tool developed in the context of the 1999 Johns Hopkins’ Summer Workshop (Al-Onaizan Figure1: Thenoisychannelmodel. et al., 1999), which implements IBM translation model4(Brownetal.,1993). that explains how the source can be turned into IBM model 4 revolves around the notion of the target and that assigns a probability to every wordalignmentoverapairofsentences(seeFig- wayinwhichasourceecanbemappedintoatar- ure 2). The word alignment is a graphical repre- get f. Once the parameters of the language and sentationofanhypotheticalstochasticprocessby translationmodelsareestimatedusingtraditional which a source string e is converted into a target maximumlikelihoodandEMtechniques(Demp- string f. The probability of a given alignment a steretal.,1977),onecantake asinput anystring andtarget sentencefgiven a sourcesentencee is in the target language f, and find the source e of givenby highest probability that could have generated the (cid:0) P(a,f e)= target,aprocesscalleddecoding(seeFigure1). (cid:1) (cid:1) (cid:15)(cid:17)(cid:16) It is clear that EBMT and SMT systems have (cid:2) (cid:0) (cid:2) (cid:2) (cid:0) n (cid:3) e(cid:3)(cid:12)(cid:11)(cid:14)(cid:13) t (cid:3)(cid:18) e(cid:3)(cid:12)(cid:11)(cid:21)(cid:13) different strengths and weaknesses. If a sen- (cid:8)(cid:10)(cid:9) (cid:8)(cid:20)(cid:19) (cid:3)(cid:5)(cid:4)(cid:7)(cid:6) (cid:3)(cid:5)(cid:4)(cid:7)(cid:6) (cid:18) (cid:4)(cid:7)(cid:6) tencetobetranslatedoraverysimilaronecanbe (cid:1) foundintheTMEM,anEBMTsystemhasagood (cid:2) d(cid:6) (cid:3)(cid:29)(cid:6)(cid:31)(cid:30)! #" (cid:16) (cid:0) %$(cid:20)&(’(cid:24)’ e" (cid:16) (cid:11)*)+ %$(cid:20)&(’(cid:24)’ (cid:3)(cid:29)(cid:6),(cid:11)(cid:23)(cid:11)(cid:21)(cid:13) (cid:8)(cid:20)(cid:28) (cid:8) (cid:8)(cid:20)(cid:19) chance of producing a good translation. How- (cid:3)(cid:5)(cid:4)(cid:7)(cid:6)(cid:23)(cid:22)(cid:15)(cid:24)(cid:16)(cid:10)(cid:25)(cid:27)(cid:26) (cid:1) (cid:15)(cid:24)(cid:16) ever, if the sentence to be translated has no close (cid:2) (cid:2) (cid:0) matches in the TMEM, then an EBMT system is d(cid:25) (cid:6) (cid:8)(cid:20)(cid:28) (cid:3)(cid:18)/(cid:30) (cid:28) (cid:3)(cid:29)0(cid:18)(cid:17)1 (cid:6)32 %$(cid:29)&(’(cid:21)’ (cid:8)(cid:20)(cid:19) (cid:3)(cid:18)4(cid:11)(cid:23)(cid:11)(cid:21)(cid:13) (cid:3)(cid:5)(cid:4)(cid:7)(cid:6) (cid:18) (cid:4).- less likely to succeed. In contrast, an SMT sys- 576 (cid:30) (cid:9) (cid:26) (cid:15)(cid:24); 1 - (cid:15)(cid:24); tem may be able to produce perfect translations (cid:6) (cid:30) (cid:6) (cid:11)3> (cid:13) (cid:8)=< even when the sentence given as input does not (cid:9) (cid:26) 8:9 9 (cid:15)(cid:17); resemble any sentence from the training corpus. (cid:2) (cid:0) (cid:26)(cid:23)(cid:18) NULL(cid:11) However, such a system may be unable to gener- (cid:8)(cid:20)(cid:19) (cid:18) (cid:4)(cid:7)(cid:6)@? ate translations that use idioms and phrases that reflect long-distance dependencies and contexts, whichareusuallynotcapturedbycurrenttransla- wherethefactorsdelineatedby (cid:13) symbolscorre- tionmodels. spond to hypothetical steps in the following gen- This paper advances the state-of-the-art in two erativeprocess: respects. First, we show how one can use an ex- A EachEnglishworde(cid:3) isassignedwithprob- isting statistical translation model (Brown et al., (cid:0) ability n (cid:3) e(cid:3)(cid:12)(cid:11) a fertility (cid:3) , which corre- (cid:8)(cid:10)(cid:9) (cid:9) 1993)inordertoautomaticallyderiveastatistical sponds to the number of French words into TMEM. Second, we adapt a decoding algorithm whicheisgoingtobetranslated. so that it can exploit information specific both to the statistical TMEM and the translation model. A EachEnglishworde(cid:3) isthentranslatedwith (cid:0) Our experiments show that the automatically de- probability t (cid:3)(cid:18) e(cid:3) (cid:11) into a French word (cid:3)(cid:18) , (cid:8)(cid:20)(cid:19) (cid:19) rived translation memory can be used within the where ranges from 1 to the number of B statistical framework to often find translations of words (cid:3) (fertility of e(cid:3) ) into which e(cid:3) is (cid:9) higher probability than those found using solely translated. For example, the English word a b c c X^ \ Z\ WT S _ _ S \ Q S W_ S WS ‘ _ [‘ \ V ] “no” in Figure 2 is a word of fertility 2 that istranslatedinto“aucun”and“ne”. P Q R Q S T U S V WR P X Y P ZXWR Q [W\ Z S \ \ T X \ S R P Q T \ ] A The rest of the factors denote distorsion probabilities (d), which capture the proba- Figure2: ExampleofViterbialignmentproduced bilitythatwordschange theirpositionwhen byIBMmodel4. translated from one language into another; the probability of some French words being sixtuplesshowninTable1,becausetheywerethe generated from an invisible English NULL only ones that satisfied all conditions mentioned element (p(cid:6) ), etc. See (Brown et al., 1993) C above. Forexample,thepair noone;aucunsyn- or (Germann et al., 2001) for a detailed dis- dicatparticulierne N doesnotoccurinthetransla- cussion of this translation model and a de- tionmemorybecausetheFrenchword“syndicat” scriptionofitsparameters. isgeneratedbytheword“union”,whichdoesnot occurintheEnglishphrase“noone”. 3 Building astatistical translation C(cid:10)D By extracting all tuples of the form I,KdI+&eN memory fromthetrainingcorpus,weendedupwithmany Companies that specialize in producing high- duplicates and with French phrases that were quality human translations of documentation and paired with multiple English translations. We newsrelyoftenontranslationmemorytoolstoin- chose for each French phrase only one possible crease their productivity (Sprung, 2000). Build- English translation equivalent. We tried out two ing high-quality TMEM is an expensive process distinctmethodsforchoosingatranslationequiv- that requires many person-years of work. Since alent,thusconstructingtwodifferentprobabilistic wearenot in thefortunatepositionof having ac- TMEMs: cess to an existing TMEM, we decided to build A The Frequency-based Translation MEMory oneautomatically. (FTMEM) was created by associating with We trained IBM translation model 4 on each French phrase the English equivalent 500,000 English-French sentence pairs from that occurredmost often in thecollection of the Hansard corpus. We then used the Viterbi phrasesthatweextracted. alignmentofeachsentence,i.e., thealignmentof highest probability, to extract tuples of the form A The Probability-based Translation MEMory C(cid:10)D D D (cid:3)=) (cid:3)(cid:5)E(cid:7)(cid:6)F)HGHGHGF) (cid:3)(cid:5)E (cid:18)JI,KHL4),KHL*E(cid:7)(cid:6)H)HGHGHGF),K L#E (cid:1)(cid:10)I+&ML4)+&ML*E(cid:7)(cid:6)H) (PTMEM) was created by associating with D D D GHGHGF)+& L*E (cid:1)ON , where (cid:3) ) (cid:3)(cid:5)E(cid:7)(cid:6) )HGHGHGF) (cid:3)(cid:5)E (cid:18) represents each French phrase the English equivalent a contiguous English phrase, KHL4),KHL*E(cid:7)(cid:6)%)HGHGHG(cid:17)),K L#E (cid:1) that corresponded to the alignment of high- represents a contiguous French phrase, and estprobability. &(cid:21)L4)+&(cid:21)L#E(cid:7)(cid:6)F)HGHGHGF)+& L*E (cid:1) represents the Viterbi alignment between the two phrases. We selected IncontrasttootherTMEMs,ourTMEMsexplic- only “contiguous” alignments, i.e., alignmentsin itly encode not only the mutual translation pairs which the words in the English phrase generated but also their corresponding word-level align- only words in the French phrase and each word ments, which are derived according to a certain in the French phrase was generated either by the translation model (in our case, IBM model 4). NULL word or a word from the English phrase. Themutualtranslationscanbeanywherebetween We extracted only tuples in which the English two words long to complete sentences. Both andFrenchphrasescontainedatleasttwowords. methods yielded translation memories that con- For example, in the Viterbi alignment of the tained around 11.8 million word-aligned transla- two sentences in Figure 2, which was produced tion pairs. Due to efficiency considerations and automatically, “there”and“.” arewordsoffertil- memory limitations — the software we wrote ity0,NULLgeneratestheFrenchlexeme“.”,“is” loads a complete TMEM into the memory — we generates“est”,“no”generates“aucun”and“ne”, used in our experiments only a fraction of the and so on. From this alignment we extracted the TMEMs, those thatcontained phrasesat most 10 English French Alignment oneunion syndicatparticulier onefhg particulieri ;unionfhg syndicati nooneunion aucunsyndicatparticulierne nofjg aucun,nei ; onefhg particulieri ;unionfhg syndicati isnooneunion aucunsyndicatparticulierneest isfjg esti ;nofhg aucun,nei ; onefhg particulieri ;unionfhg syndicati thereisnooneunion aucunsyndicatparticulierneest isfjg esti ;nofhg aucun,nei ; onefhg particulieri ;unionfhg syndicati isnooneunioninvolved aucunsyndicatparticulierneestencause isfjg esti ;nofhg aucun,nei ; onefhg particulieri ;unionfhg syndicati involvedfjg encausei thereisnooneunioninvolved aucunsyndicatparticulierneestencause isfjg esti ;nofhg aucun,nei ; onefhg particulieri ;unionfhg syndicati involvedfjg encausei thereisnooneunioninvolved. aucunsyndicatparticulierneestencause. isfjg esti ;nofhg aucun,nei ; onefhg particulieri ;unionfhg syndicati involvedfjg encausei ;NULLfjg . i Table1: Examplesofautomaticallyconstructedstatisticaltranslationmemoryentries. TMEM Perfect Almost Incorrect Unable The results of the evaluation are shown in Ta- perfect tojudge ble 2. A visual inspection of the phrases in our FTMEM 62.5% 8.5% 27.0% 2.0% PTMEM 57.5% 7.5% 33.5% 1.5% TMEMsandthejudgmentsmadebytheevaluator suggestthatmanyofthetranslationslabeledasin- Table 2: Accuracy of automatically constructed correctmakesensewhenassessedinalargercon- TMEMs. text. Forexample,“autresre´gionsdelepaysque” and “other parts of Canada than” were judged as words long. This yielded a working FTMEM of incorrect. However, when considered in a con- 4.1 million and a PTMEM of 5.7 million phrase textinwhichitisclearthat“Canada”and“pays” translation pairs aligned at the word level using corefer, itwouldbereasonabletoassumethatthe IBMstatisticalmodel4. translationiscorrect. Table3shows afewexam- To evaluate the quality of both TMEMs we plesofphrasesfromourFTMEMandtheircorre- built, we extracted randomly 200 phrase pairs spondingcorrectnessjudgments. fromeachTMEM.Thesephraseswerejudgedby Although we found our evaluation to be ex- abilingualspeakeras tremely conservative, we decided nevertheless to sticktoitasitadequatelyreflectsconstraintsspe- A perfecttranslationsifshecouldimaginecon- cifictohigh-standardtranslation environmentsin texts in which the aligned phrases could be whichTMEMsarebuiltmanuallyandconstantly mutualtranslationsofeachother; checkedforqualitybyspecializedteams(Sprung, 2000). A almost perfect translations if the aligned phrases were mutual translations of each 4 Statistical decoding usingbotha other and one phrase contained one single statistical TMEM and astatistical word with no equivalent in the other lan- translation model guage2; TheresultsinTable2showthatabout70%ofthe A incorrect translations if the judge could not entries in our translation memory are correct or imagine any contexts in which the aligned almostcorrect(veryeasytofix). Itis,though,an phrasescouldbemutualtranslationsofeach empirical question to what extend such TMEMs other. can be used to improve the performance of current translation systems. To determine this, we 2For example, the translation pair “final , le secre´taire modifiedanexistingdecodingalgorithmsothatit de”and“finalact,thesecretaryof”werelabeledasalmost canexploitinformationspecificbothtoastatisti- perfectbecausetheEnglishword“act”hasnoFrenchequiv- alent. caltranslationmodelandastatisticalTMEM. English French Judgment ,butIcannotsay ,maisjenepuisdire correct howdidthisallcomeabout? commentest-cearriveé? correct but,Ihumblybelieve mais,a`monhumbleavis correct finalact,thesecretaryof final,lesecre´tairede almostcorrect otherpartsofCanadathan autresre´gionsdelepaysque incorrect whatisthetotalamountaccumulated acombiensee´le`vela incorrect thatpartypresentthis cepartipre´sentaujourd’hui incorrect theairraftcompanytopresentfurtherstudies deautree´tudes incorrect Table3: ExamplesofTMEMentrieswithcorrectnessjudgments. C Thedecodingalgorithmthatweuseisagreedy tendu ,N and he is talking; il parleN but no pair one—see(Germannetal.,2001)fordetails. The withtheFrenchphrase“bellevictoire”. decoder guesses first an English translation for If the input sentence is found “as is” in the the French sentence given as input and then at- translation memory, its translation is simply re- tempts to improve it by exploring greedily alter- turned and there is no further processing. Oth- nativetranslationsfromtheimmediatetranslation erwise, once an initial alignment is created, the space. Wemodifiedthegreedydecoderdescribed greedy decoder tries to improve it, i.e., it tries to by Germann et al. (2001) so that it attempts to findanalignment(andimplicitlyatranslation)of find good translation starting from two distinct higherprobabilitybymodifyinglocallytheinitial points in the space of possible translations: one alignment. The decoder attempts to find align- point corresponds to a word-for-word “gloss” of ments and translations of higher probability by the French input; the other point corresponds to employing a set of simple operations, such as a translation that resembles most closely transla- changing the translation of one or two words in tionsstoredintheTMEM. the alignment under consideration, inserting into As discussed by Germann et al. (2001), the or deleting from the alignment words of fertility word-for-word gloss is constructed by aligning zero,andswappingwordsorsegments. each French word fL with its most likely En- In a stepwise fashion, starting from the ini- glish translation e (e argmax t(e (cid:0) fL )). tialglossorinitialcover, thegreedydecoderiter- f f n k kml For example, in translating the French sentence atesexhaustively overall alignmentsthatareone “Bien entendu , il parle de une belle victoire .”, such simple operation away from the alignment the greedy decoder initially assumes that a good under consideration. At every step, the decoder translationofitis“Wellheard,ittalkingabeauti- chooses the alignment of highest probability, un- fulvictory”becausethebesttranslationof“bien” tiltheprobabilityofthecurrentalignmentcanno is “well”, the best translation of “entendu” is longerbeimproved. “heard”, and so on. A word-for-word gloss re- 5 Evaluation sults (at best) in English words written in French wordorder. We extracted from the test corpus a collection The translation that resembles most closely of 505 French sentences, uniformly distributed translations stored in the TMEM is constructed across the lengths 6, 7, 8, 9, and 10. For each byderivinga“cover”fortheinputsentenceusing French sentence, we had access to the human- phrasesfromtheTMEM.Thederivationattempts generated English translation in the test corpus, to cover with translation pairs from the TMEM and to translations generated by two commercial as much of the input sentence as possible, using systems. We produced translations using three the longest phrases in the TMEM. The words in versionsofthegreedydecoder: oneusedonlythe theinputthatarenotpartofanyphraseextracted statistical translation model, one used the trans- from the TMEM are glossed. For example, this lation model and the FTMEM, and one used the approach may start the translation process from translationmodelandthePTMEM. thephrase“well,heistalkingabeautifulvictory” Weinitiallyassessedhowoftenthetranslations C if theTMEMcontains thepairs well , ; bien en- obtained from TMEM seeds had higher proba- Sent. Found Higher Same Higher tion was considered semantically incorrect. length in prob. result prob. For example, “this is rather provision dis- FTMEM from from FTMEM gloss turbing” was judged as a correct semantical 6 33 9 43 16 translation of “voila` une disposition plotoˆt 7 27 9 48 17 inquie´tante”, but “this disposalis rather dis- 8 29 16 42 14 9 31 15 28 27 turbing”wasjudgedasincorrect. 10 31 9 43 18 All(%) 30% 12% 40% 18% A If a translation was perfect from a grammatical perspective, it was considered to be Table4: TheutilityoftheFTMEM. grammatical. Otherwise, it was considered Sent. Found Higher Same Higher incorrect. For example, “this is rather pro- length in prob. result prob. vision disturbing” was judged as ungram- FTMEM from from matical,althoughonemayveryeasilymake FTMEM gloss 6 33 9 43 16 senseofit. 7 27 10 50 14 8 30 16 41 14 We decided to use such harsh evaluation criteria 9 31 15 36 19 because, in previous experiments, we repeatedly 10 31 15 31 13 found that harsh criteria can be applied consis- All(%) 31% 13% 41% 15% tently. To ensure consistency during evaluation, Table5: TheutilityofthePTMEM. the judge used a specialized interface: once the correctnessofatranslationproducedbyasystem S was judged, the same judgment was automati- bility than the translations obtained from simple callyrecordedwithrespecttotheothersystemsas glosses. Tables 4 and 5 show that the transla- well. This way, it became impossibleforatrans- tionmemories significantlyhelpthedecoder find lation to be judged as correct when produced by translations of high probability. In about 30% one system and incorrect when produced by an- of the cases, the translations are simply copied othersystem. from a TMEM and in about 13% of the cases Table6,whichsummarizestheresults,displays thetranslationsobtainedfromaTMEMseedhave the percent of perfect translations (both semanti- higher probability that the best translations ob- callyandgrammatically)producedbyavarietyof tained from a simple gloss. In 40% of the cases systems. Table6showsthattranslationsproduced both seeds (the TMEM and the gloss) yield the usingbothTMEMandglossseedsaremuchbet- same translation. Only in about 15-18% of the ter than translations that do not use TMEMs. cases the translations obtained from the gloss The translation systems that use both a TMEM are better than the translations obtained from the andthestatisticalmodeloutperformsignificantly TMEM seeds. It appears that both TMEMs help the two commercial systems. The figures in Ta- thedecoderfindtranslationsofhigherprobability ble 6 also reflect the harshness of our evaluation consistently, acrossallsentencelengths. metric: only 82% of the human translations ex- In a second experiment, a bilingual judge tractedfrom the testcorpus were considered per- scored the human translations extracted from the fect translation. A few of the errors were gen- automatically aligned test corpus; the transla- uine, and could be explained by failures of the tionsproducedbyagreedydecoderthatuseboth sentencealignmentprogramthatwasusedtocre- TMEMandglossseeds;thetranslationsproduced ate the corpus (Melamed, 1999). Most of the er- by a greedy decoder that uses only the statistical rors were judged as semantic, reflecting directly model and the gloss seed; and translations pro- theharshnessofourevaluationmetric. ducedbytwocommercialsystems(AandB). 6 Discussion A If an English translation had the very same meaning as the French original, it was con- The approach to translation described in this pa- sidered semantically correct. If the mean- per is quite general. It can be applied in con- ing was just a little different, the transla- junction with other statistical translation mod- Sentence Humans Greedywith Greedywith Greedywithout Commercial Commercial length FTMEM PTMEM TMEM systemA systemB 6 92 72 70 52 55 59 7 73 58 52 37 42 43 8 80 53 52 30 38 29 9 84 53 53 37 40 35 10 85 57 60 36 40 37 All(%) 82% 58% 57% 38% 42% 40% Table6: Percentofperfecttranslationsproducedbyvarioustranslationsystemsandalgorithms. els. And it can be applied in conjunction with “kicked” and“bucket” beingtranslatedinto“est” existing translation memories. To do this, one and “mort”. Because of this, a statistical-based wouldsimplyhavetotrainthestatisticalmodelon MT system will have trouble producing a trans- the translation memory provided as input, deter- lation that uses the phrase “kick the bucket”, no minetheViterbialignments, andenhance theex- matterwhatdecodingtechniqueitemploys. How- isting translation memory with word-level align- ever, if the two phrases are stored in the TMEM, ments as produced by the statistical translation producingsuchatranslationbecomesfeasible. model. Wesuspectthatusingmanuallyproduced If optimal decoding algorithms capable of TMEMs can only increase the performance as searching exhaustively the space of all possible such TMEMs undergo periodic checks for qual- translations existed, using TMEMs in the style ityassurance. presented in this paper would never improve the The work that comes closest to using a sta- performance of a system. Our approach works tistical TMEM similar to the one we propose because it biases the decoder to search in sub- here is that of Vogel and Ney (2000), who au- spacesthatarelikelyto yieldtranslationsof high tomatically derive from a parallel corpus a hier- probability, subspaces which otherwise may not archical TMEM. The hierarchical TMEM con- be explored. The bias introduced by TMEMs is sists of a set of transducers that encode a sim- a practical alternative to finding optimal transla- ple grammar. The transducers are automatically tions,whichisNP-complete(Knight,1999). constructed: they reflect common patterns of us- It is clear that one of the main strengths of the age at levels of abstractions that are higher than TMEM is its ability to encode contextual, long- thewords. VogelandNey(2000)donotevaluate distance dependencies that are incongruous with theirTMEM-basedsystem,soitisdifficulttoem- the parameters learned by current context poor, piricallycomparetheirapproachwithours. From reductionist channel models. Unfortunately, the a theoretical perspective, it appears though that criterion used by the decoder in order to choose the two approaches are complementary: Vogel between a translation produced starting from a andNey(2000)identifyabstractpatternsofusage gloss and one produced starting from a TMEM and then use them during translation. This may isbiasedinfavorofthegloss-basedtranslation. It address the data sparseness problem that is char- is possible for the decoder to produce a perfect acteristic to any statistical modeling effort and translation using phrases from the TMEM, and producebettertranslationparameters. yet, to discard the perfect translation in favor of In contrast, our approach attempts to stir the an incorrect translation of higher probability that statistical decoding process into directions that was obtained from a gloss (or from the TMEM). are difficult to reach when one relies only on Itwouldbe desirableto develop alternative rank- the parameters of a particular translation model. ingtechniques thatwould permitone topreferin For example, the two phrases “il est mort” and some instances a TMEM-based translation, even “he kicked the bucket” may appear only in one though that translation is not the best according sentence in an arbitrary large corpus. The pa- totheprobabilisticchannelmodel. Theexamples rameterslearned from theentirecorpuswill very in Table 7 shows though that this is not trivial: it likelyassociateverylowprobabilitytothewords isnotalwaysthecasethatthetranslationofhigh- Translations Doesthistranslation Isthis Isthisthetranslation useTMEM translation ofhighest phrases? correct? probability? monsieurlepre´sident,jeaimeraissavoir. mr.speaker,iwouldliketoknow. yes yes yes mr.speaker,iwouldliketoknow. no yes yes jenepeuxvousentendre,brian. icannothearyou,brian. yes yes yes icanyoulisten,brian. no no no alors,jeterminela`-dessus. therefore,iwillconcludemyremarks. yes yes no therefore,iconclude-over. no no yes Table7: Exampleofsystemoutputs,obtainedwithorwithoutTMEMhelp. estprobabilityistheperfectone. ThefirstFrench H. Maruyana and H. Watanabe. 1992. Tree cover sentenceinTable7iscorrectlytranslatedwithor searchalgorithmforexample-basedtranslation. In ProceedingsofTMI’92,pages173–184. without help from the translation memory. The secondsentenceiscorrectlytranslatedonlywhen Dan Melamed. 1999. Bitext maps and alignment viapatternrecognition. ComputationalLinguistics, the system uses a TMEM seed; and fortunately, 25(1):107–130. the translation of highest probability is the one obtained using the TMEM seed. The translation Franz Josef Och, Christoph Tillmann, and Herman Ney. 1999. Improved alignment models for sta- obtainedfromtheTMEMseedisalsocorrectfor tistical machine translation. In Proceedings of thethirdsentence. Butunfortunately, inthiscase, the EMNLP and VLC, pages 20–28, University of theTMEM-basedtranslationisnotthemostprob- Maryland,Maryland. able. S. Sato. 1992. CTM: an example-based transla- tionaidsystemusingthecharacter-basedmatchre- Acknowledgments. This work was supported trieval method. In Proceedings of the 14th Inter- byDARPA-ITOgrantN66001-00-1-9814. national Conferenceon Computational Linguistics (COLING’92),Nantes,France. References RobertC.Sprung,editor. 2000. TranslatingIntoSuc- Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin cess: Cutting-EdgeStrategies For Going Multilin- Knight, John Lafferty, Dan Melamed, Franz-Josef gualInAGlobalAge. JohnBenjaminsPublishers. Och, David Purdy, Noah A. Smith, and David Yarowsky. 1999. Statistical machine translation. Tony Veale and Andy Way. 1997. Gaijin: A FinalReport,JHUSummerWorkshop. template-basedbootstrappingapproachtoexample- basedmachinetranslation. InProceedingsof“New Peter F. Brown, Stephen A. Della Pietra, Vincent J. Methods in Natural Language Processing”, Sofia, Della Pietra, and Robert L. Mercer. 1993. The Bulgaria. mathematics of statistical machine translation: Pa- rameter estimation. Computational Linguistics, S. Vogel and Herman Ney. 2000. Construction of a 19(2):263–311. hierarchicaltranslationmemory. InProceedingsof COLING’00,pages1131–1135, Saarbru¨cken,Ger- RalphD.Brown. 1999. Addinglinguisticknowledge many. to a lexical example-based translation system. In ProceedingsofTMI’99,pages22–32,Chester,Eng- Ye-Yi Wang. 1998. Grammar Inference and Statis- land. tical Machine Translation. Ph.D. thesis, Carnegie Mellon University. Also available as CMU-LTI A.P.Dempster,N.M.Laird,andD.B.Rubin. 1977. TechnicalReport98-160. Maximum likelihoodfrom incomplete datavia the em algorithm. Journal of the Royal Statistical So- DekaiWuandHongsingWong. 1998. Machinetrans- ciety,39(SerB):1–38. lation with a stochastic grammatical channel. In Ulrich Germann, Mike Jahr, Kevin Knight, Daniel Proceedings of ACL’98, pages 1408–1414, Mon- Marcu, and Kenji Yamada. 2001. Fast decoding treal,Canada. and optimal decoding for machine translation. In Kenji Yamada and Kevin Knight. 2001. A syntax- ProceedingsofACL’01,Toulouse,France. basedstatisticaltranslation model. InProceedings Kevin Knight. 1999. Decoding complexityin word- ofACL’01,Toulouse,France. replacement translation models. Computational Linguistics,25(4).

DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation PDF

9 Pages·0.08 MB·English

by Defense Technical Information Center

#additional_collections #dticarchive

Checking for file health...

Save to my drive

Quick download

Download

Download DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation PDF Free - Full Version

by Defense Technical Information Center| 9 pages| 0.08| English

Download DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation by Defense Technical Information Center in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation

No description available for this book.

Detailed Information

Author:	Defense Technical Information Center
Pages:	9
Language:	English
File Size:	0.08
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation PDF?

Yes, on https://PDFdrive.to you can download DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation by Defense Technical Information Center completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation on my mobile device?

After downloading DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation?

Yes, this is the complete PDF version of DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation by Defense Technical Information Center. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download DTIC ADA461149: Towards a Unified Approach to Memory- and Statistical-Based Machine Translation PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.