SYSTRAN

Page 1

CMPE 590 Machine Translation

Spring 2006

SYSTRAN

Research Project

Işık Barış Fidaner

CMPE 590

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

SYSTRAN There are fields that are mainly based on theory, and there are fields that only focus on utility. Machine translation is in between. Some MT systems are based mainly on linguistics and computer science, and some of them take the direct way to the results. There is a wide range of approaches and systems in MT. One of them was SYSTRAN. Before SYSTRAN, there was a US government-supported MT project going on in Georgetown University. Their system translated Russian science texts into English. But after the (in)famous ALPAC report, the government cut its support. Georgetown project ended, but later one of its staff started his own MT system, SYSTRAN. SYSTRAN is one of the first commercial MT systems. It was mainly based on its ancestor Georgetown Automatic Translation (GAT, also means General Analysis Technique in different sources) system, but it went through several improvements. It was designed for several language pairs including Russian, English, French, Spanish and many more. Used in different parts of the world by big instuitions. It was the first widely known translating service on Internet. But as it was not based on a certain linguistic theory, after some point it became very hard to improve. This document explains SYSTRAN as well as its ancestor GAT, including both their histories and their technical structure.

Georgetown MT research history SYSTRAN’s ancestor was a university machine translation project that was funded by US government. Georgetown project was a government supported project for automatic translation of Russian science texts into English including atomic physics and other areas. In 1954, IBM and Georgetown University made a public demonstration of their machine translation system. Forty nine Russian sentences in the field of chemistry were carefully selected for the experiment to illustrate a variety of problems and solutions in MT. The vocabulary included only 250 words, and grammar only six rules. This demonstration had no scientific value, but nevertheless convinced many people that they achieved to solve the problem of machine translation. In the following decade, 20 million dollars were invested by US government in 17 different MT projects. In 1964, GAT was started to be used in Atomic Energy Commision at Oak Ridge National Laboratory (until 1979) and EURATOM in Ispra, Italy (until 1976). In the same year, the National Science Foundation set up the Automatic Language Processing Advisory Committee (ALPAC) that later sign a report that would affect the MT research. In 1965, IBM Mark II system using GAT was installed in US Air Force Technology Division. But in 1966, ALPAC reported that current MT systems (including GAT) were more expensive than human translation, and they had less quality. They also claimed that MT systems could not be predicted to improve. After this report, MT funding were cut, MT efforts were reduced in general.

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

Inside GAT GAT was a typical direct MT system. In fact, GAT was one of the successful direct translation systems. In direct MT systems, analysis of input text is dependent to the target language. The result of analysis solves word ambiguities and gives the necessary word ordering for the target language, it is also the synthesis process in this sense.

A direct MT system such as GAT

Their theory considered translation to involve two basic types of processes: Selection and manipulation. Practically, the process was word-by-word translation followed by an ordering of the words. Later word concept was widened to include certain word groups and idioms. Syntactic analysis was limited to the parts of speech, and it is only for solving homographs (e.g. same word being a noun and a verb such as "control"). Semantic analysis was limited or none at all. There was no clear distinction between the analysis of source language text and synthesis of target language text. A matrix was built for the whole process.

Matrix structure in GAT

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

Russian and English grammar rules were hard coded in the program. This made modification very difficult. Outputs were not comparable to human translation, but it was adequate for the institutions. They could quickly scan through Russian physics texts and find the papers they needed without human translators. In the time of GAT, machine translation was practically not impacted by linguistics. They did not base their algorithms on a truly scientific linguistic theory. In fact their general approach was pragmatic rather than scientific. GAT was developed to work for an input. Then it was tested on a larger input database and updated accordingly. The developments were mainly for application purposes. As GAT did not have a underlying general theory of linguistics or algorithms applied, it became a very complex system that did not allow further developments.As a matter of fact, no significant changes were made in the program after its delivery to American and European science centers. It gave low quality output, it could not be developed, nevertheless the instuitions used it for a very long time. This gives a clue about the demand for MT at those years.

Above is the design of the operation that was executed if the Russian word ended with "E".

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

History of SYSTRAN During GAT project, one of its staff, Peter Toma had a company for MT research: Computer Concepts Inc. based in Los Angeles. In 1963, he reported that AUTOTRAN was developed as a “fast, efficient and accurate” MT system for Russian-English translation with 100,000 stem entries in the dictionary about atomic energy and medicine. It was programmed on IBM 7090. First steps in Germany In 1964, Toma moved in Germany to continue research with suport of Deutsche Forschungsgemeinshaft (DFG). He started development of SYSTRAN in Bonn University. SYSTRAN’s first design documents show obvious similarity to GAT. Three years later, DFG proposed to develop Russian-German version at the University of Saarland. They evaluated a prototype, but decided to develop their own translation system. SYSTRAN in USA In 1968, Toma founded his company Latsec Inc. in La Jolla, California. In one year after the foundation, SYSTRAN was tested at the Wright-Patterson Air Force Base in Ohio. In 1970, SYSTRAN replaced the previous IBM Mark II system at US Air Force Technology Division. Four years later in 1974, NASA used SYSTRAN for translating texts related to the Apollo-Soyuz collaborated project. Latsec Inc. was also working on other versions. A Chinese-English prototype was demonstrated in 1975 to American government, and a German-English version was proposed to the US Army. Meantime, developments in Russian-English system continued. In the period 1970-75, number of homograph routines doubled. Stem entries increased by 30,000. In time, USAF FTD dictionaries had over a million entries for Russian-English translation. This enormous size of the dictionary was more than SYSTRAN system could carry. Improvements started to cause unexpected degradations for previously unproblematic input texts. According to a study in 1978, if the dictionary was improved to translate 7 more inputs, 3 inputs that were previously translated were lost. Because of this problem, every proposal of improvement was checked against a library of Russian input texts of 50 million words. This prevented overall changes, and allowed only “fine tuning” of the system. SYSTRAN in Europe An English-French version of SYSTRAN was started in 1973. Then it was demonstrated to representatives of the Commission of the European Communities (CEC) in 1975. They signed a contract to develop translation systems between languages of European communities. In 1976, SYSTRAN replaced GAT at EURATOM. In the February of the same year, CEC purchased an English-French version of SYSTRAN. For the use of CEC, translation quality was much more important than in the science centers, where translations were only for scanning through many papers. They were not satisfied with the translation quality at first, but hoped for better results with an improved dictionary. Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

Latsec Inc. later became World Translation Center (WTC). In 1977, they started to develop a French-English version of SYSTRAN. SYSTRAN translations were not good enough for CEC translators, but they could speed up the translation process through post-editing. Then tens of CEC translators started to use SYSTRAN through their computers, when it reached nearly a hundred thousand entries in the French dictionary. French-English and English-Italian versions were started operation in CEC in 1978 and 1979. French-English version was also used in Centre for Nuclear Research in Karlruhe. Aérospatiale in Paris bought both French-English and English-French versions. CEC not only used SYSTRAN, but also developed it. Most of the developments in French and Italian versions of SYSTRAN were achieved in CEC Translation Department in Luxembourg. CEC planned to completely reprogram SYSTRAN, but they instead sponsored Margaret Masterman at Cambrige Language Research Unit for a program to automatically annotate the SYSTRAN macro-assembler code so that CEC staff could investigate through the code and correct any possible problems. To see the improvement, we can look at the results of two evaluations of SYSTRAN. First was in October 1976, second in June 1978. Intelligibility scores for unedited raw MT outputs had increased from 45% to 78%. Outputs of SYSTRAN were either fully revised to get a good quality translation that is no worse than work of a human translator, or they were minimally post-edited to get low-quality but readable texts. This second option was for French-English informative texts. In 1978, EUROTRA was initiated to produce advanced and more flexible alternatives to first generation systems including SYSTRAN. When CEC first bought the English-French system in 1976, it had a dictionary of 6,000 entries. In 1984, there were 150,000 entries for each of three language pairs. Analysis and synthesis programs also extended from 30,000 lines to 100,000 lines. In 1983, averagely 12 professional staff were working full time on the project for about eight years. A total of 1250 pages translated in 1981, 3150 pages in 1982. In 1983 this number leaped to 40,000. In first two months of 1983, 50% of the English-Italian translations and 25% of French-English translations were made with help of SYSTRAN. This increase is partly due to new word processors that were linked directly to the IBM mainframe that ran SYSTRAN. In 1998, SYSTRAN had 17 language pairs, which included translation from English into French, Italian, German, Dutch, Spanish, Portuguese and Greek; from French into English, German, Dutch, Italian and Spanish; from German into English and French; from Spanish into English and French; and from Greek into French. Other SYSTRAN users In 1976, General Motors of Canada started to use SYSTRAN for the translation of the product manuals to French. In 1978, WTC Canada started to market SYSTRAN II in US, Canada and some parts of Europe. This was an integrated system of machine translator, word processing program and photocomposition systems. In 1979, Systran Institut founded in Munich. This institution provided translations for companies that could not purchase their own system. General Motors had its own dictionary of nearly 130,000 entries by 1981. Then EnglishSpanish version came. SYSTRAN quickened GM's translation process by a factor of 3 or 4.

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

After 1981, the usage of SYSTRAN steadily grew and developments are continued. In 1982, CEC began to work on English-German and French-German SYSTRAN systems. Xerox was another user of SYSTRAN. Xerox used a special manual style and special input language (Multinational Customized English) which made input texts easier to analyze. SYSTRAN could translate this into several languages (French, Italian, Spanish, German and Portugese). Translation time was reduced to one fifth. In 1997, SYSTRAN started first widely used free online translation service through the search engine AltaVista.

Inside SYSTRAN SYSTRAN general structure SYSTRAN was the greatly improved descendant of GAT. The system evolved to have a modular design. SYSTRAN was composed of three parts: •

System programs. These are language independent utilities, used for controlling, looking up in dictionary etc. Dictionary lookup was done by using a finite-automata method that searched in constant time independent from dictionary size.

Translation programs. These applied the successive steps in the translation of text. Every step had its own module. These programs were language dependent, and were written by designers for every language pair.

Dictionaries. These were the most important and most massive part of the system. In the developments, dictionaries were given too much flexibility and they became very complex and unmaintainable.

New language pairs were developed by the following processes: •

Necessary improvements to the analysis and synthesis programs were added for the new language pair.

New bilingual dictionaries of new source and target languages were compiled. In this process, previous dictionaries are used as a starting point.

Despite its enhancements, SYSTRAN is often considered to be ad hoc, because new semantic features were added only to solve particular cases. The developments did not rely on a general linguistic theory. There are no linguistic improvements, but the programming of SYSTRAN is highly modular compared to GAT. This allows parts of the program to be modified without affecting other parts. Computational processes are strictly seperated from linguistic data. Linguistic data is stored in dictionaries and not hard coded as in GAT.

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

Structure of a transfer system

If we call GAT and other direct translation systems as "first generation systems", the second generation systems can be classified in two approaches: Transfer systems and interlingual systems. In interlingual systems, the analysed text is in a language independent structure. In transfer systems, analysed text structure is dependent only on the source language. In SYSTRAN, analysis and synthesis components are seperated. Linguistic procedures are designed for a certain source and target language pair. However, SYSTRAN is still a direct translation system, because analysis and synthesis is rewritten for every new language pair. But as SYSTRAN developed, it acquired some abilities of a transfer MT system, in the sense that analysis, transfer and synthesis phases are seperated from each other. These features made SYSTRAN a hybrid ‘direct-transfer’ system. In a transfer system, synthesis must be restricted to only source language, and analysis to the target language. In SYSTRAN, analysis is highly dependent on the source language, but also dependent on the target language. The same is valid for synthesis phase and the target language. For example, analysis program of English-French translator cannot be directly used in, but it can easily be adapted to an English-Italian system. In the third step of the analysis, where compound nouns are found, target language forms are also determined. This in turn affect the other analysis stages. In SYSTRAN, both analysis and synthesis parts can use any information about source or target language. For example, in Russian-English system, for inserting definite or indefinite articles, Russian and English semantic information are mixed together. Or a lexical information that is used in analysis of a Russian word may include a code about English synthesis. In fact, the lexical item for Russian word “esli” included a code about converting infinitive to finite verb form in the English translation. This made SYSTRAN a non-uniform, inconsistent system in which coverage and quality is uneven and modifications in a part of the dictionary usually brought unexpected problems. Stages of translation in SYSTRAN 1) Input: The input text and the dictionaries are loaded in this stage. Then the program checks each word against a High Frequency dictionary 2) Main dictionary lookup: Remaining words are sorted alphabetically and searched in the Main Stem Dictionary. HF and MS dictionaries give grammatical information, some semantic data, also potential translations in the target language. Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

3) The words are morphologically analyzed by looking up stems and endings in the dictionary. This step is important in languages like French and Russian, but skipped in analysing English. Analysis is composed of seven steps 1) In the first step, homographs are resolved. Is the word in its noun form or verb form? This is found by looking at the grammatical categories of adjacent words. For example “to control” is in verb form, “the control” is in noun form. 83 different types of homograph have been identified in English. 2) Secondly, compound nouns are determined, e.g. “blast furnace”. These are looked up in a Limited Semantics Dictionary. 3) The sentence is divided into phrase groups. This is made by looking at punctuation marks, conjunctions etc. 4) Primary syntactic relations are determined by a right-to-left scan of the sentence. •

Adjective-noun congruence

Noun-verb government

Noun-noun apposition

5) Coordinate structures are identified in phrases. Conjoined adjectives or nouns modifying a noun. For example, in the sentence “water and air pollution”, “water” must be coordinated with “air”, not “pollution”. This matching is done by the help of semantic markers. 6) Subjects and predicates are determined by searching by finite verbs, and then finding a preceding noun not already marked as an ‘object’ or a ‘modifier’. In passive sentences, deep subject and deep object is also found. 7) Prepositional phrases are found by right-to-left search of a preposition, followed by a left-to-right search to find its dependent noun phrase. Transfer: This program is composed of three parts 1) Conditional idioms are found. These are words or word groups that give a single meaning together in certain conditions. For example English word “agree” has different French translations in active and passive tense. This information is found in the Conditional Limited Semantics Dictionary. 2) Prepositions are translated with the help of semantic information extracted from words that govern or are governed by those prepositions. This is because every language has a different set of prepositions used in different meanings. 3) Remaining ambiguities are solved by forcing the assertions written in dictionaries for particular words or expressions. Synthesis

Systran stages of translation

1) Every word is translated to target language. Verb forms and adjective endings are applied in the target language. 2) Words and phrases are reordered to make a sentence in target language. For example, an English adjective-noun becomes French noun-adjective sequence.

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

3) An additional routine was made for French synthesis. In translating English pronoun “it”, the referred object must be known to decide between “il” or “elle” in French. Example analysis: The texts were translated by a computer •

Sentence •

Predicate: verb, past passive...........translate

Deep subject:..............................computer

Deep object:................................texts

Subject: noun................................texts o Determiner: def.art............the

Prep. phrase 1: preposition............by o Noun phrase: noun........computer 

Determiner: def.art.......a

Example analysis: the translation of texts by computer Sentence: •

(Subject): verbal noun...................translation

Deep subject:..................................computer

Deep object:.....................................texts

Determiner: def.art.................................the o Prep. phrase 1: preposition.....................of

Noun phrase: noun...................texts o Prep. phrase 2: preposition........................by

Noun phrase: noun...................computer

Example analysis output: How many apples would he eat if he was at home? <clause subject=[239] predicate=[240]> (236) How many [<syntax type="DET">+functional_pos=funcadjright+pronoun_type=interrogative+type=pronoun+modifies_righ t=[237]</syntax>] (237) apples [apple<syntax type="N">+functional_pos=funcnoun+object_of_verb=[240]+dirobj_of=[240]+concrete+countable+nu mber=plural+type=commonnoun+modified_by_adj_left=[236]</syntax>] (238) would [will<syntax type="V">+auxiliary+tense=past+mood=question+type_aux=aux_conditional</synt ax>] (239) he [he<syntax type="PRO">+functional_pos=funcnoun+ +human+number=singular+type=personal+perspro_type=subject+person=3+agent_of_verb=[240]</sy ntax>]

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

(240) eat [eat<syntax type="V">+functional_pos=funcverb+conditional+tense=past+mood=question+object_of_action=[237] +direct_object=[237]+agent=[239]</syntax>] </clause> <clause subject=[242] predicate=[243]> (241) if [if<syntax type="CONJ">+type=subordinate</syntax>] (242) he [he<syntax type="PRO">+functional_pos=funcnoun+human+number=singular+type=personal+perspro_type=subj ect+person=3+agent_of_verb=[243]</syntax>] (243) was [be<syntax type="V">+meaningid=m1+functional_pos=funcverb+tense=past+modified_by_adv=[244]+agent=[242 ]</syntax>] (244) at home [<syntax type="ADV">+functional_pos=funcadv+subject_of_clause=[242]+type=simple+modifies_verb=[243]</ syntax>] </clause>

An addition to SYSTRAN by CEC staff was a routine to deal with unknown words. This routine looked for certain suffixes such as –ology, -ologist or -meter. These were translated to corresponding suffixes in French (-ologue etc.) Also, semantic data were assigned to these words about meaning, being science, person of some profession, device etc. These words had left untranslated in Russian-English system used by USAF. Another addition involved tense changes in phrases like “the day after”. Development team in Luxembourg solved grammatical problems through additions to the dictionary, which resulted in a very complex dictionary. For example “éviter” means “prevent” in French and when translated, it must have the form “prevent doing something” instead of “prevent that something has been done”. Even this is coded in the entry “éviter” in the French-English dictionary. SYSTRAN dictionaries SYSTRAN translation quality was very dependent on the dictionaries. These dictionaries not only included passive data. Large part of the dictionaries were composed of algorithms that were invoked at some stage of translation. There were at first two bilingual Stem dictionaries for every language pair, one for single word entries and one for multiword expressions. But when the number of language pairs increased, a mono-source/multitarget approach was preferred. The Stem dictionary of each source language contained basic syntactic information on a word, information required for homograph resolution, syntactic and semantic codes and, its translation in all target languages of the system. Other dictionaries were: High Frequency Dictionary, including prepositions, conjunctions, first words of idiomatic expressions, irregular verb forms etc. Limited Semantics Dictionary was for idioms and compound nouns. Most frequent entries in this dictionary were idiom replaces and straight limited semantics (SLS). Idiom replaces were prepositional, conjunctive or adverbial phrases such as "with respect to" or "in

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

order that". These word groups were treated as one word in translation. SLS items generally included technical terms such as "power station" that were translated together. Conditional Limited Semantics (CLS) Dictionary, for dealing with semantic compatibilities and valencies. CLS rules allowed users to define the exact context in which the rule should apply. Main dictionary, which was divided into stems and endings. An example transfer rule: "allergic"->"allergiás"; $1~<syntax type="A">+modified_by_prep1=$2</syntax>; $2="to"; $2~<syntax type="PREP">+direct_object=$3</syntax>; $3~<syntax type="N"></syntax>; $3-><syntax type="N">+case=sub</syntax>; $2->$$$.

Every entry in dictionaries included •

Morphological class

Part of speech

Government and valency

Agreement

Transitivity

Noun type (animate, mass, abstract etc.)

Semantic markers (physical property, food product… 450 different marker existed)

TL preposition governed by item

TL equivalents with morpholoical and syntactic information for synthesis.

In the dictionary, most general meaning was coded as the default translation in TL. The most general meaning was found by checking frequency listings of words in texts which had been processed by the system. Other translations were also kept in the dictionary and SYSTRAN needed a mechanism to alternatively select these other meanings. Topical glossary idea was found to solve the problem of disambiguation in the translations of scientific terms and jargons. The translators giving input texts could select through a set of topical glossaries. This can be considered as a semi-interactive translation process. The glossaries affected the selection of words in the disambiguation phase of translation. Application of topical glossaries were done by using some semantic markers for more than twenty different fields: AGPRO (agriculture), ANTEC (analysis), PRAVIA (aviation), PRIO (biology), PRCH (chemistry), PRCR (creative), PREL (electrical), etc. These markers were put to words that were selected in translation of texts in that certain topic. The most important disadvantage of TGs was the fact that once the user specified a Topical Glossary to

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

be used and a word is found in his text which has a meaning in that Topical Glossary, this meaning would always be used in the text being translated. Coding something in a TG meant that the flexibility of having a word translated differently within a text was lost. Therefore, topical glossaries did not work. A simpler classification was created: e.g. DEV (device, tool, instrument), CONTR (container), MATER (material or substance used for production or operation). This classification helped to relate words. For example, in translating the phrase “faulty equipment and office management”, adjective “faulty” is linked to “equipment” which is is semantically coded DEV, instead of “office”. Most widely used semantic codes were HUMANS and GROUPS. Other examples of are CITIES, MONTHS, ANIMAL, PROPTY (property), COLOUR, etc. Another solution for disambiguation was using the idiom dictionary. For example, French word "poste" has default meaning in English as "post", but "poste de travail" means power station. If this is in the idiom dictionary, it is directly translated to power station and the ambiquity is solved. But if other words are inserted in this word group, this method will not work. Here comes the conditional semantic dictionary, where even if words are not adjacent, word relations can be used to define translations of certain noun phrases. In addition, semantic rules can be asserted on the text, and default meaning may be rejected if it contradicts with these rules. For example, default meaning of "work" is fonctioneer in French, but a CLS rule tells that it must be translated as travailler if the subject is human. These rules could contain conditions that used semantic as well as relational and even positional information of words. There is also a special semantic code KEEPMN meaning that the program should keep the meaning of a word group in the following sentences, even if the part of it was used. For example, "poste" was translated as "station" instead of "post", if "poste de travail" had a KEEPMN code. There was also lexical routines that were executed at the transfer stage. These codes were written for exceptional words at first, but then heavily used for resolving many disambiguities. Semantic categories did not have a general logic, they were only inserted to translate a certain input. For example a Russian preposition was translated to either “up to” or “down to” according to the semantic category of preceding word (“increase” or “decrease”). Another preposition was translated to “along” or “over” for syntactic category of the following noun (“linear” or “nonlinear”). Other example semantic categories for Russian words are: Composition, mathematics, measure, motion, optics, quality… Similar problems existed in English-French version by the CEC. These had no common point, they were ad hoc analysis solutions that were dependent on the target language. As in USAF, updating the dictionaries was very difficult, because adding to a part could subtract from another. This problem was because every change was a little step, but anyway SYSTRAN did not allow for a general structural changes in the dictionaries. The lexicon was very irregular and dictionaries were updated by trial and error. Therefore every amendment in SYSTRAN required extensive trials and testing.

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

SYSTRAN translation quality In 1971, two Russian-English system was systematically tested. These were 1964 Mark II version and 1971 SYSTRAN version of the same language pair. The words left untranslated has increased from 1.2% from 2.3% showing little or no improvement. Similarly, the words that had alternative translations has decreased from 6.3% to %5.3. According to Pigott (1984), morphological analysis and synthesis of French is 100% successful, in the resolution of homographs this decreases to only 90%, and synthesis of French text is generally unproblematic. Following texts are example translations from SYSTRAN French-English version in 1983. The first is taken from the articles of an agreement, and the second is from a technical report. These show that MT output is not perfect, but very useful and informative. The application of these methods to the definition of a management and possible valorization policy of waste requires knowledge a variability of the behaviour of the coproducts according to their origin (nature of the methods and manufactures) and to their time to production; indeed, if variability is low it will be possible to define general elimination rules and in the contrary case, it will be necessary to organize the catch counts some and the follow-up of waste on the level even the producing factories. The detection of the gamma rays requires their interaction with a matter. It results from this interaction either an electron accompanied by the emission by a photon by lower energy (Compton effect), or a electron-positron pair, dominant phenomenon beyond some MeV. In both cases the produced charged particules take away certain information concerning the direction and the energy of the incidental gamma photon. Example translation from Russian by Mark II system: THE SWISS PUBLIC IS WORRIED, THE BASEL NEWSPAPER “"NATIONALZEITUNG” WRITES IN ONE OF THE LAST ISSUES. RECENT AMERICAN STATEMENTS ABOUT THE FACT THAT THE USA CAN USE FORCE IN THE NEAR EAST, THE NEWSPAPER EMPHASIZES, CAUSE ALARM ALL OVER THE WORLD. AS CONCERNS SWITZERLAND, THEN, IF THIS COURSE CONTINUES, IT WILL EXAMINE THE QUESTION CONCERNING AN EXIT FROM THE RECENTLY CREATED ON THE INSISTENCE OF WASHINGTON INTERNATIONAL ENERGY AGENCY, WHICH UNITES A NUMBER OF THE CAPITALIST COUNTRIES – THE GREATEST USERS OF OIL. Example technical translation which is better: HELICOPTER, A FLIGHT VEHICLE HEAVIER THAN AIR WITH VERTICAL BY TAKEOFF AND LANDING, LIFT IN WHICH IS CREATED ONE OR BY SEVERAL (MORE FREQUENT THAN TWO) ROTORS... A HELICOPTER TAKES OFF UPWARD VERTICALLY WITHOUT A TAKEOFF AND IT ACCOMPLISHES VERTICAL FITTING WITHOUT A PATH, MOTIONLESSLY “WILL HANG” ABOVE ONE PLACE, ALLOWING ROTATION AROUND A VERTICAL AXIS TO ANY SIDE, FLIGHT IN ANY DIRECTION AT SPEEDS IS PRODUCED FROM ZERO TO THE MAXIMUM...

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

Conclusion SYSTRAN is a fully automatic machine translation system that can be considered as a hybrid 'direct-transfer' system in a rule-based paradigm. It was for technical texts at first, but with the huge dictionaries, it became able to translate any text from global domain. There was no sublanguage used (except Xerox case), they aimed to cover whole languages. SYSTRAN versions were developed for mostly European languages. Chinese, Russian, Japanese, Korean and Arabic are also developed. Every version translated in one direction, but usually the opposite versions were developed. The grammar rules in dictionaries did not depend on a general linguistic theory, they were practical and ad hoc solutions for problems translating certain input text. SYSTRAN outputs of technical text were directly used, but as the domain extended, post-editing became necessary. SYSTRAN was one of the first and most successful commercial MT systems. This was partly due to its superiority to its descendant GAT, which was already a complex and successful MT system of the time. Another reason was that SYSTRAN was continually developed both by WTC and CEC staff to have dictionaries with more and more complex items. This increased the quality of SYSTRAN translations. But SYSTRAN had an inherent flaw that it was not based on a general scientific linguistic theory. This limitation appeared when dictionaries became extraordinarily complex and unmaintainable. Maybe this was the unescapable result of direct, pragmatic approach to MT.

Işık Barış Fidaner

2005702532


CMPE 590 Machine Translation

Spring 2006

References 1) Resolving Ambiguities in SYSTRAN (Katerina Antonopoulou, 1998) 2) Hutchins, W. J. (1986). Machine Translation: Past, Present, Future. Chichester/New York: Ellis Horwood/Wiley. 3) Senellart, Jean, Peter Dienes, Tamas Varadi. 2001a. “New generation SYSTRAN translation system”. MT Summit VIII, Santiago de Compostela, Spain. Pp311-316. 4) Hutchins, W. J. 1981 The Evolution of Machine Translation Systems. In Lawson 1982: 21--37. 5) Christian Raby: System Demonstration: SYSTRAN Enterprise. AMTA 1998: 498-500 6) Jin Yang , Elke D. Lange, SYSTRAN on AltaVista: A User Study on Real-Time Machine Translation on the Internet, Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup, p.275-285, October 28-31, 1998 7) Jonathan Slocum, A survey of machine translation: its history,current status and future perspectives, CL, Vol. 11,No. 1, p1-17, 1985. 8) Zarechnak, M. 1959. Three Levels of Linguistic Analysis in Machine Translation. J. ACM 6, 1 (Jan. 1959), 24-32. 9) Slocum, J., "An Experiment in Machine Translation," Proceedings of the 18th Annual Meeting of the Association for Computational Linguistics, Philadelphia, 1922 June 1980, pp. 163-167.

Işık Barış Fidaner

2005702532


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.