Machine translation and evaluation : online systems

The Internet has exerted a powerful influence over the development of machine translation (MT). The enormous quantity of multilingual documents online is a valuable resource for MT to support the development of online MT systems, many of which are free of charge. In order for a user to select the most suitable option, MT evaluation plays a critical role. Most users may not have a good enough understanding of MT, leading to inappropriate evaluation practices and subjective judgment about the usefulness of an MT system. In recent years, a number of quantitative translation evaluation metrics have been developed. By providing language independent translation quality scoring measurements with respect to human translations as gold standards, translations by various MT systems can be compared in a fair and reliable manner. Their translation performance can then be evaluated objectively and comparably, but still in high correlation with human judgment. In this research, a number of representative online MT systems are evaluated from the users’ perspective using quantitative evaluation metrics of their performance in different languages. Legal texts, which represent one of the most difficult text genres to translate, are conventionally conceived to be suitable for MT, and were thus selected as test data for this evaluation. On the basis of exploring the current MT development, the proper use of online MT for legal translation will be discussed and a user-oriented MT evaluation metric will be proposed. This metric is comparable with the other evaluation metrics for MT performance, and can reveal the translation quality in an intuitive and highly readable way. The reliability and usability of these evaluation metrics in the context of legal translation will then be examined. A horizontal comparison of the translation performance of a number of popular online MT systems will be carried out using these metrics on a large scale corpus of legal texts, to show the relative strengths and weaknesses of different systems. The evaluation results from the automatic quantitative scoring using real legal texts provide us with an objective view of the suitability of these systems for legal translation in different language pairs. As a whole, the evaluation shows that there is no particular one of these MT systems outperforming the others for all language pairs. Different systems have different strengths and weaknesses for different language pairs. It is expected that these evaluation results can help a user to select the most suitable online MT system for a particular translation task.

Contents

Chapter 1  Introduction
1.1 Overview
1.2 The problem
1.3 Methodology
1.4 Overview of the thesis
Chapter 2  Overview of Current MT Development
2.1 MT approaches
2.1.1 Rule-based MT
2.1.2 Example-based MT
2.1.2.1 Matching
2.1.2.2 Adaptation and recombination
2.1.2.3 What is Example-based MT?
2.1.3 Statistical MT
2.1.3.1 Foundation of statistical MT
2.1.3.2 Phrase-based model
2.1.3.3 Collection of bilingual phrase pairs
2.1.3.4 Advantages of phrase-based model
2.1.3.5 Training algorithms
2.1.3.6 Decoding algorithm
2.1.3.7 Log-linear model combinations
2.1.4 Hybridity in MT
2.1.4.1 Example + Rule
2.1.4.2 Statistics + Rule
2.1.4.3 Example + Statistics
2.1.4.4 A holistic perspective on MT paradigms
2.2 MT resources
2.2.1 Background
2.2.2 Plausibility of the Web as a corpus
2.2.3 Approaches to utilize the Web as MT resource
2.2.3.1 Retrieval of word occurrence frequencies
2.2.3.2 Retrieval of bitexts
2.2.3.3 A holistic strategy for online bitext retrieval
2.2.3.4 Retrieval of bitexts from specific websites
2.2.4 Other sources for bitexts
2.3 MT on the Internet
2.3.1 Development of online MT
2.3.2 Use of online MT
Chapter 3  MT Evaluation
3.1 Background
3.2 Human MT evaluation
3.2.1 Language perspective
3.2.1.1 Quality assessment
3.2.1.2 Error analysis
3.2.1.3 Round-trip translation
3.2.2 Usability perspective
3.2.3 Problems and difficulties
3.3 Automatic MT evaluation
3.3.1 Evaluation metrics
3.3.1.1 BLEU and NIST
3.3.1.2 METEOR
3.3.1.3 Other evaluation metrics
3.3.2 Performance of evaluation metrics
Chapter 4  Evaluation of Automatic MT Evaluation Metrics
4.1 Background
4.2 Towards a user-oriented evaluation metric
4.3 Evaluation
4.3.1 Data
4.3.2 Systems
4.3.3 Evaluation metrics
4.3.4 Participants
4.3.5 Procedures
4.3.6 Issues to be examined
4.4 Evaluation results and discussion
4.4.1 Consistency between evaluators
4.4.2 Judgment differences in various evaluation settings
4.4.3 Correlation between evaluation metrics and human judgment
4.4 Conclusion
Chapter 5  Evaluation of Online MT Systems
5.1 Background
5.2 Proper use of MT in legal translation
5.3 Methodology
5.3.1 Evaluation data
5.3.2 MT systems for evaluation
5.3.3 Evaluation metrics
5.3.4 Issues to be examined
5.3.5 Evaluation procedure
5.4 Evaluation results
5.4.1 Language coverage
5.4.2 Translation quality for language pairs
5.4.3 Translation quality of MT system
5.5 Conclusion
Chapter 6  Conclusion
Bibliography
Appendix 1: Evaluators guidelines in the human evaluation section
Appendix 2: A sample set of human evaluation questions

Author: Wong, Tak Ming

Source: City University of Hong Kong

Download URL 2: Visit Now

Leave a Comment