Evaluation of two word alignment systems

In recent years more attention has been paid to the fields of translation studies and corpus-based machine translation with Statistical Machine Translation.

This project evaluates two different systems that generate word alignments on English-Swedish data. The systems to be used are the Giza++ system, that may generate a variety of statistical translation models, and I*Trix system developed at IDA/NLPLab that generates word pairs with frequencies…


1. Introduction
1.1 Goal and scope of project
1.2 Overview of report
2. Background
2.1 Parallel corpora
2.2 Sentence alignment
2.3 Word alignment
2.3.1 Association approaches
2.3.2 Estimation approaches
2.4 Evaluation
2.4.1 Measuring Methods
2.4.2 Gold standard
2.5 Summary
3. The main systems used in this project
3.1 Giza++ system
3.1.1 Running Giza++ in IDA
3.1.2 Input file formats in Giza++
3.1.3 Output file formats in Giza++
3.2 I*Trix system
3.2.1 Interface of I*Trix
3.2.2 Input file formats in I*Trix
3.2.3 Output file formats in I*Trix
3.3 Evaluation tool – I*Eval
4. The evaluation environment
4.1 Symmetrization
4.1.1 Inputs symmetrization
4.1.2 Outputs symmetrization
4.1.3 Resources used
4.2 Participating Corpora
4.2.1 Blocks Corpus
4.2.2 Access XP 97 sentences Corpus
4.2.3 Access XP 5000 sentences Corpus
4.2.4 Corpora Summary
5. Analysis results
5.1 Things that may influence the results
5.1.1 In Giza++ system
5.1.2 In I*Trix system
5.1.3 In I*Eval
5.2 Comparing the results from the two systems
5.2.1 Parameter Setting in I*Trix
5.2.2 Word classes in Giza++
5.2.3 Different sizes of corpor
5.2.4 Repeated corpora
5.2.5 Monolingual corpora
5.3 Speed of two systems
6. Summary and Conclusions
6.1 Summary
6.2 Evaluation results related to parameter setting
6.3 Evaluation results related to corpora
6.4 Strengths and weaknesses
6.5 Future work
7. References

Author: Wang, Xiaoyang

Source: Linköping University

