XJIPC OpenIR  > 多语种信息技术研究室
机器翻译中混淆网络融合方法研究
宿建军
Subtype硕士
Thesis Advisor李晓
2011-05-30
Degree Grantor中国科学院研究生院
Place of Conferral北京
Degree Discipline计算机应用技术
Keyword统计机器翻译 系统融合 词对齐 混淆网络解码
Abstract近年来,系统融合方法逐渐在机器翻译领域受到重视。系统融合可以综合利用不同翻译系统的优点,缓解数据稀疏,选择最佳译文,从而提高机器翻译效果。维吾尔语和汉语间的机器翻译平行语料库规模较小,而且词法及句法分析方面的基础性研究不够成熟;两种语言间形态差异较大,译文中表现出了词形错误和语序混乱的现象。这很大程度上制约了维汉机器翻译的发展。本文的工作建立在词语级系统融合的基础上,主要做了以下方面的工作: 针对维汉机器翻译中一些对齐系统将实词对的很好而虚词不够好,另一些对齐系统虚词对的不错实词却差些,本文提出了多个对齐系统融合的方法提高对齐精度。具体来说,首先将GIZA++产生的对齐结果与TER对齐方法产生的对齐结果进行融合,再以融合的对齐结果构建混淆网络,最后解码输出最佳译文。 针对维汉词语级融合中单个混淆网络的系统融合过分依赖参考句子以及调序能力有限的问题,本文提出了多个混淆网络的重评分和最小贝叶斯风险的解码方法,通过多个混淆网络搜索到更好的融合结果。参考句子的词序决定混淆网络的基本词序,参考句子选择不当会导致混淆网络的融合效果不理想。多混淆网络的方法不再从多个翻译结果中选出一个参考,而将每个翻译结果都当作一次参考,其它翻译结果与参考对齐。  最后,本文对实验结果进行了分析和总结。实验证明,对齐系统融合的方法可以有效地提高词对齐的准确率;多个混淆网络的解码方法减少了参考句子的不确定性以及缓解数据稀疏问题,最终提高机器翻译的效果。
Other AbstractIn recent years, system combination is concerned on Machine translation. System combination using different machine translation systems can alleviate data sparseness, select optimal outputs and improve translation performance. The parallel corpus of Uyghur and Chinese is small-scale. Lexical analysis and syntactic analysis is also not mature enough. There are great morphological differences in two languages. So translation results have morphology drawbacks and word order errors. The machine translation research of Uyghur-Chinese is limited greatly. This paper builds on word-level system combination and the main contributions of this thesis are listed below:   In some alignment systems, notional words are aligned well but form words not satisfactorily enough, in other alignment systems form words are aligned well but notional words not satisfactorily enough. So we propose a multi-alignment combination method to improve the alignment accuracy of the word-level. Particularly, the alignment confusion network is built using the word alignment of GIZA++ and TER methods. And then the optimal outputs are decoded from confusion network.   In order to solve the problem of over-reliance on reference sentences and weak reordering ability, multiple confusion networks based on Rescoring and minimum Bayes-risk decoding methods is proposed in Uyghur-Chinese translation system combination. Better results can be searched from confusion networks. As the reference order decides confusion network’ word order, unsuitable reference can bring on worse results. Comparatively, multiple confusion networks treat each result as a reference, which other translations are aligned with.   Experimental results are analyzed and summarized at last. Experiments show that alignment system combination can effectively improve the alignment accuracy, multiple confusion network decoding method can reduce the uncertainty of reference, alleviate data sparseness and improve the translation results.
Document Type学位论文
Identifierhttp://ir.xjipc.cas.cn/handle/365002/4414
Collection多语种信息技术研究室
Affiliation中国科学院新疆理化技术研究所
Recommended Citation
GB/T 7714
宿建军. 机器翻译中混淆网络融合方法研究[D]. 北京. 中国科学院研究生院,2011.
Files in This Item:
File Name/Size DocType Version Access License
机器翻译中混淆网络融合方法研究.pdf(1208KB)学位论文 开放获取CC BY-NC-SAView Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[宿建军]'s Articles
Baidu academic
Similar articles in Baidu academic
[宿建军]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[宿建军]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 机器翻译中混淆网络融合方法研究.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.