XJIPC OpenIR  > 多语种信息技术研究室
基于短语汉维机器翻译解码的研究及实现
杨世勤
Thesis Advisor王磊
2018-05-25
Degree Grantor中国科学院大学
Place of Conferral北京
Degree Name硕士
Degree Discipline计算机技术
Keyword汉维统计机器翻译 解码 特征模型 柱搜索 词向量
Abstract

目前,基于短语模型机器翻译的解码理论在汉英、法英等形态变化相近的语言对的翻译任务中是比较成功的。但不同翻译语言对的差异程度不同,汉、维语种之间的差异是比较大的,维语形态变化复杂,汉、维语之间语序差异较大等给解码的搜索带来了严重干扰。本文的研究目的是为了探索适合汉维语言对的基于短语模型机器翻译的解码优化方案。在基于短语模型机器翻译中,解码涉及统计机器翻译多方面的研究内容。提升解码器的性能,除了要充分考虑评分特征模型的结合效果、解码搜索算法的效率之外,也要对不同差异的翻译语种给解码处理造成的局限性问题的原因进行分析,有所针对地提出优化方法。本文的工作主要有以下三方面的内容:1、 对基于短语模型机器翻译的解码理论进行深入总结,形成初步的解码器的设计方案。解码器的设计采用对数线性模型为框架模型,首先对相关评分特征模型的基本理论进行深入的分析,然后研究柱搜索解码的详细过程,结合评分特征模型的实际对解码器各子模块的具体处理进行了设计。2、 在初步解码器的设计方案基础上,结合汉、维语在翻译中的实际情况,加入其他能有效指导解码的特征信息及限制规则,对汉维机器翻译的解码进行优化。具体地,针对汉、维翻译中维语形态变化复杂、汉维语句法结构不同等造成的解码局限性问题,本文进行了以下优化。通过使用词向量来利用输入句子内部一些隐含的语法、语义关系,挑选更忠实于当前句子的目标短语,过滤短语对候选表。在语言模型的评分中引入词向量计算短语的相似度;利用词向量将维语单词间隐含的语义关系加入到解码评分。重新对调序距离限制方法进行调整,并设计了一个在解码中限制源短语扩展的方法,使翻译解码的调序更能符合汉维翻译的实际。3、 最后,按所设计的方案编程实现解码器。分析各个方案的解码器的实验结果,通过实验对比,检验解码优化方法的有效性以及测试最优解码器的性能。实验结果表明,本文针对汉维机器翻译解码进行的优化方法是有效的。所实现的解码器能有效支持汉维翻译。

Other Abstract
At present, the decoding theory in phrase-based statistical machine translation is relatively successful in the translation of linguistic pairs with similar morphologic changes, such as Chinese-English, French-English and so on. Chinese and Uyghur are very different in terms of morphological typology and word order, which leads to serious interference in the search of decoding. The purpose of this thesis is to explore an optimized scheme of decoding in in phrase-based Chinese-Uyghur machine translation.The decoding in phrase-based statistical machine translation is influenced by various factors in the research. In order to improve the performance of decoder, we should not only fully consider the combination effect of scoring feature model and the efficiency of decoding search algorithm, but also analyze the reasons for the limitation of decoding processing caused by the difference between two translation languages. And we should make the optimization method to be more targeted.The main contents of this thesis are as follows:1、 A preliminary design scheme of decoder is formed after a deep summary of the decoding theory in phrase-based statistical machine translation. The design of the decoder adopts logarithmic linear model as its frame model. Firstly, A deep analysis of the basic theory of relevant scoring feature model is make, and then the detailed process of beam search in decoding is studied. Combined with the actual effect of scoring feature model, the specific processing of each sub-module of the decoder is designed.2、 On the basis of the design of the decoder, combined with the actual situation of Chinese-Uygur translation, more characteristic information and restriction rules which can effectively guide decoding are added to optimize the decoding of Chinese-Uygur machine translation. Specifically, focused on the issue that the complexity of morphological changes in Uygur and the different syntactic structures between Chinese and Uygur will cause limitations in decoding of Chinese-Uyghur machine translation, the following optimizations are proposed. By using word vector, some implicit grammatical and semantic relations within the input sentence can be used to select the target phrase which is more faithful to the current sentence. And therefore, the list of candidate phrase pairs is filtered. In the scoring of language model, the word vector is introduced to calculate the similarity between phrases so that the implicit semantic relation between Uygur words is added to the scoring of decoding. The reordering distance restriction in decoding is adjusted and a method of limiting the extension of source phrases in decoding is designed, which can make the reorder in decoding to be more in line with the reality of Chinese-Uyghur translation. 3、 Finally, the decoder was implemented according to the designed scheme. With the analysis of experimental results, the effectiveness of the decoding optimizations and the performance of the optimal decoder are tested through the experimental comparison.The experimental results show that the optimizations in this thesis is effective for the decoding of Chinese-Uyghur machine translation, and the implemented decoder can effectively support Chinese-Uygur translation.
Pages59
Document Type学位论文
Identifierhttp://ir.xjipc.cas.cn/handle/365002/5453
Collection多语种信息技术研究室
Recommended Citation
GB/T 7714
杨世勤. 基于短语汉维机器翻译解码的研究及实现[D]. 北京. 中国科学院大学,2018.
Files in This Item:
File Name/Size DocType Version Access License
基于短语汉维机器翻译解码的研究及实现.p(1779KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[杨世勤]'s Articles
Baidu academic
Similar articles in Baidu academic
[杨世勤]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[杨世勤]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.