XJIPC OpenIR  > 多语种信息技术研究室
Thesis Advisor周俊林
Degree Grantor中国科学院研究生院
Place of Conferral北京
Degree Name博士
Degree Discipline计算机应用技术
Keyword部分假设共享 多模型协同解码 维汉/汉维统计机器翻译 在线多语言机器翻译

近年,统计机器翻译取得了很大的进展:从基于词的模型,到基于短语的模型,再到各种句法的模型。虽然句法的模型有诸多优点,如可以处理长距离调序等,但它们也并不是完美的,都存在各自的瑕疵,如层次短语模型在解码过程中可能会大量使用“粘合规则”,MEBTG(基于最大熵的括号转录语法)模型在解码时仍然采取严格的字符串匹配等。 对于一些较大语种之间统计机器翻译,如英语和汉语,英语和阿拉伯语等,一些大学,科研机构等已经对其进行了深入的研究,但对国内的一些小语种和汉语之间的统计机器翻译很少有人对其进行深入的研究,比如维吾尔语和汉语之间的翻译。受语言特点的影响,维吾尔语和汉语之间的翻译质量和很多因素相关。 本文的主要工作和取得的主要成果如下: 1.提出并实现了基于部分翻译假设共享的多模型协同解码系统,系统中的每个成员模型都可以共享其它成员模型的搜索空间,从而使得整个模型的解码空间得到极大的扩展。不同成员模型生成的部分翻译假设采取竞争的形式参与解码,使整个模型的搜索空间限制在较优的搜索空间范围内,这个较优的空间可能来自各个成员模型搜索空间的一部分。整个模型吸收了各个成员模型的优点,去除它们的缺点,例如,可以用最大熵调序模型取代层次短语模型和基于树模型中的粘合规则,而它们的联合同时使整个模型具备了泛化能力,而且使得生成的翻译假设更符合语言学知识。 2.对维汉翻译质量有影响的因素做了深入的探讨,分析,提出并验证了一些解决方案,它们包括汉维/维汉翻译中的词对齐问题,维汉翻译中的OOV问题,汉维翻译中的依存关系问题等。 3. 使用多线程,负载均衡等技术设计并实现了在线多语言机器翻译框架。

Other Abstract

In recent years, statistical machine translation has made great progress from word-based translation model to phrase-based model and a variety of syntax-based models. The syntax-base model has many advantages, for instance, they can resolve long-distance reordering, but they are not perfect, there are flaws for themselves, hierarchical phrase model may make extensive use of glue rules and MEBTG take strict string matching in decoding. Some universities and scientific research institutions already conducted in-depth research on statistical machine translation for the major language pairs such as English-Chinese and English-Araic, but few researchers carry out the in-depth investigation on statistical machine translation for minority languages and Chinese, e,g., Uyghur. Affected by the linguistic characteristics, the translation quality between Uyghur and Chinese has relationship with many factors. The main work and contributions are listed as follows: 1. Proposed and implemented the multi-model collaborative decoding system based on partial translation hypotheses sharing. Each member model in the system can share search space of other member models, which makes search space of whole model greatly extended. Partial translation hypotheses generated by different member models participate in decoding in the form of competition, wich enable search space of whole model to be constrainted in the optimum space that may be a part of each member model. The whole model absorbs the advantages of each member model and removes their shortcomings, say, we replace glue rules in hierarchical phrase model and tree-based model, while their joint makes the whole model be capable of generalization ability and translation hypotheses more linguistic knowledge. 2. Did in-depth discussion and analysis for factors that affect the translation performance, and proposed and verified some solutions to word alignment problems in Uyghur-Chinese and Chinese-Uyghur statistical machine translation translation, OOV problems in Uyghur-Chinese statistical machine translation and dependency problems in Chinese-Uyghur statistical machine translation. 3. Designed and implemented online multilingual machine translation framework using the technology of multi-threaded and load balancing.

Document Type学位论文
Recommended Citation
GB/T 7714
董兴华. 基于部分假设共享的多模型协同解码研究[D]. 北京. 中国科学院研究生院,2012.
Files in This Item:
File Name/Size DocType Version Access License
基于部分假设共享的多模型协同解码研究.p(1523KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[董兴华]'s Articles
Baidu academic
Similar articles in Baidu academic
[董兴华]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[董兴华]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.