XJIPC OpenIR  > 多语种信息技术研究室
面向机器翻译的维吾尔语形态分析研究
艾孜孜·吐尔逊
Subtype博士
Thesis Advisor周俊林
2017-05-21
Degree Grantor中国科学院大学
Place of Conferral北京
Degree Discipline计算机应用技术
Keyword维吾尔语 形态分析 标注 机器翻译
Abstract

形态分析是自然语言处理领域里的一个重要研究重点,尤其是维吾尔语这种黏着性语言的形态分析研究是词法分析、句法分析、机器翻译、自然语言理解等研究领域的重要前提。维吾尔语属于阿尔泰语系突厥语族,是一种黏着性语言,具有非常丰富的构词和构形词缀,而这些词缀给单词在语义、词性、数、格、时态等方面提供非常丰富的信息。维吾尔语的形态分析在自然语言处理领域里具有非常重要的研究意义。近年来,维吾尔语的自然语言处理引起了学术界的较大重视,语音识别,语音合成,机器翻译等方面取得了一些成绩。但是,由于维吾尔语自然语言处理研究起步晚,缺少较完善的可用标注语料等原因,开展更高层次的、全面的研究工作始终受到限制。本文从维吾尔语的语言形态特征出发,力求在比较全面分析维吾尔语形态变化和构形词缀的基础上,基于标注语料资源缺乏的实际情况,针对维吾尔语的构形词缀的切分,以及形态成分的自动标注,从以下几个方面对基于融合策略的维吾尔语形态分析有关技术和方法展开了深入研究:1.结合维吾尔语形态特征,分析了维吾尔语形态自动分析面临的问题,提出了构建面向自然语言处理的维吾尔语形态特征语法信息词典的思路及具体内容。2.研究了基于形态词典及无监督的方法来解决无人工标注语料的情况下维吾尔语形态自动分析的方法及相关模型。3.研究了一种基于小规模标注语料、词典、规则相结合的融合策略来提高形态分析效率的方法。在训练语料非常有限的情况下,综合考虑全部评价指标,该方法取得了92.58%的的准确率和97%的词干提取准确率。4.从提升机器翻译质量的需求出发,针对维吾尔语复杂形态特征对维—汉统计机器翻译质量的影响进行了研究,提出并验证了通过形态分析来提高维-汉统计机器翻译质量的策略。本文研究的维吾尔语形态分析方法,在基线系统的基础上,BLEU值平均提高了1.33%。

Other Abstract

Morphological analysis is one of the key components in natural language processing, especially the morphological analysis of the agglutinative language like Uyghur, which is an important prerequisite for lexical analysis, syntactic parsing, machine translation, natural language understanding and so on. Uyghur belongs to the Turkic branch of the Altaic language family, is an agglutinative and morphologically rich language, rich inflectional suffixes encapsulates extensive information about the meaning, part of speech, number, case and tense of a root word. The morphological analysis of Uyghur language has very important significance in the field of natural language processing. In recent years, The natural language processing of Uygur language has aroused great attention in academic research, there have been significant progress in the field of speech recognition, speech synthesis and machine translation. However, due to the natural language processing research for Uyghur began late and lack of available human annotated corpus and so on, to carry out a higher level, comprehensive research work is always limited.This paper embark on the morphological characteristics of Uygur language, tries to make a comprehensive analysis of inflectional suffixes and morphological features of Uyghur, with the consideration of shortness of manually annotated corpus, in order to deal with automatic segmentation and tagging of inflectional suffixes. The thesis done some researches on hybrid approach based morphological analysis for Uyghur, as follows:We put forward an approach to create a morphological grammar information dictionary with purpose of solving the problems of automatic morphological analysis, based on morphological features of Uyghur.Studies the method and related model for Uyghur morphological analysis based on morphological dictionary and unsupervised method without manually annotated corpus.Studies a semi-supervised hybrid approach, wich is based on very limited annotated training data with combination of morphologicla dictionary and rules, in order to improve the performance of morphological analysis. This approach get 92.58% of overall accuracy and 97% stemmin accuracy.Studies the effects of complex morphological features of Uyghur language on the quality of Uyghur-Chinese statistical machine translation, proposed a method to improve the quality of translation through Uighur morphological analysis. Comparing to the baseline system, the method of morphological analysis of Uyghur language increase the BLEU score 1.33% in average.

Document Type学位论文
Identifierhttp://ir.xjipc.cas.cn/handle/365002/4931
Collection多语种信息技术研究室
Affiliation中国科学院新疆理化技术研究所
Recommended Citation
GB/T 7714
艾孜孜·吐尔逊. 面向机器翻译的维吾尔语形态分析研究[D]. 北京. 中国科学院大学,2017.
Files in This Item:
File Name/Size DocType Version Access License
面向机器翻译的维吾尔语形态分析研究.pd(2496KB)学位论文 开放获取CC BY-NC-SAView Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[艾孜孜·吐尔逊]'s Articles
Baidu academic
Similar articles in Baidu academic
[艾孜孜·吐尔逊]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[艾孜孜·吐尔逊]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 面向机器翻译的维吾尔语形态分析研究.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.