XJIPC OpenIR  > 多语种信息技术研究室
维吾尔语形态分析及其在神经网络语言模型中的应用研究
徐春
学位类型博士
导师蒋同海
2018-05-25
学位授予单位中国科学院大学
学位授予地点北京
学位专业计算机应用技术
摘要

自然语言处理是人工智能最热门的研究方向之一。在新疆针对维吾尔语的自然语言处理研究工作已开展了很多年,维吾尔语的形态分析是一个基础关键性的研究课题,它类似于中文信息处理中的“分词”任务,是各类自然语言处理任务的基础,直接影响着维吾尔语的信息检索、文本分类及语音识别等应用软件系统的效率和成败。语言模型也是自然语言处理研究工作的基础,它的任务就是在给定一个词序列的情况下,对这个序列出现的概率进行估计,在词性标注、机器翻译等任务中发挥着重大的作用。本文针对这两个基础性的工作进行了融合研究,首先研究了维吾尔语的形态分析,进行词切分,提高准确率。其次,进一步将切分好的词素(词干+词缀),输入已构建好的神经网络语言模型,来提升语言模型的性能。在前人研究的基础上,本文的研究工作及创新点主要体现在以下三个方面:1.从维吾尔语的形态分析入手,以提高词切分准确率为目标,先是研究了一种机器翻译模型,把切分前的维吾尔词语看作源端,把切分后的词素或词性标注看作是目标端,词切分准确率达到了82.42%。2.为了进一步提高词切分准确率,提出了维吾尔语形态分析的图状建模方法,不仅考虑了词语内部形态成分之间的关联性,还考虑了相邻词语的形态成分之间的关联约束,进一步提高词切分准确率达到95.67%。3.构建基于维吾尔语形态分析的循环神经网络语言模型,将切分好的词干、词缀看作词素与词一起,通过Word2vect生成词向量和词素向量,输入神经网络语言模型,提高语言模型的能力。本文中提出的方法还可以运用到其他黏着语(比如哈萨克语),后续还可以将注意力机制引入到语言模型中,利用注意力机制挖掘维吾尔语句子中的历史词与当前词的关系;或者针对维吾尔语词内结构特性,研究算法自动深入地学习维吾尔语的构词规律,设计适合的卷积神经网络结构,挖掘词干词缀之间的局部相关信息,进一步加强维吾尔语神经网络语言模型。

其他摘要

Natural language processing is one of the most popular research directions of artificial intelligence.Research on the natural language processing of uyghur in XinJiang has been carried out for many years,Uyghur morphological analysis is a key research topic,it is similar to the word segmentation task in Chinese information processing,is the foundation of all kinds of natural language processing tasks,directly affects the uyghur text classification of information retrieval and voice recognition and the efficiency and the success or failure of the application software system.Language model is the basis of natural language processing research work,the task is given under the condition of a word sequence,to estimate the probability of the sequence appeared in tasks such as part-of-speech tagging machine translation plays an important role.In this paper, two basic research works are integrated. Firstly, the morphological analysis of uyghur language is studied,and the word segmentation is carried out to improve the accuracy. Secondly, it is necessary to further improve the performance of the language model by adding the morpheme (stem + affix) and the neural network language model.On the basis of previous studies,the research work and innovations in this article are mainly reflected in the following three aspects:1. From the morphological analysis of uygur language,in order to improve the word segmentation accuracy as the goal.first a machine translation model is studied,the uyghur words as the source side before segmentation,the segmentation of morpheme or part-of-speech tagging as target side,word segmentation accuracy reached 82.42%.2. In order to further improve the word segmentation accuracy and put forward the uyghur morphological analysis of figure modeling method,not only consider the correlation between the internal form words composition,but also consider the relationship between the morphology of adjacent words composition,it further improve the word segmentation accuracy reaches 95.67%.3. Build based on the cycle of uyghur morphological analysis neural network language model,the shard good stem affix as morphemes and words together,and morpheme vector generated by Word2vect term vectors, language input neural network model,and improve the ability of language model.This article puts forward the method also can be applied to other agglutinative language (such as the kazak),and subsequent can also be attention mechanism is introduced into the language model,using the attention mechanism of mining uygur language sentence the words of the history and the relationship between the current word;or for uyghur words inside structure features,the algorithm automatically in-depth study of Uighur word formation rules,design suitably for convolution neural network structure and mining stem affix between local information.further strengthen the Uighur neural network model of the language

页数84
文献类型学位论文
条目标识符http://ir.xjipc.cas.cn/handle/365002/5638
专题多语种信息技术研究室
推荐引用方式
GB/T 7714
徐春. 维吾尔语形态分析及其在神经网络语言模型中的应用研究[D]. 北京. 中国科学院大学,2018.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[徐春]的文章
百度学术
百度学术中相似的文章
[徐春]的文章
必应学术
必应学术中相似的文章
[徐春]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。