XJIPC OpenIR  > 多语种信息技术研究室
Thesis Advisor蒋同海
Degree Grantor中国科学院大学
Place of Conferral北京
Degree Discipline计算机应用技术


Other Abstract

Natural language processing is one of the most popular research directions of artificial intelligence.Research on the natural language processing of uyghur in XinJiang has been carried out for many years,Uyghur morphological analysis is a key research topic,it is similar to the word segmentation task in Chinese information processing,is the foundation of all kinds of natural language processing tasks,directly affects the uyghur text classification of information retrieval and voice recognition and the efficiency and the success or failure of the application software system.Language model is the basis of natural language processing research work,the task is given under the condition of a word sequence,to estimate the probability of the sequence appeared in tasks such as part-of-speech tagging machine translation plays an important role.In this paper, two basic research works are integrated. Firstly, the morphological analysis of uyghur language is studied,and the word segmentation is carried out to improve the accuracy. Secondly, it is necessary to further improve the performance of the language model by adding the morpheme (stem + affix) and the neural network language model.On the basis of previous studies,the research work and innovations in this article are mainly reflected in the following three aspects:1. From the morphological analysis of uygur language,in order to improve the word segmentation accuracy as the goal.first a machine translation model is studied,the uyghur words as the source side before segmentation,the segmentation of morpheme or part-of-speech tagging as target side,word segmentation accuracy reached 82.42%.2. In order to further improve the word segmentation accuracy and put forward the uyghur morphological analysis of figure modeling method,not only consider the correlation between the internal form words composition,but also consider the relationship between the morphology of adjacent words composition,it further improve the word segmentation accuracy reaches 95.67%.3. Build based on the cycle of uyghur morphological analysis neural network language model,the shard good stem affix as morphemes and words together,and morpheme vector generated by Word2vect term vectors, language input neural network model,and improve the ability of language model.This article puts forward the method also can be applied to other agglutinative language (such as the kazak),and subsequent can also be attention mechanism is introduced into the language model,using the attention mechanism of mining uygur language sentence the words of the history and the relationship between the current word;or for uyghur words inside structure features,the algorithm automatically in-depth study of Uighur word formation rules,design suitably for convolution neural network structure and mining stem affix between local information.further strengthen the Uighur neural network model of the language

Document Type学位论文
Recommended Citation
GB/T 7714
徐春. 维吾尔语形态分析及其在神经网络语言模型中的应用研究[D]. 北京. 中国科学院大学,2018.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[徐春]'s Articles
Baidu academic
Similar articles in Baidu academic
[徐春]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[徐春]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.