XJIPC OpenIR  > 多语种信息技术研究室
Uyghur word segmentation using a combination of rules and statistics
Xue, Huajian; Yang, Yong; Turghun, Osman; Li, Xiao; Zhang, Ronghui
2011
Source PublicationAdvances in Information Sciences and Service Sciences
ISSN19763700
Volume3Issue:11
Abstract

Rich morphology of Uyghur produces a large number of words and leads to high out of vocabulary (OOV) rates that can cause many errors in Uyghur natural language processing (NLP). Morphological word segmentation is the very important component to overcome this problem caused by Uyghur morphology. This paper depicts some morphological rules by analyzing the universal structure of Uyghur words and presents a partly supervised word segmentation method. In this method, the suffix corpus was utilized to give all the possible morphological word segmentations, from which the optimal word segmentation is selected by the MAP-based model. In addition, cascaded language model was used to improve the accuracy of word segmentation. The test set composed of 5000 words was collected and segmented by hand. The experiment on this test set was given and experimental results show that the proposed method was more effective.

DOI10.4156/AISS.vol3.issue11.13
Indexed ByEI
Citation statistics
Document Type期刊论文
Identifierhttp://ir.xjipc.cas.cn/handle/365002/4160
Collection多语种信息技术研究室
AffiliationXinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, China
Recommended Citation
GB/T 7714
Xue, Huajian,Yang, Yong,Turghun, Osman,et al. Uyghur word segmentation using a combination of rules and statistics[J]. Advances in Information Sciences and Service Sciences,2011,3(11).
APA Xue, Huajian,Yang, Yong,Turghun, Osman,Li, Xiao,&Zhang, Ronghui.(2011).Uyghur word segmentation using a combination of rules and statistics.Advances in Information Sciences and Service Sciences,3(11).
MLA Xue, Huajian,et al."Uyghur word segmentation using a combination of rules and statistics".Advances in Information Sciences and Service Sciences 3.11(2011).
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Xue, Huajian]'s Articles
[Yang, Yong]'s Articles
[Turghun, Osman]'s Articles
Baidu academic
Similar articles in Baidu academic
[Xue, Huajian]'s Articles
[Yang, Yong]'s Articles
[Turghun, Osman]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Xue, Huajian]'s Articles
[Yang, Yong]'s Articles
[Turghun, Osman]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.