XJIPC OpenIR  > 多语种信息技术研究室
A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis
Tursun, E (Tursun, Eziz); Ganguly, D (Ganguly, Debasis); Osman, T (Osman, Turghun); Yang, YT (Yang, Ya-Ting); Abdukerim, G (Abdukerim, Ghalip); Zhou, JL (Zhou, Jun-Lin); Liu, Q (Liu, Qun)
2016
Source PublicationACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
ISSN2375-4699
Volume16Issue:2Pages:8-23
Abstract

Morphological analysis, which includes analysis of part-of-speech (POS) tagging, stemming, and morpheme segmentation, is one of the key components in natural language processing (NLP), particularly for agglutinative languages. In this article, we investigate the morphological analysis of the Uyghur language, which is the native language of the people in the Xinjiang Uyghur autonomous region of western China. Morphological analysis of Uyghur is challenging primarily because of factors such as (1) ambiguities arising due to the likelihood of association of a multiple number of POS tags with a word stem or a multiple number of functional tags with a word suffix, (2) ambiguous morpheme boundaries, and (3) complex morphopholonogy of the language. Further, the unavailability of a manually annotated training set in the Uyghur language for the purpose of word segmentation makes Uyghur morphological analysis more difficult. In our proposed work, we address these challenges by undertaking a semisupervised approach of learning a Markov model with the help of a manually constructed dictionary of "suffix to tag" mappings in order to predict the most likely tag transitions in the Uyghur morpheme sequence. Due to the linguistic characteristics of Uyghur, we incorporate a prior belief in our model for favoring word segmentations with a lower number of morpheme units. Empirical evaluation of our proposed model shows an accuracy of about 82%. We further improve the effectiveness of the tag transition model with an active learning paradigm. In particular, we manually investigated a subset of words for which the model prediction ambiguity was within the top 20%. Manually incorporating rules to handle these erroneous cases resulted in an overall accuracy of 93.81%.

KeywordUyghur Morphological Analysis Markov Model
Indexed ByEI
Document Type期刊论文
Identifierhttp://ir.xjipc.cas.cn/handle/365002/4716
Collection多语种信息技术研究室
Affiliation1.Univ Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Chinese Acad Sci, Inst Math & Informat,Hotan Teachers Coll, Beijing, Peoples R China
2.Dublin City Univ, ADAPT Ctr, Sch Comp, Dublin 9, Ireland
3.Univ Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Chinese Acad Sci, Beijing, Peoples R China
4.Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Beijing 100864, Peoples R China
5.Chinese Acad Sci, Xinjiang Branch, Beijing 100864, Peoples R China
Recommended Citation
GB/T 7714
Tursun, E ,Ganguly, D ,Osman, T ,et al. A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2016,16(2):8-23.
APA Tursun, E .,Ganguly, D .,Osman, T .,Yang, YT .,Abdukerim, G .,...&Liu, Q .(2016).A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,16(2),8-23.
MLA Tursun, E ,et al."A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 16.2(2016):8-23.
Files in This Item:
File Name/Size DocType Version Access License
A Semisupervised Tag(1325KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Tursun, E (Tursun, Eziz)]'s Articles
[Ganguly, D (Ganguly, Debasis)]'s Articles
[Osman, T (Osman, Turghun)]'s Articles
Baidu academic
Similar articles in Baidu academic
[Tursun, E (Tursun, Eziz)]'s Articles
[Ganguly, D (Ganguly, Debasis)]'s Articles
[Osman, T (Osman, Turghun)]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Tursun, E (Tursun, Eziz)]'s Articles
[Ganguly, D (Ganguly, Debasis)]'s Articles
[Osman, T (Osman, Turghun)]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.