XJIPC OpenIR  > 多语种信息技术研究室
A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis
Tursun, E (Tursun, Eziz); Ganguly, D (Ganguly, Debasis); Osman, T (Osman, Turghun); Yang, YT (Yang, Ya-Ting); Abdukerim, G (Abdukerim, Ghalip); Zhou, JL (Zhou, Jun-Lin); Liu, Q (Liu, Qun)
2016
发表期刊ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
卷号16期号:2
摘要Morphological analysis, which includes analysis of part-of-speech (POS) tagging, stemming, and morpheme segmentation, is one of the key components in natural language processing (NLP), particularly for agglutinative languages. In this article, we investigate the morphological analysis of the Uyghur language, which is the native language of the people in the Xinjiang Uyghur autonomous region of western China. Morphological analysis of Uyghur is challenging primarily because of factors such as (1) ambiguities arising due to the likelihood of association of a multiple number of POS tags with a word stem or a multiple number of functional tags with a word suffix, (2) ambiguous morpheme boundaries, and (3) complex morphopholonogy of the language. Further, the unavailability of a manually annotated training set in the Uyghur language for the purpose of word segmentation makes Uyghur morphological analysis more difficult. In our proposed work, we address these challenges by undertaking a semisupervised approach of learning a Markov model with the help of a manually constructed dictionary of "suffix to tag" mappings in order to predict the most likely tag transitions in the Uyghur morpheme sequence. Due to the linguistic characteristics of Uyghur, we incorporate a prior belief in our model for favoring word segmentations with a lower number of morpheme units. Empirical evaluation of our proposed model shows an accuracy of about 82%. We further improve the effectiveness of the tag transition model with an active learning paradigm. In particular, we manually investigated a subset of words for which the model prediction ambiguity was within the top 20%. Manually incorporating rules to handle these erroneous cases resulted in an overall accuracy of 93.81%.
关键词Uyghur Morphological Analysis Markov Model
收录类别EI
文献类型期刊论文
条目标识符http://ir.xjipc.cas.cn/handle/365002/4716
专题多语种信息技术研究室
作者单位1.Univ Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Chinese Acad Sci, Inst Math & Informat,Hotan Teachers Coll, Beijing, Peoples R China
2.Dublin City Univ, ADAPT Ctr, Sch Comp, Dublin 9, Ireland
3.Univ Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Chinese Acad Sci, Beijing, Peoples R China
4.Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Beijing 100864, Peoples R China
5.Chinese Acad Sci, Xinjiang Branch, Beijing 100864, Peoples R China
推荐引用方式
GB/T 7714
Tursun, E ,Ganguly, D ,Osman, T ,et al. A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2016,16(2).
APA Tursun, E .,Ganguly, D .,Osman, T .,Yang, YT .,Abdukerim, G .,...&Liu, Q .(2016).A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,16(2).
MLA Tursun, E ,et al."A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 16.2(2016).
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
A Semisupervised Tag(1325KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Tursun, E (Tursun, Eziz)]的文章
[Ganguly, D (Ganguly, Debasis)]的文章
[Osman, T (Osman, Turghun)]的文章
百度学术
百度学术中相似的文章
[Tursun, E (Tursun, Eziz)]的文章
[Ganguly, D (Ganguly, Debasis)]的文章
[Osman, T (Osman, Turghun)]的文章
必应学术
必应学术中相似的文章
[Tursun, E (Tursun, Eziz)]的文章
[Ganguly, D (Ganguly, Debasis)]的文章
[Osman, T (Osman, Turghun)]的文章
相关权益政策
暂无数据
收藏/分享
文件名: A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。