中国科学院新疆理化技术研究所机构知识库
Advanced  
XJIPC OpenIR  > 多语种信息技术研究室  > 学位论文
题名: 面向汉维机器翻译的维语命名实体的识别与翻译
作者: 张磊
答辩日期: 2014-05-21
导师: 李晓
专业: 计算机应用技术
授予单位: 中国科学院大学
授予地点: 北京
学位: 硕士
关键词: 数词类 ; 命名实体 ; 维汉机器翻译 ; 基于规则 ; 基于统计
摘要:

维吾尔语命名实体识别和翻译是维汉机器翻译的基础任务,也是信息检索、信息抽取、智能问答等技术的基础,研究并实现有效的维吾尔语命名实体识别与翻译系统是本文的主要研究内容。维吾尔语命名实体包括数词类和实体类,其中数词类包括时间、日期、货币、百分比,实体类包括地名、机构名、人名。当期中英文命名实体识别与翻译已经取得可实用的效果,而维吾尔语命名实体正处于初步研究阶段。维吾尔语命名实体有着特有的语法语义特点,汉语和英语中广泛实用的理论、模型和系统不能简单移植过来,需要结合语言特性做相应处理。本文工作包括三部分:基于维汉平行语料的维吾尔语数词类命名实体的识别与翻译:通过有限自动机结合触发词识别并翻译维语基本数词,从平行语料中自动抽取出翻译模板,匹配模板并实现翻译。实验表明,维吾尔语数词类命名实体的翻译F值达到了91%。基于规则的维吾尔语地名识别和翻译:总结了维吾尔语地名内部结构特征和相邻词信息,手动建立了地名词典库、首词库、中间词库和尾词库,实现了维吾尔语地名识别算法。实验表明,维吾尔语地名的翻译F值达到了76%。基于统计的维吾尔语机构名识别:将机构名识别问题转换为序列标注的问题来解决,利用条件随机场模型充分利用上下文信息和外部特征,实现对机构名的识别。实验表明,维吾尔语机构名识别的F值达到了82%。

英文摘要:
Uyghur Named Entities Recognition and Translation is the premise and basic of Uyghur-Chinese Machine Translation, such as infomation retrieval ,question answering and so on. Uyghur Named Entities has its special syntactic and semantic characteristics,so the recognition of Uyghur Named Entities is not simple transplantation of widely used technology of the recognition of English and Chinese Named Entities. A Finite Automata Combined trigger words was used for basic number recognition and translation, then translation templates with weights were automatically extracted from Uyghur-Chinese Parallel corpus, finally the translation was achieved by a shortest path optimization algorithm. The F value of Recognition was 91% A research on the rule-based method for recognizing place names in text is conducted,and based on the internal structure feature of Uyghur place names, Xinjiang place name dictionary,first-word dictionary, middle-word dictionary and special word dictionary are established.Meanwhile, with large-scale text containing place names as the testing sample, and by usinginternal structure of place names and adjacent word information. The F value of Recognition was 73%. According to the syntactic and semantic characteristics of Uyghur organization name, summed up the rule of constructionof simple organization name and complicated organization name, then designs effective recognition rules,corresponding knowledge bases, and efficient recognition algorithm based on state transition and key-word matching. The F value of Recognition was 84%.
内容类型: 学位论文
URI标识: http://ir.xjipc.cas.cn/handle/365002/3448
Appears in Collections:多语种信息技术研究室_学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
张磊.pdf(1733KB)学位论文--暂不开放View 联系获取全文

作者单位: 中国科学院新疆理化技术研究所

Recommended Citation:
张磊. 面向汉维机器翻译的维语命名实体的识别与翻译[D]. 北京. 中国科学院大学. 2014.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[张磊]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[张磊]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
文件名: 张磊.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Powered by CSpace