XJIPC OpenIR  > 多语种信息技术研究室
维吾尔语名词词汇语义网构建研究
阿力木·木拉提
学位类型博士
导师李晓
2017-05-21
学位授予单位中国科学院大学
学位授予地点北京
学位专业计算机应用技术
关键词维吾尔语 名词语义网 语义关系库 词向量 图数据
摘要

维吾尔语名词词汇语义网是以名词词汇所构成的同义词概念集合为描述对象,以名词词汇间的潜在语义关联为连接方式,通过其中的组织与联系,以词义与语义关系为经纬建立的一种词汇语义知识库。目前,面向自然语言处理的维吾尔语语义理论研究和相关的知识库的构建是维吾尔文在计算机上得到更广泛的应用的基础,并且维吾尔文在计算机的信息处理研究仅限于词法、浅层句法分析等语法层面的问题描述、分析以及处理的研究,而缺乏对语义的表示。为使计算机具有更好的理解和处理维吾尔文的能力,还远不能满足我们在维吾尔文信息处理中所需要的知识,故必须使计算机具有词法、句法、语义等丰富的语言知识。本文主要从严格按照信息处理用现代维吾尔语名词语义标注标记规范的词汇语义语料库建设、名词词汇语义自动标注、名词词汇语义关系库构建、名词词汇语义网构建等四个方面开展深入的研究。根据WordNet和中文词汇语义网研究的理论与方法,以及其中的各种语义关系,结合维吾尔文信息处理的现状及语言自身的特点,并充分利用其它相关的词汇语义资源,研究和实现了维吾尔语名词词汇语义网。其旨在以自动与人工交互相结合的方式构造名词词汇语义网,能够将其应用于维吾尔语词汇语义分析、机器翻译、信息检索等应用系统,为计算机理解维吾尔文提供基础性的语义信息。本文具体研究内容包括以下几个方面:(1)探讨与分析解决维吾尔语词汇语义知识资源缺乏的瓶颈问题,以现代维吾尔语名词词汇作为数据对象,提出用于构建维吾尔语名词词汇语义网的语义基础资源方案。在此基础上,首先,构建基于语义功能的名词性词缀语料库和基于语义信息的维吾尔语名词词汇资源库。其次,旨在从大量生文本中获取维吾尔语名词词汇资源,以名词语义标注提供原始词汇资源为目标,提出预先定义的规则和统计方法相结合的维吾尔语名词识别模型,并将其进一步集成到维吾尔语名词词汇语义标注系统。最终,将维吾尔语名词词汇语义基础资源知识库和名词自动识别模型作为数据和技术支撑,根据维吾尔语名词语义分类新体系,对维吾尔语名词词汇进行语义标注,设计和实现维吾尔语名词语义标注系统,为后续的语义关系库构造任务给出带有语义标记的名词词汇资源。(2)重点研究维吾尔语名词词汇语义关系库构建方法。以维吾尔语在信息处理中用词类标注标记规范框架为基础,提出维吾尔语名词语义层次概念结构。通过使其与WordNet的名词独立起始概念结构进行映射,设计并实现维吾尔语名词语义层次概念树结构,将树结构的节点和带语义标记的词汇进行合并而构造维吾尔语名词词汇间的语义关系,包括同义词关系、反义词关系、上下位关系、部分整体关系等。为了进一步扩大语义关系库,在现有的名词词汇语义关系的基础上,提出了一种基于词向量的同义词和语义相关词自动挖掘方法。其中针对词向量训练中所需要的语料资源匮乏问题,进行维吾尔语网页识别及资源获取研究,基于改进的N-Gram和常用词方法建立了词向量中维吾尔文语料自动获取模型。基于词向量模型的词语相似度计算和同义词扩展的实验结果表明,该方法能够有效地自动获取语义相关联的名词。(3)在词汇语义网中,针对名词语义关系以可视化的方式提供查询应用的需求,开展了构建维吾尔语名词语义网的研究。以名词语义关系型数据库中所囊括的语义关系作为数据对象,采用基于图的可视化技术将其转换成图数据,设计和实现维吾尔语名词词汇语义网查询系统,通过应用实例对语义查询结果进行了分析。该系统可以将语义关系的层次结构形象地在二维平面上表示出来,以此建立高度互联的名词语义网络。

其他摘要

Uygur Noun Lexical Semantic Network (UNLSN) is a synonym concept-based structre, which is composed of noun vocabulary as a description object, and also as the latent semantic association between noun vocabularies. Through the organization and connection, the semantic relations are established based on Lexical Semantic Knowledge. Currently, the related research of Uyghur semantic theory and construction of related knowledge base are the basis of Uygur Natural language application, and information processing research in Uyghur is now limited to lexical, shallow syntactic analysis level and other grammatical aspects of the problem described and analysed, while the lack of representation of the semantics in Uyghur. In order to make the computer have better understanding and the ability to handle Uyghur semantics, it is far from satisfying that the knowledge we need in Uighur information processing still requires indepth study. Therefore, we must make the computer have rich linguistic knowledge such as lexical, syntactic and semantic.This article conducted the reseach mainly from four aspects of information processing with modern Uygur language semantic annotation mark specification lexical semantic corpus construction, noun lexical semantic automatic annotation, noun lexical semantic relation library construction, noun lexical semantic web construction. According to the theory and method of WordNet and Chinese lexical semantic web research, and the various semantic relations, this paper first considering the current situation of Uyghur information processing and the characteristics of language itself, and makes full use of other related lexical semantic resources and implemented Uygur Glossary Semantic Web. It is mainly designed to construct the UNLSN in the form of automatic and human interaction, which can be applied to Uygur vocabulary semantic analysis, machine translation and information retrieval. It provides basic semantic information for computer understanding Uygur language. The specific contents of this paper include the following aspects:(1) This paper first discussed the bottleneck problem of Uygur language lexical semantic knowledge resource, and presented a scheme for constructing semantic resource for UNLSN with modern Uighur noun vocabulary as data object. On this basis, firstly, the semantic function based noun affix corpus and semantic information based Uygur noun vocabulary resource library is built. Secondly, aiming to obtain Uygur noun vocabulary resources from a large number of raw texts, and the provision of original vocabulary resources by semantic annotation is presented and proposed a Uighur noun recognition model combining pre-defined rules and statistical methods and further integrating them into Uygur noun noun word semantic annotation system. In the end, the Uygur language lexical semantic base resource knowledge base and the noun automatic recognition model are used as data and technical support. According to the new semantic classification system of Uygur nouns, Uygur noun vocabulary is semantically annotated, and Uygur language semantic annotation system is designed and implemented , And given that the vocabulary resources with semantic markings for the subsequent semantic relation library is constructed.(1) The problem description on the Construction of Uighur Lexical Semantic Relations Library is mainly focused. The conceptual structure of the Uygur noun semantic hierarchy is put forward. By constructing the conceptual structure of the semantic hierarchy of Uighur nouns by mapping and constructing the conceptual structure of WordNet's independent starting concept structure, the semantic relations between the vocabulary of Uygur words are constructed by combining the nodes of the tree structure and the vocabulary with semantic markings, Including synonyms, antonyms, hypernoyms and hyponomys, molonoyms and meronyms relationship.In order to further expand the semantic relation library, an automatic mining method of synonyms and semantic related words which is based on word vector is proposed. In this part, we tackled Uyghur web page recognition and resource acquisition for the problem of lack of corpus resources that are needed in vector training, and established an automatic acquisition model for Uighur based on improved N-Gram and common words approach. The experimental results of word similarity calculation and synonym expansion based on word vector model have shown that the proposed approach can effectively obtain the semantically related noun words.(3) The UNLSN is built, considering the demand of query application, and the semantic relation of nouns is provided in a visual way. The semantic relational database is used as the data object, and by using the visualization technology based on graph that is used to transform it into graph data. The Uygur noun vocabulary semantic web query system is designed and implemented. The semantic query result is carried out by applying examples analysis. The system can represent the hierarchical structure of semantic relations in a two-dimensional plane, in this way a highly interconnected noun semantic network is constructed.

文献类型学位论文
条目标识符http://ir.xjipc.cas.cn/handle/365002/4969
专题多语种信息技术研究室
作者单位中国科学院新疆理化技术研究所
推荐引用方式
GB/T 7714
阿力木·木拉提. 维吾尔语名词词汇语义网构建研究[D]. 北京. 中国科学院大学,2017.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
维吾尔语名词词汇语义网构建研究.pdf(12550KB)学位论文 开放获取CC BY-NC-SA浏览 请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[阿力木·木拉提]的文章
百度学术
百度学术中相似的文章
[阿力木·木拉提]的文章
必应学术
必应学术中相似的文章
[阿力木·木拉提]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 维吾尔语名词词汇语义网构建研究.pdf
格式: Adobe PDF
此文件暂不支持浏览
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。