XJIPC OpenIR  > 多语种信息技术研究室
维吾尔语方言口语语音识别中声学建模及多发音字典自适应研究
杨雅婷
学位类型博士
导师李晓
2012-04
学位授予单位中国科学院研究生院
学位授予地点北京
学位专业计算机应用技术
关键词语音识别 语料库 发音变异 多发音字典 维吾尔语方言口音
摘要

随着信息化水平的不断提高和国际交流的日益频繁,世界各国对语音识别的需求急剧增长。方言口语的语音识别是国内外研究的难点问题,少数民族语言的方言语音识别研究甚少,维吾尔语的方言口语语音识别研究尚属起步阶段。本文主要围绕维吾尔语声学特征的提取和模型训练,以及方言口音发音变异与多发音字典自适应的方法展开研究。运用三种策略对声学特征的提取和模型的训练进行优化,针对基于标准语音的识别系统在识别带有发音变异特征的方言口音语料时识别率低这一问题,提出一种在标准维吾尔语发音字典的基础上进行多发音字典自适应的方法。采用基于专家知识(Knowledge-Based)和数据驱动(Data-Driven)相结合的方法,分析维吾尔语方言口音发音变异规则,通过建立发音混淆矩阵(Confusion Matrix)和挖掘模糊发音映射对(Mapping Pair),构造发音变异集合,生成初始的多发音字典(Multi-Pronunciation Dictionary)。然后运用剪枝算法和门限阈值,从方言口语训练语音数据中获得精简的多发音字典,最终提高维吾尔语方言口语语音识别的识别率。 目前,国内很多自动语音识别系统(Automatic Speech Recognition,ASR)对发音规范的说话人,能够达到较好的识别性能,而对于带方言口音的自然口语的语音识别性能会急剧下降。维吾尔语方言口语的语音识别更未被深入研究。本文将焦点定位在维吾尔语带方言自然语音的识别问题,在基于维吾尔语方言口语语音识别的声学建模方面进一步探索,尝试从声学建模和多发音字典自适应的角度解决由于方言口音给识别任务带来的严重的识别困难,提出新的思路并通过实验证明其有效性,同时也为后续的深入研究积累了经验。 本文的主要工作和创新点概述如下: (1)研究建立并完善维吾尔语方言口语带声学特征的语料库,对其特征进行系统地声学分析。为后续进行该语言的标准音的推广使用、语音教学、语音通讯、语音识别等研究提供真实有效的依据。 (2)研究针对新疆地区少数民族语言语音识别相关技术研究尚处于初期阶段,缺乏相关积累的现状,提出基于子带频谱密度的语音端点检测等语音预处理方法,并设计适用于维吾尔语的聚类问题集,进行声学模型的训练和优化策略研究。 (3)研究中对维吾尔语的语音特性研究进行知识融合与技术创新,提出并实现对维吾尔语方言口语语音识别中多发音字典自适应(Pronunciation Dictionary Adaptation,PDA)的框架体系研究。使用基于数据驱动和专家知识相结合的方法对方言口音中存在的发音变异现象进行分析研究,提出基于Uni-gram 的累计概率剪枝策略对多发音字典进行有效剪枝,并对多发音字典输出概率进行归一化。验证发音字典自适应方法在维吾尔语方言口语语音识别中的有效性,从而有效提高识别率。 本研究将为维吾尔语语音信息处理和深层次的应用提供相应的基础支撑,为国家少数民族语言语音文字信息资源的建设起到填充作用,具有重要的研究价值和现实意义。研究致力于推进少数民族语音识别研究进程,满足当地多语种语音系统应用需求,探求一种适用于少数民族地区推广使用的研究方法,积累所需的经验并完善实施流程,从而满足信息处理技术领域日益增长的需求。

其他摘要

The recognition of spoken language is a difficult problem in speech recognition research area, and Uyghur spoken language recognition is still in the initial stage.The recognition rate on standard speech recognition system of Uyghur will drop a lot in the identification of the pronunciation variation corpus. These pronunciation variations can degrade an automatic speech recognition system’s performance on Uyghur. This research is mainly about the phenomenon of pronunciation variation in Uyghur and the construction of the multi-pronunciation dictionary. The recognition rate is not high enough, when recognizing the spoken language with pronunciation variation based on the recognition system of standard spoken language. To solve this problem, a new method to create the multi-pronunciation dictionary based on the standard dictionary is proposed. By combining knowledge-based method and data-driven method, the rules of Uyghur dialect pronunciation variation is analyzed, the set of pronunciation variation is constructed,the mapping pairs of vowels and consonants is calculated,the phoneme confusion matrix is established and the initial multi-pronunciation dictionary is generated. By using the automatic data-processing algorithms and threshold, the precise multi-pronunciation dictionary is automatically generated from Uyghur accent spoken language speech corpus, the preliminary experimental results show the capability of the proposed method to boost the performance of the Uyghur continuous speech recognition. In this study, the focus on the problem of Uygur language speech recognition with the accent spoken language, try to solve the serious identification difficult by the acoustic modeling and multi-pronunciation dictionary. Proposed new ideas and proved the effectiveness, also accumulated experience for the subsequent research in this field. The innovations can be summarized as the following: (1) Established the Uyghur accents corpus with the acoustic characteristics and improved it, analysis the prosodic features of speech systematically. Provide a real and effective basis on the standard pronunciation of the language, speech instruction, speech communications, speech recognition and so on. (2) The information processing technology of Xinjiang minority languages in is still in the early stages. Proposed a speech endpoint detection algorithm based on Uyghur acoustic frequency feature, designed the Uighur problem sets for clustering, researched on the acoustic model training and optimization strategy. (3) Did the knowledge integration and technological innovation to Uighur speech research. proposed and implemented the pronunciation dictionary adaptation framework for accent Uighur accent spoken language speech recognition system, combined knowledge-based method and data-driven method, multi-pronunciation expansion based on Uni-gram, obtained the dictionary after the pruning. The preliminary experimental results show the capability of the proposed method to boost the performance of the Uyghur continuous speech recognition. This research will enhance the quality of minority languages speech recognition, provide the foundation for Uyghur information processing and application, promote the construction of minority languages information resources,which has important meanings for both research and reality.

文献类型学位论文
条目标识符http://ir.xjipc.cas.cn/handle/365002/4397
专题多语种信息技术研究室
作者单位中国科学院新疆理化技术研究所
推荐引用方式
GB/T 7714
杨雅婷. 维吾尔语方言口语语音识别中声学建模及多发音字典自适应研究[D]. 北京. 中国科学院研究生院,2012.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
维吾尔语方言口语语音识别中声学建模及多发(1213KB)学位论文 开放获取CC BY-NC-SA浏览 请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[杨雅婷]的文章
百度学术
百度学术中相似的文章
[杨雅婷]的文章
必应学术
必应学术中相似的文章
[杨雅婷]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 维吾尔语方言口语语音识别中声学建模及多发音字典自适应研究.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。