XJIPC OpenIR  > 多语种信息技术研究室
基于 ELK 的日志分析与海量数据检索系统的设计与实现
姚攀
Subtype硕士
Thesis Advisor马玉鹏
2018-05-25
Degree Grantor中国科学院大学
Place of Conferral北京
Degree Discipline计算机应用技术
Keyword日志处理 信息检索 服务器监控 Elk Elasticsearch
Abstract

网络与信息处理技术的飞速发展使人类进入了大数据时代,数据量呈指数级增长,各行各业都面临海量数据处理的压力。自治区某物联网系统中的应用日志还停留在手工排查阶段,排查日志的效率低下,缺少集中处理与分析,存储在关系型数据库中的海量数据存在查询耗时过长的问题,此外众多服务器的运行监控问题也亟需解决。为了实现应用日志和服务器指标的分布式采集与实时分析,提高海量数据分析的效率,本文提出了基于ELK(Elasticsearch、Logstash、Kibana的缩写)技术栈的分布式数据实时处理与分析解决方案,结合实际需求设计并实现了日志分析系统和海量数据检索系统。论文的具体研究内容如下:1.对大数据处理技术进行了调研,对比了主流大数据处理系统的特点和应用场景,重点研究了分布式搜索引擎的核心技术原理,对目前新兴起的ELK技术栈的系统架构、技术原理和使用方法进行研究和实践。2.针对系统应用日志缺少集中处理与分析的问题,设计并实现了分布式日志分析系统,完成了应用日志的分布式采集、解析、存储与可视化分析工作,解决了传统日志处理方法中的日志处理效率低下、耗时过长、缺少可视化分析等弊端。3.针对物联网应用服务器缺少有效的监控问题,实现了服务器指标的分布式采集和实时监控,减少了工程师和运维人员的负担,为服务器的稳定运行提供保障。4.基于Elasticsearch分布式搜索引擎,设计并实现了海量数据检索系统,弥补了关系型数据库在大规模数据检索时耗时较长和缺少全文检索功能的不足。由于Elasticsearch内置分词器、基于词典的IK分词器和Mmseg分词器都不能满足中文地址分词的需要,采用地址要素级别和规则相结合的地址分词方法并实现Elasticsearch中文分词插件。目前日志分析系统和海量数据检索系统已经完成并投入使用。在日志分析与服务器监控方面,实现了日志的实时采集与分析、服务器指标的实时监控,能够显著减少工程师和运维人员的负担,为服务器的稳定运行提供保障。海量数据检索系统实现了亿级数据的高性能检索,并且具有线性扩展的特性,数据处理效率高,能够有效提高数据分析的效率。

Other Abstract
AbstractWith the rapid development of network and information processing technology, mankind has entered the era of big data. Data volume is increasing exponentially, and all trades and professions are facing the pressure of massive data processing. The application log in the supervision system of the gasoline sales information collection and supervision system is still in the stage of manual investigation, the efficiency of the log is low, the centralized processing and analysis is absent.Besides, analysis of large amounts of data in relational databases takes a long time,and the problems of the monitoring of many servers need to be solved urgently.In order to realize the distributed collection and real-time analysis of application log and server metric, and to improve the efficiency of mass data analysis, this paper proposes a distributed real-time data processing and analysis solution based on ELK stack.ELK is the abbreviation for Elasticsearch, Logstash, and Kibana .The log analysis system and mass data retrieval system are designed and implemented in combination with the actual requirements. The specific research contents of this paper are as follows:1. Research on large data processing technology, compare the characteristics and application of the major data processing system, focus on the core technology principle of the distributed search engine. Research and practice the system architecture, technology principle and use method of the newly emerging ELK stack.2. In view of the lack of centralized processing and analysis of the system application log, a distributed log analysis system is designed and implemented, and the distributed collection, parsing, storage and visualization analysis of the application log are completed. It makes up for the shortcomings of traditional log processing methods, such as inefficient log processing, long time consuming, and lack of visual analysis.3.For the lack of effective monitoring of IoT servers, a distributed acquisition and real-time monitoring of the server metrics is realized, which reduces the burden of engineers and operators and provides security for the stable operation of the server.4. Based on distributed search engine Elasticsearch, a mass data retrieval system is designed and implemented, which makes up for the long time consuming and lack of full text search function of relational database in large-scale data retrieval. Because Elasticsearch's built-in analyzer, IK analyzer and Mmseg analyzer can't meet the needs of Chinese address segmentation need, a address segmentation method combining the address factor level and the rule is adopted .At present, log analysis system and massive data retrieval system have been completed and put into use. In terms of log analysis and server monitoring, application log real-time collection and analysis are implemented,which can significantly reduce the burden of engineers and system administrators and guarantee the stable operation of the servers. The massive data retrieval system has realized the high performance search of the data, and has the characteristics of linear expansion, high data processing efficiency, which can effectively improve the efficiency of data analysis
Pages83
Document Type学位论文
Identifierhttp://ir.xjipc.cas.cn/handle/365002/5454
Collection多语种信息技术研究室
Recommended Citation
GB/T 7714
姚攀. 基于 ELK 的日志分析与海量数据检索系统的设计与实现[D]. 北京. 中国科学院大学,2018.
Files in This Item:
File Name/Size DocType Version Access License
基于ELK的日志分析与海量数据检索系统的(2986KB)学位论文 开放获取CC BY-NC-SAView Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[姚攀]'s Articles
Baidu academic
Similar articles in Baidu academic
[姚攀]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[姚攀]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 基于ELK的日志分析与海量数据检索系统的设计与实现.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.