XJIPC OpenIR  > 多语种信息技术研究室
Thesis Advisor李晓
Degree Grantor中国科学院大学
Place of Conferral北京
Degree Name硕士
Degree Discipline计算机应用技术
Keyword云计算 Mysql Hadoop Mongodb Mahout K-means 海量数据

随着大数据时代的到来,人们的生活方式和消费习惯也发生重大变化,网上购物以其方便与廉价的特点受到越来越多人的青睐。越来越多的商家不仅有自己的实体店而且开设了电子商务网站。随着电子商务网站巨大的访问量和庞大的交易额,电子商务网站将产生海量的交易记录需要存储与分析,而传统关系数据库对大数据的处理能力正面临严峻的挑战。为了解决所出现的挑战,云计算技术在如今数据急剧膨胀的时代应运而生。Hadoop 是一个对大数据进行分布式处理的云计算框架具体实现,它可以使用户在不了解分布式系统底层细节的情况下,开发分布式程序对大数据进行处理。Hadoop 现在已经成为企业界和学术界研究的热点。本文主要研究内容是基于Hadoop 平台的海量交易记录分析系统中的数据存储与算法优化。首先,对Hadoop 及其相关技术的研究,主要包括Hadoop 系统中的HDFS 和MapReduce 的研究,以及Mahout 中算法的并行化实现。其次,关系数据库、非关系型数据库与Hadoop 存储的效率对比,为存储不同类型海量数据提出一种解决方案。然后,将基于Mahout 优化聚类中心的K-means 算法与Mahout 中的K-means 算法对比,缩短了聚类的时间,提高了聚类的效率。最后,对基于Hadoop 平台的海量交易记录分析系统进行了设计和实现。这个交易记录是用户使用手机刷卡所产生的消费信息。根据用户需求实现其功能,例如:分析优质客户、用户销售量统计与分析、地域性分析与推荐等。本文将新的存储方案与改进的算法应用于该系统中,并验证了系统的可行性和正确性。

Other Abstract
With the arrival of the era of big data, people's lifestyles and consumption habits are also significant changes in online shopping, for its convenient and cheap characteristics favored by more and more people. More and more businesses not only have its own in-store, but also provide an e-commerce Web site. As the electronic commerce website huge visit quantity and the huge volume of trade, it will produce a tremendous amount of transaction records need to be stored and analyzed. While big data processing ability of traditional relational databases are facing severe challenges. In order to address the emerging challenges,now cloud computing has appeared in the data explosion era. Hadoop, which can be distributed processing of large data and make users don't know about distributed system-level details of the case and development of distributed programs for data processing, is a framework for cloud computing implementation. Nowadays Hadoop has become a hot research in academic and enterprise. The main contents of this article--based on the Hadoop platform data analysis and implementation of trading records--are data storage and algorithm optimization. First, the research on Hadoop and related technology, including HDFS ,MapReduce and algorithm for parallel implementation in Mahout. Second, the experimental result of relational databases, non-relational database and Hadoop storage efficiency comparison, which proposes a solution for different types of mass data storage. Then, the optimal cluster centers' K-means algorithm based on Mahout compares with in Mahout, the conclusion shows that the former shorten clustering time and improve clustering efficiency. Finally, the Hadoop platform for massive transaction analysis system is designed and implemented. This transaction is generated by the users who use mobile phone credit card information. According to user needs achieve systematic functions, such as: analyze the high quality customers, customer sales statistics and analysis, geographical analysis and recommendations, etc. In this paper, the new storage solution and the improved algorithms are applied to the system, and the system's feasibility and correctness are verified.
Document Type学位论文
Recommended Citation
GB/T 7714
韩岩. 基于Hadoop平台对交易记录的数据分析系统的设计与实现[D]. 北京. 中国科学院大学,2015.
Files in This Item:
File Name/Size DocType Version Access License
基于Hadoop平台对交易记录的数据分析(3175KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[韩岩]'s Articles
Baidu academic
Similar articles in Baidu academic
[韩岩]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[韩岩]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.