XJIPC OpenIR  > 多语种信息技术研究室
Thesis Advisor李晓
Degree Grantor中国科学院大学
Place of Conferral北京
Degree Name硕士
Degree Discipline计算机应用技术
Keyword电子政务 政府网站 Web日志挖掘 Web使用挖掘 数据分析

随着Internet的飞速发展,人们交流和获取信息的方式都发生了很大的变化,网络成了人们主要信息来源。政府网站作为电子政府的核心,逐渐成为了政府发布相关政策、法律、信息的主流平台,公众对于政府网站的使用方式也随之发生了改变。公众希望可以通过政府网站与政府相关部门人员交流、提供自己的监督意见。我国政府大力支持电子政务的发展,经过多年的努力,我国的各级政府网站的建设也越来越好,同时积累了海量的日志数据文件。如何有效的对政府网站用户的使用数据进行收集与预处理,直接关系到其中潜在规律的挖掘结果,是一个非常值得研究的课题。本文针对面向电子政务的用户行为数据收集与预处理进行了相关研究。首先,对国内电子政府发展历史及现状进行调查研究,分析了政府网站的职能、特点和用户体验,指出了其发展过程中存在的问题。为了解决这些实际中存在的问题,结合用户行为数据收集方法难易情况和实际需求的用户行为数据收集的颗粒度,确定了基于服务器日志的数据收集方法。实际的数据处理中,为了提高数据清洗的效率,提出了SNM ( Sorted neighborhood method, 临近记录排序)算法的改进算法,增加了长度过滤和对属性缺失情况的判断,提高了数据清洗的准确度和效率。针对政府网站用户行为的特点实现了用户识别、会话识别、路径完成的相关启发式算法,并对其用户识别的有效性进行了验证。 最后,实现了面向电子政务的用户行为数据收集与预处理平台的运行,对政府网站的实际运行日志数据进行了分析,得到了相应的分析,并对平台的性能进行了分析。

Other Abstract

The rapid development of Internet has hugely influenced the way how people communicate and get information. The Internet has become the main source of information for people. As the core of e-government, government website has become the platform of publishing the relevant policies, laws and information. How people use government websites also changes. The public hope to communicate through the government website with relevant officer, supervise and provide their own advice. Chinese government strongly supports the development of e-government, after years of efforts, all levels of government websites are being built better and better, and vast amounts of log data files has accumulated. It is a subject need to study, how to effectively collect and preprocess the government website user's data, which is directly related to the results of the potential data mining.In this paper, user behavior data collection and preprocessing for e-government is researched. First, we investigate and study the history and current status of the domestic development of e-government, analyze the functions of government websites, features and user experience, point out its shortcomings. To solve these problems, considering the advantages and disadvantages of different collection methods and the request of how exquisite the user behavior data should be, we collect data based on the server log. In order to improve data cleaning efficiency, we propose an improved SNM (Sorted-Neighborhood Method) algorithm, based on length filtering and dynamic fault-tolerance (LF-SNM). LF-SNM improved accuracy and efficiency of data cleaning. Considering the characteristics of e-government website user, heuristic rules are proposed for user identification, session identification and path completion. Finally, we runs a user behavior data collection and pre-processing platform for e-government successfully. The platform can analyze e-government websites log data and provide result reports.

Document Type学位论文
Recommended Citation
GB/T 7714
刘雅思. 面向电子政务的用户行为数据收集与预处理[D]. 北京. 中国科学院大学,2016.
Files in This Item:
File Name/Size DocType Version Access License
面向电子政务的用户行为数据收集与预处理.(2339KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[刘雅思]'s Articles
Baidu academic
Similar articles in Baidu academic
[刘雅思]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[刘雅思]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.