|Place of Conferral||北京|
|Keyword||重叠社区发现 非负矩阵分解 复杂网络 社交网络|
At present,there are plenty of graph data in real-world which contains both entities and the relations of entities, and this kind of relations are pretty complex. Take social network data as an example, the data contain both users and the interactions between users such as comments, follow and forwarding. This data can be represented as graph data where nodes represent users and edges denote the relations between users.The nodes, edges and structures of the graph data contain a lot of information. And many techniques have been applied to reveal the information in graph data and one of them is community detection. Community detection aims at detection groups which are densely connected and consist of a group of nodes--communities. In this field, if the communities are disjoint, it is called non-overlapping community detection or community detection, otherwise, it is called overlapping community detection. This essay falls into the second category. That is the circles detected are overlapping.The research of community detection is of great value in theory and applications. At the theoretical level, community detection technique helps mine the structure of graph data, and the cluster and interactions between nodes. At the application level, community detection technique can be applied to products recommendation or friends recommendation, gang detection on shopping platform, and the detection of abnormal sites such as pornographic and gambling drug sites, junk websites, etc.The thesis focuses on overlapping community detection and unfold with non-negative matrix factorization algorithm. Firstly, to improve the stability of overlapping community detection based on non-negative matrix factorization and explore the performance from different views, and a novel overlapping community detection algorithm model network from probabilistic aspect is proposed. This algorithm treats vertices and edges in network as observations and communities as latent factors, the network includes unknown factors in addition to observations. To reveal this latent factors, expectation-maximization algorithm is adopted to solve it iteratively. The generation process of the vertices-communities matrix is treated as a type of non-negative matrix factorization, and the square sum of Euclidean distance is formulated as loss function to control iterations. This algorithm can computes the membership of nodes to communities, and has a comparable performance.Then, in the study of community detection algorithms based on non-negative matrix factorization, we find that if the text data are converted into graph data by means of definitions, then the overlapping community detection algorithms can be applied to solve problems in the filed of text mining.Therefore, to solve the problems mentioned above, we convert text data into document-word matrix by term-frequency inverse document frequency, and discretize document-word matrix into binary matrix by its nature of sparsity, and detect topics by non-negative matrix factorization methods. Compared with traditional topic models such as LDA, our method makes the conditional independence assumption between words unnecessary and can capture the semantic relations between words, and is more computationally efficient (30x faster) and comparative effective.Then, when the overlapping communities are detected by non-negative matrix factorization method, these communities are the subset of raw graph data with high dimensional sparse structure. However, vertices and edges in graph always own attributes existing in the form of vectors, which makes it difficult to combine both communities and attributes.To solve problems mentioned above, in this thesis, we detect network communities via non-negative matrix factorization method, and then map network communities into feature vector in Euclidean space, so that the network communities feature can be concatenated with other features vectors easily. The experimental results show that the performance improve by 7.2% with the addition of the community feature and the transformation of mapping communities into Euclidean space feature vector is effective.Finally, gas stations in the Xinjiang province accumulate a lot of vehicles refueling data, and these data can be represented as bipartite heterogeneous network. Bipartite heterogeneous network contains two types of vertices and links are only formed between different type of vertices. To analyse the data by graph analysis technique, overlapping community detection based on non-negative matrix factorization are adopted. Analysing the refueling network by overlapping community detection algorithm can mine the cluster relations between vehicles and gas stations. This cluster relations are of great importance for maintaining public safety, such as abnormal vehicles detection.
|赵清华. 基于非负矩阵分解的重叠社区发现研究及其应用[D]. 北京. 中国科学院大学,2018.|
|Files in This Item:|
|基于非负矩阵分解的重叠社区发现研究及其应（1914KB）||学位论文||开放获取||CC BY-NC-SA||View Application Full Text|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.