Bipartite Graph Partitioning for Data Clustering
Title: Bipartite Graph Partitioning for Data Clustering
Abstract: This research proposes a new data clustering method based on partitioning a bipartite graph. The partition is constructed by minimizing the normalized sum of edge weights between unmatched pairs of vertices. The algorithm uses a partial singular value decomposition (SVD) to approximate the solution. The study connects the clustering algorithm to correspondence analysis in multivariate analysis and discusses assigning data objects to multiple clusters. Experimental results on document clustering demonstrate the algorithm's effectiveness and efficiency.
Main Research Question: How can we develop an efficient method for clustering data by partitioning a bipartite graph?
Methodology: The method involves constructing a bipartite graph with terms and documents as vertices, with edges indicating the co-occurrence of terms and documents. Optionally, edge weights can be used to indicate the frequency of this co-occurrence. The goal is to minimize the normalized sum of edge weights between unmatched pairs of vertices. This is approximated by computing a partial singular value decomposition (SVD) of the edge weight matrix.
Results: The experimental results show that the proposed clustering algorithm performs well on document clustering tasks.
Implications: This study provides a new approach to data clustering by using a bipartite graph partitioning method. It also connects the clustering algorithm to correspondence analysis in multivariate analysis, which could lead to further research in this area. The ability to assign data objects to multiple clusters could also have practical applications in various fields.
Link to Article: https://arxiv.org/abs/0108018v1 Authors: arXiv ID: 0108018v1