No More Graph Clustering Algorithms, Please: Disjoint Clustering Algorithms Now Able To Detect Overlapping Clusters

Published by Tanmoy Chakraborty

IIIT Delhi, India

These findings are described in the article entitled Ensemble-based overlapping community detection using disjoint community structures, recently published in the journal Knowledge-Based Systems (Knowledge-Based Systems 163 (2019) 241-251). This work was conducted by Tanmoy Chakraborty from IIIT Delhi, Saptarshi Ghosh from IIT Kharagpur, and Noseong Park from George Mason University.

Most of the complex systems such as social networks (e.g., Facebook, Twitter), biological systems (e.g., protein-protein interactions) can be modeled by networks/graphs where nodes/vertices in a graph represent different objects/entities (users in social networks, proteins in biological systems) and edges/links between nodes represent relationships between objects (friendships in social networks, proteins interacting during the metabolic process within a cell). Due to the availability of large data, researchers working on difference disciplines (such as computer science, physics, biology, and mathematics) have started representing the overall complex system using graphs.

One of the important problems in graph analysis is to detect densely connected nodes, aka “communities” or “clusters”. What is the definition of a community structure is still an “ill-posed problem” – although it is roughly defined as the groups of nodes densely connected internally and sparsely connected externally, several metrics such as modularity, conductance, cut ratio, and permanence have been proposed to define a community structure (Readers are encouraged to read “Metrics for community analysis: A survey,” by Chakraborty et al., ACM Computing Surveys, 2017 to know all possible definitions of a community).

There are two types of communities in graphs – disjoint, where a node can belong to only one community, and overlapping, where a node can be a part of more than one community (see Fig 1). While disjoint community is a more theoretical concept of network partition, overlapping community is more natural in real world, e.g., in social networks, a user is usually a part of many communities (communities of his/her high school friends, college friends, and colleagues, etc.). However, the complexity of the problem of community detection will exponentially increase when we go from disjoint to overlapping community detection due to the exponential number of possible ways to assign nodes into groups.

Fig 1: A schematic example disjoint and overlapping community structures. Image republished with permission from Elsevier from

There has been a plethora of research in detecting disjoint communities. Readers are encouraged to read “Community detection in graphs,” by Santo Fortunato, Physics Reviews 2010, to know more about disjoint community detection methods. Compared to this, the literature on overlapping community detection has been relatively less (read “Overlapping community detection in networks: the state-of-the-art and comparative study,” by Xie et al., ACM Computing Surveys 2011) till 2014. In recent years, attempts have been made by various interdisciplinary research communities to develop algorithms for overlapping community detection. As a result, it has become immensely difficulty for beginners (who are unfamiliar with the literature on community detection) to pick and apply a specific (and usually the best) community detection algorithm from the ocean of algorithms.

Through a series of studies, we started believing that overlapping community detection can be solved using disjoint community detection methods. We empirically observed that in the case of most of the real networks, nodes laying at the border of the disjoint communities are responsible for forming the overlapping part of the community structure. So if we run an accurate disjoint community detection algorithm, identify peripheral nodes, and assign them to appropriate communities, we may be able to get the final overlapping community structure.

In 2015, I made the first attempt to prove this hypothesis empirically, which resulted in the following publication: “Leveraging disjoint communities for detecting overlapping community structure,” T. Chakraborty, Journal of Statistical Mechanics: Theory and Experiment (JSTAT), 2015. In this paper, I used a metric, called “permanence,” which was proposed by us earlier for disjoint community detection (“On the permanence of vertices in network communities,” Chakraborty et al. ACM SIGKDD 2014), to identify such potential peripheral nodes which are responsible to form the overlapping community structure.

In 2018, we made a second attempt to prove our hypothesis again, which resulted in the following publication: “Ensemble-based Overlapping Community Detection using Disjoint Community Structures,” Chakraborty et al., Knowledge-Based Systems, 2019. In this work, we hypothesized that, instead of designing yet another overlapping clustering method, an alternative way could be to leverage the community structures obtained from various heuristic-based disjoint community detection methods on a given network and discover the overlapping regions. We are the first to propose an ensemble algorithm, called EnCoD, which systematically materializes the hypothesis mentioned above.

Fig 2: Conditional probability P (in y-axis) of (a) existence of an edge between two vertices, and (b) common overlapping community (OC) membership of two vertices in the ground-truth community structure of six real networks (Senate, Flickr, Coauthorship, Youtube, LiveJournal, Orkut), given the fraction of the base disjoint communities that both the vertices share. We conclude that the more the vertices share common base disjoint communities, the more the chance that they are connected in the graph and belong to the same overlapping community. Image republished with permission from Elsevier from

The backbone of EnCoD stands on an interesting observation drawn from multiple real-world network structures as follows: the more a pair of nodes co-occur together in the same “base disjoint communities” (i.e., communities obtained from different disjoint community detection algorithms), the more the chance that they will be connected via an edge in the original network and they share same overlapping region. We ran several well-known disjoint community detection algorithms (we call them “base disjoint community detection methods”) and detected disjoint community structures (we call them “base community structures”). EnCoD uses the information of the community memberships of a vertex as features and iteratively learns the major dimensions of features which would lead to the densely connected network.

In particular, since the base disjoint community detection methods are built on different heuristic methods, the base community structures that we obtain from a particular network would be significantly different. Therefore, EnCoD iteratively keeps on exploring the important dimensions from the vector to generate the network structure. EnCoD further uses two novel observations from the real-world networks: (i) nodes are densely connected in the overlapping regions, compared to the non-overlapping regions, and (ii) nodes belonging to the same communities are highly similar in terms of their feature vectors.

To evaluate the performance of EnCoD, six large real-world networks were considered whose ground-truth overlapping community structures (the actual community structure that the algorithm should be able to detect) are known to us, along with many synthetically-generated networks. EnCoD was compared against several state-of-the-art overlapping community detection methods. We observed that:

  • The performance of EnCoD does not vary much with the performance of the base algorithms; however, care should be taken to detect more diverse base algorithms.
  • The performance of EnCoD remains consistent after inclusion of a certain number of base algorithms.
    In general, EnCoD turns out to be the better overlapping community detection method than other baseline methods.
  • The idea behind the design of EnCoD can also be used when vertices are associated with other types of features.

We believe that this work will enable others to realize that there may not be any further need to develop yet another overlapping community detection method. We also believe that the paradigm of ensemble learning is new in the context of community detection, which needs further studies both theoretically and empirically.

The paper is available at

The preprint of the paper is available at

More details on the related publications can be available at and

About The Author

Tanmoy Chakraborty

Tanmoy Chakraborty is an assistant professor at the Indraprastha Institute of Information. His primary research interests include Network Science, Data Mining, Natural Language Processing and Data-driven cybersecurity. He has served as a PC member in various top conferences including WWW, AAAI, PAKDD, IJCAI, and reviewer of top journals including ACM TKDD, IEEE TKDE, ACM TIST, CACM.

Speak Your Mind!


Labeled Periodic Table

The periodic table of the elements is a representation of all of the chemical elements that have been discovered. The elements on the periodic table run from top to bottom and left to right in order of increasing atomic number, and in general the order of the elements is correlated with their atomic mass. In […]

Aggregation Induced Emission: Using New Luminescent Materials for Food Safety Monitoring

In 2016, the world shipped over USD 220 billion of meats and seafood. Naturally, ensuring the quality and safety of these foods is a major concern for everyone in the supply chain, from the food suppliers to consumers to government agencies. In particular, monitoring the vast amounts of food can easily become a logistical nightmare. […]

High Primary Production In Shallow Waters

Small freshwater bodies are supercharged ecosystems with regards to carbon turnover within a landscape. They often form the lowest point of catchments and, due to their high surface area to volume ratios, they receive significant external carbon and nutrient input through leaf litter, surface runoff, fertilizer seepage and other anthropogenic effluents. Consequently, such small, shallow […]

Using An Electrochemical Approach To Show That TP Binds To DNA Nucleobases

Exposure of deoxyribonucleic acid to endogenous or exogenous chemical and physical agents often leads to conformational changes in DNA structure. Small inorganic and organic molecules can non-covalently interact with DNA via formation of DNA-guest complexes. A ligand fixation into DNA structure can be performed by electrostatic binding between a positively charged guest and negatively charged […]

The Nitrogen Ice Activity On Pluto Explored With Numerical Modeling  

Numerical modeling of the nitrogen cycle on Pluto explains the distribution, color, geology, and morphology of the different nitrogen ice deposits observed. On July 14, 2015, our vision of Pluto changed as the NASA New Horizons spacecraft flew by Pluto and revealed an active frozen world, with unprecedented landscapes in the Solar System. On the […]

Modeling Metabolism To Investigate The Response Of S. aureus To Different Nutrient Environments

S. aureus is a type of bacterium that is carried in the nose of 30% of healthy adults and can cause diseases ranging from mild to severe. It is categorized as a serious-threat pathogen, especially because of its low susceptibility to many classes of antibiotics. Efforts to find new antibiotic regimens have shown that the […]

The Science Is In: Left-Handed People Are Exceptional

Many cultures in the olden days had superstitions about left-handed people. But science has more recently demonstrated that left-handed people are indeed very special. But it has not been plain sailing for left-handed people historically. Not only have superstitions often been against then but even in the 20th century, several studies suggested that on average […]