Measuring the similarity between two graphs is a basic operation with many applications. For example, in the fields of chemical informatics and drug design, we often need to compare the similarity of a chemical (as modeled by a labeled graph) to a known drug (as modeled by another labeled graph) in order to evaluate the potential of the chemical for novel drug design. Applications include bioinformatics, data mining, pattern recognition, and graph classification.

Graph edit distance (GED) is a useful metric that gives a measure of similarity of two graphs. Intuitively, graph edit distance measures how many operations must be applied in order to transform the first graph into the second. Typical operations include inserting or deleting a node, inserting or deleting an edge, changing a node label, and changing an edge label. Computing the graph edit distance is an NP-hard problem, and thus computational cost is a concern, especially when we need to compute graph edit distances many times, as in some applications.

ADVERTISEMENT

Figure 1: An Illustration of Graph Edit Distance. Figure republished with permission from Elsevier from doi.org/10.1016/j.knosys.2018.10.002

Several approaches have been developed for the efficient computation of graph edit distance.  Typically, the possible graph mappings (between the two graphs) are organized as an ordered search tree, where the inner nodes denote partial mappings and the leaf nodes denote complete mappings.  Riesen et al. recently proposed the A-GED method.  The method is vertex-based because the inner nodes (i.e., partial mappings) are extended in an iterative manner by extending unmapped vertices of the two graphs.  The authors developed a heuristic function to estimate a lower bound on the GED in order to prune the search and reduce the number of mappings considered.  Nevertheless, a major drawback of A-GED is that it often needs to store too many partial mappings, resulting in huge memory consumption; in practice, A-GED is only applicable to relatively small graphs.

In this paper, Chen and co-authors present BSS_GED, a novel vertex-based mapping method that calculates the GED in a reduced search space, which is enabled by more aggressively identifying and discarding invalid and redundant mappings.  BSS_GED employs the beam-stack search paradigm, a widely utilized search algorithm in AI, combined with two specially designed heuristics to improve the GED computation, achieving a trade-off between memory utilization and expensive backtracking calls.  Through extensive experiments, the authors demonstrate that BSS_GED is highly efficient on both sparse and dense graphs and outperforms the state-of-the-art methods.  BSS_GED was further evaluated to solve the well-investigated graph similarity search problem.  The experimental results show that this method is an order-of-magnitude faster than the state-of-the-art graph similarity search methods.

These findings are described in the article entitled An Efficient Algorithm for Graph Edit Distance Computation, recently published in the journal Knowledge-Based Systems, 163, January 2019, 762–775, doi.org/10.1016/j.knosys.2018.10.002. This work was conducted by Xiaoyang Chen, Hongwei Huo, Jun Huan, and Jeffrey Scott Vitter.

ADVERTISEMENT

About The Author

Xiaoyang Chen received the B.S. degree from Xidian University in 2013, where he is currently pursuing the Ph.D. degree. His research interests include graph indexing and search, design and analysis of algorithms, external memory algorithms, and compressed indexes.

Hongwei Huo (M’00-SM’17) received the B.S. degree in mathematics from Northwest University, China, and the M.S. degree in computer science and the Ph.D. degree in electronic engineering from Xidian University, China. She is currently a professor with the Department of Computer Science, Xidian University. Her research interests include the design and analysis of algorithms, bioinformatics algorithms, external memory algorithms and compressed indexes, parallel and distributed algorithms, and algorithm engineering.

Jun Huan (SM’11) received the Ph.D. degree in computer science from the University of North Carolina. He is currently head of the Big Data Lab at Baidu Research.  He was formerly a professor with the Department of Electrical Engineering and Computer Science, the University of Kansas, where he directed the Data Science and Computational Life Sciences Laboratory, KU Information and Telecommunication Technology Center. He held courtesy appointments at the KU Bioinformatics Center, the KU Bioengineering Program, and a visiting professorship from GlaxoSmithKline plc. He has authored over 120 peer-reviewed papers in leading conferences and journals and has graduated more than ten graduate students including seven PhDs. His research works on data science, machine learning, data mining, big data, and interdisciplinary topics including bioinformatics. He serves the editorial boards of several international journals, including the Springer Journal of Big Data, the Elsevier Journal of Big Data Research, and the International Journal of Data Mining and Bioinformatics. He regularly serves the program committee of top-tier international conferences on machine learning, data mining, big data, and bioinformatics.

Jeffrey Scott Vitter (F’93) received the B.S. degree (Hons.) in mathematics from the University of Notre Dame in 1977, the Ph.D. degree in computer science from Stanford University in 1980, and an M.B.A degree from Duke University in 2002. He is currently Distinguished Professor of Computer and Information Science with the University of Mississippi, where he served as chancellor from 2016 to 2019. From 2010 to 2015, he was provost and executive vice chancellor and Roy A. Roberts Distinguished Professor at the University of Kansas. From 2008 to 2010, he was on the faculty at Texas A&M University, where he also served as a provost and executive vice president for academics. From 2002 to 2008, he was the Frederick L. Hovde Dean of Science with Purdue University. From 1993 to 2002, he held the Gilbert, Louis, and Edward Lehrman Distinguished Professorship with Duke University, where he also served as a chair of the Department of Computer Science and a co-director of Duke’s Center for Geometric and Biological Computing. From 1980 to 1992, he advanced through the faculty ranks in computer science at Brown University. He is Guggenheim Fellow and a Fellow of the ACM, AAAS, and IEEE, and he is a U.S. National Science Foundation Presidential Young Investigator and Fulbright Scholar. He has authored the book Algorithms and Data Structures for External Memory, co-authored the books Design and Analysis of Coalesced Hashingand Efficient Algorithms for MPEG Video Compression, co-edited the collections External Memory Algorithmsand Algorithm Engineering, and co-holder of patents in the areas of external sorting, prediction, and approximate data structures. His research interests span the design and analysis of algorithms, external memory algorithms, data compression, databases, compressed data structures, parallel algorithms, machine learning, random variate generation, and sampling. He has received the IBM Faculty Development Award, the ACM Recognition of Service Award (twice), and the 2009 ACM SIGMOD Test of Time Award. He has served on several boards and professional committees. He serves or has served on the editorial boards of Algorithmica, Communications of the ACM, IEEE Transactions on Computers, Theory of Computing Systems, and SIAM Journal on Computing, and has edited several special issues.