Best Student Paper Award for work on faster network classification for machine learning
Comparing graphs the team’s tool is up to an order of magnitude faster than competitive baselines.
CSE PhD students Mark Heimann and Tara Safavi received a Best Student Paper Award at the IEEE International Conference on Data Mining (ICDM 2019) for their work “Distribution of Node Embeddings as Multiresolution Features for Graphs.” The students work with Prof. Danai Koutra to make networks more useful, predictive tools for structuring data.
An important problem in machine learning is graph classification, which allows the learning model to identify what category a graph or network belongs to by comparing it to other networks. This technique can be used to draw conclusions about datasets very quickly – in previous work, the group classified brain networks to distinguish between healthy patients and those suffering from a mental disorder such as schizophrenia. This was done based purely on the patterns of connectivity in their brains.
It turns out that to characterize the high-level structure of a network, it is helpful to look at the lower-level structural roles of the individual nodes. This observation is at the heart of the group’s new technique, called Randomized Grid Mapping (RGM). RGM leverages a technique called node embedding, which learns a real-valued vector to represent each individual node based on the structural roles that they play in the network.
“Usually these features are used for node-level classification tasks,” says Heimann, “but we show that they can be used for graph-level classification tasks too, as the spatial distribution of the nodes’ features actually serves as a distinctive way to characterize entire graphs.”
The group characterized each graph by a set of features representing the distribution of its node embeddings in the space of real-valued vectors. These graph feature maps are computationally efficient to compute even for large graphs and can also be used together with fast off-the-shelf supervised machine learning algorithms In contrast, the most competitive existing graph classification works use techniques that are notoriously difficult to scale to large datasets.
Experiments run by the group show that RGM feature maps gives comparable or better graph classification accuracy than several alternative approaches, including powerful graph kernels, unsupervised graph feature mappings, and deep neural networks. Comparing graphs based on their node embeddings with RGM is up to an order of magnitude faster than competitive baselines, while maintaining high classification accuracy.