This post complements my recent article Graph reduction with spectral and cut guarantees that will appear in JMLR 2019. It is a follow up of Spectrally Approximating Large Graphs with Smaller Graphs published in ICML 2018. The research was kindly supported by the Swiss National Science Foundation.
You can download the demo-code here and the slides of my presentation here. The latest version of the code can be found in my github repository. If you find this useful, consider citing the above papers.
Andreas Loukas,
November 1st 2018
Problem setup
We will consider three representative networks:
- minnesota: a road network consisting of 2642 nodes
G = graphs.Minnesota()
- airfoil: a mesh consisting of 4000 nodes.
G = graphs.Airfoil() G = graphs.Graph(G.W[:,:4000][:4000,:], G.coords[:4000,:])
- yeast: consists of protein-to-protein interactions in budding yeast. The data were taken from networkrepository.com. I am not sure of the source, unfortunately. After keeping only the giant component, the graph consists of 819 nodes.
W = np.load('./data/bio-yeast.npy') G = graphs.Graph(W = W[0:1000,0:1000]) if not G.is_connected(): G,_ = graph_utils.get_giant_component(G)
For the yeast network we will also need to compute some node coordinates using graphviz. This will help us visualize things later on.
if not hasattr(G, 'coords'): graph = nx.from_scipy_sparse_matrix(G.W) pos = nx.nx_agraph.pygraphviz_layout(graph, prog='neato') G.set_coordinates(np.array(list(pos.values())))
Let’s do some local variation coarsening
We focus on the following two methods:
- variation_edges: contracts edges
- variation_neighborhood: contracts sets formed by node neighborhoods
There are two parameters we need to set:
- First the dimensionality reduction ratio
controls how many nodes to remove. Remember,
is the size of the original graph and
the target size.
- The parameter
is the size of the eigenspace we are interested in preserving.
# Parameters r = 0.5 k = 10
To coarsen our graph, we simply execute the following:
C, Gc, Call, Gall = coarsen(G, K=k, r=r, method='variation_neighborhood')
Visual inspection
We use the following utility function to inspect the result
fig = plot_coarsening(Gall, Call);
This will plot graphs in total, where
is the number of levels. The left-most is the original graph and the right-most is the final coarse graph. Colors are used to indicate the size of each contraction set
:
- green is for
blue is for
, red is for
, and yellow for
Below, I illustrate the code output for all three graphs and both coarsening methods.
Few general remarks:
- Selecting larger contraction sets (as the neighborhood variation method does) means that we generally need fewer levels of coarsening and often yields better results.
- In many instances it is difficult to say why the algorithm choose what it did. Informally, the methods aim to reduce a graph such that what is smooth remains smooth also after coarsening. It is an interesting and non-trivial consequence then that all good cuts of the original large graph will also be (approximately) preserved in the coarse graph.
Some quantitative results
We consider three metrics for coarsening quality:
- (Left) The restricted similarity
is defined such that, for every
we have
Note that consists of the first
eigenvectors of the Laplacian
associated with
.
- (Center) The relative eigenvalue error is computed as
for every
where $latex\lambda_i$ and are respectively the eigenvalues of the Laplacian
of
and
of
.
- (Right) The angle matrix contains the angles between the eigenvectors of
(y-axis) and the lifted eigenvectors of
. The closer to counter-diagonal it is, the better.
I will only show the results for the neighborhood-based method.
Minnesota graph:Airfoil graph:
Yeast graph:
For reference, in the paper I also include exhaustive comparison with other standard methods for graph coarsening (heavy edge, kron reduction, affinity, algebraic distance) that can be found in the literature.
What about the signals?
Finally, let’s inspect how coarsening affects the information defined on the nodes of a graph (i.e., graph signals in the GSP lingo). The code is as follows:
size = 2.04; fig, axes = plt.subplots(2, 2, figsize=(2*size*4, 2*3*size)); lineWidth = 1 # a random smooth signal x = G.U[:,:k] @ np.random.randn(k,1) x = x / np.linalg.norm(x) G.plot_signal(x, ax=axes[0][0], title='signal') # coarsen it xc = coarsen_vector(x, C) Gc.plot_signal(xc, ax=axes[0][1], title='coarsened signal') # lift it xp = lift_vector(xc, C) G.plot_signal(xp, ax=axes[1][0], title='lifted signal') # reconstruction error G.plot_signal(np.abs(x-xp), ax=axes[1][1], title='reconstruction error') print('signal error: {}'.format(np.linalg.norm(x - xp)))
Again, I show only the results for the neighborhood-based method.
Minnesota graph:
signal error: 0.022988891880551716
Airfoil graph:
signal error: 0.020485495357371118
signal error: 0.05260328950018943

From the small reconstruction error it can be deduced that smooth signals are almost unaffected by coarsening. This is an indirect consequence of optimizing restricted similarity. Indeed, the signal of interest is unknown at coarsening time.