This post complements my recent article Graph reduction with spectral and cut guarantees that will appear in JMLR 2019. It is a follow up of Spectrally Approximating Large Graphs with Smaller Graphs published in ICML 2018. The research was kindly supported by the Swiss National Science Foundation.

You can download the demo-code here and the slides of my presentation here. The latest version of the code can be found in my github repository. If you find this useful, consider citing the above papers.

Andreas Loukas,

November 1st 2018

## Problem setup

We will consider three representative networks:

**minnesota**: a road network consisting of 2642 nodes

G = graphs.Minnesota()

**airfoil**: a mesh consisting of 4000 nodes.

G = graphs.Airfoil() G = graphs.Graph(G.W[:,:4000][:4000,:], G.coords[:4000,:])

**yeast**: consists of protein-to-protein interactions in budding yeast. The data were taken from networkrepository.com. I am not sure of the source, unfortunately. After keeping only the giant component, the graph consists of 819 nodes.

W = np.load('./data/bio-yeast.npy') G = graphs.Graph(W = W[0:1000,0:1000]) if not G.is_connected(): G,_ = graph_utils.get_giant_component(G)

For the yeast network we will also need to compute some node coordinates using graphviz. This will help us visualize things later on.

if not hasattr(G, 'coords'): graph = nx.from_scipy_sparse_matrix(G.W) pos = nx.nx_agraph.pygraphviz_layout(graph, prog='neato') G.set_coordinates(np.array(list(pos.values())))

## Let’s do some local variation coarsening

We focus on the following two methods:

**variation_edges:**contracts edges**variation_neighborhood**: contracts sets formed by node neighborhoods

There are two parameters we need to set:

- First the
*dimensionality reduction*ratio controls how many nodes to remove. Remember, is the size of the original graph and the target size. - The parameter is the size of the eigenspace we are interested in preserving.

# Parameters r = 0.5 k = 10

To coarsen our graph, we simply execute the following:

C, Gc, Call, Gall = coarsen(G, K=k, r=r, method='variation_neighborhood')

## Visual inspection

We use the following utility function to inspect the result

fig = plot_coarsening(Gall, Call);

This will plot graphs in total, where is the number of levels. The left-most is the original graph and the right-most is the final coarse graph. Colors are used to indicate the size of each contraction set :

- green is for blue is for , red is for , and yellow for

Below, I illustrate the code output for all three graphs and both coarsening methods.

Few general remarks:

- Selecting larger contraction sets (as the neighborhood variation method does) means that we generally need fewer levels of coarsening and often yields better results.
- In many instances it is difficult to say why the algorithm choose what it did. Informally, the methods aim to reduce a graph such that what is smooth remains smooth also after coarsening. It is an interesting and non-trivial consequence then that all good cuts of the original large graph will also be (approximately) preserved in the coarse graph.

## Some quantitative results

We consider three metrics for coarsening quality:

- (Left) The
**restricted similarity**is defined such that, for every we have

Note that consists of the first eigenvectors of the Laplacian associated with .

- (Center) The
**relative eigenvalue error**is computed as

for every

where $latex\lambda_i$ and are respectively the eigenvalues of the Laplacian of and of .

- (Right) The
**angle matrix**contains the angles between the eigenvectors of (y-axis) and the lifted eigenvectors of . The closer to counter-diagonal it is, the better.

I will only show the results for the neighborhood-based method.

Minnesota graph:Airfoil graph:Yeast graph:

For reference, in the paper I also include exhaustive comparison with other standard methods for graph coarsening (heavy edge, kron reduction, affinity, algebraic distance) that can be found in the literature.

## What about the signals?

Finally, let’s inspect how coarsening affects the information defined on the nodes of a graph (i.e., *graph signals* in the GSP lingo). The code is as follows:

size = 2.04; fig, axes = plt.subplots(2, 2, figsize=(2*size*4, 2*3*size)); lineWidth = 1 # a random smooth signal x = G.U[:,:k] @ np.random.randn(k,1) x = x / np.linalg.norm(x) G.plot_signal(x, ax=axes[0][0], title='signal') # coarsen it xc = coarsen_vector(x, C) Gc.plot_signal(xc, ax=axes[0][1], title='coarsened signal') # lift it xp = lift_vector(xc, C) G.plot_signal(xp, ax=axes[1][0], title='lifted signal') # reconstruction error G.plot_signal(np.abs(x-xp), ax=axes[1][1], title='reconstruction error') print('signal error: {}'.format(np.linalg.norm(x - xp)))

Again, I show only the results for the neighborhood-based method.

Minnesota graph:

signal error: 0.022988891880551716

Airfoil graph:

signal error: 0.020485495357371118

signal error: 0.05260328950018943

From the small reconstruction error it can be deduced that smooth signals are almost unaffected by coarsening. This is an indirect consequence of optimizing restricted similarity. Indeed, the signal of interest is unknown at coarsening time.