This blog-post overviews the paper “*Extrapolating paths with graph neural networks*” by Jean-Baptiste Cordonnier and Andreas Loukas (github code repository). The work was kindly supported by the Swiss National Science Foundation.

In the last period, we have been thinking about the following question:

*Can a graph neural network learn to extrapolate natural paths from examples?*

The word ‘natural’ here implies that we are not looking for the shortest or otherwise optimal path based on some graph-theoretic criterion. Rather, we consider paths that occur as a by-product of the interaction of an *agent* with a *graph*: a driver on the transportation graph, an information seeker in a knowledge graph, or a client in an online shop.

It is reasonable to assume that natural paths may follow predictable patterns. For instance, when I take an excursion into the city I tend to fall into habitual patterns, favoring few familiar roads and straying little. Somewhat similarly, the way I navigate from article to article in a Wikipedia reading session might be complicated but it is usually not random: consciously or not, my clicks are determined by a complex interaction between my choices (as constrained by the Wikipedia graph and its features) and by belief about how interesting or relevant these choices are.

## The path inference problem

I will return to this example later on. Before that, let me define the associated learning problem more concretely:

**The path inference problem: **given a graph and a path prefix, i.e., a partially observed sequence of nodes, we want to predict which nodes are in the missing suffix.

Path inference problems are demanding because natural paths tend to differ qualitatively from shortest paths. The choice of the agent at every step may depend on a number of factors, such as the structural properties of the graph and the conceptual similarity of nodes as perceived by the agent. At the same time, some form of directionality is involved in path formation, in the sense that an agent’s actions can be conditioned on the entire history of its trajectory. Making a parallel to Euclidean domains, a path can be thought as ‘*straight*‘ when the agent moves towards nodes that are far from where it was in the past and ‘*circular*‘ when it returns to its starting position. Contrasting our geometric intuition however, the space here is non-Euclidean and what is far or close should be determined in light of the graph structure.

*From a deep learning perspective, the main challenge we face is reconciling graph-based approaches with sequential data. *

Graph convolutional networks (GCN) have exhibited a measure of success at predicting properties of nodes (e.g., the category of a Wikipedia article or the average traffic flow in a road network) or of the graph as a whole (e.g., the solubility of a molecule or the functional similarity of two proteins). The isotropic nature of graph convolution however renders it a poor fit for sequential data. On the other hand, sequence prediction problems are typically solved with Recurrent Neural Networks (RNN). These are ideal for pure sequences, such as sentences or time-series, but do not take into account the graph structure.

## Finding paths with Gretel

Gretel is a graph neural network that is capable of encoding the directionality of paths. In essence, it is a path generative model that is trained (end-to-end) to maximize the log-likelihood of true suffixes (or alternatively to predict the agent’s position after hops).

The network uses an encoder-decoder architecture, a simplification of which is described next:

**Encoder.** Gretel modifies an input graph so that it encodes the directionality of an observed path prefix into a *latent graph*:

Though shares the same vertex and edge sets as the original graph , its edges are re-weighted so as to point towards the directions the agent is most likely to follow. The computation is done by which is a multi-layer graph neural network parametrized by (under the hood we are combining multiple edge-transformers with graph convolution).

**Decoder.** We then approximate the likelihood of any suffix by the *graph-dependent model*

which sees instead of . In particular, candidate suffixes are generated by a *non-backtracking random walk* on the latent graph . This comes with a number of benefits: Inference can be done efficiently and in closed-form. In addition, the network can be trained to estimate the true path likelihood from very little data.

## A toy scenario:* Can Gretel learn a straight path?*

We constructed a toy experiment to qualitatively test whether Gretel has the capacity to capture directionalilty in the Euclidean sense.

Specifically, we generated straight trajectories on a random graph built to approximate a plane. The trajectories were obtained by mapping straight lines to the closest nodes and sub-sampling the resulting path. The task is non trivial as the algorithm is not given the positions of the nodes. In addition, the graph differs from a regular grid and does not offer a good approximation of the underlying Euclidean space.

Four typical runs of the experiment are shown above. It can be seen that most of the probability mass (red disks) of the predicted distribution is concentrated close to the target. Moreover, as intended, the direction of the trajectory is encoded into the edge weights of , despite the sampling irregularity (note that a black arrow indicates the most significant edge at each node). It is intriguing to observe that, whereas in the top left figure Gretel does not identify correctly the target, the neural network’s answer presents a visually plausible alternative.

In the next blog-post, I will share some insights from the behavior of Gretel on a much more interesting (and difficult) task: predicting the choices of human players in the game of Wikispeedia!