# Uncovering the structure of an epidemic

An epidemic is spreading fast from airport to airport (not really). Scientists have been trying to create a map of the spread to study and prevent. They have been attempting to find the day that each airport hosted an infected individual. Still, they suspect that their measurements are imprecise. How can they denoise their data?

To have fun I decided to simulate this scenario. One interesting dataset I stumbled upon is the global flight network, depicting 2980 airports and their flight interconnections.  The network comprises of a giant component covering the major world airports – if it wasn’t so, flying would be much more cumbersome. Network check! How about the epidemic? Well, studying the spread of diseases is a well established scientific topic both mathematically and numerically. To simulate my epidemic, I opted to use the simple (-istic) SIR model, standing for Suspected Infected Recovered.  The results are shown below.

The video shows how the disease travels from the core of the network to those airports that are hard to get to. Here is a map view:
It might be more interesting however to consider the visualize the time (in days) until infection. In the picture below, blue nodes are infected latter than red ones (left). In addition, the figure on the right depicts the same data but this time corrupted with uniform noise – this is what the scientists know.
The objective is to identify the real signal (on the left) from noisy data (on the right). The key insight we exploit to do the denoising is that, since the epidemic spreads from airport to airport, it results in a smooth signal on the flight network, meaning that well connected airports will have similar infection times. Our task is then to distinguish between smooth and non-smooth signals on the network, an ideal task for graph filters. A graph filter is very similar to a standard filter: it processes the high dimensional data so as to remove high frequency variations attributed to noise. The only difference being that it takes into account the topology of the underlying space where the epidemic occurs, i.e., the network. For more information on graph filters, I urge you to check [The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains, Distributed Autoregressive Moving Average Graph Filters]. A  visual comparison between the real (left) noisy (middle) and the denoised signal (right) is shown below:
If you look really closely you might observe that the graph filter eliminated some of the details of the real epidemic. Nevertheless, the denoised signal is pretty close to the real epidemic.