The goal of the graph data processing (GDP) course is to provide a broad introduction to graph algorithms in data science and machine learning.
The course features two modules:
The network science module. Here we look as the analysis of networks from the perspective of network science. The lectures will follow chapters 2, 3, 4, 5, and 10 from Barabasi’s Network Science (NS) book. Some questions we will answer are:
- How are networks formed?
- Can we model them effectively and what network properties can be predicted accurately?
- What is the role of hubs in information spreading and why are epidemics so difficult to contain?
The learning on graphs module. A major effort will be given to show that existing data analysis techniques can be extended (and enhanced) on graphs. To do so, we will present strong mathematical tools based on linear and non-linear graph spectral harmonic analysis. Some key points covered are:
- How can graphs combat the curse of dimensionality?
- Why is the graph spectrum important?
- How to solve graph problems related to unsupervised and supervised learning, recommendation, visualization?
The course is given in the University of Rennes in the fall of 2018. It builds on the EPFL course ‘A Network Tour of Data Science‘ given by Prof. Pascal Frossard and Prof. Pierre Vandergheynst in 2017 (and earlier by Xavier Bresson), as well as on Prof. Barabási’s class on Network Science.
- keywords: learning on graphs, graph-structured data, graph signal processing; deep learning on graphs
- prerequisites: graph theory, basic probability theory & algorithms, linear algebra, familiarity with python
- instructor: Dr. Andreas Loukas
- advisor: Prof. Pierre Vandergheynst
Course schedule & syllabus
Lectures will take place every Tuesday 10:15-12:15 and Thursday 16:15-18:15 in room E110, building B02B at Rennes.
The topics covered will be
- 04/12/2018: Introduction (read chapter 2 of the NS book)
- 06/12/2018: Random graphs (read chapter 3 of the NS book)
- 11/12/2018: Scale-free networks (read chapter 4 of the NS book)
- 13/12/2018: Network formation models (read chapter 5 of the NS book)
- 18/12/2018: Spreading phenomena and network epidemics (read chapter 10 of the NS book, skip 10.D)
- 10/12/2018: Elements of spectral graph theory (chapters 1 (up to and including 1.4) and 2 (up to and including 2.3) from Fan Chung’s lectures.)
- 08/01/2019: Unsupervised learning with graphs: spectral clustering (slides, read von Luxburg’s tutorial.)
- 10/01/2019: Unsupervised learning with graphs: dimensionality reduction (slides, papers on ISOMAP, Laplacian Eigenmaps, LLE and t-SNE)
- 15/01/2019: Regularization on graphs with graph signal processing (slides, for GFT see relevant sections of 1/2, for filtering read this)
- 17/01/2019: Project presentation
- 22/01/2019: Supervised learning on graphs with deep learning (slides)
- 24/01/2019: Written exam
Final grade = 40% team project grade + 10% individual assignments grade + 50% written exam grade.
If you are motivated to learn more, in addition to the project papers, you may consider the following references:
- The “Spectral Graph Theory” chapter from David Spielman’s book.
There are three in total:
- Random graphs. Deadline: 10/12/2018
- Parsing, visualizing and testing scale-free networks. Deadline: 19/12/2018
- Dimensionality reduction with graphs. Deadline: 29/01/2019
Please send your answers by email. I will only accept submissions sent by midnight.
In a nutshell: the objective is to grasp, implement, and present the essence of a state-of-the-art paper in the field.
- Teams of 2.
- Each team selects one subject (see list below).
- Give a 15 minute presentation that explains main ideas during lecture on 17/01/2019.
- Implement and evaluate proposed method with real data. Send code reproducing experiments, with proper documentation. Tentative deadline: midnight of 30/01/2019.
- An in depth critical treatment of the core idea is expected, rather than an exhaustive account. Some brief guidelines are provided next to each subject. The teams are expected to present their implementation plan, along with any preliminary results, during the presentation.
- Present your findings in a blog style Jupyter notebook, including an easily accessible description of the code and performed experiments. Send the code by email.
- Feel free to use available libraries for common routines (e.g., graph filters, optimization solvers). The main method examined however should be coded from scratch (even if a pre-existing implementation exists). The grading will focus (i) the correctness/quality of the implementation, as well as (ii) on the depth of the evaluation.
- The teams are highly recommended to use real data for their experiments (see resources). Creative applications are appreciated.
- Graph sparsification by effective resistances. Is it possible to make graph sparser without affecting its spectrum? An implementation of the sparsification algorithm is expected, along with tests on real data.
- Local clustering with heat kernel pagerank. This paper gives an algorithm that finds small clusters in sublinear time. It is a theoretical work with strong practical implications. Only an implementation/evaluation of the simple heat kernel pagerank algorithm is expected, skipping the approximation argument.
- Sparse inverse covariance estimation with the graphical lasso. There is a strong connection between the task of learning a graph from features and Gaussian models. This algorithm is simple to implement given basic familiarity with convex optimization frameworks. A good evaluation is a must here.
- Graph kernels. Can two graphs of different sizes be compared? An implementation and evaluation of the random walk graph kernel is expected. For benchmark datasets specific to this problem, see here.
- Compressive spectral clustering. A fast clustering algorithm using graph filters and randomness. The project should focus primarily on the approximate spectral embedding obtained by combining randomness and graph filters. An examination of the compression step is optional.
- node2vec: scalable feature learning for networks. Adapt ideas from natural language processing (i.e., word embeddings) in order to map nodes into points in some high dimensional space.
- In certain cases, I will allow teams to choose their own subjects. If you are interested in this option, present to me your suggestion after class.
Subjects will be assigned to teams in a first come first serve basis. To claim yours, please announce your preference by replying to the post below.