Andres Masegosa Arredondo, University of Almeria

Probabilistic Programming with Deep Neural Networks


The field of probabilistic programming is currently a quite active topic of research. The main motivation has been the relevant flaws of current deep learning systems which are reducing its degree of adoption in many industries and in other research fields. The first main flaw is the lack of a natural way to quantify the inherent uncertainty when making predictions and the lack of transparency in these predictions. The second main flaw is that deep learning models were only applicable to supervised settings, involving the need for huge manually labelled data sets, which are not always easily available.

In recent years, researchers have tried to address these issues by merging the field of deep learning with the field of Bayesian statistics giving rise to a new field of research known as Bayesian Deep Learning. In this new field, the models contain random variables with complex non-linear functions in the form of deep neural networks. And these models can be naturally defined and learned using modern probabilistic programming languages, like the ones that Google (Tensor Flow Probability) and Uber (Pyro) has recently released. In these lectures, we will cover the main elements of modern probabilistic programming languages and how they allow to define modern Bayesian deep learning models. The talks will contain plenty of hands-on exercises on Python notebooks to illustrate the presented methods.

Tentative Outline:

  • Lesson 1 (90 mins). Introduction to PPLs.
  • Lesson 1 (90 mins). Bayesian inference with PPLs.
  • Lesson 1 (90 mins). Bayesian Neural Networks.
  • Lesson 1 (90 mins). Deep Generative Models.


Aymeric Dieuleveut, ENS Paris, Inria

Large-scale machine learning and convex optimization       –    Download Course Material

Modern machine learning methods involve learning in high dimensional spaces and utilize tremendous amounts of data: these two factors alone, even for simple models, have resulted in new challenges and given birth to new algorithms over the last 20 years. The goal of this course is to describe the large scale supervised learning context and to review methods and algorithms used in such a setting, especially in convex optimization settings. We will focus on stochastic gradient methods and algorithms that come with convergence guarantees.

Starting from the traditional statistical analysis, we will introduce classical methods for convex optimization, and describe convergence properties of algorithms under different regularity assumptions. We will also focus on variance reduced methods and convergence rates in distributed architectures.

Tentative Outline:

  • Lesson 1 (90 mins). Statistical and optimization tools.
  • Lesson 2 (90 mins). Stochastic Approximation.
  • Lesson 3 (90 mins). Finite sum problems and variance reduction.
  • Lesson 4 (90 mins). Distributed Optimization.


Julia Ive,  University of Sheffield / Imperial College London   

Machine Translation –    Download Course Material

Summary: Machine Translation (MT) has recently experienced a paradigm shift and is dominated now by neural models. In this course, after a brief historical overview, we will start with a primer on neural language modelling with Recurrent Neural Networks (RNNs). We will then look into the state-of-the-art (SOTA) Sequence-to-sequence models for Neural MT (NMT) based on RNNs, covering also important concepts such as attention and beam search. Finally, we will highlight the principles behind the current SOTA Transformer model that replaces RNNs by attention. A lecture on advanced NMT topics will provide an overview of current NMT problems: low-resource NMT, robust NMT, discourse-level MT, multimodal MT. The final lecture on MT evaluation  will cover automatic and human evaluation, error analysis and quality estimation.

Tentative outline:

  • Lesson 1 (90 mins): Machine Translation in its historical perspective. Neural Language Modelling.
  • Lesson 2 (90 mins): Primer on Neural Machine Translation: Models based on Recurrent Neural Networks. Transformer-based models.
  • Lesson 3 (90 mins): Neural Machine Translation: Advanced Topics.
  • Lesson 4 (90 mins): Evaluation of Machine Translation.


Paolo Napoletano, University of Milano-Bicocca

Computer Vision

Summary: The aim of this course is to provide the main concepts of computer vision, ranging from image formation, feature extraction, image classification and retrieval to deep learning with convolutional neural networks.
The course includes a discussion of some case studies and a demo session.

Tentative outline:

  • Lesson 1 (90 mins): Introduction to Computer Vision; Image formation; Feature Extraction.
  • Lesson 2 (90 mins): Image Classification, Image Retrieval.
  • Lesson 3 (90 mins): Convolutional Neural Networks.
  • Lesson 4 (90 mins): Case studies and demo session.


Pedro Larrañaga, Technical University of Madrid

Bayesian Networks: From Theory to Practice   Download Course Material  2  –3  –4

Summary:Bayesian networks are a kind of probabilistic graphical models with the main components: a directed acyclic graph representing conditional independencies among triplets of variables and a set of conditional probability distributions related with each variable and its parents in the graph. These models constitute a paradigm for explanaible and transparent machine learning, a hot topic in nowadays artificial intelligence.

During the course we will see how a joint probability distribution can be factorized by means of conditional probability distributions and using the structure of the graph. Different methods for providing inference will be exposed: exact inference (brute force approach, variable elimination, and message-passing) as well as approximate inference (probabilistic logic sampling). The problem of how to learn a Bayesian network form data will be presented with two types of methods: methods based on testing conditional independencies and methods based on score and search. Different types of Bayesian classifiers (from naive Bayes to Bayesian multinets) will be shown. The use of Bayesian network for probabilistic clustering (based on the EM algorithm) will be also introduced.

The recent use of Bayesian networks in challenging real world applications will be presented in three different areas: neuroscience, industry4.0 and sport analytics. Neuroscience applications will cover problems at different scales: from neuroanatomy questions, such as interneuron classification and spine clustering, to diagnosis of neurodegenerative Parkinson and Alzheimer diseases. The applications in industry4.0 will be related to the automatic inspection of a laser process and the discovery of fingerprints in a real machinery performing servo-motor movements. Finally, the scouting problem and the football as a science will be introduced as representatives of sport analytics.

Tentative outline:

  • Lecture 1 (90 minutes): Basics. Exact and approximate inference.
  • Lecture 2 (90 minutes): Learning from data: detecting conditional independencies and score+search methods.
  • Lecture 3 (90 minutes): Bayesian classifiers and multi-dimensional classification.
  • Lecture 4 (90 minutes): Applications in neuroscience, industry 4.0, and sports.


Björn Schuller,  Imperial College London and University of Augsburg 

Deep Learning for Signal Analysis


Automatic signal analysis such as for audio, video, or physiological and further signal interpretation is currently witnessing a major shift from traditional pre-processing and representation to increased usage of Deep Learning approaches. In this lecture series, we will deal with according methods to first denoise the signal parts of interest and learn suited representations both by convolutional neural networks including attention mechanisms. For the actual decision making, recurrent neural networks enhanced by memory such as long-short term memory or gated recurrent units will follow. Connectionist temporal classification will thereby cater for dynamics in considered signals. Together, this will lead to end-to-end learning from raw multisensorial and multimodal signals. However, as the topology of the networks still plays a crucial role, automatic machine learning will next be introduced to ease this bottleneck towards completely automated learning from raw data alongside labels of arbitrary signal types. As such requires larger amounts of data, we will also deal with suited means of transfer and cross-modal learning as well as data augmentation and generation. The latter includes the discussion of generative adversarial networks end variational autoencoders. In addition, we will touch upon efficient ways of integrating humans into the learning process in two ways: As labellers by active, semi-supervised, and cooperative learning; as users of according technology by deep reinforcement learning. Finally, we shall investigate means of “green” and explainable Deep Learning for real-world application. The introduction of suited open source toolkits will accompany the lectures.

Tentative Outline:

  • Lesson 1 (90 mins). Deep Signal Denoising and Representation Learning.
  • Lesson 2 (90 mins). Deep Decision Making and Network Design.
  • Lesson 3 (90 mins). Data Efficiency Methods for Deep Signal Analysis.
  • Lesson 4 (90 mins). Deep Learning Efficiency and Explainability.


Manuel Gomez Rodriguez,  MPI-SWS,  Saarland

Social Network Analysis


In recent years, there has been an increasing effort on developing realistic representations and models as well as learning, inference and control algorithms to understand, predict, and control dynamic processes over social and information networks. This has been in part due to the increasing availability and granularity of large-scale social activity data, which allows for data-driven approaches with unprecedented accuracy. In this course, you will first learn how to utilize the theory of temporal point processes to create realistic representations and models for a wide variety of dynamic processes in social and information networks. Then, you will get introduced to several inference and control problems of practical importance in the context of dynamic processes over networks, and learn about state-of-the-art machine learning algorithms to solve these problems.

Tentative Outline:

  • Statistical and optimization tools.
  • Lesson 1 (90 mins). Introduction to temporal point processes (I): intensity function and basic types of temporal point processes.
  • Lesson 2 (90 mins). Introduction to temporal point processes (II): marks and dynamical systems with jumps.
  • Lesson 3 (90 mins). Models of social and information systems.
  • Lesson 4 (90 mins). Control and reinforcement learning of social and information systems.