UAI 2024 - Tutorials
1. An introduction to Performative Prediction
Celestine Mendler-Duenner, Tijana Zrnic
Predictions in the social world generally influence the target of prediction, a phenomenon known as performativity. Self-fulfilling and self-negating predictions are examples of performativity. Of fundamental importance to economics, finance, and the social sciences, the notion has long been absent from the development of machine learning. In machine learning applications, performativity surfaces as distribution shift. A predictive model deployed on a digital platform, for example, influences consumption and thereby changes the data-generating distribution. We will provide an introduction to the recently founded area of performative prediction that provides a definition and conceptual framework to study performativity in machine learning. A consequence of performative prediction is a natural equilibrium notion that gives rise to new optimization challenges. Another consequence is a distinction between learning and steering, two mechanisms at play in performative prediction. The notion of steering is in turn intimately related to questions of power in digital markets. Throughout, we will focus on presenting the key technical results in performative prediction and highlight connections to statistics, game theory, and causality. Finally, we end on a discussion of future directions, such as the role that performativity plays in contesting algorithmic systems.
[slides]2. Causal Graphical Methods For Handling Nonignorable Missing Data
Razieh Nabi
It is often said that the fundamental problem of causal inference is a missing data problem. The focus of this course is on the implications of the converse view: that missing data problems are a form of causal inference. We start off by providing a quick overview of classical approaches to missing data and move on to redefining missing data models using the terminology of causal models, where missingness indicators are viewed as treatment variables that can be intervened on, and the underlying variables are viewed as counterfactuals, i.e., variables had we (possibly contrary to fact) been able to observe them. We focus on missing data models where restrictions may be represented by a factorization of the full data law with respect to a directed acyclic graph (DAG) with potentially latent variables. In addition to providing concise representations of statistical models, missing data DAGs (m-DAGs) illustrate the causal mechanisms responsible for missingness and provide a natural interpretation of such mechanisms in applied settings. Using the m-DAG representations, we discuss various approaches for identifying a given target parameter as a function of the observed data distribution. We discuss how to carry out non/semi-parametric inference for the identified targets, primarily under missing-not-at-random mechanisms. Finally, we introduce methods for conducting sensitivity analysis and illustrate, via a case study, how the methods can be employed in practice.
[slides]3. Distribution-Free Predictive Uncertainty Quantification: Strengths and Limits of Conformal Prediction
Aymeric Dieuleveut, Margaux Zaffran
The goal of this tutorial is to give a detailed and rigorous introduction and comprehensive overview of the booming field of conformal prediction. In particular, the focus is to extensively discuss methods, theoretical results, and practical trade-offs in order to enable participants to efficiently and purposefully leverage those techniques within their own fields and applications.
[slides]4. Geometric Probabilistic Models
Viacheslav Borovitskiy, Alexander Terenin
In applications from drug design to robotics, recent advances in geometric deep learning have demonstrated the value of having specialized methods beyond R^d. Since deep learning tends to require a lot of data, and makes it non-trivial to quantify uncertainty in a way that leads to efficient decision-making, this motivates a need for complementary technical capabilities. In this tutorial, we describe geometric counterparts of data-efficient probabilistic modeling techniques, which are effective in small-data settings, and enable uncertainty to be quantified in a geometry-compatible way. We focus mainly on Gaussian processes and related techniques like Bayesian neural networks. These methods can be used to power automated decision-making systems such as Bayesian optimization, which are used in important applications such as molecular optimization and robotic policy tuning---areas where data-efficiency is key, and geometric properties such as symmetries play a fundamental role. This research area has been rapidly developing, and is starting to become mature enough that comprehensive software packages are becoming available---we will therefore cover both the theory and practical implementation of these methods through software demonstrations, and conclude by showcasing a number of emerging applications.
[slides]5. Recent Advances of Statistical Reinforcement Learning
Yuejie Chi, Sattar Vakili, Gergely Neu
As a paradigm for sequential decision-making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes. Understanding and enhancing the sample and computational efficiencies of RL algorithms is thus of great interest and in imminent need. In this tutorial, we aim to present a coherent framework that covers important foundational ideas and recent algorithmic developments in RL, highlighting the connections between new ideas and classical topics. Employing Markov Decision Processes as the central mathematical model, we start by introducing classical dynamic programming algorithms when precise descriptions of the environments are available. Equipped with this preliminary background, we introduce distinctive RL scenarios, including RL with a generative model, online RL, and offline RL. We also present mainstream RL paradigms such as model-based and model-free approaches. While most existing analytical results focus on settings with a limited number of state-actions or simple models, such as linearly modeled state-action value functions, deriving RL policies for efficiently handling large state-action spaces with more general value functions remains a challenge. Recent works have explored nonlinear function approximations, with kernel ridge regression serving as a natural intermediate step between linear models and neural network-based models. We cover the structural complexity of the models, ranging from tabular to linear and nonlinear function approximations, highlighting some of the current challenges and open problems in achieving statistical efficiency in these settings. We will systematically introduce several effective algorithmic ideas that are central to the design of efficient RL algorithms.
[slides 1] [slides 2]