Gaussian Process (GP) models are a class of flexible non-parametric models that have rich representational power. By using a Gaussian process with additive structure, complex responses can be modelled whilst retaining interpretability. Previous work showed that additive Gaussian process models require high-dimensional interaction terms. We propose the orthogonal additive kernel (OAK), which imposes an orthogonality constraint on the additive functions, enabling an identifiable, low-dimensional representation of the functional relationship. We connect the OAK kernel to functional ANOVA decomposition, and show improved convergence rates for sparse computation methods. With only a small number of additive low-dimensional terms, we demonstrate the OAK model achieves similar or better predictive performance compared to black-box models, while retaining interpretability.
The world outside our labs seldom conforms to the assumptions of our models. This is especially true for dynamics models used in control and motion planning for complex high-DOF systems like deformable objects. We must develop better models, but we must also accept that, no matter how powerful our simulators or how big our datasets, our models will sometimes be wrong. This talk will present our recent work on using unreliable dynamics models for motion planning and manipulation. Given a dynamics model, our methods learn where that model can be trusted given either batch data or online experience. These approaches allow imperfect dynamics models to be useful for a wide range of tasks in novel scenarios, while requiring much less data than baseline methods. This data-efficiency is a key requirement for scalable and flexible motion planning and manipulation capabilities.
We give an exact characterization of admissibility in statistical decision problems in terms of Bayes optimality in a so-called nonstandard extension of the original decision problem, as introduced by Duanmu and Roy. Unlike the consideration of improper priors or other generalized notions of Bayes optimalitiy, the nonstandard extension is distinguished, in part, by having priors that can assign ‘infinitesimal’ mass in a sense that can be made rigorous using results from nonstandard analysis. With these additional priors, we find that, informally speaking, a decision procedure δ0 is admissible in the original statistical decision problem if and only if, in the nonstandard extension of the problem, the nonstandard extension of δ0 is Bayes optimal among the (extensions of) standard decision procedures with respect to a nonstandard prior that assigns at least infinitesimal mass to every standard parameter value. We use the above theorem to give further characterizations of admissibility, one related to Blyth’s method, one to a condition due to Stein which characterizes admissibility under some regularity assumptions; and finally, a characterization using finitely additive priors in decision problems meeting certain regularity requirements. Our results imply that Blyth’s method is a sound and complete method for establishing admissibility. Buoyed by this result, we revisit the univariate two-sample common-mean problem, and show that the Graybill–Deal estimator is admissible among a certain class of unbiased decision procedures. Joint work with Haosui Duanmu (HIT) and David Schrittesser (Toronto).
In this talk I will discuss several recent papers that develop new graph neural networks by considering their relation to continuous processes. I will discuss how graph neural networks can be arrived at as numerical schemes to solve differential equations, what they have to do with Perelman’s famous solution to the Poincare conjecture and how they are related to string theory. Graphs are fundamentally discrete structures and at first glance, treating them continuously does not appear to be a promising research direction. However, there are many examples where handling discrete objects as if they were continuous has been a catalyst to progress. Photons are now known to be discrete, but modelling quantum physical processes with continuous differential equations such as heat diffusion produced many great breakthroughs in classical physics and chemistry. In computer science, digital images are also discrete, but continuous tools such as diffusion based denoising are still widely used and the question of whether digital images are best modelled continuously or discretely remains a source of great philosophical debate. Even in ML, the most common approach to handling discrete objects is to embed them into a continuous space. I will show that for graph ML too, there is much to be gained from unlocking the magnificent toolbox of continuous mathematics.
We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely on a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Then, the training algorithm sequentially encounters tasks and constructs posterior beliefs over the task-specific functions by using inducing point sparse Gaussian process methods. At each step a new task is first learnt and then a summary is constructed consisting of (i) inducing inputs – a fixed-size subset of the task inputs selected such that it optimally represents the task – and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms. Our method thus unites approaches focused on (pseudo-)rehearsal with those derived from a sequential Bayesian inference perspective in a principled way, leading to strong results on accepted benchmarks.
We introduce a scalable approach to Gaussian process inference that combines spatio-temporal filtering with natural gradient variational inference, resulting in a non-conjugate GP method for multivariate data that scales linearly with respect to time. Through a natural gradient approach, we derive a sparse approximation that constructs a state-space model over a reduced set of spatial inducing points and shows that for separable Markov kernels the full and sparse cases exactly recover the standard variational GP. This leads to an efficient and accurate method for large spatio-temporal problems that we demonstrate on multiple real-world examples.
Reproducing kernel Hilbert spaces (RKHS) provide a powerful framework, termed kernel mean embeddings, for representing probability distributions, enabling nonparametric statistical inference in a variety of applications. Combining RKHS formalism with Gaussian process modelling, we present a methodology to refine low-resolution (LR) spatial fields with high-resolution (HR) information. This task, known as statistical downscaling, is challenging as the diversity of spatial datasets often prevents direct matching of observations. Yet, when LR samples are modeled as aggregate conditional means of HR samples with respect to a mediating variable that is globally observed, the recovery of the underlying fine-grained field can be framed as taking an ‘inverse’ of the conditional expectation, namely a deconditioning problem. Leveraging this deconditioning perspective, we introduce a Bayesian formulation of statistical downscaling able to handle potentially unmatched multi-resolution spatial fields.
Arctic sea ice is a major component of the Earth’s climate system, as well as an integral platform for travel, subsistence, and habitat. Since the late 1970s, significant advancements have been made in our ability to closely monitor the state of the ice cover at the polar regions through the launch of Earth-observation satellites. Subsequently, now over 4 decades of time-series data at our disposal, we have observed significant reductions in the spatial extent of Arctic sea ice, and more recently its thickness — directly in line with increasing anthropogenic CO2 emissions. The summer months, in particular, present the largest rate of decline in sea ice extent compared to other seasons, and also the largest pattern of inter-annual variability, making seasonal to inter-annual predictions difficult. Advanced predictions of the summer ice conditions are important as this is the time when the ice cover is at its minimum extent, and the Arctic becomes open to a whole host of traffic including coastal resupply vessels, eco-tourism, and the movement of local communities. This presentation explores Gaussian processes as a framework for both sea ice forecasting, and for optimally combining and interpolating multiple satellite observation sets. In the first instance, the spatio-temporal patterns of variability in past ice conditions are exploited using a framework of a complex network, which is then fed into a Gaussian process regression forecast model in the form of a random walk graph kernel, to predict regional and pan-Arctic (basin-wide) September sea ice extents with high skill. Following this, we will see how extensions to this work can be made in the form of spatial forecasts by adopting a multi-task learning approach. In the second application, the Gaussian process regression method is used to optimally combine (and interpolate) observations from 3 separate satellite altimeters in space and time, in order to produce the first-ever daily pan-Arctic observational data set of Arctic sea ice freeboard (the base product for deriving sea ice thickness). Following this, we will see how extensions to this work can be made through computational speed-ups by using relevant vector machines. In both the forecasting and interpolation applications, the hyperparameters of the models are learned through the empirical Bayes, or type-II maximum likelihood approach, which in the second application allows us to derive information relating to the spatio-temporal correlation length scales of Arctic sea ice thickness.