Model Choice Workshop

Europe/Berlin
CAS

CAS

Beschreibung

Bayesian inference is often said to begin with a prior, yet an even earlier step—the articulation of a statistical model—quietly fixes the outcome space, the likelihood, the data representation, and the structural assumptions that make inference possible. This workshop asks whether and how that step can be principled or partly formalized: what counts as a good modeling choice? How can simplicity and expressiveness be negotiated given the available evidence? Bringing together perspectives, we will compare the rationales and heuristics different communities use when specifying models, identify common ground and points of divergence, and work through concrete cases that reveal the practical stakes of "prior to the prior" decisions. The aim is to distill a shared vocabulary for reasoning about model choice and equip participants with questions and habits that make early modeling decisions more transparent and effective.

The workshop includes a plenary talk by Eric-Jan Wagenmakers (University of Amsterdam) on the evening of the 23rd, as well as a reception.

    • 18:30 19:30
      The Ockham factor 1 h

      In Bayesian model selection, the Ockham factor is the extent to which the maximum likelihood
      needs to be discounted to attain a fair assessment of a model's predictive performance. As a correction for selection, the Ockham factor quantifies the amount of prior mass that was wasted
      on parameter values that are undercut by the data. In this talk I will outline several complementary perspectives on the Ockham factor, and suggest that, outside of the domain of physics, its
      conceptual benefits have been underappreciated.

      Sprecher: Eric-Jan Wagenmakers
    • 09:30 10:15
      Post-Bayesian beliefs 45m

      In this talk, I provide my perspective on the efforts to develop inference procedures with Bayesian characteristics that go beyond Bayes' Rule as an epistemological principle. I will explain why these efforts are needed, as well as the forms which they take. As an example, I will focus on the recently developed predictively oriented (PrO) posterior, which expresses epistemic uncertainty as a consequence of predictive ability.

      Sprecher: Jeremias Knoblauch
    • 10:15 11:00
      Can a statistical model be true? 45m

      In statistics it is often said that "All models are wrong but some are useful". This dictum seems to suggest that statistical models are somehow truth-apt, or factive. In my talk I investigate how we might conceptualize statistical models so that they are, and I argue that ideas on the factivity and truth-aptness of models can be used to clarify certain debates over statistical methodology. Finally, in pragmatist spirit, I will consider if we can relate the truth of models to their usefulness.

      Sprecher: Jan-Willem Romeijn
    • 11:30 12:15
      Bayesian hierarchical modelling to relax auxiliary assumptions 45m

      When analyzing their data, researchers usually need to choose among a number of statistical models that are not diverse and flexible enough to adequately capture the data generation mechanism. In this situation, they often make the problem fit the tools by introducing auxiliary assumptions that are at best questionable and at worst indefensible. This talk will illustrate on past and ongoing projects how Bayesian hierarchical models can help to relax auxiliary assumptions through the explicit modeling of various sources of evidence and of uncertainty. It will also discuss how the flexibility in choosing a model that adequately captures many aspects of the data generating mechanism can help address the problem of overconfidence in statistical results and how its ability to avoid making the problem fit the tools can be a boon and a bane.

      Sprecher: Sabine Hoffmann
    • 12:15 13:00
      Occam’s Razor in Bayesian Inference: Evidence, compression, and generalization 45m

      Occam’s razor appears in several perspectives across statistics, information theory, and learning theory. In Bayesian model selection, it emerges through the marginal likelihood, which automatically trades off goodness of fit with the volume of parameter space supported by the data. Closely related ideas arise in the Minimum Description Length (MDL) framework, where model selection is interpreted as data compression, and in PAC-Bayes theory, where generalization guarantees depend on how far a learned predictor deviates from a prior distribution.
      In this talk, I discuss these three perspectives on Occam’s razor—Bayesian evidence, compression, and generalization—and highlight their common structure. In each case, model complexity can be understood in terms of the information required to move from prior assumptions to a data-explaining predictor. I will discuss when these perspectives agree on model selection, when they differ, and what this reveals about the role of priors, representation, and information in modern Bayesian inference.

      Sprecher: Vincent Fortuin
    • 14:00 14:45
      Forward modeling cosmic large-scale structure 45m

      The distribution of galaxies in the Universe holds important information about the origin and evolution of the Universe. It can elucidate some fundamental questions in current cosmology, for example, the nature of Dark Energy that accelerates the expansion of the Universe. Extracting this information from observations, however, poses considerable challenges in theoretical modeling and statistical analysis. By following the physical processes that shape the observed structures, it is possible to conduct a fully Bayesian analysis of the three-dimensional distribution of galaxies. In this framework, both model choices and priors are determined by physical principles. This talk outlines the underlying concepts and discusses both the challenges and limitations of this approach.

      Sprecher: Julia Stadler
    • 14:45 15:05
      Evidence Lower Bound for Model Selection in High-Dimensional Bayesian Inference 20m

      Model comparison in high-dimensional Bayesian inference remains computationally challenging due to the intractability of the marginal likelihood (evidence). Variational inference offers an attractive alternative by providing a tractable lower bound to the evidence, the Evidence Lower Bound (ELBO), which can be optimized and estimated efficiently even in very large parameter spaces.

      In this talk, I derive the Evidence Lower Bound (ELBO) for Metric Gaussian Variational Inference (MGVI) and its geometric extension geoVI, as implemented in the information field theory framework and the NIFTy library. The method approximates the posterior with a Gaussian in latent space whose covariance is determined by the local Fisher metric, enabling scalable Bayesian inference in extremely high-dimensional problems. I show how the ELBO can be estimated efficiently using posterior samples and analytic expectations, providing a practical tool for model comparison in high-dimensional settings.

      Finally, I illustrate applications ranging from astrophysical inverse problems (e.g. imaging and lensing reconstruction) to causal inference in complex probabilistic models, where scalable evidence estimation is crucial for comparing competing generative hypotheses.

      Sprecher: Matteo Guardiani (Max Planck Institute for Astrophysics)
    • 15:30 15:50
      Model specification and truth convergence 20m

      Classical convergence results show that Bayesian agents who entertain the true hypothesis H as one of their alternatives will become certain of H’ in the limit. If H is not in the set of alternatives, on the other hand, we may still converge on the 'best' alternative (e.g. in terms of minimal KL-divergence, see Barron 1998). However, it has also been demonstrated how this can fail, if certain structural (in particular, convexity-) properties are violated (Grünwald & van Ommen 2017). If we are careful to set up the model space properly, we may be able to ensure convergence on the best alternative. However, if the parameter space is very complex, doing so may leave the model intractable. In this contribution, I want to examine the prospects for setting up the hypothesis space such as to enable convergence on the best alternative as far as possible, while keeping the model tractable and allowing for efficient data collection.

      References:

      Barron, A. R. (1998). “Information-Theoretic Characterization of Bayes Performance and the Choice of Priors in Parametric and Nonparametric Problems.” In Bernardo, J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M. (eds.), Bayesian Statistics, volume 6, 27–52. Oxford: Oxford University Press.

      Grünwald, P. and van Ommen, T. (2017). Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Analysis 9.

      Sprecher: Rafael Fuchs (Munich Center for Mathematical Philosophy, LMU)
    • 15:50 16:10
      Choosing what to choose from: model selection when prior knowledge is real but imprecise 20m

      In adaptive Bayesian spectroscopy, physics fixes the forward model and what remains is the choice of prior. This prior also implicitly encodes a certain effective size of the model which make model and prior selection overlapping problems. In spectroscopy we typically have real structural knowledge. Qualitatively speaking, spectra are smooth, variance is finite, certain frequency ranges matter. But this constrains hyperparameters only to plausible regions, not to specific values. Marginalizing over a diffuse hyperprior gives support to unphysical regions, while a sharp one encodes precision we lack. In practice we compare a finite candidate set via Bayesian evidence, but correlated candidates (e.g., smoothness priors at neighboring length scales) let densely sampled regions of model space accumulate disproportionate weight, making results sensitive to an arbitrary discretization. We present these tensions from information-optimal Fourier-transform spectroscopy and invite discussion on navigating the space between "I know something" and "I can write down a measure."

      Sprecher: Jakob Maria Schröder
    • 16:10 16:30
      What Goes into a Model of EPR Experiments? 20m

      This talk explores the steps that need to be taken to build a causal model for EPR experiments. Wood and Spekkens (2015) argued that causal discovery algorithms are insufficient for this task: they cannot distinguish EPR from Bell-inequality-violating correlations, since both share the same independence relations. Bell inequality violations instead provide a hypothesis space of possible causal models. I argue that selecting from this space requires a commitment to both an interpretation of quantum mechanics and a theory of causation. I further argue that even these commitments may not select a unique causal model, since different adequacy criteria for causal arrows can yield competing models within the same physical theory, as I illustrate using de Broglie-Bohm theory.

      References
      Wood, C. J. and Spekkens, R. W. (2015). The lesson of causal discovery algorithms for quantum correlations: Causal explanations of Bell-inequality violations require fine-tuning. New Journal of Physics, 17.

      Sprecher: Mario Hubert