Publications

Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts
Étienne Marcotte
Valentina Zantedeschi
Multivariate probabilistic time series forecasts are commonly evaluated via proper scoring rules, i.e., functions that are minimal in expect… (see more)ation for the ground-truth distribution. However, this property is not sufficient to guarantee good discrimination in the non-asymptotic regime. In this paper, we provide the first systematic finite-sample study of proper scoring rules for time-series forecasting evaluation. Through a power analysis, we identify the"region of reliability"of a scoring rule, i.e., the set of practical conditions where it can be relied on to identify forecasting errors. We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions, and we gauge the generalizability of our findings to real-world tasks with an application to an electricity production problem. Our results reveal critical shortcomings in the evaluation of multivariate probabilistic forecasts as commonly performed in the literature.
Repository-Level Prompt Generation for Large Language Models of Code
Disha Shrivastava
Daniel Tarlow
With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques fo… (see more)r introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a remarkably high relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines. We release our code, data, and trained checkpoints at: https://github.com/shrivastavadisha/repo_level_prompt_generation.
Robust Perception through Equivariance
Chengzhi Mao
Lingyu Zhang
Abhishek Vaibhav Joshi
Junfeng Yang
Hao Wang
Carl Vondrick
R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
Daniel D. Johnson
Daniel Tarlow
Christian Walder
Sampling-Based Accuracy Testing of Posterior Estimators for General Inference
Parameter inference, i.e. inferring the posterior distribution of the parameters of a statistical model given some data, is a central proble… (see more)m to many scientific disciplines. Generative models can be used as an alternative to Markov Chain Monte Carlo methods for conducting posterior inference, both in likelihood-based and simulation-based problems. However, assessing the accuracy of posteriors encoded in generative models is not straightforward. In this paper, we introduce `Tests of Accuracy with Random Points' (TARP) coverage testing as a method to estimate coverage probabilities of generative posterior estimators. Our method differs from previously-existing coverage-based methods, which require posterior evaluations. We prove that our approach is necessary and sufficient to show that a posterior estimator is accurate. We demonstrate the method on a variety of synthetic examples, and show that TARP can be used to test the results of posterior inference analyses in high-dimensional spaces. We also show that our method can detect inaccurate inferences in cases where existing methods fail.
Uncertain Evidence in Probabilistic Models and Stochastic Simulators
Andreas Munk
Alexander Mead
Frank N. Wood
We consider the problem of performing Bayesian inference in probabilistic models where observations are accompanied by uncertainty, referred… (see more) to as "uncertain evidence.'' We explore how to interpret uncertain evidence, and by extension the importance of proper interpretation as it pertains to inference about latent variables. We consider a recently-proposed method "distributional evidence'' as well as revisit two older methods: Jeffrey's rule and virtual evidence. We devise guidelines on how to account for uncertain evidence and we provide new insights, particularly regarding consistency. To showcase the impact of different interpretations of the same uncertain evidence, we carry out experiments in which one interpretation is defined as "correct.'' We then compare inference results from each different interpretation illustrating the importance of careful consideration of uncertain evidence.
Unlocking Slot Attention by Changing Optimal Transport Costs
David W. Zhang
Gertjan J. Burghouts
Cees G. M. Snoek
Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to ha… (see more)ndle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.
Omega: Optimistic EMA Gradients
Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training… (see more). Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.
Artificial Intelligence in COVID-19-Related Geriatric Care: A Scoping Review
Emina Burnazovic
Amanda Yee
Joshua Howard Levy
Genevieve Gore
S. A. Rahimi
Chat2Code: A Chatbot for Model Specification and Code Generation, The Case of Smart Contracts
Ilham Qasse
Shailesh Mishra
Björn þór Jónsson
Mohammad Hamdaqa
The potential of automatic code generation through Model-Driven Engineering (MDE) frameworks has yet to be realized. Beyond their ability to… (see more) help software professionals write more accurate, reusable code, MDE frameworks could make programming accessible for a new class of domain experts. However, domain experts have been slow to embrace these tools, as they still need to learn how to specify their applications' requirements using the concrete syntax (i.e., textual or graphical) of the new and unified domain-specific language. Conversational interfaces (chatbots) could smooth the learning process and offer a more interactive way for domain experts to specify their application requirements and generate the desired code. If integrated with MDE frameworks, chatbots may offer domain experts with richer domain vocabulary without sacrificing the power of agnosticism that unified modelling frameworks provide. In this paper, we discuss the challenges of integrating chatbots within MDE frameworks and then examine a specific application: the auto-generation of smart contract code based on conversational syntax. We demonstrate how this can be done and evaluate our approach by conducting a user experience survey to assess the usability and functionality of the chatbot framework. The paper concludes by drawing attention to the potential benefits of leveraging Language Models (LLMs) in this context.
Continuous cutting plane algorithms in integer programming
Didier Chételat
Andrea Lodi
Curriculum frameworks and educational programs in artificial intelligence for medical students, residents, and practicing physicians: a scoping review protocol.
Raymond Tolentino
Ashkan Baradaran
Genevieve Gore
Pierre Pluye
S. A. Rahimi
OBJECTIVE The aim of this scoping review is to synthesize knowledge from the literature on curriculum frameworks and current educational pro… (see more)grams that focus on the teaching and learning of artificial intelligence (AI) for medical students, residents, and practicing physicians. INTRODUCTION To advance the implementation of AI in clinical practice, physicians need to have a better understanding of AI and how to use it within clinical practice. Consequently, medical education must introduce AI topics and concepts into the curriculum. Curriculum frameworks are educational road maps to teaching and learning. Therefore, any existing AI curriculum frameworks must be reviewed and, if none exist, such a framework must be developed. INCLUSION CRITERIA This review will include articles that describe curriculum frameworks for teaching and learning AI in medicine, irrespective of country. All types of articles and study designs will be included, except conference abstracts and protocols. METHODS This review will follow the JBI methodology for scoping reviews. Keywords will first be identified from relevant articles. Another search will then be conducted using the identified keywords and index terms. The following databases will be searched: MEDLINE (Ovid), Embase (Ovid), Cochrane Central Register of Controlled Trials (CENTRAL), CINAHL (EBSCOhost), and Scopus. Gray literature will also be searched. Articles will be limited to the English and French languages, commencing from the year 2000. The reference lists of all included articles will be screened for additional articles. Data will then be extracted from included articles and the results will be presented in a table.