Publications

Graphically Structured Diffusion Models
Christian Dietrich Weilbach
William Harvey
We introduce a framework for automatically defining and learning deep generative models with problem-specific structure. We tackle problem d… (see more)omains that are more traditionally solved by algorithms such as sorting, constraint satisfaction for Sudoku, and matrix factorization. Concretely, we train diffusion models with an architecture tailored to the problem specification. This problem specification should contain a graphical model describing relationships between variables, and often benefits from explicit representation of subcomputations. Permutation invariances can also be exploited. Across a diverse set of experiments we improve the scaling relationship between problem dimension and our model's performance, in terms of both training time and final accuracy.
A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining
Shengchao Liu
weitao Du
Zhi-Ming Ma
Hongyu Guo
Molecule pretraining has quickly become the go-to schema to boost the performance of AI-based drug discovery. Naturally, molecules can be re… (see more)presented as 2D topological graphs or 3D geometric point clouds. Although most existing pertaining methods focus on merely the single modality, recent research has shown that maximizing the mutual information (MI) between such two modalities enhances the molecule representation ability. Meanwhile, existing molecule multi-modal pretraining approaches approximate MI based on the representation space encoded from the topology and geometry, thus resulting in the loss of critical structural information of molecules. To address this issue, we propose MoleculeSDE. MoleculeSDE leverages group symmetric (e.g., SE(3)-equivariant and reflection-antisymmetric) stochastic differential equation models to generate the 3D geometries from 2D topologies, and vice versa, directly in the input space. It not only obtains tighter MI bound but also enables prosperous downstream tasks than the previous work. By comparing with 17 pretraining baselines, we empirically verify that MoleculeSDE can learn an expressive representation with state-of-the-art performance on 26 out of 32 downstream tasks.
Guessing Random Additive Noise Decoding
Syed Mohsin Abbas
Marwan Jalaleddine
Guiding Language Model Math Reasoning with Planning Tokens
Xinyi Wang
Lucas Caccia
Oleksiy Ostapenko
Xingdi Yuan
William Yang Wang
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as cha… (see more)in-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce planning tokens at the start of each reasoning step, serving as a guide for the model, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. standard fine-tuning baselines.
GUILGET: GUI Layout GEneration with Transformer
Andrey Sobolevsky
Guillaume-Alexandre Bilodeau
Jinghui Cheng
Guillotine Regularization: Why removing layers is needed to improve generalization in Self-Supervised Learning
Florian Bordes
Randall Balestriero
Quentin Garrido
Adrien Bardes
One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method,… (see more) and using this network on downstream tasks but with its last few projector layers entirely removed. This trick of throwing away the projector is actually critical for SSL methods to display competitive performances on ImageNet for which more than 30 percentage points can be gained that way. This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last projector layer) should be the one to use for best generalization performance downstream. But it seems not to be, and this study sheds some light on why. This trick, which we name Guillotine Regularization (GR), is in fact a generically applicable method that has been used to improve generalization performance in transfer learning scenarios. In this work, we identify the underlying reasons behind its success and show that the optimal layer to use might change significantly depending on the training setup, the data or the downstream task. Lastly, we give some insights on how to reduce the need for a projector in SSL by aligning the pretext SSL task and the downstream task.
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Guillaume Huguet
Alexander Tong
Edward De Brouwer
Yanlei Zhang
Ian Adelstein
Smita Krishnaswamy
Hierarchical Distributed Energy Management Framework for Multiple Greenhouses Considering Demand Response
Ehsan Rezaei
Kianoosh Ojand
Greenhouses are a key component of modernised agriculture, aiming for producing high-quality crops and plants. Furthermore, a network of gre… (see more)enhouses has enormous potential as part of demand response programs. Saving energy during off-peak time, reducing power consumption and delaying the start time of subsystems during on-peak time are some strategies that can be used to limit power exchanged with the main grid. In this work, a hierarchical distributed alternating direction method of multipliers-based model predictive control framework is proposed that has two main objectives: 1) providing appropriate conditions for greenhouses' crops and plants to grow, and 2) limiting the total power exchanged with the main grid. At each time step in the framework, an aggregator coordinates the greenhouses to reach a consensus and limit the total electric power exchanged while managing shared resources, e.g., reservoir water. The proposed framework's performance is investigated through a case study.
How can intelligent systems revolutionise healthcare?
How Useful Are Educational Questions Generated by Large Language Models?
Sabina Elkins
Ekaterina Kochmar
Iulian V. Serban
Human-Centered Responsible Artificial Intelligence: Current & Future Trends
Mohammad Tahaei
Marios Constantinides
Daniele Quercia
Sean Kennedy
Michael Muller
Simone Stumpf
Q. Vera Liao
Ricardo Baeza-Yates
Lora Aroyo
Jess Holbrook
Ewa Luger
Michael Madaio
Ilana Golbin Blumenfeld
Maria De-Arteaga
Jessica Vitak
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
Eric Nguyen
Michael Poli
Marjan Faizi
Armin W Thomas
Callum Birch-Sykes
Michael Wornow
Aman Patel
Clayton M. Rabideau
Stefano Massaroli
Stefano Ermon
Stephen Baccus
Christopher Re