Publications

GROOD: Gradient-Aware Out-of-Distribution Detection
Mostafa Elaraby
Yann Batiste Pequignot
Paul Novello
Guessing Random Additive Noise Decoding
Syed Mohsin Abbas
Marwan Jalaleddine
Warren J. Gross
Guiding Language Model Math Reasoning with Planning Tokens
Xinyi Wang
Lucas Caccia
Xingdi Yuan
William Yang Wang
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as cha… (see more)in-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce planning tokens at the start of each reasoning step, serving as a guide for the model, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. standard fine-tuning baselines.
GUILGET: GUI Layout GEneration with Transformer
Andrey Sobolevsky
Guillaume-Alexandre Bilodeau
Jinghui Cheng
Jin L.C. Guo
Guillotine Regularization: Why removing layers is needed to improve generalization in Self-Supervised Learning
Randall Balestriero
Quentin Garrido
Adrien Bardes
P Vincent
One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method,… (see more) and using this network on downstream tasks but with its last few projector layers entirely removed. This trick of throwing away the projector is actually critical for SSL methods to display competitive performances on ImageNet for which more than 30 percentage points can be gained that way. This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last projector layer) should be the one to use for best generalization performance downstream. But it seems not to be, and this study sheds some light on why. This trick, which we name Guillotine Regularization (GR), is in fact a generically applicable method that has been used to improve generalization performance in transfer learning scenarios. In this work, we identify the underlying reasons behind its success and show that the optimal layer to use might change significantly depending on the training setup, the data or the downstream task. Lastly, we give some insights on how to reduce the need for a projector in SSL by aligning the pretext SSL task and the downstream task.
Hard-Constrained Deep Learning for Climate Downscaling
Prasanna Sattegeri
D. Szwarcman
Campbell Watson
The availability of reliable, high-resolution climate and weather data is important to inform long-term decisions on climate adaptation and … (see more)mitigation and to guide rapid responses to extreme events. Forecasting models are limited by computational costs and, therefore, often generate coarse-resolution predictions. Statistical downscaling, including super-resolution methods from deep learning, can provide an efficient method of upsampling low-resolution data. However, despite achieving visually compelling results in some cases, such models frequently violate conservation laws when predicting physical variables. In order to conserve physical quantities, here we introduce methods that guarantee statistical constraints are satisfied by a deep learning downscaling model, while also improving their performance according to traditional metrics. We compare different constraining approaches and demonstrate their applicability across different neural architectures as well as a variety of climate and weather data sets. Besides enabling faster and more accurate climate predictions through downscaling, we also show that our novel methodologies can improve super-resolution for satellite data and natural images data sets.
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Edward De Brouwer
Yanlei Zhang
Ian Adelstein
Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensio… (see more)nal, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoretical links have been established. Here, we establish such a link via results in Riemannian geometry explicitly connecting heat diffusion to manifold distances. In this process, we also formulate a more general heat kernel based manifold embedding method that we call heat geodesic embeddings. This novel perspective makes clearer the choices available in manifold learning and denoising. Results show that our method outperforms existing state of the art in preserving ground truth manifold distances, and preserving cluster structure in toy datasets. We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure, where our method enables interpolation of withheld timepoints of data. Finally, we show that parameters of our more general method can be configured to give results similar to PHATE (a state-of-the-art diffusion based manifold learning method) as well as SNE (an attraction/repulsion neighborhood based method that forms the basis of t-SNE).
Hierarchical Distributed Energy Management Framework for Multiple Greenhouses Considering Demand Response
Ehsan Rezaei
Kianoosh Ojand
Greenhouses are a key component of modernised agriculture, aiming for producing high-quality crops and plants. Furthermore, a network of gre… (see more)enhouses has enormous potential as part of demand response programs. Saving energy during off-peak time, reducing power consumption and delaying the start time of subsystems during on-peak time are some strategies that can be used to limit power exchanged with the main grid. In this work, a hierarchical distributed alternating direction method of multipliers-based model predictive control framework is proposed that has two main objectives: 1) providing appropriate conditions for greenhouses' crops and plants to grow, and 2) limiting the total power exchanged with the main grid. At each time step in the framework, an aggregator coordinates the greenhouses to reach a consensus and limit the total electric power exchanged while managing shared resources, e.g., reservoir water. The proposed framework's performance is investigated through a case study.
High-Throughput Edge Inference for BERT Models via Neural Architecture Search and Pipeline.
Hung-Yang Chang
Seyyed Hasan Mozafari
James J. Clark
Brett H. Meyer
Warren J. Gross
There has been growing interest in improving the BERT inference throughput on resource-constrained edge devices for a satisfactory user expe… (see more)rience. One methodology is to employ heterogeneous computing, which utilizes multiple processing elements to accelerate inference. Another methodology is to deploy Neural Architecture Search (NAS) to find optimal solutions in accuracy-throughput design space. In this paper, for the first time, we incorporate NAS with pipelining for BERT models. We show that performing NAS with pipelining achieves on average 53% higher throughput, compared to NAS with a homogeneous system. Additionally, we propose a NAS algorithm that incorporates hardware performance feedback to accelerate the NAS process. Our proposed NAS algorithm speeds up the search process by ~4x, and 5.5x on the design space of the BERT and CNNs, respectively. Also, by exploring the accuracy-throughput design space of BERT models, we demonstrate that performing pipelining then NAS (Pipeline-then-NAS) can lead to solutions with up to 9x higher inference throughput, compared to running homogeneous inference on the BERT-base model, with only a 1.3% decrease in accuracy.
Home alone: A population neuroscience investigation of brain morphology substrates
MaryAnn Noonan
Chris Zajner
As a social species, ready exchange with peers is a pivotal asset - our “social capital”. Yet, single-person households have come to per… (see more)vade metropolitan cities worldwide, with unknown consequences in the long run. Here, we systematically explore the morphological manifestations associated with singular living in ∼40,000 UK Biobank participants. The uncovered population-level signature spotlights the highly associative default mode network, in addition to findings such as in the amygdala central, cortical and corticoamygdaloid nuclei groups, as well as the hippocampal fimbria and dentate gyrus. Sex-stratified analyses revealed male-specific neural substrates, including somatomotor, saliency and visual systems, while female-specific neural substrates centred on the dorsomedial prefrontal cortex. In line with our demographic profiling results, the discovered neural imprint of living alone is potentially linked to alcohol and tobacco consumption, anxiety, sleep quality as well as daily TV watching. The secular trend for solitary living will require new answers from public-health decision makers.
Homotopic local-global parcellation of the human cerebral cortex from resting-state functional connectivity
Xiaoxuan Yan
Ru Kong
Aihuiping Xue
Qing Yang
Csaba Orban
Lijun An
Avram J. Holmes
Xing Qian
Jianzhong Chen
Xi-Nian Zuo
Juan Helen Zhou
Marielle V Fortier
Ai Peng Tan
Peter Gluckman
Yap Seng Chong
Michael J Meaney
Simon B. Eickhoff
B.T. Thomas Yeo
Resting-state fMRI is commonly used to derive brain parcellations, which are widely used for dimensionality reduction and interpreting human… (see more) neuroscience studies. We previously developed a model that integrates local and global approaches for estimating areal-level cortical parcellations. The resulting local-global parcellations are often referred to as the Schaefer parcellations. However, the lack of homotopic correspondence between left and right Schaefer parcels has limited their use for brain lateralization studies. Here, we extend our previous model to derive homotopic areal-level parcellations. Using resting-fMRI and task-fMRI across diverse scanners, acquisition protocols, preprocessing and demographics, we show that the resulting homotopic parcellations are as homogeneous as the Schaefer parcellations, while being more homogeneous than five publicly available parcellations. Furthermore, weaker correlations between homotopic parcels are associated with greater lateralization in resting network organization, as well as lateralization in language and motor task activation. Finally, the homotopic parcellations agree with the boundaries of a number of cortical areas estimated from histology and visuotopic fMRI, while capturing sub-areal (e.g., somatotopic and visuotopic) features. Overall, these results suggest that the homotopic local-global parcellations represent neurobiologically meaningful subdivisions of the human cerebral cortex and will be a useful resource for future studies. Multi-resolution parcellations estimated from 1479 participants are publicly available (https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/brain_parcellation/Yan2023_homotopic).
How can intelligent systems revolutionise health care?