Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as cha… (see more)in-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce planning tokens at the start of each reasoning step, serving as a guide for the model, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. standard fine-tuning baselines.
One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method,… (see more) and using this network on downstream tasks but with its last few projector layers entirely removed. This trick of throwing away the projector is actually critical for SSL methods to display competitive performances on ImageNet for which more than 30 percentage points can be gained that way. This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last projector layer) should be the one to use for best generalization performance downstream. But it seems not to be, and this study sheds some light on why. This trick, which we name Guillotine Regularization (GR), is in fact a generically applicable method that has been used to improve generalization performance in transfer learning scenarios. In this work, we identify the underlying reasons behind its success and show that the optimal layer to use might change significantly depending on the training setup, the data or the downstream task. Lastly, we give some insights on how to reduce the need for a projector in SSL by aligning the pretext SSL task and the downstream task.
The availability of reliable, high-resolution climate and weather data is important to inform long-term decisions on climate adaptation and … (see more)mitigation and to guide rapid responses to extreme events. Forecasting models are limited by computational costs and, therefore, often generate coarse-resolution predictions. Statistical downscaling, including super-resolution methods from deep learning, can provide an efficient method of upsampling low-resolution data. However, despite achieving visually compelling results in some cases, such models frequently violate conservation laws when predicting physical variables. In order to conserve physical quantities, here we introduce methods that guarantee statistical constraints are satisfied by a deep learning downscaling model, while also improving their performance according to traditional metrics. We compare different constraining approaches and demonstrate their applicability across different neural architectures as well as a variety of climate and weather data sets. Besides enabling faster and more accurate climate predictions through downscaling, we also show that our novel methodologies can improve super-resolution for satellite data and natural images data sets.
Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensio… (see more)nal, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoretical links have been established. Here, we establish such a link via results in Riemannian geometry explicitly connecting heat diffusion to manifold distances. In this process, we also formulate a more general heat kernel based manifold embedding method that we call heat geodesic embeddings. This novel perspective makes clearer the choices available in manifold learning and denoising. Results show that our method outperforms existing state of the art in preserving ground truth manifold distances, and preserving cluster structure in toy datasets. We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure, where our method enables interpolation of withheld timepoints of data. Finally, we show that parameters of our more general method can be configured to give results similar to PHATE (a state-of-the-art diffusion based manifold learning method) as well as SNE (an attraction/repulsion neighborhood based method that forms the basis of t-SNE).
2022-12-31
Advances in Neural Information Processing Systems 36 (NeurIPS 2023) (published)
Greenhouses are a key component of modernised agriculture, aiming for producing high-quality crops and plants. Furthermore, a network of gre… (see more)enhouses has enormous potential as part of demand response programs. Saving energy during off-peak time, reducing power consumption and delaying the start time of subsystems during on-peak time are some strategies that can be used to limit power exchanged with the main grid. In this work, a hierarchical distributed alternating direction method of multipliers-based model predictive control framework is proposed that has two main objectives: 1) providing appropriate conditions for greenhouses' crops and plants to grow, and 2) limiting the total power exchanged with the main grid. At each time step in the framework, an aggregator coordinates the greenhouses to reach a consensus and limit the total electric power exchanged while managing shared resources, e.g., reservoir water. The proposed framework's performance is investigated through a case study.
2022-12-31
IEEE Transactions on Sustainable Energy (published)
High-Throughput Edge Inference for BERT Models via Neural Architecture Search and Pipeline.
Hung-Yang Chang
Seyyed Hasan Mozafari
James J. Clark
Brett H. Meyer
Warren J. Gross
There has been growing interest in improving the BERT inference throughput on resource-constrained edge devices for a satisfactory user expe… (see more)rience. One methodology is to employ heterogeneous computing, which utilizes multiple processing elements to accelerate inference. Another methodology is to deploy Neural Architecture Search (NAS) to find optimal solutions in accuracy-throughput design space. In this paper, for the first time, we incorporate NAS with pipelining for BERT models. We show that performing NAS with pipelining achieves on average 53% higher throughput, compared to NAS with a homogeneous system. Additionally, we propose a NAS algorithm that incorporates hardware performance feedback to accelerate the NAS process. Our proposed NAS algorithm speeds up the search process by ~4x, and 5.5x on the design space of the BERT and CNNs, respectively. Also, by exploring the accuracy-throughput design space of BERT models, we demonstrate that performing pipelining then NAS (Pipeline-then-NAS) can lead to solutions with up to 9x higher inference throughput, compared to running homogeneous inference on the BERT-base model, with only a 1.3% decrease in accuracy.
As a social species, ready exchange with peers is a pivotal asset - our “social capital”. Yet, single-person households have come to per… (see more)vade metropolitan cities worldwide, with unknown consequences in the long run. Here, we systematically explore the morphological manifestations associated with singular living in ∼40,000 UK Biobank participants. The uncovered population-level signature spotlights the highly associative default mode network, in addition to findings such as in the amygdala central, cortical and corticoamygdaloid nuclei groups, as well as the hippocampal fimbria and dentate gyrus. Sex-stratified analyses revealed male-specific neural substrates, including somatomotor, saliency and visual systems, while female-specific neural substrates centred on the dorsomedial prefrontal cortex. In line with our demographic profiling results, the discovered neural imprint of living alone is potentially linked to alcohol and tobacco consumption, anxiety, sleep quality as well as daily TV watching. The secular trend for solitary living will require new answers from public-health decision makers.
Resting-state fMRI is commonly used to derive brain parcellations, which are widely used for dimensionality reduction and interpreting human… (see more) neuroscience studies. We previously developed a model that integrates local and global approaches for estimating areal-level cortical parcellations. The resulting local-global parcellations are often referred to as the Schaefer parcellations. However, the lack of homotopic correspondence between left and right Schaefer parcels has limited their use for brain lateralization studies. Here, we extend our previous model to derive homotopic areal-level parcellations. Using resting-fMRI and task-fMRI across diverse scanners, acquisition protocols, preprocessing and demographics, we show that the resulting homotopic parcellations are as homogeneous as the Schaefer parcellations, while being more homogeneous than five publicly available parcellations. Furthermore, weaker correlations between homotopic parcels are associated with greater lateralization in resting network organization, as well as lateralization in language and motor task activation. Finally, the homotopic parcellations agree with the boundaries of a number of cortical areas estimated from histology and visuotopic fMRI, while capturing sub-areal (e.g., somatotopic and visuotopic) features. Overall, these results suggest that the homotopic local-global parcellations represent neurobiologically meaningful subdivisions of the human cerebral cortex and will be a useful resource for future studies. Multi-resolution parcellations estimated from 1479 participants are publicly available (https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/brain_parcellation/Yan2023_homotopic).