Publications

Gintare Karolina Dziugaite

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

Tian Jin

Nolan Clement

Xin Dong

Vaishnavh Nagarajan

Michael Carbin

Jonathan Ragan-kelley

How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techn… (see more)iques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.

2023-10-06

ArXiv (preprint)

Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4

Kellin Pelrine

Anne Imouza

Meilina Reksoprodjo

Camille Thibault

Caleb Gupta

Joel Christoph

Reihaneh Rabbany

Jean-François Godbout

Published online: 24 May 2023

2023-10-06

EMNLP/2023/Conference (accepted)

Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models

Pierre Colombo

Victor Pellegrain

Malik Boudiaf

Victor Storchan

Myriam Tami

Ismail Ben Ayed

C'eline Hudelot

Pablo Piantanida

Proprietary and closed APIs are becoming increasingly common to process natural language, and are impacting the practical applications of na… (see more)tural language processing, including few-shot classification. Few-shot classification involves training a model to perform a new classification task with a handful of labeled data. This paper presents three contributions. First, we introduce a scenario where the embedding of a pre-trained model is served through a gated API with compute-cost and data-privacy constraints. Second, we propose a transductive inference, a learning paradigm that has been overlooked by the NLP community. Transductive inference, unlike traditional inductive learning, leverages the statistics of unlabeled data. We also introduce a new parameter-free transductive regularizer based on the Fisher-Rao loss, which can be used on top of the gated API embeddings. This method fully utilizes unlabeled data, does not share any label with the third-party API provider and could serve as a baseline for future research. Third, we propose an improved experimental setting and compile a benchmark of eight datasets involving multiclass classification in four different languages, with up to 151 classes. We evaluate our methods using eight backbone models, along with an episodic evaluation over 1,000 episodes, which demonstrate the superiority of transductive inference over the standard inductive setting.

2023-10-06

EMNLP/2023/Conference (accepted)

Using In-Context Learning to Improve Dialogue Safety

Devamanyu Hazarika

Di Jin

Yang Liu

Dilek Hakkani-Tur

2023-10-06

EMNLP/2023/Conference (accepted)

DragD3D: Vertex-based Editing for Realistic Mesh Deformations using 2D Diffusion Priors

Tianhao Xie

Eugene Belilovsky

Sudhir Mudur

Tiberiu Popa

Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline. Direct mesh editing methods are typ… (see more)ically framed as optimization problems combining user-specified vertex constraints with a regularizer that determines the position of the rest of the vertices. The choice of the regularizer is key to the realism and authenticity of the final result. Physics and geometry-based regularizers are not aware of the global context and semantics of the object, and the more recent deep learning priors are limited to a specific class of 3D object deformations. In this work, our main contribution is a local mesh editing method called DragD3D for global context-aware realistic deformation through direct manipulation of a few vertices. DragD3D is not restricted to any class of objects. It achieves this by combining the classic geometric ARAP (as rigid as possible) regularizer with 2D priors obtained from a large-scale diffusion model. Specifically, we render the objects from multiple viewpoints through a differentiable renderer and use the recently introduced DDS loss which scores the faithfulness of the rendered image to one from a diffusion model. DragD3D combines the approximate gradients of the DDS with gradients from the ARAP loss to modify the mesh vertices via neural Jacobian field, while also satisfying vertex constraints. We show that our deformations are realistic and aware of the global context of the objects, and provide better results than just using geometric regularizers.

2023-10-05

ArXiv (preprint)

Evolution of High Throughput Satellite Systems: Vision, Requirements, and Key Technologies

Olfa Ben Yahia

Zineb Garroussi

Olivier Bélanger

Brunilde Sansò

Jean-François Frigon

Stéphane Martel

Antoine Lesage-Landry

Gunes Karabulut Kurt

High throughput satellites (HTS), with their digital payload technology, are expected to play a key role as enablers of the upcoming 6G netw… (see more)orks. HTS are mainly designed to provide higher data rates and capacities. Fueled by technological advancements including beamforming, advanced modulation techniques, reconfigurable phased array technologies, and electronically steerable antennas, HTS have emerged as a fundamental component for future network generation. This paper offers a comprehensive state-of-the-art of HTS systems, with a focus on standardization, patents, channel multiple access techniques, routing, load balancing, and the role of software-defined networking (SDN). In addition, we provide a vision for next-satellite systems that we named as extremely-HTS (EHTS) toward autonomous satellites supported by the main requirements and key technologies expected for these systems. The EHTS system will be designed such that it maximizes spectrum reuse and data rates, and flexibly steers the capacity to satisfy user demand. We introduce a novel architecture for future regenerative payloads while summarizing the challenges imposed by this architecture.

2023-10-05

ArXiv (preprint)

Realizing XR Applications Using 5G-Based 3D Holographic Communication and Mobile Edge Computing

Dun Yuan

Ekram Hossain

Di Wu

Xue Liu

Gregory Dudek

3D holographic communication has the potential to revolutionize the way people interact with each other in virtual spaces, offering immersiv… (see more)e and realistic experiences. However, demands for high data rates, extremely low latency, and high computations to enable this technology pose a significant challenge. To address this challenge, we propose a novel job scheduling algorithm that leverages Mobile Edge Computing (MEC) servers in order to minimize the total latency in 3D holographic communication. One of the motivations for this work is to prevent the uncanny valley effect, which can occur when the latency hinders the seamless and real-time rendering of holographic content, leading to a less convincing and less engaging user experience. Our proposed algorithm dynamically allocates computation tasks to MEC servers, considering the network conditions, computational capabilities of the servers, and the requirements of the 3D holographic communication application. We conduct extensive experiments to evaluate the performance of our algorithm in terms of latency reduction, and the results demonstrate that our approach significantly outperforms other baseline methods. Furthermore, we present a practical scenario involving Augmented Reality (AR), which not only illustrates the applicability of our algorithm but also highlights the importance of minimizing latency in achieving high-quality holographic views. By efficiently distributing the computation workload among MEC servers and reducing the overall latency, our proposed algorithm enhances the user experience in 3D holographic communications and paves the way for the widespread adoption of this technology in various applications, such as telemedicine, remote collaboration, and entertainment.

2023-10-05

ArXiv (preprint)

Causal Inference in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large Systems

Trang Nguyen

Alexander Tong

Kanika Madan

Yoshua Bengio

Dianbo Liu

Understanding causal relationships within Gene Regulatory Networks (GRNs) is essential for unraveling the gene interactions in cellular proc… (see more)esses. However, causal discovery in GRNs is a challenging problem for multiple reasons including the existence of cyclic feedback loops and uncertainty that yields diverse possible causal structures. Previous works in this area either ignore cyclic dynamics (assume acyclic structure) or struggle with scalability. We introduce Swift-DynGFN as a novel framework that enhances causal structure learning in GRNs while addressing scalability concerns. Specifically, Swift-DynGFN exploits gene-wise independence to boost parallelization and to lower computational cost. Experiments on real single-cell RNA velocity and synthetic GRN datasets showcase the advancement in learning causal structure in GRNs and scalability in larger systems.

2023-10-04

ArXiv (preprint)

Improved baselines for vision-language pre-training

Enrico Fini

Pietro Astolfi

Adriana Romero-Soriano

Jakob Verbeek

Michal Drozdzal

Contrastive learning has emerged as an efficient framework to learn multimodal representations. CLIP, a seminal work in this area, achieved … (see more)impressive results by training on paired image-text data using the contrastive loss. Recent work claims improvements over CLIP using additional non-contrastive losses inspired from self-supervised learning. However, it is sometimes hard to disentangle the contribution of these additional losses from other implementation details, e.g., data augmentation or regularization techniques, used to train the model. To shed light on this matter, in this paper, we first propose, implement and evaluate several baselines obtained by combining contrastive learning with recent advances in self-supervised learning. In particular, we use the loss functions that were proven successful for visual self-supervised learning to align image and text modalities. We find that these baselines outperform a basic implementation of CLIP. However, when a stronger training recipe is employed, the advantage disappears. Indeed, we find that a simple CLIP baseline can also be improved substantially, up to a 25% relative improvement on downstream zero-shot tasks, by using well-known training techniques that are popular in other subfields. Moreover, we discover that it is enough to apply image and text augmentations to make up for most of the improvement attained by prior works. With our improved training recipe for CLIP, we obtain state-of-the-art performance on four standard datasets, and consistently outperform prior work (up to +4% on the largest dataset), while being substantially simpler. The code is available at https://github.com/facebookresearch/clip-rocket

2023-10-04

TMLR (accepted)

« L’étude de la synchronisation intercérébrale renouvelle le regard sur nos cerveaux »

Guillaume Dumas

François Lassagne

2023-10-04

Pour la Science (published)

Laurence Perreault-Levasseur

Posterior Sampling of the Initial Conditions of the Universe from Non-linear Large Scale Structures using Score-Based Generative Models

Ronan Legin

Matthew Ho

Pablo Lemos

Shirley Ho

Yashar Hezaveh

Benjamin Wandelt

Reconstructing the initial conditions of the universe is a key problem in cosmology. Methods based on simulating the forward evolution of th… (see more)e universe have provided a way to infer initial conditions consistent with present-day observations. However, due to the high complexity of the inference problem, these methods either fail to sample a distribution of possible initial density fields or require significant approximations in the simulation model to be tractable, potentially leading to biased results. In this work, we propose the use of score-based generative models to sample realizations of the early universe given present-day observations. We infer the initial density field of full high-resolution dark matter N-body simulations from the present-day density field and verify the quality of produced samples compared to the ground truth based on summary statistics. The proposed method is capable of providing plausible realizations of the early universe density field from the initial conditions posterior distribution marginalized over cosmological parameters and can sample orders of magnitude faster than current state-of-the-art methods.

2023-10-03

Monthly Notices of the Royal Astronomical Society: Letters (published)