Publications

Sparse Universal Transformer

Shawn Tan

Yikang Shen

Zhenfang Chen

Aaron Courville

Chuang Gan

The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers and is Turing-complete under certain… (voir plus) assumptions. Empirical evidence also shows that UTs have better compositional generalization than Vanilla Transformers (VTs) in formal language tasks. The parameter-sharing also affords it better parameter efficiency than VTs. Despite its many advantages, most state-of-the-art NLP systems use VTs as their backbone model instead of UTs. This is mainly because scaling UT parameters is more compute and memory intensive than scaling up a VT. This paper proposes the Sparse Universal Transformer (SUT), which leverages Sparse Mixture of Experts (SMoE) to reduce UT's computation complexity while retaining its parameter efficiency and generalization ability. Experiments show that SUT combines the best of both worlds, achieving strong generalization results on formal language tasks (Logical inference and CFQ) and impressive parameter and computation efficiency on standard natural language benchmarks like WMT'14.

2023-10-07

EMNLP/2023/Conference (accepté)

Gintare Karolina Dziugaite

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

Tian Jin

Nolan Clement

Xin Dong

Vaishnavh Nagarajan

Michael Carbin

Jonathan Ragan-Kelley

How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techn… (voir plus)iques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.

2023-10-07

ArXiv (prépublication)

Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4

Kellin Pelrine

Anne Imouza

Meilina Reksoprodjo

Camille Thibault

Caleb Gupta

Joel Christoph

Reihaneh Rabbany

Jean-François Godbout

Misinformation poses a critical societal challenge, and current approaches have yet to produce an effective solution. We propose focusing on… (voir plus) generalization, uncertainty, and how to leverage recent large language models, in order to create more practical tools to evaluate information veracity in contexts where perfect classification is impossible. We first demonstrate that GPT-4 can outperform prior methods in multiple settings and languages. Next, we explore generalization, revealing that GPT-4 and RoBERTa-large exhibit differences in failure modes. Third, we propose techniques to handle uncertainty that can detect impossible examples and strongly improve outcomes. We also discuss results on other language models, temperature, prompting, versioning, explainability, and web retrieval, each one providing practical insights and directions for future research. Finally, we publish the LIAR-New dataset with novel paired English and French misinformation data and Possibility labels that indicate if there is sufficient context for veracity evaluation. Overall, this research lays the groundwork for future tools that can drive real-world progress to combat misinformation.

2023-10-07

EMNLP/2023/Conference (accepté)

Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models

Pierre Colombo

Victor Pellegrain

Malik Boudiaf

Myriam Tami

Victor Storchan

Ismail Ben Ayed

Pablo Piantanida

C'eline Hudelot

Proprietary and closed APIs are becoming increasingly common to process natural language, and are impacting the practical applications of na… (voir plus)tural language processing, including few-shot classification. Few-shot classification involves training a model to perform a new classification task with a handful of labeled data. This paper presents three contributions. First, we introduce a scenario where the embedding of a pre-trained model is served through a gated API with compute-cost and data-privacy constraints. Second, we propose a transductive inference, a learning paradigm that has been overlooked by the NLP community. Transductive inference, unlike traditional inductive learning, leverages the statistics of unlabeled data. We also introduce a new parameter-free transductive regularizer based on the Fisher-Rao loss, which can be used on top of the gated API embeddings. This method fully utilizes unlabeled data, does not share any label with the third-party API provider and could serve as a baseline for future research. Third, we propose an improved experimental setting and compile a benchmark of eight datasets involving multiclass classification in four different languages, with up to 151 classes. We evaluate our methods using eight backbone models, along with an episodic evaluation over 1,000 episodes, which demonstrate the superiority of transductive inference over the standard inductive setting.

2023-10-07

EMNLP/2023/Conference (accepté)

Using In-Context Learning to Improve Dialogue Safety

Nicholas Meade

Spandana Gella

Devamanyu Hazarika

Prakhar Gupta

Di Jin

Siva Reddy

Yang Liu

Dilek Hakkani-Tur

2023-10-07

EMNLP/2023/Conference (publié)

DragD3D: Vertex-based Editing for Realistic Mesh Deformations using 2D Diffusion Priors

Tianhao Xie

Eugene Belilovsky

Sudhir Mudur

Tiberiu Popa

Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline. Direct mesh editing methods are typ… (voir plus)ically framed as optimization problems combining user-specified vertex constraints with a regularizer that determines the position of the rest of the vertices. The choice of the regularizer is key to the realism and authenticity of the final result. Physics and geometry-based regularizers are not aware of the global context and semantics of the object, and the more recent deep learning priors are limited to a specific class of 3D object deformations. In this work, our main contribution is a local mesh editing method called DragD3D for global context-aware realistic deformation through direct manipulation of a few vertices. DragD3D is not restricted to any class of objects. It achieves this by combining the classic geometric ARAP (as rigid as possible) regularizer with 2D priors obtained from a large-scale diffusion model. Specifically, we render the objects from multiple viewpoints through a differentiable renderer and use the recently introduced DDS loss which scores the faithfulness of the rendered image to one from a diffusion model. DragD3D combines the approximate gradients of the DDS with gradients from the ARAP loss to modify the mesh vertices via neural Jacobian field, while also satisfying vertex constraints. We show that our deformations are realistic and aware of the global context of the objects, and provide better results than just using geometric regularizers.

2023-10-06

ArXiv (prépublication)

Evolution of High Throughput Satellite Systems: Vision, Requirements, and Key Technologies

Olfa Ben Yahia

Zineb Garroussi

Olivier B'elanger

Brunilde Sansò

J. Frigon

St'ephane Martel

Antoine Lesage-Landry

G. Kurt

High throughput satellites (HTS), with their digital payload technology, are expected to play a key role as enablers of the upcoming 6G netw… (voir plus)orks. HTS are mainly designed to provide higher data rates and capacities. Fueled by technological advancements including beamforming, advanced modulation techniques, reconfigurable phased array technologies, and electronically steerable antennas, HTS have emerged as a fundamental component for future network generation. This paper offers a comprehensive state-of-the-art of HTS systems, with a focus on standardization, patents, channel multiple access techniques, routing, load balancing, and the role of software-defined networking (SDN). In addition, we provide a vision for next-satellite systems that we named as extremely-HTS (EHTS) toward autonomous satellites supported by the main requirements and key technologies expected for these systems. The EHTS system will be designed such that it maximizes spectrum reuse and data rates, and flexibly steers the capacity to satisfy user demand. We introduce a novel architecture for future regenerative payloads while summarizing the challenges imposed by this architecture.

2023-10-06

ArXiv (prépublication)

Realizing XR Applications Using 5G-Based 3D Holographic Communication and Mobile Edge Computing

Dun Yuan

Ekram Hossain

Di Wu

Xue (Steve) Liu

Gregory Dudek

3D holographic communication has the potential to revolutionize the way people interact with each other in virtual spaces, offering immersiv… (voir plus)e and realistic experiences. However, demands for high data rates, extremely low latency, and high computations to enable this technology pose a significant challenge. To address this challenge, we propose a novel job scheduling algorithm that leverages Mobile Edge Computing (MEC) servers in order to minimize the total latency in 3D holographic communication. One of the motivations for this work is to prevent the uncanny valley effect, which can occur when the latency hinders the seamless and real-time rendering of holographic content, leading to a less convincing and less engaging user experience. Our proposed algorithm dynamically allocates computation tasks to MEC servers, considering the network conditions, computational capabilities of the servers, and the requirements of the 3D holographic communication application. We conduct extensive experiments to evaluate the performance of our algorithm in terms of latency reduction, and the results demonstrate that our approach significantly outperforms other baseline methods. Furthermore, we present a practical scenario involving Augmented Reality (AR), which not only illustrates the applicability of our algorithm but also highlights the importance of minimizing latency in achieving high-quality holographic views. By efficiently distributing the computation workload among MEC servers and reducing the overall latency, our proposed algorithm enhances the user experience in 3D holographic communications and paves the way for the widespread adoption of this technology in various applications, such as telemedicine, remote collaboration, and entertainment.

2023-10-06

ArXiv (prépublication)

Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching

Junliang Luo

Yi Tian Xu

Di Wu

M. Jenkin

Xue (Steve) Liu

Gregory Dudek

Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental a… (voir plus)nd regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.

2023-10-05

ArXiv (prépublication)

Causal Inference in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large Systems

Trang Nguyen

Alexander Tong

Kanika Madan

Yoshua Bengio

Dianbo Liu

Understanding causal relationships within Gene Regulatory Networks (GRNs) is essential for unraveling the gene interactions in cellular proc… (voir plus)esses. However, causal discovery in GRNs is a challenging problem for multiple reasons including the existence of cyclic feedback loops and uncertainty that yields diverse possible causal structures. Previous works in this area either ignore cyclic dynamics (assume acyclic structure) or struggle with scalability. We introduce Swift-DynGFN as a novel framework that enhances causal structure learning in GRNs while addressing scalability concerns. Specifically, Swift-DynGFN exploits gene-wise independence to boost parallelization and to lower computational cost. Experiments on real single-cell RNA velocity and synthetic GRN datasets showcase the advancement in learning causal structure in GRNs and scalability in larger systems.

2023-10-05

ArXiv (prépublication)

Improved baselines for vision-language pre-training

Enrico Fini

Pietro Astolfi

Adriana Romero Soriano

Jakob Verbeek

Michal Drozdzal

2023-10-05

TMLR (accepté)