Publications

The Impact of Positional Encoding on Length Generalization in Transformers

Inkit Padhi

Karthikeyan Natesan Ramamurthy

Payel Das

Length generalization, the ability to generalize from small training context sizes to larger ones, is a critical challenge in the developmen… (voir plus)t of Transformer-based language models. Positional encoding (PE) has been identified as a major factor influencing length generalization, but the exact impact of different PE schemes on extrapolation in downstream tasks remains unclear. In this paper, we conduct a systematic empirical study comparing the length generalization performance of decoder-only Transformers with five different position encoding approaches including Absolute Position Embedding (APE), T5's Relative PE, ALiBi, and Rotary, in addition to Transformers without positional encoding (NoPE). Our evaluation encompasses a battery of reasoning and mathematical tasks. Our findings reveal that the most commonly used positional encoding methods, such as ALiBi, Rotary, and APE, are not well suited for length generalization in downstream tasks. More importantly, NoPE outperforms other explicit positional encoding methods while requiring no additional computation. We theoretically demonstrate that NoPE can represent both absolute and relative PEs, but when trained with SGD, it mostly resembles T5's relative PE attention patterns. Finally, we find that scratchpad is not always helpful to solve length generalization and its format highly impacts the model's performance. Overall, our work suggests that explicit position embeddings are not essential for decoder-only Transformers to generalize well to longer sequences.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Thinker: Learning to Plan and Act

Stephen Chung

Ivan Anokhin

David Krueger

We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a le… (voir plus)arned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for handcrafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. Thinker is the first work showing that an RL agent can learn to plan with a learned world model in complex environments.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network

Fuyuan Lyu

Xing Tang

Dugang Liu

Chen Ma

Weihong Luo

Liang Chen

xiuqiang He

Xue Liu

2023-09-20

NeurIPS.cc/2023/Conference (poster)

openreview.net

A Unified, Scalable Framework for Neural Population Decoding

Mehdi Azabou

Vinam Arora

Venkataramana Ganesh

Ximeng Mao

Santosh Nachimuthu

Michael J. Mendelson

Blake Richards

Matthew G. Perich

Guillaume Lajoie

Eva L. Dyer

Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size… (voir plus) and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

Tianwei Ni

Michel Ma

Benjamin Eysenbach

Pierre-Luc Bacon

Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, an… (voir plus)d determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the underlying reason for the strong performance of Transformer-based RL methods remains unclear: is it because they learn effective memory, or because they perform effective credit assignment? After introducing formal definitions of memory length and credit assignment length, we design simple configurable tasks to measure these distinct quantities. Our empirical results reveal that Transformers can enhance the memory capability of RL algorithms, scaling up to tasks that require memorizing observations

2023-09-20

NeurIPS.cc/2023/Conference (présentation orale)

doi.org

openreview.net

Conserving avian evolutionary history can effectively safeguard future benefits for people

Rikki Gumbs

Claudia L. Gray

Michael Hoffmann

Rafael Molina-Venegas

Nisha Owen

Laura J. Pollock

Phylogenetic diversity (PD)—the evolutionary history of a set of species—is conceptually linked to the maintenance of yet-to-be-discover… (voir plus)ed benefits from biodiversity or “option value.” We used global phylogenetic and utilization data for birds to test the PD option value link, under the assumption that the performance of sets of PD-maximizing species at capturing known benefits is analogous to selecting the same species at a point in human history before these benefits were realized. PD performed better than random at capturing utilized bird species across 60% of tests, with performance linked to the phylogenetic dispersion and prevalence of each utilization category. Prioritizing threatened species for conservation by the PD they encapsulate performs comparably to prioritizing by their functional distinctiveness. However, species selected by each metric show low overlap, indicating that we should conserve both components of biodiversity to effectively conserve a variety of uses. Our findings provide empirical support for the link between evolutionary history and benefits for future generations.

2023-09-19

Science Advances (publié)

doi.org

In-Context Learning for Text Classification with Many Labels

Aristides Milios

Siva Reddy

Dzmitry Bahdanau

2023-09-18

ArXiv (prépublication)

doi.org

arxiv.org

M-TAG: A modular teaching-aid for Geant4

Liam Carroll

S. Enger

2023-09-18

Heliyon (publié)

doi.org

Estimating the population effectiveness of interventions against COVID-19 in France: a modelling study

Iris Ganser

David L Buckeridge

Jane M Heffernan

M. Prague

Rodolphe Thiébaut

Background Non-pharmaceutical interventions (NPIs) and vaccines have been widely used to manage the COVID-19 pandemic. However, uncertainty … (voir plus)persists regarding the effectiveness of these interventions due to data quality issues, methodological challenges, and differing contextual factors. Accurate estimation of their effects is crucial for future epidemic preparedness. Methods To address this, we developed a population-based mechanistic model that includes the impact of NPIs and vaccines on SARS-CoV-2 transmission and hospitalization rates. Our statistical approach estimated all parameters in one step, accurately propagating uncertainty. We fitted the model to comprehensive epidemiological data in France from March 2020 to October 2021. With the same model, we simulated scenarios of vaccine rollout. Results The first lockdown was the most effective, reducing transmission by 84% (95% confidence interval (CI) 83-85). Subsequent lockdowns had diminished effectiveness (reduction of 74% (69-77) and 11% (9-18), respectively). A 6pm curfew was more effective than one at 8 pm (68% (66-69) vs. 48% (45-49) reduction), while school closures reduced transmission by 15% (12-18). In a scenario without vaccines before November 2021, we predicted 159,000 or 194% (95% prediction interval (PI) 74-424) more deaths and 1,488,000 or 340% (136-689) more hospitalizations. If a vaccine had been available after 100 days, over 71,000 deaths (16,507-204,249) and 384,000 (88,579-1,020,386) hospitalizations could have been averted. Conclusion Our results highlight the substantial impact of NPIs, including lockdowns and curfews, in controlling the COVID-19 pandemic. We also demonstrate the value of the 100 days objective of the CEPI initiative for vaccine availability.

2023-09-13

medRxiv (prépublication)

doi.org

Addressing uncertainty when projecting marine species' distributions under climate change

Sarah C. Davies

Patrick L. Thompson

Catalina Gómez

Jessica Nephin

Anders Knudby

Ashley E. Park

Sarah K. Friesen

Laura J. Pollock

Emily M. Rubidge

Sean C. Anderson

Josephine C. Iacarella

Devin A. Lyons

Andrew MacDonald

Andrew McMillan

Eric J. Ward

Amber M. Holdsworth

Neil Swart

Jeff Price

Karen L. Hunter

2023-09-12

Ecography (publié)

doi.org

Artificial Intelligence for Detection of Dementia Using Motion Data: A Scoping Review

Lily Puterman-salzman

Jory Katz

Howard Bergman

Roland Grad

Vladimir Khanassov

Genevieve Gore

Isabelle Vedel

Machelle Wilchesky

Narges Armanfard

Negar Ghourchian

S. A. Rahimi

Background: Dementia is a neurodegenerative disease resulting in the loss of cognitive and psychological functions. Artificial intelligence … (voir plus)(AI) may help in detection and screening of dementia; however, little is known in this area. Objectives: The objective of this study was to identify and evaluate AI interventions for detection of dementia using motion data. Method: The review followed the framework proposed by O’Malley’s and Joanna Briggs Institute methodological guidance for scoping reviews. We adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist for reporting the results. An information specialist performed a comprehensive search from the date of inception until November 2020, in five bibliographic databases: MEDLINE, EMBASE, Web of Science Core Collection, CINAHL, and IEEE Xplore. We included studies aimed at the deployment and testing or implementation of AI interventions using motion data for the detection of dementia among a diverse population, encompassing varying age, sex, gender, economic backgrounds, and ethnicity, extending to their health care providers across multiple health care settings. Studies were excluded if they focused on Parkinson’s or Huntington’s disease. Two independent reviewers screened the abstracts, titles, and then read the full-texts. Disagreements were resolved by consensus, and if this was not possible, the opinion of a third reviewer was sought. The reference lists of included studies were also screened. Results: After removing duplicates, 2,632 articles were obtained. After title and abstract screening and full-text screening, 839 articles were considered for categorization. The authors categorized the papers into six categories, and data extraction and synthesis was performed on 20 included papers from the motion tracking data category. The included studies assessed cognitive performance (n = 5, 25%); screened dementia and cognitive decline (n = 8, 40%); investigated visual behaviours (n = 4, 20%); and analyzed motor behaviors (n = 3, 15%). Conclusions: We presented evidence of AI systems being employed in the detection of dementia, showcasing the promising potential of motion tracking within this domain. Although some progress has been made in this field recently, there remain notable research gaps that require further exploration and investigation. Future endeavors need to compare AI interventions using motion data with traditional screening methods or other tech-enabled dementia detection mechanisms. Besides, future works should aim at understanding how gender and sex, and ethnic and cultural sensitivity can contribute to refining AI interventions, ensuring they are accessible, equitable, and beneficial across all society.

2023-09-12

Dementia and Geriatric Cognitive Disorders EXTRA (publié)

doi.org

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Hao-Jun Michael Shi

Tsung-Hsien Lee

Shintaro Iwasaki

Jose Gallego-Posada

Zhijing Li

Kaushik Rangadurai

Dheevatsa Mudigere

Michael G. Rabbat

2023-09-11

ArXiv (prépublication)

doi.org

arxiv.org

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Publications

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Publications