Publications

Feasibility of cognitive neuroscience data collection during a speleological expedition

Anita Paas

Hugo R. Jourde

Arnaud Brignol

Marie-Anick Savard

Zseyvfin Eyqvelle

Samuel Bassetto

Giovanni Beltrame

Emily B.J. Coffey

2023-12-14

bioRxiv (prépublication)

doi.org

Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems

Heiko Hoppe

Tobias Enders

Quentin Cappart

Maximilian Schiffer

2023-12-14

ArXiv (prépublication)

doi.org

arxiv.org

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Jack Urbanek

Florian Bordes

Pietro Astolfi

Mary Williamson

Vasu Sharma

Adriana Romero Soriano

Curation methods for massive vision-language datasets trade off between dataset size and quality. However, even the highest quality of avail… (voir plus)able curated captions are far too short to capture the rich visual detail in an image. To show the value of dense and highly-aligned image-text pairs, we collect the Densely Captioned Images (DCI) dataset, containing 7805 natural images human-annotated with mask-aligned descriptions averaging above 1000 words each. With precise and reliable captions associated with specific parts of an image, we can evaluate vision-language models' (VLMs) understanding of image content with a novel task that matches each caption with its corresponding subcrop. As current models are often limited to 77 text tokens, we also introduce a summarized version (sDCI) in which each caption length is limited. We show that modern techniques that make progress on standard benchmarks do not correspond with significant improvement on our sDCI based benchmark. Lastly, we finetune CLIP using sDCI and show significant improvements over the baseline despite a small training set. By releasing the first human annotated dense image captioning dataset, we hope to enable the development of new benchmarks or finetuning recipes for the next generation of VLMs to come.

2023-12-14

ArXiv (prépublication)

doi.org

arxiv.org

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Jack Urbanek

Florian Bordes

Pietro Astolfi

Mary Williamson

Vasu Sharma

Adriana Romero Soriano

Curation methods for massive vision-language datasets trade off between dataset size and quality. However, even the highest quality of avail… (voir plus)able curated captions are far too short to capture the rich visual detail in an image. To show the value of dense and highly-aligned image-text pairs, we collect the Densely Captioned Images (DCI) dataset, containing 7805 natural images human-annotated with mask-aligned descriptions averaging above 1000 words each. With precise and reliable captions associated with specific parts of an image, we can evaluate vision-language models' (VLMs) understanding of image content with a novel task that matches each caption with its corresponding subcrop. As current models are often limited to 77 text tokens, we also introduce a summarized version (sDCI) in which each caption length is limited. We show that modern techniques that make progress on standard benchmarks do not correspond with significant improvement on our sDCI based benchmark. Lastly, we finetune CLIP using sDCI and show significant improvements over the baseline despite a small training set. By releasing the first human annotated dense image captioning dataset, we hope to enable the development of new benchmarks or finetuning recipes for the next generation of VLMs to come.

2023-12-14

ArXiv (prépublication)

doi.org

arxiv.org

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Jack Urbanek

Florian Bordes

Pietro Astolfi

Mary Williamson

Vasu Sharma

Adriana Romero Soriano

Curation methods for massive vision-language datasets trade off between dataset size and quality. However, even the highest quality of avail… (voir plus)able curated captions are far too short to capture the rich visual detail in an image. To show the value of dense and highly-aligned image-text pairs, we collect the Densely Captioned Images (DCI) dataset, containing 7805 natural images human-annotated with mask-aligned descriptions averaging above 1000 words each. With precise and reliable captions associated with specific parts of an image, we can evaluate vision-language models' (VLMs) understanding of image content with a novel task that matches each caption with its corresponding subcrop. As current models are often limited to 77 text tokens, we also introduce a summarized version (sDCI) in which each caption length is limited. We show that modern techniques that make progress on standard benchmarks do not correspond with significant improvement on our sDCI based benchmark. Lastly, we finetune CLIP using sDCI and show significant improvements over the baseline despite a small training set. By releasing the first human annotated dense image captioning dataset, we hope to enable the development of new benchmarks or finetuning recipes for the next generation of VLMs to come.

2023-12-14

ArXiv (prépublication)

doi.org

arxiv.org

Symmetry Breaking and Equivariant Neural Networks

Sékou-Oumar Kaba

Siamak Ravanbakhsh

Using symmetry as an inductive bias in deep learning has been proven to be a principled approach for sample-efficient model design. However,… (voir plus) the relationship between symmetry and the imperative for equivariance in neural networks is not always obvious. Here, we analyze a key limitation that arises in equivariant functions: their incapacity to break symmetry at the level of individual data samples. In response, we introduce a novel notion of 'relaxed equivariance' that circumvents this limitation. We further demonstrate how to incorporate this relaxation into equivariant multilayer perceptrons (E-MLPs), offering an alternative to the noise-injection method. The relevance of symmetry breaking is then discussed in various application domains: physics, graph representation learning, combinatorial optimization and equivariant decoding.

2023-12-14

ArXiv (prépublication)

doi.org

arxiv.org

Asymmetric Actor-Critic with Approximate Information State

Amit Sinha

Aditya Mahajan

Reinforcement learning (RL) for partially observable Markov decision processes (POMDPs) is a challenging problem because decisions need to b… (voir plus)e made based on the entire history of observations and actions. However, in several scenarios, state information is available during the training phase. We are interested in exploiting the availability of this state information during the training phase to efficiently learn a history-based policy using RL. Specifically, we consider actor-critic algorithms, where the actor uses only the history information but the critic uses both history and state. Such algorithms are called asymmetric actor-critic, to highlight the fact that the actor and critic have asymmetric information. Motivated by the recent success of using representation losses in RL for POMDPs [1], we derive similar theoretical results for the asymmetric actor-critic case and evaluate the effectiveness of adding such auxiliary losses in experiments. In particular, we learn a history representation-called an approximate information state (AIS)-and bound the performance loss when acting using AIS.

2023-12-13

IEEE Conference on Decision and Control (publié)

doi.org

Current Practices in Voice Data Collection and Limitations to Voice AI Research: A National Survey.

Emily Evangelista

Rohan Kale

Desiree McCutcheon

Anais Rameau

Alexander Gelbard

Maria Powell

Michael Johns

Anthony Law

Phillip Song

Matthew Naunheim

Stephanie Watts

Paul C. Bryson

Matthew G. Crowson

Jeremy Pinto

Yael Bensoussan

INTRODUCTION Accuracy and validity of voice AI algorithms rely on substantial quality voice data. Although commensurable amounts of voice da… (voir plus)ta are captured daily in voice centers across North America, there is no standardized protocol for acoustic data management, which limits the usability of these datasets for voice artificial intelligence (AI) research. OBJECTIVE The aim was to capture current practices of voice data collection, storage, analysis, and perceived limitations to collaborative voice research. METHODS A 30-question online survey was developed with expert guidance from the voicecollab.ai members, an international collaborative of voice AI researchers. The survey was disseminated via REDCap to an estimated 200 practitioners at North American voice centers. Survey questions assessed respondents' current practices in terms of acoustic data collection, storage, and retrieval as well as limitations to collaborative voice research. RESULTS Seventy-two respondents completed the survey of which 81.7% were laryngologists and 18.3% were speech language pathologists (SLPs). Eighteen percent of respondents reported seeing 40%-60% and 55% reported seeing >60 patients with voice disorders weekly (conservative estimate of over 4000 patients/week). Only 28% of respondents reported utilizing standardized protocols for collection and storage of acoustic data. Although, 87% of respondents conduct voice research, only 38% of respondents report doing so on a multi-institutional level. Perceived limitations to conducting collaborative voice research include lack of standardized methodology for collection (30%) and lack of human resources to prepare and label voice data adequately (55%). CONCLUSION To conduct large-scale multi-institutional voice research with AI, there is a pertinent need for standardization of acoustic data management, as well as an infrastructure for secure and efficient data sharing. LEVEL OF EVIDENCE Level 5 Laryngoscope, 2023.

2023-12-13

The Laryngoscope (publié)

doi.org

Current Practices in Voice Data Collection and Limitations to Voice AI Research: A National Survey.

Emily Evangelista

Rohan Kale

Desiree McCutcheon

Anais Rameau

Alexander H. Gelbard

Maria Powell

Michael Johns

Anthony Law

Phillip C Song

M. Naunheim

Stephanie Watts

Paul C. Bryson

Matthew G. Crowson

Jeremy M. Pinto

Yael Bensoussan

INTRODUCTION Accuracy and validity of voice AI algorithms rely on substantial quality voice data. Although commensurable amounts of voice da… (voir plus)ta are captured daily in voice centers across North America, there is no standardized protocol for acoustic data management, which limits the usability of these datasets for voice artificial intelligence (AI) research. OBJECTIVE The aim was to capture current practices of voice data collection, storage, analysis, and perceived limitations to collaborative voice research. METHODS A 30-question online survey was developed with expert guidance from the voicecollab.ai members, an international collaborative of voice AI researchers. The survey was disseminated via REDCap to an estimated 200 practitioners at North American voice centers. Survey questions assessed respondents' current practices in terms of acoustic data collection, storage, and retrieval as well as limitations to collaborative voice research. RESULTS Seventy-two respondents completed the survey of which 81.7% were laryngologists and 18.3% were speech language pathologists (SLPs). Eighteen percent of respondents reported seeing 40%-60% and 55% reported seeing >60 patients with voice disorders weekly (conservative estimate of over 4000 patients/week). Only 28% of respondents reported utilizing standardized protocols for collection and storage of acoustic data. Although, 87% of respondents conduct voice research, only 38% of respondents report doing so on a multi-institutional level. Perceived limitations to conducting collaborative voice research include lack of standardized methodology for collection (30%) and lack of human resources to prepare and label voice data adequately (55%). CONCLUSION To conduct large-scale multi-institutional voice research with AI, there is a pertinent need for standardization of acoustic data management, as well as an infrastructure for secure and efficient data sharing. LEVEL OF EVIDENCE Level 5 Laryngoscope, 2023.

2023-12-13

The Laryngoscope (published)

doi.org

A deep learning benchmark for first break detection from hardrock seismic reflection data

Pierre-Luc St-Charles

Bruno Rousseau

Joumana Ghosn

Gilles Bellefleur

Ernst Schetselaar

2023-12-13

GEOPHYSICS (publié)

doi.org

Privacy-preserving analysis of time-to-event data under nested case-control sampling

Lamin Juwara

Archer Yang

Ana M Velly

Paramita Saha-Chaudhuri

2023-12-13

Statistical Methods in Medical Research (publié)

doi.org

Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma

Quentin Bertrand

Juan Duque

Emilio Calvano

Gauthier Gidel

The deployment of machine learning systems in the market economy has triggered academic and institutional fears over potential tacit collusi… (voir plus)on between fully automated agents. Multiple recent economics studies have empirically shown the emergence of collusive strategies from agents guided by machine learning algorithms. In this work, we prove that multi-agent Q-learners playing the iterated prisoner's dilemma can learn to collude. The complexity of the cooperative multi-agent setting yields multiple fixed-point policies for

2023-12-13

ArXiv (prépublication)

doi.org

arxiv.org

Avantage IA

Bourse Mila en politiques de l'IA

Priorités stratégiques

Avantage IA

Bourse Mila en politiques de l'IA

Publications

Avantage IA

Bourse Mila en politiques de l'IA

Priorités stratégiques

Avantage IA

Bourse Mila en politiques de l'IA

Mots-clés populaires:

Publications