Publications

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie

Polina Kirichenko

Mark Ibrahim

Mahmoud Assran

Andrew Gordon Wilson

Aaron Courville

Nicolas Ballas

There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its … (voir plus)caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.

2024-04-30

ArXiv (prépublication)

Semantically Consistent Video Inpainting with Conditional Diffusion Models

Dylan Green

William Harvey

Saeid Naderiparizi

Matthew Niedoba

Yunpeng Liu

Xiaoxuan Liang

Jonathan Wilder Lavington

Ke Zhang

Vasileios Lioutas

Setareh Dabiri

Adam Ścibior

Berend Zwartsenberg

Frank N. Wood

Current state-of-the-art methods for video inpainting typically rely on optical flow or attention-based approaches to inpaint masked regions… (voir plus) by propagating visual information across frames. While such approaches have led to significant progress on standard benchmarks, they struggle with tasks that require the synthesis of novel content that is not present in other frames. In this paper we reframe video inpainting as a conditional generative modeling problem and present a framework for solving such problems with conditional video diffusion models. We highlight the advantages of using a generative approach for this task, showing that our method is capable of generating diverse, high-quality inpaintings and synthesizing new content that is spatially, temporally, and semantically consistent with the provided context.

2024-04-30

ArXiv (prépublication)

openreview.net

8-inch Wafer-scale Epitaxial Monolayer MoS2.

Hua Yu

Liangfeng Huang

Lanying Zhou

Yalin Peng

Xiuzhen Li

Peng Yin

Jiaojiao Zhao

M. Zhu

Shuopei Wang

Jieying Liu

Hongyue Du

Jian Tang

Songge Zhang

Yuchao Zhou

Nianpeng Lu

Kaihui Liu

Na Li

Guangyu Zhang

Large-scale, high-quality, and uniform monolayer MoS2 films are crucial for their applications in next-generation electronics and optoelectr… (voir plus)onics. Epitaxy is a mainstream technique for achieving high-quality MoS2 films and has been demonstrated at a wafer scale up to 4-inch. In this study, we report the epitaxial growth of 8-inch wafer-scale highly oriented monolayer MoS2 on sapphire with excellent spatial homogeneity, using a specially designed vertical chemical vapor deposition (VCVD) system. Field effect transistors (FETs) based on the as-grown 8-inch wafer-scale monolayer MoS2 film were fabricated and exhibited high performances, with an average mobility and an on/off ratio of 53.5 cm2V-1s-1 and 107, respectively. In addition, batch fabrication of logic devices and 11-stage ring oscillators were also demonstrated, showcasing excellent electrical functions. Our work may pave way of MoS2 in practical industry-scale applications. This article is protected by copyright. All rights reserved.

2024-04-29

Advances in Materials (publié)

8-inch Wafer-scale Epitaxial Monolayer MoS2.

Hua Yu

Liangfeng Huang

Lanying Zhou

Yalin Peng

Xiuzhen Li

Peng Yin

Jiaojiao Zhao

Min Zhu

Shuopei Wang

Jieying Liu

Hongyue Du

Jian Tang

Songge Zhang

Yuchao Zhou

Nianpeng Lu

Kaihui Liu

Na Li

Guangyu Zhang

2024-04-29

Advances in Materials (publié)

MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection

Heitor Rapela Medeiros

David Latortue

Fidel A. Guerrero Peña

Eric Granger

Marco Pedersoli

,

2024-04-29

ArXiv (prépublication)

Sequential predictive learning is a unifying theory for hippocampal representation and replay

Aleksei Efremov

The mammalian hippocampus contains a cognitive map that represents an animal’s position in the environment 1 and generates offline “repl… (voir plus)ay” 2,3 for the purposes of recall 4, planning 5,6, and forming long term memories 7. Recently, it’s been found that artificial neural networks trained to predict sensory inputs develop spatially tuned cells 8, aligning with predictive theories of hippocampal function 9–11. However, whether predictive learning can also account for the ability to produce offline replay is unknown. Here, we find that spatially tuned cells, which robustly emerge from all forms of predictive learning, do not guarantee the presence of a cognitive map with the ability to generate replay. Offline simulations only emerged in networks that used recurrent connections and head-direction information to predict multi-step observation sequences, which promoted the formation of a continuous attractor reflecting the geometry of the environment. These offline trajectories were able to show wake-like statistics, autonomously replay recently experienced locations, and could be directed by a virtual head direction signal. Further, we found that networks trained to make cyclical predictions of future observation sequences were able to rapidly learn a cognitive map and produced sweeping representations of future positions reminiscent of hippocampal theta sweeps 12. These results demonstrate how hippocampal-like representation and replay can emerge in neural networks engaged in predictive learning, and suggest that hippocampal theta sequences reflect a circuit that implements a data-efficient algorithm for sequential predictive learning. Together, this framework provides a unifying theory for hippocampal functions and hippocampal-inspired approaches to artificial intelligence.

2024-04-29

bioRxiv (prépublication)

Spinal cord perfusion impairments in the M83 mouse model of Parkinson’s disease

Benjamin F. Combes

Sandeep Kumar Kalva

Pierre-Louis Benveniste

Agathe Tournant

Man Hoi Law

Joshua Newton

Maik Krüger

Rebecca Z. Weber

Inês Dias

Daniela Noain

Xose Luis Dean-Ben

Uwe Konietzko

Christian R. Baumann

Per-Göran Gillberg

Christoph Hock

Roger M. Nitsch

Julien Cohen-Adad

Daniel Razansky

Ruiqing Ni

Metabolism and bioenergetics in the central nervous system play important roles in the pathophysiology of Parkinson’s disease (PD). Here, … (voir plus)we employed a multimodal imaging approach to assess oxygenation changes in the spinal cord of a transgenic M83 murine model of PD in comparison to non-transgenic littermates at 9-12 months-of-age. A lower oxygen saturation (SO2)SVOT was detected in vivo with spiral volumetric optoacoustic tomography (SVOT) in the spinal cord of M83 mice compared to non-transgenic littermate mice. Ex-vivo high-field T1-weighted magnetic resonance imaging (MRI) and immunostaining for alpha-synuclein (phospho-S129) and vascular organisation (CD31 and GLUT1) were used to investigate the nature of the abnormalities detected via in vivo imaging. Ex-vivo analysis showed that the vascular network in the spinal cord was not impaired in the spinal cord of M83 mice. Ex-vivo MRI assisted with deep learning-based automatic segmentation showed no volumetric atrophy in the spinal cord of M83 mice compared to non-transgenic littermates, whereas nuclear alpha-synuclein phosphorylated at Ser129 site could be linked to early pathology and metabolic dysfunction. The proposed and validated non-invasive high-resolution imaging tool to study oxygen saturation in the spinal cord of PD mice holds promise for assessing early changes preceding motor deficits in PD mice.

2024-04-29

bioRxiv (prépublication)

Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages

A. Seza Dougruoz

Andr'e Coneglian

Atul Kr. Ojha

Large Language Models are transforming NLP for a lot of tasks. However, how LLMs perform NLP tasks for LRLs is less explored. In alliance wi… (voir plus)th the theme track of the NAACL’24, we focus on 12 low-resource languages (LRLs) from Brazil, 2 LRLs from Africa and 2 high-resource languages (HRLs) (e.g., English and Brazilian Portuguese). Our results indicate that the LLMs perform worse for the labeling of LRLs in comparison to HRLs in general. We explain the reasons behind this failure and provide an error analyses through examples from 2 Brazilian LRLs.

2024-04-28

ArXiv (prépublication)

Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages

A. Seza Dougruoz

Andr'e Coneglian

Atul Kr. Ojha

Large Language Models are transforming NLP for a lot of tasks. However, how LLMs perform NLP tasks for LRLs is less explored. In alliance wi… (voir plus)th the theme track of the NAACL’24, we focus on 12 low-resource languages (LRLs) from Brazil, 2 LRLs from Africa and 2 high-resource languages (HRLs) (e.g., English and Brazilian Portuguese). Our results indicate that the LLMs perform worse for the labeling of LRLs in comparison to HRLs in general. We explain the reasons behind this failure and provide an error analyses through examples from 2 Brazilian LRLs.

2024-04-28

ArXiv (prépublication)

Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages

A. Seza Doğruöz

Andr'e Coneglian

Atul Kr. Ojha

Large Language Models are transforming NLP for a variety of tasks. However, how LLMs perform NLP tasks for low-resource languages (LRLs) is … (voir plus)less explored. In line with the goals of the AmericasNLP workshop, we focus on 12 LRLs from Brazil, 2 LRLs from Africa and 2 high-resource languages (HRLs) (e.g., English and Brazilian Portuguese). Our results indicate that the LLMs perform worse for the part of speech (POS) labeling of LRLs in comparison to HRLs. We explain the reasons behind this failure and provide an error analysis through examples observed in our data set.

2024-04-28

ArXiv (prépublication)

EkoHate: Abusive Language and Hate Speech Detection for Code-switched Political Discussions on Nigerian Twitter

Comfort Eseohen Ilevbare

Jesujoba Oluwadara Alabi

Firdous Damilola Bakare

Oluwatoyin Bunmi Abiola

Oluwaseyi A. Adeyemo

Nigerians have a notable online presence and actively discuss political and topical matters. This was particularly evident throughout the 20… (voir plus)23 general election, where Twitter was used for campaigning, fact-checking and verification, and even positive and negative discourse. However, little or none has been done in the detection of abusive language and hate speech in Nigeria. In this paper, we curated code-switched Twitter data directed at three musketeers of the governorship election on the most populous and economically vibrant state in Nigeria; Lagos state, with the view to detect offensive speech in political discussions. We developed EkoHate -- an abusive language and hate speech dataset for political discussions between the three candidates and their followers using a binary (normal vs offensive) and fine-grained four-label annotation scheme. We analysed our dataset and provided an empirical evaluation of state-of-the-art methods across both supervised and cross-lingual transfer learning settings. In the supervised setting, our evaluation results in both binary and four-label annotation schemes show that we can achieve 95.1 and 70.3 F1 points respectively. Furthermore, we show that our dataset adequately transfers very well to three publicly available offensive datasets (OLID, HateUS2020, and FountaHate), generalizing to political discussions in other regions like the US.

2024-04-28

ArXiv (prépublication)