Offert en partenariat avec Indspire, ce parcours professionnel sur mesure est conçu pour permettre aux talents autochtones d'apprendre, de développer et de diriger l'évolution de l'IA. Les candidatures pour le programme 2025 sont ouvertes jusqu'au 31 janvier.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
SelfIE: Self-Interpretation of Large Language Model Embeddings
Haozhe Chen
Carl Vondrick
Chengzhi Mao
How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM's reasoning process is key for reliabili… (voir plus)ty, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passage. Capable of interpreting open-world concepts in the hidden embeddings, SelfIE reveals LLM internal reasoning in cases such as making ethical decisions, internalizing prompt injection, and recalling harmful knowledge. SelfIE's text descriptions on hidden embeddings also open up new avenues to control LLM reasoning. We propose Supervised Control, which allows editing open-ended concepts while only requiring gradient computation of individual layer. We extend RLHF to hidden embeddings and propose Reinforcement Control that erases harmful knowledge in LLM without supervision targets.
Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent… (voir plus) success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose to incorporate location uncertainty into MIM by using stochastic positional embeddings (StoP). Specifically, we condition the model on stochastic masked token positions drawn from a Gaussian distribution. StoP reduces overfitting to location features and guides the model toward learning features that are more robust to location uncertainties. Quantitatively, StoP improves downstream MIM performance on a variety of downstream tasks, including
Semantically Consistent Video Inpainting with Conditional Diffusion Models
Dylan Green
William Harvey
Saeid Naderiparizi
Matthew Niedoba
Yunpeng Liu
Xiaoxuan Liang
Jonathan Wilder Lavington
Ke Zhang
Vasileios Lioutas
Setareh Dabiri
Adam Ścibior
Berend Zwartsenberg
Frank Wood
Current state-of-the-art methods for video inpainting typically rely on optical flow or attention-based approaches to inpaint masked regions… (voir plus) by propagating visual information across frames. While such approaches have led to significant progress on standard benchmarks, they struggle with tasks that require the synthesis of novel content that is not present in other frames. In this paper we reframe video inpainting as a conditional generative modeling problem and present a framework for solving such problems with conditional video diffusion models. We highlight the advantages of using a generative approach for this task, showing that our method is capable of generating diverse, high-quality inpaintings and synthesizing new content that is spatially, temporally, and semantically consistent with the provided context.
The mammalian hippocampus contains a cognitive map that represents an animal’s position in the environment 1 and generates offline “repl… (voir plus)ay” 2,3 for the purposes of recall 4, planning 5,6, and forming long term memories 7. Recently, it’s been found that artificial neural networks trained to predict sensory inputs develop spatially tuned cells 8, aligning with predictive theories of hippocampal function 9–11. However, whether predictive learning can also account for the ability to produce offline replay is unknown. Here, we find that spatially tuned cells, which robustly emerge from all forms of predictive learning, do not guarantee the presence of a cognitive map with the ability to generate replay. Offline simulations only emerged in networks that used recurrent connections and head-direction information to predict multi-step observation sequences, which promoted the formation of a continuous attractor reflecting the geometry of the environment. These offline trajectories were able to show wake-like statistics, autonomously replay recently experienced locations, and could be directed by a virtual head direction signal. Further, we found that networks trained to make cyclical predictions of future observation sequences were able to rapidly learn a cognitive map and produced sweeping representations of future positions reminiscent of hippocampal theta sweeps 12. These results demonstrate how hippocampal-like representation and replay can emerge in neural networks engaged in predictive learning, and suggest that hippocampal theta sequences reflect a circuit that implements a data-efficient algorithm for sequential predictive learning. Together, this framework provides a unifying theory for hippocampal functions and hippocampal-inspired approaches to artificial intelligence.
Metabolism and bioenergetics in the central nervous system play important roles in the pathophysiology of Parkinson’s disease (PD). Here, … (voir plus)we employed a multimodal imaging approach to assess oxygenation changes in the spinal cord of a transgenic M83 murine model of PD in comparison to non-transgenic littermates at 9-12 months-of-age. A lower oxygen saturation (SO2)SVOT was detected in vivo with spiral volumetric optoacoustic tomography (SVOT) in the spinal cord of M83 mice compared to non-transgenic littermate mice. Ex-vivo high-field T1-weighted magnetic resonance imaging (MRI) and immunostaining for alpha-synuclein (phospho-S129) and vascular organisation (CD31 and GLUT1) were used to investigate the nature of the abnormalities detected via in vivo imaging. Ex-vivo analysis showed that the vascular network in the spinal cord was not impaired in the spinal cord of M83 mice. Ex-vivo MRI assisted with deep learning-based automatic segmentation showed no volumetric atrophy in the spinal cord of M83 mice compared to non-transgenic littermates, whereas nuclear alpha-synuclein phosphorylated at Ser129 site could be linked to early pathology and metabolic dysfunction. The proposed and validated non-invasive high-resolution imaging tool to study oxygen saturation in the spinal cord of PD mice holds promise for assessing early changes preceding motor deficits in PD mice.
Large Language Models are transforming NLP for a variety of tasks. However, how LLMs perform NLP tasks for low-resource languages (LRLs) is … (voir plus)less explored. In line with the goals of the AmericasNLP workshop, we focus on 12 LRLs from Brazil, 2 LRLs from Africa and 2 high-resource languages (HRLs) (e.g., English and Brazilian Portuguese). Our results indicate that the LLMs perform worse for the part of speech (POS) labeling of LRLs in comparison to HRLs. We explain the reasons behind this failure and provide an error analysis through examples observed in our data set.
Nigerians have a notable online presence and actively discuss political and topical matters. This was particularly evident throughout the 20… (voir plus)23 general election, where Twitter was used for campaigning, fact-checking and verification, and even positive and negative discourse. However, little or none has been done in the detection of abusive language and hate speech in Nigeria. In this paper, we curated code-switched Twitter data directed at three musketeers of the governorship election on the most populous and economically vibrant state in Nigeria; Lagos state, with the view to detect offensive speech in political discussions. We developed EkoHate -- an abusive language and hate speech dataset for political discussions between the three candidates and their followers using a binary (normal vs offensive) and fine-grained four-label annotation scheme. We analysed our dataset and provided an empirical evaluation of state-of-the-art methods across both supervised and cross-lingual transfer learning settings. In the supervised setting, our evaluation results in both binary and four-label annotation schemes show that we can achieve 95.1 and 70.3 F1 points respectively. Furthermore, we show that our dataset adequately transfers very well to three publicly available offensive datasets (OLID, HateUS2020, and FountaHate), generalizing to political discussions in other regions like the US.
The cross-sectional area (CSA) of the spinal cord (SC) computed from its segmentation is a relevant clinical biomarker for the diagnosis and… (voir plus) monitoring of cord compression and atrophy. One key limitation of existing automatic methods is that their SC segmentations depend on the MRI contrast, resulting in different CSA across contrasts. Furthermore, these methods rely on CNNs, leaving a gap in the literature for exploring the performance of modern deep learning (DL) architectures. In this study, we extend our recent work \cite{Bdard2023TowardsCS} by evaluating the contrast-agnostic SC segmentation capabilities of different classes of DL architectures, namely, ConvNeXt, vision transformers (ViTs), and hierarchical ViTs. We compared 7 different DL models using the open-source \textit{Spine Generic} Database of healthy participants