Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages
In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss th… (see more)e challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.
While pretraining on large-scale image–text data from the Web has facilitated rapid progress on many vision-and-language (V&L) tasks, rece… (see more)nt work has demonstrated that pretrained models lack “fine-grained” understanding, such as the ability to recognise relationships, verbs, and numbers in images. This has resulted in an increased interest in the community to either develop new benchmarks or models for such capabilities. To better understand and quantify progress in this direction, we investigate four competitive V&L models on four fine-grained benchmarks. Through our analysis, we find that X-VLM (Zeng et al., 2022) consistently outperforms other baselines, and that modelling innovations can impact performance more than scaling Web data, which even degrades performance sometimes. Through a deeper investigation of X-VLM, we highlight the importance of both novel losses and rich data sources for learning fine-grained skills. Finally, we inspect training dynamics, and discover that for some tasks, performance peaks early in training or significantly fluctuates, never converging.
While significant research advances have been made in the field of deep reinforcement learning, there have been no concrete adversarial atta… (see more)ck strategies in literature tailored for studying the vulnerability of deep reinforcement learning algorithms to membership inference attacks. In such attacking systems, the adversary targets the set of collected input data on which the deep reinforcement learning algorithm has been trained. To address this gap, we propose an adversarial attack framework designed for testing the vulnerability of a state-of-the-art deep reinforcement learning algorithm to a membership inference attack. In particular, we design a series of experiments to investigate the impact of temporal correlation, which naturally exists in reinforcement learning training data, on the probability of information leakage. Moreover, we compare the performance of \emph{collective} and \emph{individual} membership attacks against the deep reinforcement learning algorithm. Experimental results show that the proposed adversarial attack framework is surprisingly effective at inferring data with an accuracy exceeding
Motor cortex latent dynamics 1 encode arm movement direction and 2 urgency independently 3
Andrea Colins Rodriguez
Matthew G Perich
Lee Miller
Mark D. Humphries
10 The fluid movement of an arm is controlled by multiple parameters that can be set 11 independently. Recent studies argue that arm moveme… (see more)nts are generated by the collective 12 dynamics of neurons in motor cortex. But how these collective dynamics simultaneously encode 13 and control multiple parameters of movement is an open question. Using a task where monkeys 14 made sequential, varied arm movements, we show that the direction and urgency of arm 15 movements are simultaneously encoded in the low-dimensional trajectories of population 16 activity: each movement’s direction by a fixed, looped neural trajectory and its urgency by how 17 quickly that trajectory was traversed. Network models showed this latent coding is potentially 18 advantageous as it allows the direction and urgency of arm movement to be independently 19 controlled. Our results suggest how low-dimensional neural dynamics can define multiple 20 parameters of goal-directed movement simultaneously. 21
To integrate high amounts of renewable energy resources, electrical power grids must be able to cope with high amplitude, fast timescale var… (see more)iations in power generation. Frequency regulation through demand response has the potential to coordinate temporally flexible loads, such as air conditioners, to counteract these variations. Existing approaches for discrete control with dynamic constraints struggle to provide satisfactory performance for fast timescale action selection with hundreds of agents. We propose a decentralized agent trained with multi-agent proximal policy optimization with localized communication. We explore two communication frameworks: hand-engineered, or learned through targeted multi-agent communication. The resulting policies perform well and robustly for frequency regulation, and scale seamlessly to arbitrary numbers of houses for constant processing times.
Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and visio… (see more)n applications. In reinforcement learning, however, a key challenge is that available data of sequential decision making is often not annotated with actions - for example, videos of game-play are much more available than sequences of frames paired with their logged game controls. We propose to circumvent this challenge by combining large but sparsely-annotated datasets from a \emph{target} environment of interest with fully-annotated datasets from various other \emph{source} environments. Our method, Action Limited PreTraining (ALPT), leverages the generalization capabilities of inverse dynamics modelling (IDM) to label missing action data in the target environment. We show that utilizing even one additional environment dataset of labelled data during IDM pretraining gives rise to substantial improvements in generating action labels for unannotated sequences. We evaluate our method on benchmark game-playing environments and show that we can significantly improve game performance and generalization capability compared to other approaches, using annotated datasets equivalent to only
The complexity of the human brain gives the illusion that brain activity is intrinsically high-dimensional. Nonlinear dimensionality-reducti… (see more)on methods such as uniform manifold approximation and t-distributed stochastic neighbor embedding have been used for high-throughput biomedical data. However, they have not been used extensively for brain activity data such as those from functional magnetic resonance imaging (fMRI), primarily due to their inability to maintain dynamic structure. Here we introduce a nonlinear manifold learning method for time-series data—including those from fMRI—called temporal potential of heat-diffusion for affinity-based transition embedding (T-PHATE). In addition to recovering a low-dimensional intrinsic manifold geometry from time-series data, T-PHATE exploits the data’s autocorrelative structure to faithfully denoise and unveil dynamic trajectories. We empirically validate T-PHATE on three fMRI datasets, showing that it greatly improves data visualization, classification, and segmentation of the data relative to several other state-of-the-art dimensionality-reduction benchmarks. These improvements suggest many potential applications of T-PHATE to other high-dimensional datasets of temporally diffuse processes.
We propose a new first-order optimization algorithm — AcceleratedGradient-OptimisticGradient (AG-OG) Descent Ascent—for separable convex… (see more)-concave minimax optimization. The main idea of our algorithm is to carefully leverage the structure of the minimax problem, performing Nesterov acceleration on the individual component and optimistic gradient on the coupling component. Equipped with proper restarting, we show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings, including bilinearly coupled strongly convex-strongly concave minimax optimization (bi-SC-SC), bilinearly coupled convex-strongly concave minimax optimization (bi-C-SC), and bilinear games. We also extend our algorithm to the stochastic setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings. AG-OG is the first single-call algorithm with optimal convergence rates in both deterministic and stochastic settings for bilinearly coupled minimax optimization problems.
. Neural activity tends to reside on manifolds whose dimension is much lower than the dimension of the whole neural state space. Experiments… (see more) using brain-computer interfaces with microelectrode arrays implanted in the motor cortex of nonhuman primates tested the hypothesis that external perturbations should produce different adaptation strategies depending on how “aligned” the perturbation is with respect to a pre-existing intrinsic manifold. On the one hand, perturbations within the manifold (WM) evoked fast reassociations of existing patterns for rapid adaptation. On the other hand, perturbations outside the manifold (OM) triggered the slow emergence of new neural patterns underlying a much slower—and, without adequate training protocols, inconsistent or virtually impossible—adaptation. This suggests that the time scale and the overall difficulty of the brain to adapt depend fundamentally on the structure of neural activity. Here, we used a simplified static Gaussian model to show that gradient-descent learning could explain the differences between adaptation to WM and OM perturbations. For small learning rates, we found that the adaptation speeds were different but the model eventually adapted to both perturbations. Moreover, sufficiently large learning rates could entirely prohibit adaptation to OM perturbations while preserving adaptation to WM perturbations, in agreement with experiments. Adopting an incremental training protocol, as has been done in experiments, permitted a swift recovery of a full adaptation in the cases where OM perturbations were previously impossible to relearn. Finally, we also found that gradient descent was compatible with the reassociation mechanism on short adaptation time scales. Since gradient descent has many biologically plausible variants, our findings thus establish gradient-based learning as a plausible mechanism for adaptation under network-level constraints, with a central role for the learning rate.