Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Learning a Universal Template for Few-shot Dataset Generalization
We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of … (voir plus)the empirical risk's Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample guarantees are then achieved by decomposing the test error into generalisation, optimisation and approximation errors, each of which can be bounded and traded off with respect to algorithmic parameters, sample size and magnitude of this eigenvalue. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity, specifically, the Hessian's smallest eigenvalue during training can be controlled by the normalisation of the layers, i.e., network scaling. This allows test error guarantees to then be achieved when the population risk minimiser satisfies a complexity assumption. By trading off the network complexity and scaling, insights are gained into the implicit bias of neural network scaling, which are further supported by experimental findings.
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effe… (voir plus)ctive means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analyses, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an eff… (voir plus)ective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis
Motivated by estimation problems arising in autonomous vehicles and decentralized control of unmanned aerial vehicles, we consider multi-age… (voir plus)nt estimation and filtering problems in which multiple agents generate state estimates based on decentralized information and the objective is to minimize a coupled mean-squared error which we call team mean-square error. We call the resulting estimates as minimum team mean-squared error (MTMSE) estimates. We show that MTMSE estimates are different from minimum mean-squared error (MMSE) estimates. We derive closed-form expressions for MTMSE estimates, which are linear function of the observations where the corresponding gain depends on the weight matrix that couples the estimation error. We then consider a filtering problem where a linear stochastic process is monitored by multiple agents which can share their observations (with delay) over a communication graph. We derive expressions to recursively compute the MTMSE estimates. To illustrate the effectiveness of the proposed scheme we consider an example of estimating the distances between vehicles in a platoon and show that MTMSE estimates significantly outperform MMSE estimates and consensus Kalman filtering estimates.
Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge. There ha… (voir plus)ve been several proposals for efficient and robust visual representation learning among vision research practitioners, especially in the sensitive and critical biomedical domain. In this paper, we propose an idea for out-of-distribution generalization of chest X-ray pathologies that uses a simple balanced batch sampling technique. We observed that balanced sampling between the multiple training datasets improves the performance over baseline models trained without balancing. Code for this work is available on Github. 1
Tandem spoken language understanding 001 (SLU) systems suffer from the so-called 002 automatic speech recognition (ASR) error 003 propagatio… (voir plus)n problem. Additionally, as the 004 ASR is not optimized to extract semantics, but 005 solely the linguistic content, relevant semantic 006 cues might be left out of its transcripts. In 007 this work, we propose a multimodal language 008 understanding (MLU) architecture to mitigate 009 these problems. Our solution is based on 010 two compact unidirectional long short-term 011 memory (LSTM) models that encode speech 012 and text information. A fusion layer is also 013 used to fuse audio and text embeddings. 014 Two fusion strategies are explored: a simple 015 concatenation of these embeddings and a 016 cross-modal attention mechanism that learns 017 the contribution of each modality. The first 018 approach showed to be the optimal solution 019 to robustly extract semantic information from 020 audio-textual data. We found that attention 021 is less effective at testing time when the text 022 modality is corrupted. Our model is evaluated 023 on three SLU datasets and robustness is tested 024 using ASR outputs from three off-the-shelf 025 ASR engines. Results show that the proposed 026 approach effectively mitigates the ASR error 027 propagation problem for all datasets. 028
Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from de generation: generated… (voir plus) text is repetitive, generic, self-inconsistent, and lacking commonsense. The empirical analyses on sentence-level attention patterns reveal that neural text degeneration may be associated with insufficient learning of inductive biases by the attention mechanism. Our findings motivate on-the-fly attention modularization, a simple but effective method for injecting inductive biases into attention computation during inference. The resulting text produced by the language model with attention modularization can yield enhanced diversity and commonsense reasoning while maintaining fluency and coherence.