We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Textual Time Travel: A Temporally Informed Approach to Theory of Mind
Natural language processing systems such as dialogue agents should be able to reason about other people’s beliefs, intentions and desires.… (see more) This capability, called theory of mind (ToM), is crucial, as it allows a model to predict and interpret the needs of users based on their mental states. A recent line of research evaluates the ToM capability of existing memoryaugmented neural models through questionanswering. These models perform poorly on false belief tasks where beliefs differ from reality, especially when the dataset contains distracting sentences. In this paper, we propose a new temporally informed approach for improving the ToM capability of memory-augmented neural models. Our model incorporates priors about the entities’ minds and tracks their mental states as they evolve over time through an extended passage. It then responds to queries through textual time travel—i.e., by accessing the stored memory of an earlier time step. We evaluate our model on ToM datasets and find that this approach improves performance, particularly by correcting the predicted mental states to match the false belief.
2021-01-01
Conference on Empirical Methods in Natural Language Processing (published)
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way … (see more)to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of"task"that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.
Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused … (see more)on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.
Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researc… (see more)hers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by failure to capture authorship style, by the topic shift or by other factors. Motivated by this, we propose the topic confusion task, where we switch the author-topic config-uration between training and testing set. This setup allows us to probe errors in the attribution process. We investigate the accuracy and two error measures: one caused by the models’ confusion by the switch because the features capture the topics, and one caused by the features’ inability to capture the writing styles, leading to weaker models. By evaluating different features, we show that stylometric features with part-of-speech tags are less susceptible to topic variations and can increase the accuracy of the attribution process. We further show that combining them with word-level n - grams can outperform the state-of-the-art technique in the cross-topic scenario. Finally, we show that pretrained language models such as BERT and RoBERTa perform poorly on this task, and are outperformed by simple n -gram features.
The problem of characterizing quantum channels arises in a number of contexts such as quantum process tomography and quantum error correctio… (see more)n. However, direct approaches to parameterizing and optimizing the Choi matrix representation of quantum channels face a curse of dimensionality: the number of parameters scales exponentially in the number of qubits. Recently, Torlai et al. [2020] proposed using locally purified density operators (LPDOs), a tensor network representation of Choi matrices, to overcome the unfavourable scaling in parameters. While the LPDO structure allows it to satisfy a ‘complete positivity’ (CP) constraint required of physically valid quantum channels, it makes no guarantees about a similarly required ‘trace preservation’ (TP) constraint. In practice, the TP constraint is violated, and the learned quantum channel may even be trace-increasing, which is non-physical. In this work, we present the problem of optimizing over TP LPDOs, discuss two approaches to characterizing the TP constraints on LPDOs, and outline the next steps for developing an optimization scheme.
A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches
Meta and transfer learning are two successful families of approaches to few-shot 1 learning. Despite highly related goals, state-of-the-art … (see more)advances in each family are 2 measured largely in isolation of each other. As a result of diverging evaluation 3 norms, a direct or thorough comparison of different approaches is challenging. 4 To bridge this gap, we introduce a few-shot classification evaluation protocol 5 named VTAB+MD with the explicit goal of facilitating sharing of insights from 6 each community. We demonstrate its accessibility in practice by performing a 7 cross-family study of the best transfer and meta learners which report on both a 8 large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning 9 benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, 10 large-scale transfer methods (Big Transfer, BiT) outperform competing approaches 11 on MD, even when trained only on ImageNet. In contrast, meta-learning approaches 12 struggle to compete on VTAB when trained and validated on MD. However, BiT 13 is not without limitations, and pushing for scale does not improve performance 14 on highly out-of-distribution MD tasks. We hope that this work contributes to 15 accelerating progress on few-shot learning research. 16
Few-shot classification aims to recognize unseen classes when presented with only a small number of samples. We consider the problem of mult… (see more)i-domain few-shot image classification, where unseen classes and examples come from diverse data sources. This problem has seen growing interest and has inspired the development of benchmarks such as Meta-Dataset. A key challenge in this multi-domain setting is to effectively integrate the feature representations from the diverse set of training domains. Here, we propose a Universal Representation Transformer (URT) layer, that meta-learns to leverage universal features for few-shot classification by dynamically re-weighting and composing the most appropriate domain-specific representations. In experiments, we show that URT sets a new state-of-the-art result on Meta-Dataset. Specifically, it achieves top-performance on the highest number of data sources compared to competing methods. We analyze variants of URT and present a visualization of the attention score heatmaps that sheds light on how the model performs cross-domain generalization.
Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image d… (see more)ata. Recently, Song et al. (2021) show that diffusion processes that transform data into noise can be reversed via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing a lower bound of the likelihood of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.
For a natural language understanding bench-001 mark to be useful in research, it has to con-002 sist of examples that are diverse and diffi… (see more)-003 cult enough to discriminate among current and 004 near-future state-of-the-art systems. However, 005 we do not yet know how best to select pas-006 sages to collect a variety of challenging exam-007 ples. In this study, we crowdsource multiple-008 choice reading comprehension questions for 009 passages taken from seven qualitatively dis-010 tinct sources, analyzing what attributes of pas-011 sages contribute to the difficulty and question 012 types of the collected examples. To our sur-013 prise, we find that passage source, length, and 014 readability measures do not significantly affect 015 question difficulty. Through our manual anno-016 tation of seven reasoning types, we observe 017 several trends between passage sources and 018 reasoning types, e.g., logical reasoning is more 019 often required in questions written for techni-020 cal passages. These results suggest that when 021 creating a new benchmark dataset, selecting a 022 diverse set of passages can help ensure a di-023 verse range of question types, but that passage 024 difficulty need not be a priority. 025
Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning