GPAI Report & Policy Guide: Towards Substantive Equality in AI
Join us at Mila on November 26 for the launch of the report and policy guide that outlines actionable recommendations for building inclusive AI ecosystems.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that da… (see more)tasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe
Data scientists commonly use computational notebooks because they provide a good environment for testing multiple models. However, once the … (see more)scientist completes the code and finds the ideal model, he or she will have to dedicate time to clean up the code in order for others to easily understand it. In this paper, we perform a qualitative study on how scientists clean their code in hopes of being able to suggest a tool to automate this process. Our end goal is for tool builders to address possible gaps and provide additional aid to data scientists, who then can focus more on their actual work rather than the routine and tedious cleaning work. By sampling notebooks from GitHub and analyzing changes between subsequent commits, we identified common cleaning activities, such as changes to markdown (e.g., adding headers sections or descriptions) or comments (both deleting dead code and adding descriptions) as well as reordering cells. We also find that common cleaning activities differ depending on the intended purpose of the notebook. Our results provide a valuable foundation for tool builders and notebook users, as many identified cleaning activities could benefit from codification of best practices and dedicated tool support, possibly tailored depending on intended use.
2021-11-15
2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW) (published)
Data scientists reportedly spend a significant amount of their time in their daily routines on data wrangling, i.e. cleaning data and extrac… (see more)ting features. However, data wrangling code is often repetitive and error-prone to write. Moreover, it is easy to introduce subtle bugs when reusing and adopting existing code, which results in reduced model quality. To support data scientists with data wrangling, we present a technique to generate documentation for data wrangling code. We use (1) program synthesis techniques to automatically summarize data transformations and (2) test case selection techniques to purposefully select representative examples from the data based on execution information collected with tailored dynamic program analysis. We demonstrate that a JupyterLab extension with our technique can provide on-demand documentation for many cells in popular notebooks and find in a user study that users with our plugin are faster and more effective at finding realistic bugs in data wrangling code.
2021-11-15
2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) (published)
Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE s… (see more)ystem to adapt effectively and efficiently to particular speakers. In this study, we propose a novel meta-learning-based speaker-adaptive SE approach (called OSSEM) that aims to achieve SE model adaptation in a one-shot manner. OSSEM consists of a modified transformer SE network and a speaker-specific masking (SSM) network. In practice, the SSM network takes an enrolled speaker embedding extracted using ECAPA-TDNN to adjust the input noisy feature through masking. To evaluate OSSEM, we designed a modified Voice Bank-DEMAND dataset, in which one utterance from the testing set was used for model adaptation, and the remaining utterances were used for testing the performance. Moreover, we set restrictions allowing the enhancement process to be conducted in real time, and thus designed OSSEM to be a causal SE system. Experimental results first show that OSSEM can effectively adapt a pretrained SE model to a particular speaker with only one utterance, thus yielding improved SE results. Meanwhile, OSSEM exhibits a competitive performance compared to state-of-the-art causal SE systems.
We introduce the Cut-and-Play, an efficient algorithm for computing equilibria in simultaneous non-cooperative games where players solve non… (see more)convex and possibly unbounded optimization problems. Our algorithm exploits an intrinsic relationship between the equilibria of the original nonconvex game and the ones of a convexified counterpart. In practice, Cut-and-Play formulates a series of convex approximations of the original game and refines them with techniques from integer programming, for instance, cutting planes and branching operations. We test our algorithm on two families of challenging nonconvex games involving discrete decisions and bilevel programs, and we empirically demonstrate that it efficiently computes equilibria and outperforms existing game-specific algorithms.
We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine informat… (see more)ion from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.
The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most… (see more) learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples. We introduce the Cross Aggregator neural network module based on a multi-head attention mechanism that learns to combine the cues present in these per-example solutions to synthesize a global solution. Evaluation across programs of different lengths and under two different experimental settings reveal that when given the same time budget, our technique significantly improves the success rate over PCCoder [Zohar et. al 2018] and other ablation baselines.
Tensor network (TN) methods have been a key ingredient of advances in condensed matter physics and have recently sparked interest in the mac… (see more)hine learning community for their ability to compactly represent very high-dimensional objects. TN methods can for example be used to efficiently learn linear models in exponentially large feature spaces [56]. In this work, we derive upper and lower bounds on the VC-dimension and pseudo-dimension of a large class of TN models for classification, regression and completion. Our upper bounds hold for linear models parameterized by arbitrary TN structures, and we derive lower bounds for common tensor decomposition models (CP, Tensor Train, Tensor Ring and Tucker) showing the tightness of our general upper bound. These results are used to derive a generalization bound which can be applied to classification with low-rank matrices as well as linear classifiers based on any of the commonly used tensor decomposition models. As a corollary of our results, we obtain a bound on the VC-dimension of the matrix product state classifier introduced in [56] as a function of the so-called bond dimension (i.e. tensor train rank), which answers an open problem listed by Cirac, Garre-Rubio and Pérez-García in [13].
Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and … (see more)the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.