Publications

GFlowNet Foundations

Edward J. Hu

Mo Tiwari

Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, w… (see more)ith a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.

2021-11-16

ArXiv (preprint)

doi.org

arxiv.org

Digital Ageism: Challenges and Opportunities in Artificial Intelligence for Older Adults

Charlene H. Chu

Rune Nyrup

Kathleen Leslie

Jiamin Shi

Andria Bianchi

Alexandra Lyn

Molly McNicholl

Shehroz Khan

Samira Rahimi

Amanda Grenier

Artificial intelligence (AI) and machine learning are changing our world through their impact on sectors including health care, education, e… (see more)mployment, finance, and law. AI systems are developed using data that reflect the implicit and explicit biases of society, and there are significant concerns about how the predictive models in AI systems amplify inequity, privilege, and power in society. The widespread applications of AI have led to mainstream discourse about how AI systems are perpetuating racism, sexism, and classism; yet, concerns about ageism have been largely absent in the AI bias literature. Given the globally aging population and proliferation of AI, there is a need to critically examine the presence of age-related bias in AI systems. This forum article discusses ageism in AI systems and introduces a conceptual model that outlines intersecting pathways of technology development that can produce and reinforce digital ageism in AI systems. We also describe the broader ethical and legal implications and considerations for future directions in digital ageism research to advance knowledge in the field and deepen our understanding of how ageism in AI is fostered by broader cycles of injustice.

2021-11-14

The Gerontologist (published)

doi.org

Splitting, Renaming, Removing: A Study of Common Cleaning Activities in Jupyter Notebooks

Helen Dong

Shurui Zhou

Jin L.C. Guo

Christian KÃ¤stner

Data scientists commonly use computational notebooks because they provide a good environment for testing multiple models. However, once the … (see more)scientist completes the code and finds the ideal model, he or she will have to dedicate time to clean up the code in order for others to easily understand it. In this paper, we perform a qualitative study on how scientists clean their code in hopes of being able to suggest a tool to automate this process. Our end goal is for tool builders to address possible gaps and provide additional aid to data scientists, who then can focus more on their actual work rather than the routine and tedious cleaning work. By sampling notebooks from GitHub and analyzing changes between subsequent commits, we identified common cleaning activities, such as changes to markdown (e.g., adding headers sections or descriptions) or comments (both deleting dead code and adding descriptions) as well as reordering cells. We also find that common cleaning activities differ depending on the intended purpose of the notebook. Our results provide a valuable foundation for tool builders and notebook users, as many identified cleaning activities could benefit from codification of best practices and dedicated tool support, possibly tailored depending on intended use.

2021-11-14

2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW) (published)

doi.org

Subtle Bugs Everywhere: Generating Documentation for Data Wrangling Code

Chenyang Yang

Shurui Zhou

Jin L.C. Guo

Christian KÃ¤stner

Data scientists reportedly spend a significant amount of their time in their daily routines on data wrangling, i.e. cleaning data and extrac… (see more)ting features. However, data wrangling code is often repetitive and error-prone to write. Moreover, it is easy to introduce subtle bugs when reusing and adopting existing code, which results in reduced model quality. To support data scientists with data wrangling, we present a technique to generate documentation for data wrangling code. We use (1) program synthesis techniques to automatically summarize data transformations and (2) test case selection techniques to purposefully select representative examples from the data based on execution information collected with tailored dynamic program analysis. We demonstrate that a JupyterLab extension with our technique can provide on-demand documentation for many cells in popular notebooks and find in a user study that users with our plugin are faster and more effective at finding realistic bugs in data wrangling code.

2021-11-14

2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) (published)

doi.org

ZERO: Playing Mathematical Programming Games

Gabriele Dragotto

S. Sankaranarayanan

Margarida Carvalho

Andrea Lodi

2021-11-14

ArXiv (preprint)

arxiv.org

Hidden Hypergraphs, Error-Correcting Codes, and Critical Learning in Hopfield Networks

Christopher Hillar

Tenzin Chan

Rachel Taubman

David Rolnick

In 1943, McCulloch and Pitts introduced a discrete recurrent neural network as a model for computation in brains. The work inspired breakthr… (see more)oughs such as the first computer design and the theory of finite automata. We focus on learning in Hopfield networks, a special case with symmetric weights and fixed-point attractor dynamics. Specifically, we explore minimum energy flow (MEF) as a scalable convex objective for determining network parameters. We catalog various properties of MEF, such as biological plausibility, and then compare to classical approaches in the theory of learning. Trained Hopfield networks can perform unsupervised clustering and define novel error-correcting coding schemes. They also efficiently find hidden structures (cliques) in graph theory. We extend this known connection from graphs to hypergraphs and discover n-node networks with robust storage of 2Ω(n1−ϵ) memories for any ϵ>0. In the case of graphs, we also determine a critical ratio of training samples at which networks generalize completely.

2021-11-10

Entropy (published)

doi.org

The Cut and Play Algorithm: Computing Nash Equilibria via Outer Approximations

Margarida Carvalho

Gabriele Dragotto

Andrea Lodi

Sriram Sankaranarayanan

We introduce the Cut-and-Play, an efficient algorithm for computing equilibria in simultaneous non-cooperative games where players solve non… (see more)convex and possibly unbounded optimization problems. Our algorithm exploits an intrinsic relationship between the equilibria of the original nonconvex game and the ones of a convexified counterpart. In practice, Cut-and-Play formulates a series of convex approximations of the original game and refines them with techniques from integer programming, for instance, cutting planes and branching operations. We test our algorithm on two families of challenging nonconvex games involving discrete decisions and bilevel programs, and we empirically demonstrate that it efficiently computes equilibria and outperforms existing game-specific algorithms.

2021-11-09

ArXiv (preprint)

arxiv.org

S$^3$: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks

Xinlin Li

Bang Liu

Yaoliang Yu

Wulong Liu

Chunjing Xu

Vahid Partovi Nia

2021-11-08

NeurIPS.cc/2021/Conference (poster)

openreview.net

Active 3D Shape Reconstruction from Vision and Touch

Edward J. Smith

David Meger

Luis Pineda

Roberto Calandra

Jitendra Malik

Adriana Romero

Michal Drozdzal

Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch. However, in 3… (see more)D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings, leaving the active exploration of the shape largely unexplored. In active touch sensing for 3D reconstruction, the goal is to actively select the tactile readings that maximize the improvement in shape reconstruction accuracy. However, the development of deep learning-based active touch models is largely limited by the lack of frameworks for shape exploration. In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2) a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration. Our framework enables the development of the first fully data-driven solutions to active touch on top of learned models for object understanding. Our experiments show the benefits of such solutions in the task of 3D shape understanding where our models consistently outperform natural baselines. We provide our framework as a tool to foster future research in this direction.

2021-11-08

NeurIPS.cc/2021/Conference (poster)

openreview.net

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Devendra Singh Sachan

Siva Reddy

William Hamilton

Chris Dyer

Dani Yogatama

We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine informat… (see more)ion from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.

2021-11-08

NeurIPS.cc/2021/Conference (poster)

doi.org

openreview.net

Gradient Starvation: A Learning Proclivity in Neural Networks

Sékou-Oumar Kaba

We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks… (see more). Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and real-world out-of-distribution (OOD) generalization experiments.

2021-11-08

NeurIPS.cc/2021/Conference (poster)

doi.org

openreview.net

Learning to Combine Per-Example Solutions for Neural Program Synthesis

Disha Shrivastava

Hugo Larochelle

Daniel Tarlow

The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most… (see more) learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples. We introduce the Cross Aggregator neural network module based on a multi-head attention mechanism that learns to combine the cues present in these per-example solutions to synthesize a global solution. Evaluation across programs of different lengths and under two different experimental settings reveal that when given the same time budget, our technique significantly improves the success rate over PCCoder [Zohar et. al 2018] and other ablation baselines. The code, data and trained models for our work can be found at https://github.com/shrivastavadisha/N-PEPS.

2021-11-08

NeurIPS.cc/2021/Conference (poster)

doi.org

openreview.net

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications