Publications

Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training

William Harvey

Michael Teng

Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. However, hard atte… (see more)ntion mechanisms can be difficult and slow to train, which is especially costly for applications like neural architecture search where multiple networks must be trained. We introduce a method to amortise the cost of training by generating an extra supervision signal for a subset of the training data. This supervision is in the form of sequences of ‘good’ locations to attend to for each image. We find that the best method to generate supervision sequences comes from framing hard attention for image classification as a Bayesian optimal experimental design (BOED) problem. From this perspective, the optimal locations to attend to are those which provide the greatest expected reduction in the entropy of the classification distribution. We introduce methodology from the BOED literature to approximate this optimal behaviour and generate ‘near-optimal’ supervision sequences. We then present a hard attention network training objective that makes use of these sequences and show that it allows faster training than prior work. We finally demonstrate the utility of faster hard attention training by incorporating supervision sequences in a neural architecture search, resulting in hard attention architectures which can outperform networks with access to the entire image.

2022-07-18

2022 International Joint Conference on Neural Networks (IJCNN) (published)

doi.org

arxiv.org

On the Effectiveness of Interpretable Feedforward Neural Network

Miles Q. Li

Benjamin Fung

Adel Abusitta

Deep learning models have achieved state-of-the-art performance in many classification tasks. However, most of them cannot provide an explan… (see more)ation for their classification results. Machine learning models that are interpretable are usually linear or piecewise linear and yield inferior performance. Non-linear models achieve much better classification performance, but it is usually hard to explain their classification results. As a counter-example, an interpretable feedforward neural network (IFFNN) is proposed to achieve both high classification performance and interpretability for malware detection. If the IFFNN can perform well in a more flexible and general form for other classification tasks while providing meaningful explanations, it may be of great interest to the applied machine learning community. In this paper, we propose a way to generalize the interpretable feedforward neural network to multi-class classification scenarios and any type of feedforward neural networks, and evaluate its classification performance and interpretability on interpretable datasets. We conclude by finding that the generalized IFFNNs achieve comparable classification performance to their normal feedforward neural network counterparts and provide meaningful explanations. Thus, this kind of neural network architecture has great practical use.

2022-07-18

2022 International Joint Conference on Neural Networks (IJCNN) (published)

doi.org

arxiv.org

Generative Models of Brain Dynamics

Mahta Ramezanian-Panahi

Germán Abrevaya

Jean-Christophe Gagnon-Audet

Vikram Voleti

Irina Rish

Guillaume Dumas

2022-07-15

Frontiers in Artificial Intelligence (published)

doi.org

Predicting Adverse Radiation Effects in Brain Tumors After Stereotactic Radiotherapy With Deep Learning and Handcrafted Radiomics

Simon A. Keek

Manon Beuque

Sergey Primakov

Henry C. Woodruff

Avishek Chatterjee

Janita E. van Timmeren

Martin Vallières

Lizza E. L. Hendriks

Johannes Kraft

Nicolaus Andratschke

Steve E. Braunstein

Olivier Morin

Philippe Lambin

Introduction There is a cumulative risk of 20–40% of developing brain metastases (BM) in solid cancers. Stereotactic radiotherapy (SRT) en… (see more)ables the application of high focal doses of radiation to a volume and is often used for BM treatment. However, SRT can cause adverse radiation effects (ARE), such as radiation necrosis, which sometimes cause irreversible damage to the brain. It is therefore of clinical interest to identify patients at a high risk of developing ARE. We hypothesized that models trained with radiomics features, deep learning (DL) features, and patient characteristics or their combination can predict ARE risk in patients with BM before SRT. Methods Gadolinium-enhanced T1-weighted MRIs and characteristics from patients treated with SRT for BM were collected for a training and testing cohort (N = 1,404) and a validation cohort (N = 237) from a separate institute. From each lesion in the training set, radiomics features were extracted and used to train an extreme gradient boosting (XGBoost) model. A DL model was trained on the same cohort to make a separate prediction and to extract the last layer of features. Different models using XGBoost were built using only radiomics features, DL features, and patient characteristics or a combination of them. Evaluation was performed using the area under the curve (AUC) of the receiver operating characteristic curve on the external dataset. Predictions for individual lesions and per patient developing ARE were investigated. Results The best-performing XGBoost model on a lesion level was trained on a combination of radiomics features and DL features (AUC of 0.71 and recall of 0.80). On a patient level, a combination of radiomics features, DL features, and patient characteristics obtained the best performance (AUC of 0.72 and recall of 0.84). The DL model achieved an AUC of 0.64 and recall of 0.85 per lesion and an AUC of 0.70 and recall of 0.60 per patient. Conclusion Machine learning models built on radiomics features and DL features extracted from BM combined with patient characteristics show potential to predict ARE at the patient and lesion levels. These models could be used in clinical decision making, informing patients on their risk of ARE and allowing physicians to opt for different therapies.

2022-07-13

Frontiers in Oncology (published)

doi.org

Revisiting Transfer Functions: Learning About a Lagged Exposure-Outcome Association in Time-Series Data

Hiroshi Mamiya

Alexandra M. Schmidt

Erica E. M. Moodie

David Buckeridge

2022-07-11

International Journal of Public Health (published)

doi.org

Challenging Common Assumptions about Catastrophic Forgetting

Timothee LESORT

Oleksiy Ostapenko

Pau Rodriguez

Md Rifat Arefin

Diganta Misra

Laurent Charlin

Irina Rish

Building learning agents that can progressively learn and accumulate knowledge is the core goal of the continual learning (CL) research fiel… (see more)d. Unfortunately, training a model on new data usually compromises the performance on past data. In the CL literature, this effect is referred to as catastrophic forgetting (CF). CF has been largely studied, and a plethora of methods have been proposed to address it on short sequences of non-overlapping tasks. In such setups, CF always leads to a quick and significant drop in performance in past tasks. Nevertheless, despite CF, recent work showed that SGD training on linear models accumulates knowledge in a CL regression setup. This phenomenon becomes especially visible when tasks reoccur. We might then wonder if DNNs trained with SGD or any standard gradient-based optimization accumulate knowledge in such a way. Such phenomena would have interesting consequences for applying DNNs to real continual scenarios. Indeed, standard gradient-based optimization methods are significantly less computationally expensive than existing CL algorithms. In this paper, we study the progressive knowledge accumulation (KA) in DNNs trained with gradient-based algorithms in long sequences of tasks with data re-occurrence. We propose a new framework, SCoLe (Scaling Continual Learning), to investigate KA and discover that catastrophic forgetting has a limited effect on DNNs trained with SGD. When trained on long sequences with data sparsely re-occurring, the overall accuracy improves, which might be counter-intuitive given the CF phenomenon. We empirically investigate KA in DNNs under various data occurrence frequencies and propose simple and scalable strategies to increase knowledge accumulation in DNNs.

2022-07-10

ArXiv (preprint)

openreview.net

Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations

Elliot Layne

Jason Hartford

Sébastien Lachapelle

Mathieu Blanchette

Dhanya Sridhar

Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning meth… (see more)ods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.

2022-07-09

auai.org/UAI/2022/Workshop/CRL (poster)

doi.org

openreview.net

FIXME: synchronize with database! An empirical study of data access self-admitted technical debt

Biruk Asmare Muse

Csaba Nagy

Anthony Cleve

Foutse Khomh

Giuliano Antoniol

2022-07-08

Empirical Software Engineering (published)

doi.org

arxiv.org

Joint Multisided Exposure Fairness for Recommendation

Haolun Wu

Bhaskar Mitra

Chen Ma

Fernando Diaz

Xue (Steve) Liu

Prior research on exposure fairness in the context of recommender systems has focused mostly on disparities in the exposure of individual or… (see more) groups of items to individual users of the system. The problem of how individual or groups of items may be systemically under or over exposed to groups of users, or even all users, has received relatively less attention. However, such systemic disparities in information exposure can result in observable social harms, such as withholding economic opportunities from historically marginalized groups (allocative harm) or amplifying gendered and racialized stereotypes (representational harm). Previously, Diaz et al. developed the expected exposure metric---that incorporates existing user browsing models that have previously been developed for information retrieval---to study fairness of content exposure to individual users. We extend their proposed framework to formalize a family of exposure fairness metrics that model the problem jointly from the perspective of both the consumers and producers. Specifically, we consider group attributes for both types of stakeholders to identify and mitigate fairness concerns that go beyond individual users and items towards more systemic biases in recommendation. Furthermore, we study and discuss the relationships between the different exposure fairness dimensions proposed in this paper, as well as demonstrate how stochastic ranking policies can be optimized towards said fairness goals.

2022-07-07

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (published)

doi.org

arxiv.org

On Natural Language User Profiles for Transparent and Scrutable Recommendation

Filip Radlinski

Krisztian Balog

Fernando Diaz

Lucas Dixon

Ben Wedin

Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the c… (see more)hallenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed. We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests.

2022-07-07

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (published)

doi.org

arxiv.org

Offline Retrieval Evaluation Without Evaluation Metrics

Fernando Diaz

Andres Ferraro

Offline evaluation of information retrieval and recommendation has traditionally focused on distilling the quality of a ranking into a scala… (see more)r metric such as average precision or normalized discounted cumulative gain. We can use this metric to compare the performance of multiple systems for the same request. Although evaluation metrics provide a convenient summary of system performance, they also collapse subtle differences across users into a single number and can carry assumptions about user behavior and utility not supported across retrieval scenarios. We propose recall-paired preference (RPP), a metric-free evaluation method based on directly computing a preference between ranked lists. RPP simulates multiple user subpopulations per query and compares systems across these pseudo-populations. Our results across multiple search and recommendation tasks demonstrate that RPP substantially improves discriminative power while correlating well with existing metrics and being equally robust to incomplete data.

2022-07-07

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (published)

doi.org

arxiv.org

Retrieval-Enhanced Machine Learning

Hamed Zamani

Fernando Diaz

Mostafa Dehghani

Donald Metzler

Michael Bendersky

Although information access systems have long supportedpeople in accomplishing a wide range of tasks, we propose broadening the scope of use… (see more)rs of information access systems to include task-driven machines, such as machine learning models. In this way, the core principles of indexing, representation, retrieval, and ranking can be applied and extended to substantially improve model generalization, scalability, robustness, and interpretability. We describe a generic retrieval-enhanced machine learning (REML) framework, which includes a number of existing models as special cases. REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization. The REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.

2022-07-07

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (published)

doi.org

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications