Publications

Metro: Memory-Enhanced Transformer for Retrosynthetic Planning via Reaction Tree

Songtao Liu

Rex Ying

Zhitao Ying

Zuobai Zhang

Peilin Zhao

Jian Tang

Lu Lin

Dinghao Wu

Retrosynthetic planning plays a critical role in drug discovery and organic chemistry. Starting from a target molecule as the root node, it … (see more)aims to find a complete reaction tree subject to the constraint that all leaf nodes belong to a set of starting materials. The multi-step reactions are crucial because they determine the flow chart in the production of the Organic Chemical Industry. However, existing datasets lack curation of tree-structured multi-step reactions, and fail to provide such reaction trees, limiting models’ understanding of organic molecule transformations. In this work, we first develop a benchmark curated for the retrosynthetic planning task, which consists of 124,869 reaction trees retrieved from the public USPTO-full dataset. On top of that, we propose Metro: Memory-Enhanced Transformer for RetrOsynthetic planning. Specifically, the dependency among molecules in the reaction tree is captured as context information for multi-step retrosynthesis predictions through transformers with a memory module. Extensive experiments show that Metro dramatically outperforms existing single-step retrosynthesis models by at least 10.7% in top-1 accuracy. The experiments demonstrate the superiority of exploiting context information in the retrosynthetic planning task. Moreover, the proposed model can be directly used for synthetic accessibility analysis, as it is trained on reaction trees with the shortest depths. Our work is the first step towards a brand new formulation for retrosynthetic planning in the aspects of data construction, model design, and evaluation. Code is available at https://github.com/SongtaoLiu0823/metro.

2022-01-01

arXiv.org (preprint)

doi.org

openreview.net

Multi-Head Adapter Routing for Data-Efficient Fine-Tuning

Lucas Caccia

Edoardo Ponti

Lu Liu

Matheus Pereira

Nicolas Le Roux

Alessandro Sordoni

Parameter-efﬁcient ﬁne-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly add… (see more)ed parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon [Ponti et al., 2022] jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and ﬁne-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efﬁcient generalization. Thus, we propose less expressive variants where we perform weighted averaging of the adapters before few-shot adaptation ( Poly - µ ) instead of learning a routing function. Moreover, we introduce more expressive variants where ﬁner-grained task–adapter allocation is learned through a multi-head routing function ( Poly - S ). We test these variants on three separate benchmarks for multi-task learning. We ﬁnd that Poly - S achieves gains on all three (up to 5.3 points on average) over strong baselines, while incurring a negligible additional cost in parameter count. In particular, we ﬁnd that instruction tuning, where models are fully ﬁne-tuned on natural language instructions for each task, is inferior to modular methods such as Polytropon and our proposed variants.

2022-01-01

arXiv.org (preprint)

doi.org

Multilingual Language Model Adaptive Fine-Tuning: A Study on African Languages

Jesujoba Oluwadara Alabi

David Ifeoluwa Adelani

Marius Mosbach

Dietrich Klakow

and XLM-R) and three NLP tasks (NER, news topic classiﬁcation, and sentiment classiﬁcation) shows that our approach is competitive to ap… (see more)plying LAFT on individual languages while requiring signiﬁcantly less disk space. Finally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efﬁcient ﬁne-tuning methods.

2022-01-01

arXiv.org (preprint)

doi.org

Optimizing deep learning for Magnetoencephalography (MEG): From sensory perception to sex prediction and brain fingerprinting

Arthur Dehgan

Irina Rish

Karim Jerbi

2022-01-01

2022 Conference on Cognitive Computational Neuroscience (published)

doi.org

Peer-to-Peer Energy Trading and Energy Conversion in Interconnected Multi-Energy Microgrids Using Multi-Agent Deep Reinforcement Learning

Tianyi Chen

Shengrong Bu

Xue (Steve) Liu

Jikun Kang

F. Richard Yu

Zhu Han

A key aspect of multi-energy microgrids (MEMGs) is the capability to efficiently convert and store energy in order to reduce the costs and e… (see more)nvironmental impact. Peer-to-peer (P2P) energy trading is a novel paradigm for decentralised energy market designs. In this paper, we investigate the external P2P energy trading problem and internal energy conversion problem within interconnected residential, commercial and industrial MEMGs. These two problems are complex decision-making problems with enormous high-dimensional data and uncertainty, so a multi-agent deep reinforcement learning approach combining the multi-agent actor-critic algorithm with the twin delayed deep deterministic policy gradient algorithm is proposed. The proposed approach can handle the high-dimensional continuous action space and aligns with the nature of P2P energy trading with multiple MEMGs. Simulation results based on three real-world MG datasets show that the proposed approach significantly reduces each MGâ€™s average hourly operation cost. The impact of carbon tax pricing is also considered.

2022-01-01

IEEE Transactions on Smart Grid (published)

doi.org

Peer-to-Peer Energy Trading and Energy Conversion in Interconnected Multi-Energy Microgrids Using Multi-Agent Deep Reinforcement Learning

Student Member Ieee Tianyi Chen

Shengrong Bu

Ieee Xue Liu Member

Ieee Jikun Kang Fellow

Fellow Ieee F. Richard Yu

Fellow Ieee. Zhu Han

A key aspect of multi-energy microgrids (MEMGs) is the capability to efficiently convert and store energy in order to reduce the costs and e… (see more)nvironmental impact. Peer-to-peer (P2P) energy trading is a novel paradigm for decentralised energy market designs. In this paper, we investigate the external P2P energy trading problem and internal energy conversion problem within interconnected residential, commercial and industrial MEMGs. These two problems are complex decision-making problems with enormous high-dimensional data and uncertainty, so a multi-agent deep reinforcement learning approach combining the multi-agent actor-critic algorithm with the twin delayed deep deterministic policy gradient algorithm is proposed. The proposed approach can handle the high-dimensional continuous action space and aligns with the nature of P2P energy trading with multiple MEMGs. Simulation results based on three real-world MG datasets show that the proposed approach significantly reduces each MG’s average hourly operation cost. The impact of carbon tax pricing is also considered.

2022-01-01

IEEE Transactions on Smart Grid (published)

doi.org

Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

Eric Larsen

Sébastien Lachapelle

This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a method… (see more)ology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming, where the second stage is demanding computationally. We aim to predict at a high speed the expected TDOS associated with the second-stage problem, conditionally on the first-stage variables. This may be used in support of the solution to the overall two-stage problem by avoiding the online generation of multiple second-stage scenarios and solutions. We formulate the tactical prediction problem as a stochastic optimal prediction program, whose solution we approximate with supervised machine learning. The training data set consists of a large number of deterministic operational problems generated by controlled probabilistic sampling. The labels are computed based on solutions to these problems (solved independently and offline), employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application on load planning for rail transportation show that deep learning models produce accurate predictions in very short computing time (milliseconds or less). The predictive accuracy is close to the lower bounds calculated based on sample average approximation of the stochastic prediction programs.

2022-01-01

INFORMS Journal on Computing (published)

doi.org

arxiv.org

Probabilistic surrogate networks for simulators with unbounded randomness

Andreas Munk

Berend Zwartsenberg

Adam Ścibior

Atilim Güneş Baydin

Andrew Lawrence Stewart

Goran Fernlund

Anoush Poursartip

Frank Wood

We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of stochastic simulators. Unlike… (see more) traditional approaches to surrogate modeling, our surrogates retain the interpretable structure and control flow of the reference simulator. Our surrogates target stochastic simulators where the number of random variables itself can be stochastic and potentially unbounded. Our framework further enables an automatic replacement of the reference simulator with the surrogate when undertaking amortized inference. The fidelity and speed of our surrogates allow for both faster stochastic simulation and accurate and substantially faster posterior inference. Using an illustrative yet non-trivial example we show our surrogates' ability to accurately model a probabilistic program with an unbounded number of random variables. We then proceed with an example that shows our surrogates are able to accurately model a complex structure like an unbounded stack in a program synthesis example. We further demonstrate how our surrogate modeling technique makes amortized inference in complex black-box simulators an order of magnitude faster. Specifically, we do simulator-based materials quality testing, inferring safety-critical latent internal temperature profiles of composite materials undergoing curing.

2022-01-01

UAI (published)

proceedings.mlr.press

openreview.net

Question Personalization in an Intelligent Tutoring System

Sabina Elkins

Robert Belfer

Ekaterina Kochmar

Iulian V. Serban

Jackie Cheung

2022-01-01

AIED (2) (published)

doi.org

arxiv.org

Realistic Evaluation of Transductive Few-Shot Learning - Supplementary Material

Olivier Veilleux

Éts Montréal

Malik Boudiaf

Pablo Piantanida

Ismail Ben

Ayed Éts Montreal

In the main tables of the paper, we did not include the performances of α-TIM in the standard balanced setting. Here, we emphasize that α-… (see more)TIM is a generalization of TIM [1] as when α → 1 (i.e., the α-entropies tend to the Shannon entropies), α-TIM tends to TIM. Therefore, in the standard setting, where optimal hyper-parameter α is obtained over validation tasks that are balanced (as in the standard validation tasks of the original TIM and the other existing methods), the performance of α-TIM is the same as TIM. When α is tuned on balanced validation tasks, we obtain an optimal value of α very close to 1, and our α-mutual information approaches the standard mutual information. When the validation tasks are uniformly random, as in our new setting and in the validation plots we provided in the main figure, one can see that the performance of α-TIM remains competitive when we tend to balanced testing tasks (i.e., when a is increasing), but is significantly better than TIM when we tend to uniformly-random testing tasks (a = 1). These results illustrate the flexibility of α-divergences, and are in line with the technical analysis provided in the main paper.

2022-01-01

(published)

www.semanticscholar.org

Recipe for a General, Powerful, Scalable Graph Transformer

Ladislav Rampášek

Mikhail Galkin

Vijay Prakash Dwivedi

Anh Tuan Luu

Guy Wolf

Dominique Beaini

We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art result… (see more)s on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being

openreview.net

Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

Rishabh Agarwal

Max Schwarzer

Pablo Samuel Castro

Aaron Courville

Marc Gendron-Bellemare

openreview.net