Publications

G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural Networks

Zhaoyu Li

Jinpei Guo

Xujie Si

2023-12-31

Trans. Mach. Learn. Res. (publié)

doi.org

openreview.net

Game On, Hate Off: A Study of Toxicity in Online Multiplayer Environments.

Zachary Yang

Nicolas Grenon-Godbout

Reihaneh Rabbany

The advent of online spaces, particularly social media platforms and video games, has brought forth a significant challenge: the detection a… (voir plus)nd mitigation of toxic and harmful speech. This issue is not only pervasive but also detrimental to the overall user experience. In this study, we leverage small language models to reliably detect toxicity, achieving an average precision of 0.95. Analyzing eight months of chat data from two Ubisoft games, we uncover patterns and trends in toxic behavior. The insights derived from our research will contribute to the development of healthier online communities and inform preventive measures against toxicity.

2023-12-31

Games Res. Pract. (publié)

doi.org

Game Theoretical Formulation for Residential Community Microgrid via Mean Field Theory: Proof of Concept

Mohamad Aziz

Hanane Dagdougui

Issmail ElHallaoui

Incentive-based demand response aggregators are widely recognized as a powerful strategy to increase the flexibility of residential communit… (voir plus)y MG (RCM) while allowing consumers’ assets to participate in the operation of the power system in critical peak times. RCM implementing demand response approaches are of high interest as collectively, they have a high impact on shaping the demand curve during peak time while providing a wide range of economic and technical benefits to consumers and utilities. The penetration of distributed energy resources such as battery energy storage and photovoltaic systems introduces additional flexibility to manage the community loads and increase revenue. This letter proposes a game theoretical formulation for an incentive-based residential community microgrid, where an incentive-based pricing mechanism is developed to encourage peak demand reduction and share the incentive demand curve with the residential community through the aggregator. The aggregator’s objective is to maximize the welfare of the residential community by finding the optimal community equilibrium electricity price. Each household communicates with each other and with the distributed system operator (DSO) through the aggregator and aims to minimize the local electricity cost.

2023-12-31

IEEE Control Systems Letters (publié)

doi.org

Generative Active Learning for the Search of Small-Molecule Protein Binders

Maksym Korablyov

Cheng-Hao Liu

Moksh Jain

Almer Van Der Sloot

Éric Jolicoeur

Edward Ruediger

Andrei Nica

Emmanuel Bengio

Kostiantyn Lapchevskyi

Daniel St-Cyr

Doris Alexandra Schuetz

Victor Ion Butoi

Saikrishna Gottipati

Prateek Gupta

Ladislav Rampasek … (voir 14 de plus)

Sasikanth Avancha

Pierre-Luc Bacon

William Hamilton

Brooks Paige

Sanchit Misra

Stanislaw Jastrzebski

Bharat Kaul

Doina Precup

José Miguel Hernández-Lobato

Marwin Segler

Michael Bronstein

Anne Marinier

Mike Tyers

Yoshua Bengio

Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exh… (voir plus)ibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.

2023-12-31

arXiv (prépublication)

doi.org

arxiv.org

Generative Adversarial Neural Networks for Realistic Stock Market Simulations

Badre Labiad

Abdelaziz Berrado

Loubna Benabbou

—Stock market simulations are widely used to create synthetic environments for testing trading strategies before deploying them to real-ti… (voir plus)me markets. However, the weak realism often found in these simulations presents a significant challenge. Improving the quality of stock market simulations could be facilitated by the availability of rich and granular real Limit Order Books (LOB) data. Unfortunately, access to LOB data is typically very limited. To address this issue, a framework based on Generative Adversarial Networks (GAN) is proposed to generate synthetic realistic LOB data. This generated data can then be utilized for simulating downstream decision-making tasks, such as testing trading strategies, conducting stress tests, and performing prediction tasks. To effectively tackle challenges related to the temporal and local dependencies inherent in LOB structures and to generate highly realistic data, the framework relies on a specific data representation and preprocessing scheme, transformers, and conditional Wasserstein GAN with gradient penalty. The framework is trained using the FI-2010 benchmark dataset and an ablation study is conducted to demonstrate the importance of each component of the proposed framework. Moreover, qualitative and quantitative metrics are proposed to assess the quality of the generated data. Experimental results indicate that the framework outperforms existing benchmarks in simulating realistic market conditions, thus demonstrating its effectiveness in generating synthetic LOB data for diverse downstream tasks.

2023-12-31

International Journal of Advanced Computer Science and Applications (publié)

doi.org

GenRL: Multimodal-foundation world models for generalization in embodied agents

Pietro Mazzaglia

Tim Verbelen

Bart Dhoedt

Aaron Courville

Sai Rajeswar

Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learni… (voir plus)ng (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can specify tasks in a more natural way. Current foundation vision-language models (VLMs) generally require fine-tuning or other adaptations to be adopted in embodied contexts, due to the significant domain gap. However, the lack of multimodal data in such domains represents an obstacle to developing foundation models for embodied applications. In this work, we overcome these problems by presenting multimodal-foundation world models, able to connect and align the representation of foundation VLMs with the latent space of generative world models for RL, without any language annotations. The resulting agent learning framework, GenRL, allows one to specify tasks through vision and/or language prompts, ground them in the embodied domain's dynamics, and learn the corresponding behaviors in imagination. As assessed through large-scale multi-task benchmarking in locomotion and manipulation domains, GenRL enables multi-task generalization from language and visual prompts. Furthermore, by introducing a data-free policy learning strategy, our approach lays the groundwork for foundational policy learning using generative world models. Website, code and data: https://mazpie.github.io/genrl/

2023-12-31

NeurIPS (publié)

doi.org

openreview.net

GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling

Yimin Fan

Shi Han

Single-cell Assay for Transposase-Accessible Chromatin with sequencing (scATAC-seq) has emerged as a powerful technique for investigating op… (voir plus)en chromatin landscapes at single-cell resolution. However, analyzing scATAC-seq data remain challenging due to its sparsity and noise. Genome Foundation Models (GFMs), pre-trained on massive DNA sequences, have proven effective at genome analysis. Given that open chromatin regions (OCRs) harbour salient sequence features, we hypothesize that leveraging GFMs’ sequence embeddings can improve the accuracy and generalizability of scATAC-seq modeling. Here, we introduce the Genome Foundation Embedded Topic Model (GFETM), an interpretable deep learning framework that combines GFMs with the Embedded Topic Model (ETM) for scATAC-seq data analysis. By integrating the DNA sequence embeddings extracted by a GFM from OCRs, GFETM demonstrates superior accuracy and generalizability and captures cell-state specific TF activity both with zero-shot inference and attention mechanism analysis. Finally, the topic mixtures inferred by GFETM reveal biologically meaningful epigenomic signatures of kidney diabetes.

2023-12-31

Lecture Notes in Computer Science (publié)

doi.org

GIST: Generated Inputs Sets Transferability in Deep Learning

Florian Tambon

Foutse Khomh

Giuliano Antoniol

2023-12-31

ACM Trans. Softw. Eng. Methodol. (publié)

doi.org

arxiv.org

GrowSpace: A reinforcement learning environment for plant architecture

Mark Lefsrud

2023-12-31

Computers and Electronics in Agriculture (publié)

doi.org

Harnessing small projectors and multiple views for efficient vision pretraining

Arna Ghosh

Kumar Krishna Agrawal

Shagun Sodhani

Adam M. Oberman

Blake A. Richards

Recent progress in self-supervised (SSL) visual representation learning has led to the development of several different proposed frameworks … (voir plus)that rely on augmentations of images but use different loss functions. However, there are few theoretically grounded principles to guide practice, so practical implementation of each SSL framework requires several heuristics to achieve competitive performance. In this work, we build on recent analytical results to design practical recommendations for competitive and efficient SSL that are grounded in theory. Specifically, recent theory tells us that existing SSL frameworks are minimizing the same idealized loss, which is to learn features that best match the data similarity kernel defined by the augmentations used. We show how this idealized loss can be reformulated to a functionally equivalent loss that is more efficient to compute. We study the implicit bias of using gradient descent to minimize our reformulated loss function and find that using a stronger orthogonalization constraint with a reduced projector dimensionality should yield good representations. Furthermore, the theory tells us that approximating the reformulated loss should be improved by increasing the number of augmentations, and as such using multiple augmentations should lead to improved convergence. We empirically verify our findings on CIFAR, STL and Imagenet datasets, wherein we demonstrate an improved linear readout performance when training a ResNet-backbone using our theoretically grounded recommendations. Remarkably, we also demonstrate that by leveraging these insights, we can reduce the pretraining dataset size by up to 2

2023-12-31

NeurIPS (publié)

doi.org

openreview.net

High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise.

Eduard Gorbunov

Abdurakhmon Sadiev

Marina Danilova

Samuel Horváth

Gauthier Gidel

Pavel Dvurechensky

Alexander Gasnikov

Peter Richtárik

2023-12-31

International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Hint Marginalization for Improved Reasoning in Large Language Models

Soumyasundar Pal

Didier Chételat

Yingxue Zhang

Mark J. Coates

Large Language Models (LLMs) have exhibited an impressive capability to perform reasoning tasks, especially if they are encouraged to genera… (voir plus)te a sequence of intermediate steps. Reasoning performance can be improved by suitably combining multiple LLM responses, generated either in parallel in a single query, or via sequential interactions with LLMs throughout the reasoning process. Existing strategies for combination, such as self-consistency and progressive-hint-prompting, make inefficient usage of the LLM responses. We present Hint Marginalization, a novel and principled algorithmic framework to enhance the reasoning capabilities of LLMs. Our approach can be viewed as an iterative sampling strategy for forming a Monte Carlo approximation of an underlying distribution of answers, with the goal of identifying the mode the most likely answer. Empirical evaluation on several benchmark datasets for arithmetic reasoning demonstrates the superiority of the proposed approach.

2023-12-31

arXiv.org (prépublication)

doi.org

openreview.net

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Publications

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Publications