Publications

Dissecting Deep RL with High Update Ratios: Combatting Value Divergence.

Marcel Hussing

Claas Voelcker

Igor Gilitschenski

Amir-massoud Farahmand

Eric R. Eaton

2024-01-01

RLJ (published)

arxiv.org

Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection

Litao Li

Steven H. H. Ding

Andrew Walenstein

Philippe Charland

Benjamin Fung

2024-01-01

CIKM (published)

doi.org

arxiv.org

E(3)-Equivariant Mesh Neural Networks

Thuan Nguyen Anh Trang

Khang Nhat Ngo

Daniel Levy

Thieu Vo

Siamak Ravanbakhsh

Truong Son Hy

Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have addressed the need for geometr… (see more)ic deep learning on 3D meshes. However, we observe that the complexities in many of these architectures do not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimally extend the update equations of E(n)-Equivariant Graph Neural Networks (EGNNs) (Satorras et al., 2021) to incorporate mesh face information and further improve it to account for long-range interactions through a hierarchy. The resulting architecture, Equivariant Mesh Neural Network (EMNN), outperforms other, more complicated equivariant methods on mesh tasks, with a fast run-time and no expensive preprocessing. Our implementation is available at https://github.com/HySonLab/EquiMesh.

2024-01-01

AISTATS (published)

doi.org

arxiv.org

ECBD: Evidence-Centered Benchmark Design for NLP

Yu Lu Liu

Su Lin Blodgett

Jackie Chi

Jackie Cheung

Kit Cheung

Q. Vera Liao

Alexandra Olteanu

Ziang Xiao

Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which dat… (see more)asets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually measuring. There is currently no principled way of analyzing these decisions and how they impact the validity of the benchmark's measurements. To address this gap, we draw on evidence-centered design in educational assessments and propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules. ECBD specifies the role each module plays in helping practitioners collect evidence about capabilities of interest. Specifically, each module requires benchmark designers to describe, justify, and support benchmark design choices -- e.g., clearly specifying the capabilities the benchmark aims to measure or how evidence about those capabilities is collected from model responses. To demonstrate the use of ECBD, we conduct case studies with three benchmarks: BoolQ, SuperGLUE, and HELM. Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements.

2024-01-01

ACL (1) (published)

doi.org

arxiv.org

Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation

Divyat Mahajan

Ioannis Mitliagkas

Brady Neal

Vasilis Syrgkanis

We study the problem of model selection in causal inference, specifically for the case of conditional average treatment effect (CATE) estima… (see more)tion under binary treatments. Unlike model selection in machine learning, there is no perfect analogue of cross-validation as we do not observe the counterfactual potential outcome for any data point. Towards this, there have been a variety of proxy metrics proposed in the literature, that depend on auxiliary nuisance models estimated from the observed data (propensity score model, outcome regression model). However, the effectiveness of these metrics has only been studied on synthetic datasets as we can access the counterfactual data for them. We conduct an extensive empirical analysis to judge the performance of these metrics introduced in the literature, and novel ones introduced in this work, where we utilize the latest advances in generative modeling to incorporate multiple realistic datasets. Our analysis suggests novel model selection strategies based on careful hyperparameter tuning of CATE estimators and causal ensembling.

2024-01-01

ICLR (published)

doi.org

arxiv.org

Enhancing Click-through Rate Prediction in Recommendation Domain with Search Query Representation

Yuening Wang

Man Chen

Yaochen Hu

Wei Guo

Yingxue Zhang

Huifeng Guo

Yong Liu

Mark Coates

2024-01-01

CIKM (published)

doi.org

arxiv.org

Enhancing Security and Energy Efficiency of Cyber-Physical Systems using Deep Reinforcement Learning

Saeid Jamshidi

Ashkan Amirnia

Amin Nikanjam

Foutse Khomh

2024-01-01

Procedia Computer Science (published)

doi.org

Enhancing Supervised Visualization through Autoencoder and Random Forest Proximities for Out-of-Sample Extension

Shuang Ni

Adrien Aumon

Guy Wolf

Kevin R. Moon

Jake S. Rhodes

The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Com… (see more)mon dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction method, RF-PHATE, combining information learned from the random forest model with the function-learning capabilities of autoencoders. Through quantitative assessment of various autoencoder architectures, we identify that networks that reconstruct random forest proximities are more robust for the embedding extension problem. Furthermore, by leveraging proximity-based prototypes, we achieve a 40% reduction in training time without compromising extension quality. Our method does not require label information for out-of-sample points, thus serving as a semi-supervised method, and can achieve consistent quality using only 10% of the training data.

2024-01-01

MLSP (published)

doi.org

arxiv.org

Evaluating In-Context Learning of Libraries for Code Generation

Arkil Patel

Siva Reddy

Dzmitry Bahdanau

Pradeep Dasigi

Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising ar… (see more)ea is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work has shown that large proprietary LLMs can learn novel library usage in-context from demonstrations. These results raise several open questions: whether demonstrations of library usage is required, whether smaller (and more open) models also possess such capabilities, etc. In this work, we take a broader approach by systematically evaluating a diverse array of LLMs across three scenarios reflecting varying levels of domain specialization to understand their abilities and limitations in generating code based on libraries defined in-context. Our results show that even smaller open-source LLMs like Llama-2 and StarCoder demonstrate an adept understanding of novel code libraries based on specification presented in-context. Our findings further reveal that LLMs exhibit a surprisingly high proficiency in learning novel library modules even when provided with just natural language descriptions or raw code implementations of the functions, which are often cheaper to obtain than demonstrations. Overall, our results pave the way for harnessing LLMs in more adaptable and dynamic coding environments.

2024-01-01

North American Chapter of the Association for Computational Linguistics (published)

doi.org

Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting

David Latortue

Moetez Kdayem

Fidel A. Guerrero Peña

Eric Granger

Marco Pedersoli

Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly boun… (see more)ding box annotations for training. Given the importance of privacy in people counting, these models rely more and more on infrared images, making the task even harder. In this paper, we explore how weaker levels of supervision affect the performance of deep person counting architectures for image classification and point-level localization. Our experiments indicate that counting people using a convolutional neural network with image-level annotation achieves a level of accuracy that is competitive with YOLO detectors and point-level localization models yet provides a higher frame rate and a simi-lar amount of model parameters. Our code is available at: https://github.com/tortueTortue/IRPeopleCounting.

2024-01-01

2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) (published)

doi.org

arxiv.org

Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set)

Jiayi Wang

David Ifeoluwa Adelani

Pontus Stenetorp

2024-01-01

Conference on Machine Translation (published)

doi.org

Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set)

Jiayi Wang

David Ifeoluwa Adelani

Pontus Stenetorp

2024-01-01

Conference on Machine Translation (published)

doi.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications