Publications

In deep reinforcement learning, a pruned network is a good network
Johan Samir Obando Ceron
Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage pri… (voir plus)or insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks and exhibit a type of"scaling law", using only a small fraction of the full network parameters.
Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection
Litao Li
Steven H. H. Ding
Andrew Walenstein
Philippe Charland
E(3)-Equivariant Mesh Neural Networks
Thuan Nguyen Anh Trang
Khang Nhat Ngo
Daniel Levy
Thieu Vo
Truong Son Hy
Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have addressed the need for geometr… (voir plus)ic deep learning on 3D meshes. However, we observe that the complexities in many of these architectures do not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimally extend the update equations of E(n)-Equivariant Graph Neural Networks (EGNNs) (Satorras et al., 2021) to incorporate mesh face information and further improve it to account for long-range interactions through a hierarchy. The resulting architecture, Equivariant Mesh Neural Network (EMNN), outperforms other, more complicated equivariant methods on mesh tasks, with a fast run-time and no expensive preprocessing. Our implementation is available at https://github.com/HySonLab/EquiMesh.
ECBD: Evidence-Centered Benchmark Design for NLP
Yu Lu Liu
Su Lin Blodgett
Jackie Chi
Kit Cheung
Q. Vera Liao
Ziang Xiao
Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which dat… (voir plus)asets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually measuring. There is currently no principled way of analyzing these decisions and how they impact the validity of the benchmark's measurements. To address this gap, we draw on evidence-centered design in educational assessments and propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules. ECBD specifies the role each module plays in helping practitioners collect evidence about capabilities of interest. Specifically, each module requires benchmark designers to describe, justify, and support benchmark design choices -- e.g., clearly specifying the capabilities the benchmark aims to measure or how evidence about those capabilities is collected from model responses. To demonstrate the use of ECBD, we conduct case studies with three benchmarks: BoolQ, SuperGLUE, and HELM. Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements.
Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation
Divyat Mahajan
Brady Neal
Vasilis Syrgkanis
We study the problem of model selection in causal inference, specifically for the case of conditional average treatment effect (CATE) estima… (voir plus)tion under binary treatments. Unlike model selection in machine learning, there is no perfect analogue of cross-validation as we do not observe the counterfactual potential outcome for any data point. Towards this, there have been a variety of proxy metrics proposed in the literature, that depend on auxiliary nuisance models estimated from the observed data (propensity score model, outcome regression model). However, the effectiveness of these metrics has only been studied on synthetic datasets as we can access the counterfactual data for them. We conduct an extensive empirical analysis to judge the performance of these metrics introduced in the literature, and novel ones introduced in this work, where we utilize the latest advances in generative modeling to incorporate multiple realistic datasets. Our analysis suggests novel model selection strategies based on careful hyperparameter tuning of CATE estimators and causal ensembling.
Enhancing Click-through Rate Prediction in Recommendation Domain with Search Query Representation
Yuening Wang
Man Chen
Yaochen Hu
Wei Guo
Yingxue Zhang
Huifeng Guo
Yong Liu
Enhancing Security and Energy Efficiency of Cyber-Physical Systems using Deep Reinforcement Learning
Saeid Jamshidi
Ashkan Amirnia
Amin Nikanjam
Enhancing Supervised Visualization through Autoencoder and Random Forest Proximities for Out-of-Sample Extension
Shuang Ni
Adrien Aumon
Kevin R. Moon
Jake S. Rhodes
The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Com… (voir plus)mon dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction method, RF-PHATE, combining information learned from the random forest model with the function-learning capabilities of autoencoders. Through quantitative assessment of various autoencoder architectures, we identify that networks that reconstruct random forest proximities are more robust for the embedding extension problem. Furthermore, by leveraging proximity-based prototypes, we achieve a 40% reduction in training time without compromising extension quality. Our method does not require label information for out-of-sample points, thus serving as a semi-supervised method, and can achieve consistent quality using only 10% of the training data.
Evaluating In-Context Learning of Libraries for Code Generation
Arkil Patel
Pradeep Dasigi
Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising ar… (voir plus)ea is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work has shown that large proprietary LLMs can learn novel library usage in-context from demonstrations. These results raise several open questions: whether demonstrations of library usage is required, whether smaller (and more open) models also possess such capabilities, etc. In this work, we take a broader approach by systematically evaluating a diverse array of LLMs across three scenarios reflecting varying levels of domain specialization to understand their abilities and limitations in generating code based on libraries defined in-context. Our results show that even smaller open-source LLMs like Llama-2 and StarCoder demonstrate an adept understanding of novel code libraries based on specification presented in-context. Our findings further reveal that LLMs exhibit a surprisingly high proficiency in learning novel library modules even when provided with just natural language descriptions or raw code implementations of the functions, which are often cheaper to obtain than demonstrations. Overall, our results pave the way for harnessing LLMs in more adaptable and dynamic coding environments.
An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter
Sahar Omidi Shayegan
Isar Nejadgholi
Kellin Pelrine
Hao Yu
Sacha Lévy
Zachary Yang
Jean-François Godbout
Large Language Models (LLMs) have shown significant promise in various tasks, including identifying the political beliefs of English-speakin… (voir plus)g social media users from their posts. However, assessing LLMs for this task in non-English languages remains unexplored. In this work, we ask to what extent LLMs can predict the political ideologies of users in Persian social media. To answer this question, we first acknowledge that political parties are not well-defined among Persian users, and therefore, we simplify the task to a much simpler task of hyperpartisan ideology detection. We create a new benchmark and show the potential and limitations of both open-source and commercial LLMs in classifying the hyper-partisan ideologies of users. We compare these models with smaller fine-tuned models, both on the Persian language (ParsBERT) and translated data (RoBERTa), showing that they considerably outperform generative LLMs in this task. We further demonstrate that the performance of the generative LLMs degrades when classifying users based on their tweets instead of their bios and even when tweets are added as additional information, whereas the smaller fine-tuned models are robust and achieve similar performance for all classes. This study is a first step toward political ideology detection in Persian Twitter, with implications for future research to understand the dynamics of ideologies in Persian social media.
An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter
Sahar Omidi Shayegan
Isar Nejadgholi
Kellin Pelrine
Hao Yu
Sacha Lévy
Zachary Yang
Jean-François Godbout
Large Language Models (LLMs) have shown significant promise in various tasks, including identifying the political beliefs of English-speakin… (voir plus)g social media users from their posts. However, assessing LLMs for this task in non-English languages remains unexplored. In this work, we ask to what extent LLMs can predict the political ideologies of users in Persian social media. To answer this question, we first acknowledge that political parties are not well-defined among Persian users, and therefore, we simplify the task to a much simpler task of hyperpartisan ideology detection. We create a new benchmark and show the potential and limitations of both open-source and commercial LLMs in classifying the hyper-partisan ideologies of users. We compare these models with smaller fine-tuned models, both on the Persian language (ParsBERT) and translated data (RoBERTa), showing that they considerably outperform generative LLMs in this task. We further demonstrate that the performance of the generative LLMs degrades when classifying users based on their tweets instead of their bios and even when tweets are added as additional information, whereas the smaller fine-tuned models are robust and achieve similar performance for all classes. This study is a first step toward political ideology detection in Persian Twitter, with implications for future research to understand the dynamics of ideologies in Persian social media.
Evolution of High-Throughput Satellite Systems: A Vision of Programmable Regenerative Payload
Olfa Ben Yahia
Zineb Garroussi
Olivier Bélanger
Brunilde Sansò
Jean-François Frigon
Stéphane Martel
Gunes Karabulut Kurt
High-throughput satellite (HTS), with its digital payload technology, is expected to play a key role as an enabler of the upcoming sixth-gen… (voir plus)eration (6G) networks. HTS is mainly designed to provide higher data rates and capacities. Fueled by technological advancements, including beamforming, advanced modulation techniques, reconfigurable phased array technologies, and electronically steerable antennas, HTS has emerged as a fundamental component for future network generations. This paper offers a comprehensive state-of-the-art on HTS systems, focusing on standardization, patents, channel multiple access techniques, routing, load balancing, and the role of software-defined networking (SDN). In addition, we provide a vision for next-generation satellite systems that we have named Extremely-HTS (EHTS) toward autonomous satellites supported by the main requirements and key technologies expected for these systems. The EHTS system will be designed to maximize spectrum reuse and data rates and to flexibly steer the capacity to satisfy user demand. We introduce a novel architecture for future programmable regenerative payloads as well.