Publications

Enhancing Click-through Rate Prediction in Recommendation Domain with Search Query Representation
Yuening Wang
Man Chen
Yaochen Hu
Wei Guo
Yingxue Zhang
Huifeng Guo
Yong Liu
Enhancing Security and Energy Efficiency of Cyber-Physical Systems using Deep Reinforcement Learning
Saeid Jamshidi
Ashkan Amirnia
Amin Nikanjam
Enhancing Supervised Visualization through Autoencoder and Random Forest Proximities for Out-of-Sample Extension
Shuang Ni
Adrien Aumon
Kevin R. Moon
Jake S. Rhodes
The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Com… (see more)mon dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction method, RF-PHATE, combining information learned from the random forest model with the function-learning capabilities of autoencoders. Through quantitative assessment of various autoencoder architectures, we identify that networks that reconstruct random forest proximities are more robust for the embedding extension problem. Furthermore, by leveraging proximity-based prototypes, we achieve a 40% reduction in training time without compromising extension quality. Our method does not require label information for out-of-sample points, thus serving as a semi-supervised method, and can achieve consistent quality using only 10% of the training data.
An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter
Sahar Omidi Shayegan
Isar Nejadgholi
Kellin Pelrine
Hao Yu
Sacha Lévy
Zachary Yang
Jean-François Godbout
Large Language Models (LLMs) have shown significant promise in various tasks, including identifying the political beliefs of English-speakin… (see more)g social media users from their posts. However, assessing LLMs for this task in non-English languages remains unexplored. In this work, we ask to what extent LLMs can predict the political ideologies of users in Persian social media. To answer this question, we first acknowledge that political parties are not well-defined among Persian users, and therefore, we simplify the task to a much simpler task of hyperpartisan ideology detection. We create a new benchmark and show the potential and limitations of both open-source and commercial LLMs in classifying the hyper-partisan ideologies of users. We compare these models with smaller fine-tuned models, both on the Persian language (ParsBERT) and translated data (RoBERTa), showing that they considerably outperform generative LLMs in this task. We further demonstrate that the performance of the generative LLMs degrades when classifying users based on their tweets instead of their bios and even when tweets are added as additional information, whereas the smaller fine-tuned models are robust and achieve similar performance for all classes. This study is a first step toward political ideology detection in Persian Twitter, with implications for future research to understand the dynamics of ideologies in Persian social media.
An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter
Sahar Omidi Shayegan
Isar Nejadgholi
Kellin Pelrine
Hao Yu
Sacha Lévy
Zachary Yang
Jean-François Godbout
Large Language Models (LLMs) have shown significant promise in various tasks, including identifying the political beliefs of English-speakin… (see more)g social media users from their posts. However, assessing LLMs for this task in non-English languages remains unexplored. In this work, we ask to what extent LLMs can predict the political ideologies of users in Persian social media. To answer this question, we first acknowledge that political parties are not well-defined among Persian users, and therefore, we simplify the task to a much simpler task of hyperpartisan ideology detection. We create a new benchmark and show the potential and limitations of both open-source and commercial LLMs in classifying the hyper-partisan ideologies of users. We compare these models with smaller fine-tuned models, both on the Persian language (ParsBERT) and translated data (RoBERTa), showing that they considerably outperform generative LLMs in this task. We further demonstrate that the performance of the generative LLMs degrades when classifying users based on their tweets instead of their bios and even when tweets are added as additional information, whereas the smaller fine-tuned models are robust and achieve similar performance for all classes. This study is a first step toward political ideology detection in Persian Twitter, with implications for future research to understand the dynamics of ideologies in Persian social media.
Evolution of High-Throughput Satellite Systems: A Vision of Programmable Regenerative Payload
Olfa Ben Yahia
Zineb Garroussi
Olivier Bélanger
Brunilde Sansò
Jean-François Frigon
Stéphane Martel
Gunes Karabulut Kurt
High-throughput satellite (HTS), with its digital payload technology, is expected to play a key role as an enabler of the upcoming sixth-gen… (see more)eration (6G) networks. HTS is mainly designed to provide higher data rates and capacities. Fueled by technological advancements, including beamforming, advanced modulation techniques, reconfigurable phased array technologies, and electronically steerable antennas, HTS has emerged as a fundamental component for future network generations. This paper offers a comprehensive state-of-the-art on HTS systems, focusing on standardization, patents, channel multiple access techniques, routing, load balancing, and the role of software-defined networking (SDN). In addition, we provide a vision for next-generation satellite systems that we have named Extremely-HTS (EHTS) toward autonomous satellites supported by the main requirements and key technologies expected for these systems. The EHTS system will be designed to maximize spectrum reuse and data rates and to flexibly steer the capacity to satisfy user demand. We introduce a novel architecture for future programmable regenerative payloads as well.
An Exact Method for (Constrained) Assortment Optimization Problems with Product Costs
Markus Leitner
Andrea Lodi
Roberto Roberti
Claudio Sole
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
Kamran Chitsaz
Quentin Fournier
Goncalo Mordido
The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has p… (see more)roven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for language modeling. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining language modeling ability. Code is available at https://github.com/chandar-lab/EfficientLLMs.
Exploring the digital divide: results of a survey informing mobile application development
Maira Corinne Claudio
Zachary Rehany
Katerina Stachtari
Elena Guadagno
Esli Osmanlliu
Introduction Mobile health apps risk widening health disparities if they overlook digital inclusion. The digital divide, encompassing access… (see more), familiarity, and readiness, poses a significant barrier to medical interventions. Existing literature lacks exploration of the digital divide's contributing factors. Hence, data are needed to comprehend the challenges in developing inclusive health apps. Methods We created a survey to gauge internet and smartphone access, smartphone familiarity, and readiness for using mobile health apps among caregivers of pediatric patients in tertiary care. Open-ended questions solicited feedback and suggestions on mobile health applications. Responses were categorized by similarity and compared. Developed with patient partners, the survey underwent cognitive testing and piloting for accuracy. Results Data from 209 respondents showed that 23% were affected by the digital divide, mainly due to unfamiliarity with digital skills. Among 49 short text responses about health app concerns, 31 mentioned security and confidentiality, with 7 mentioning the impersonal nature of such apps. Desired features included messaging healthcare providers, scheduling, task reminders, and simplicity. Conclusions This study underscores a digital divide among caregivers of pediatric patients, with nearly a quarter affected primarily due to a lack of digital comfort. Respondents emphasized user-friendliness and online security for health apps. Future apps should prioritize digital inclusion by addressing the significant barriers and carefully considering patient and family concerns.
Exploring validation metrics for offline model-based optimisation
Christopher Beckham
Alexandre Piché
David Vazquez
In offline model-based optimisation (MBO) we are interested in using machine learning to de-sign candidates that maximise some measure of d… (see more)esirability through an expensive but real-world scoring process. Offline MBO tries to approximate this expensive scoring function and use that to evaluate generated designs, however evaluation is non-exact because one approximation is being evaluated with another. Instead, we ask ourselves: if we did have the real world scoring function at hand, what cheap-to-compute validation metrics would correlate best with this? Since the real-world scoring function is available for simulated MBO datasets, insights obtained from this can be transferred over to real-world offline MBO tasks where the real-world scoring function is expensive to compute. To address this, we propose a conceptual evaluation framework that is amenable to measuring extrapolation, and apply this to conditional denoising diffusion models. Empirically, we find that two validation metrics – agreement and Frechet distance – correlate quite well with the ground truth. When there is high variability in conditional generation, feedback is required in the form of an approximated version of the real-world scoring function. Furthermore, we find that generating high-scoring samples may require heavily weighting the generative model in favour of sample quality, potentially at the cost of sample diversity.
Fairness Through Domain Awareness: Mitigating Popularity Bias For Music Discovery
Rebecca Salganik
As online music platforms grow, music recommender systems play a vital role in helping users navigate and discover content within their vast… (see more) musical databases. At odds with this larger goal, is the presence of popularity bias, which causes algorithmic systems to favor mainstream content over, potentially more relevant, but niche items. In this work we explore the intrinsic relationship between music discovery and popularity bias. To mitigate this issue we propose a domain-aware, individual fairness-based approach which addresses popularity bias in graph neural network (GNNs) based recommender systems. Our approach uses individual fairness to reflect a ground truth listening experience, i.e., if two songs sound similar, this similarity should be reflected in their representations. In doing so, we facilitate meaningful music discovery that is robust to popularity bias and grounded in the music domain. We apply our BOOST methodology to two discovery based tasks, performing recommendations at both the playlist level and user level. Then, we ground our evaluation in the cold start setting, showing that our approach outperforms existing fairness benchmarks in both performance and recommendation of lesser-known content. Finally, our analysis explains why our proposed methodology is a novel and promising approach to mitigating popularity bias and improving the discovery of new and niche content in music recommender systems.
Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, June 16-21, 2024
Mohamed Abdalla
Gavin Abercrombie
Rodrigo Agerri
Zeljko Agic
Eneko Agirre
Monica Agrawal
Wasi Uddin Ahmad
James Allan
Aijun An
Antonios Anasta-sopoulos
Mark Anderson
Jacob Andreas
Marianna Apidianaki
Alessio Palmero
Yuki Aprosio
Ehsaneddin Arase
Giuseppe Asgari
Wilker Attardi
Aziz JinYeong … (see 480 more)
Timothy Bak
Mohamad Hardyman Baldwin
Pierpaolo Barawi
Ali Basile
Ja-smijn Basirat
Timo Bastings
Gábor Baumann
Eyal Bella
Farah Ben-David
Luciana Benamara
Benotti Yevgeni
Brijesh Berzak
Federico Bhatt
Chris Bianchi
Lidong Biemann
Alexandra Bing
Birch Eduardo
Gemma Blanco
Aurélien Boleda
Florian Bossard
Leonid Boudin
Ronan Boytsov
Pavel Le Bras
Chris Braslavski
Eleftheria Brew
Thomas Briakou
Emanuele Brochhagen
Wray Buglia-rello
Buntine Elena
Aoife Cabrio
Ruken Cahill
Jose Cakici
Marie Camacho-Collados
Pengfei Candito
Ziqiang Cao
Dallas Cao
Paula Card
Tommaso Carvalho
Andrew Caselli
Tanmoy Cattle
Ilias Chakrabor-ty
Angel X Chalkidis
Ching-Yun Chang
Snigdha Chang
Chen Chaturvedi
Kehai Chen
Long Chen
Lu Chen
Muhao Chen
Wei Chen
Wenhu Chen
Wenliang Chen
Xiang Chen
Yidong Chen
Yun-Nung Chen
Zhiyu Chen
Zhuang Chen
Hao Chen
Yu Cheng
Colin Cheng
Cherry Hai
Eunsol Leong Chieu
Leshem Choi
Monojit Choshen
Christos Choudhury
Yi-Ling Christodoulopou-los
Stephen Chung
Vincent Clark
Simone Claveau
John M Conia
Caio Filippo Conroy
Mathias Corro
Leyang Creutz
Aron Cui
Anna E Culotta
Amanda Cercas Currey
Curry Raj
Daniel Dabre
Cristian Dakota
Verna Danescu-Niculescu-Mizil
Budhaditya Dankers
Deb Vera
Zhenyun Demberg
Li Deng
Ruihai Dong
Antoine Dong
Eduard Doucet
Nan Dragut
Kevin Duan
Greg Duh
Ondrej Durrett
Tomasz Dusek
Dwojak Julian Martin
Asif Eisenschlos
Yanai Ekbal
Cristina Elazar
Luis España-Bonet
Espinosa-Anke Allyson
Kilian Ettinger
Evang Alexander
Agnieszka Fabbri
Meng Falenska
Marcello Fang
Hao Federico
Anna Fei
Feldman Naomi
Fuli Feldman
Xiaocheng Feng
Yansong Feng
Eric Feng
Francis Le Ferrand
Eli-sabetta Ferraro
Simone Fersini
Mark Filice
Mark Finlayson
Jennifer Fishel
Annemarie Foster
Friedrich Matthias
Zhe Gallé
Siddhant Gan
Judith Garg
Kallirroi Gaspers
Alborz Georgila
Geramifard Luke
Mor Gessler
Abbas Geva
Sahar Ghaddar
Filip Ghannay
Mario Ginter
Tejas Giulianelli
Sharon Gokhale
Rob Goldwater
Kyle van der Goot
Tanya Gorman
Jia-Chen Goyal
Qing Gu
Frank Gu
Lin Guerin
Honglei Gui
Qipeng Guo
Vivek Guo
Gupta Thanh-Le
Nizar Ha
Ivan Habash
Barry Habernal
Xianpei Haddow
Daniel Han
Peter Hardt
Di Hase
Michael He
Behnam Heck
Peter Hedayatnia
Daniel Heeman
Jack Hershcovich
Ryuichiro Hes-sel
Julia Higashinaka
Enamul Hockenmaier
Andreas Hoque
Yufang Hotho
Hou Dirk
Kristen Hovy
Di Howell
Xuming Hu
Fei Hu
Jie Huang
Lifu Huang
Peijie Huang
Shaohan Huang
Shujian Huang
Xuanjing Huang
Zhen Huang
Mika Huang
Hämäläinen Kentaro
Inui Kokil
Hyeju Jaidka
Mustafa Jang
Yangfeng Jarrar
Lifeng Ji
Mali Jin
Qin Jin
Richard Jin
David Johansson
Preethi Jurgens
Jyothi Ehsan
Diptesh Kamalloo
S. Kanojia
Sarvnaz Kar
Pei Karimi
Daniel Ke
So-pan Khashabi
Tushar Khosla
Hyounghun Khot
Jin-Dong Kim
Joo-Kyung Kim
Taeuk Kim
Kim Roman
Rebecca Klinger
Ivan Knowles
Ekaterina Kobyzev
Philipp Kochmar
Koehn Mamoru
Rik Komachi
Lingpeng Koncel-Kedziorski
Julia Kong
Amrith Kreutzer
Kal-pesh Krishna
Udo Krishna
Artur Kruschwitz
Adhiguna Kulmizev
Kuncoro Wai
Gerasimos Lam
Mirella Lampouras
Staffan Lapata
Mark Larsson
Ivano Last
Lauriola Thu
Dong-Ho Le
Hwanhee Lee
Jinhyuk Lee
Mark G Lee
SangKeun Lee
Oliver Lee
Heather Le-mon
Piyawat Lent
Gina-Anne Lertvittayakumjorn
Miryam Levow
Bing de Lhoneux
Chuyuan Li
Dongxu Li
Jing Li
Junhui Li
Juntao Li
Liang Li
Peng Li
Piji Li
Sujian Li
Li Tao
Wenjie Li
Xin Li
Yongbin Li
Yu Li
Yufei Li
Zhifei Li
Constantine Li
Chenghua Lignos
Hongyu Lin
Robert Lin
Bing Litschko
Hao Liu
Kang Liu
Ming Liu
Qianying Liu
Tin-gwen Liu
Xuebo Liu
Yang Liu
Zhiyuan Liu
Zoey Liu
Ximing Liu
Anh Tuan Lu
Luu Chenyang
Lyu Ji
Jing Ma
Ruotian Ma
Xiaojuan Ma
Aman Ma
Harish Tayyar Madaan
Andrea Madabushi
Navonil Ma-dotto
Prodromos Majumder
Shervin Malakasiotis
Yuning Malmasi
Kelly Mao
Vukosi Marchi-sio
Stella Marivate
Lara J Markantonatou
Bruno Martin
Yuval Martins
Sérgio Marton
Yuji Matos
Julian Matsumoto
Bryan McAuley
Ryan McCann
Kathleen McDonald
McKeown Mahnoosh
Yuxian Mehrabani
Samuel Meng
Timothee Mensah
Margot Mickus
Simon Mieskes
Yasuhide Mille
Makoto Miura
Daichi Miwa
David R Mochihashi
Lili Mortensen
Kha-lil Mou
Benjamin Mrini
Philippe Muller
Smaranda Muller
Rudra Muresan
Thomas Murthy
Müller Max
Müller-Eberstein Maria
Nona Nadejde
Mikio Naderi
Hideki Nakano
Linyong Nakayama
Nan
Franco Maria
Tapas Nardini
Mark-Jan Nayak
Isar Nederhof
Mariana Nejadgholi
Dat Quoc Neves
Nguyen Le-Minh
Thien Huu Nguyen
Vahid Nguyen
Partovi Nia
Jan Niehues
Qiang Ning
Maciej Ogrodniczuk
Alice Oh
Naoaki Okazaki
Manabu Okumura
Matan Orbach
Nedjma Ou-sidhoum
Vasile Pais
Nikolaos Pappas
Joonsuk Park
Yannick Parmentier
Prasannan Parthasarathi
Lucia Passaro
Ramakanth Pasunuru
Siddharth Patwardhan
Hao Peng
Lis Pereira
Laura Perez-Beltrachini
Maxime Peyrard
Jonas Pfeiffer
Bryan A. Plummer
Maja Popovic
Soujanya Poria
Daniel Preotiuc-Pietro
Emily Prud'hommeaux
Vikram Pudi
Peng Qian
Tieyun Qian
Deepak Ramachandran
Carlos Ramisch
Leonardo Ranaldi
Sudha Rao
Shauli Ravfogel
Marek Rei
Leonardo F. R. Ribeiro
Oleg Rokhlenko
Salvatore Romeo
Joseph Le Roux
Alla Rozov-skaya
Terry Ruas
Raphael Rubino
Ivan Vladimir Meza Ruiz
Maria Ryskina
Hassan Sajjad
Shubhra Kanti
Karmaker Santu
Maarten Sap
Naomi Saphra
Asad B. Sayeed
Dominik Schlechtweg
Viktor Schlegel
Natalie Schluter
Nathan Schneider
Hinrich Schuetze
H. Schwartz
Jingbo Shang
Vasu Sharma
Tianze Shi
Mohammad Shoeybi
Lei Shu
Melanie Siegel Maneesh
Kumar Singh
Pranaydeep Singh
Sunayana Sitaram
Kevin Small
Luca Soldaini
Aina Garí Soler
Wei Song
Xingyi Song
Yan Song
Jeffrey S. Sorensen
Aitor Soroa
Jacopo Staiano
Efstathios Stamatatos
Gabriel Stanovsky
Shane Steinert-Threlkeld
Jannik Strötgen
Sara Stymne
Jinsong Su
Saku Sugawara
Alessandro Suglia
Aixin Sun
Cheng-jie Sun
Kai Sun
György Szarvas
Víctor M. Sánchez-Cartagena
Gözde Gül ¸Sahin
Zeerak Talat
Chenhao Tan
Hao Tan
Tianyi Tang
Jesse Thomason
Brian Thompson
Yuanhe Tian
Zhiliang Tian
Amalia Todirascu
Sara Tonelli
Paolo Torroni
Kristina Toutanova
Amine xv Trabelsi
Trang Tran
David R. Traum
Kewei Tu
Martin Tutek
Ana Sabina Uban
Takehito Utsuro
Olga Vechtomova
Yannick Versley
Karin M. Verspoor
David Vilar
David Vilares 0001
Serena Villa-ta
Esaú Villatoro-Tello
Thuy Vu
Ivan Vuli´c
Fei Xia
Tong Xiao
Bo Xu
Huijuan Xu
Nianwen Xue
S. Yadav
Hang Yan
Rui Yan
Min Yang
Wei Yang
Yezhou Yang
Yi Yang
Zhenglu Yang
Jin-Ge Yao
Wei Ye
Yongjing Yin
Naoki Yoshinaga
Koichiro Yoshino
Jianfei Yu
Juntao Yu Mo
Yu Manzil Zaheer
Fabio Massimo Zanzotto
Weixin Zeng
Luke Zettlemoyer
Biao Zhang
Chen Zhang
Crystina Zhang
Jiajun Zhang
Jingyi Zhang
Justine Zhang
Meishan Zhang
Ningyu Zhang
Shaolei Zhang
Sheng Zhang
Shiyue Zhang
Shuai Zhang
Shuo Zhang
Wei Zhang
Yang Zhang
Zhe Zhang
Jieyu Zhao
Shiwan Zhao
Hai-Tao Zheng
Zaixiang Zheng
Jie Zhou
Yi Zhou
Xiaodan Zhu