A Distributed ADMM-based Deep Learning Approach for Thermal Control in Multi-Zone Buildings
The surge in electricity use, coupled with the dependency on intermittent renewable energy sources, poses significant hurdles to effectively… (voir plus) managing power grids, particularly during times of peak demand. Demand Response programs and energy conservation measures are essential to operate energy grids while ensuring a responsible use of our resources This research combines distributed optimization using ADMM with Deep Learning models to plan indoor temperature setpoints effectively. A two-layer hierarchical structure is used, with a central building coordinator at the upper layer and local controllers at the thermal zone layer. The coordinator must limit the building's maximum power by translating the building's total power to local power targets for each zone. Local controllers can modify the temperature setpoints to meet the local power targets. The resulting control algorithm, called Distributed Planning Networks, is designed to be both adaptable and scalable to many types of buildings, tackling two of the main challenges in the development of such systems. The proposed approach is tested on an 18-zone building modeled in EnergyPlus. The algorithm successfully manages Demand Response peak events.
An Effective Theory of Bias Amplification
Arjun Subramonian
Samuel J. Bell
Levent Sagun
Machine learning models may capture and amplify biases present in data, leading to disparate test performance across social groups. To bette… (voir plus)r understand, evaluate, and mitigate these possible biases, a deeper theoretical understanding of how model design choices and data distribution properties could contribute to bias is needed. In this work, we contribute a precise analytical theory in the context of ridge regression, both with and without random projections, where the former models neural networks in a simplified regime. Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias in various feature and parameter regimes. For example, we demonstrate that there may be an optimal regularization penalty or training time to avoid bias amplification, and there can be fundamental differences in test error between groups that do not vanish with increased parameterization. Importantly, our theoretical predictions align with several empirical observations reported in the literature. We extensively empirically validate our theory on diverse synthetic and semi-synthetic datasets.
Embedding Cultural Diversity in Prototype-based Recommender Systems
Armin Moradi
Nicola Neophytou
Popularity bias in recommender systems can increase cultural overrepresentation by favoring norms from dominant cultures and marginalizing u… (voir plus)nderrepresented groups. This issue is critical for platforms offering cultural products, as they influence consumption patterns and human perceptions. In this work, we address popularity bias by identifying demographic biases within prototype-based matrix factorization methods. Using the country of origin as a proxy for cultural identity, we link this demographic attribute to popularity bias by refining the embedding space learning process. First, we propose filtering out irrelevant prototypes to improve representativity. Second, we introduce a regularization technique to enforce a uniform distribution of prototypes within the embedding space. Across four datasets, our results demonstrate a 27\% reduction in the average rank of long-tail items and a 2\% reduction in the average rank of items from underrepresented countries. Additionally, our model achieves a 2\% improvement in HitRatio@10 compared to the state-of-the-art, highlighting that fairness is enhanced without compromising recommendation quality. Moreover, the distribution of prototypes leads to more inclusive explanations by better aligning items with diverse prototypes.
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Ryuichiro Hataya
Kotaro Yoshida
An empirical study of testing machine learning in the wild
Moses Openja
Armstrong Foundjem
Zhen Ming (Jack) Jiang
Mouna Abidi
Ahmed E. Hassan
Background: Recently, machine and deep learning (ML/DL) algorithms have been increasingly adopted in many software systems. Due to their in… (voir plus)ductive nature, ensuring the quality of these systems remains a significant challenge for the research community. Traditionally, software systems were constructed deductively, by writing explicit rules that govern the behavior of the system as program code. However, ML/DL systems infer rules from training data i.e., they are generated inductively). Recent research in ML/DL quality assurance has adapted concepts from traditional software testing, such as mutation testing, to improve reliability. However, it is unclear if these proposed testing techniques are adopted in practice, or if new testing strategies have emerged from real-world ML deployments. There is little empirical evidence about the testing strategies. Aims: To fill this gap, we perform the first fine-grained empirical study on ML testing in the wild to identify the ML properties being tested, the testing strategies, and their implementation throughout the ML workflow. Method: We conducted a mixed-methods study to understand ML software testing practices. We analyzed test files and cases from 11 open-source ML/DL projects on GitHub. Using open coding, we manually examined the testing strategies, tested ML properties, and implemented testing methods to understand their practical application in building and releasing ML/DL software systems. Results: Our findings reveal several key insights: 1.) The most common testing strategies, accounting for less than 40%, are Grey-box and White-box methods, such as Negative Testing , Oracle Approximation , and Statistical Testing . 2.) A wide range of 17 ML properties are tested, out of which only 20% to 30% are frequently tested, including Consistency , Correctness , and Efficiency . 3.) Bias and Fairness is more tested in Recommendation (6%) and CV (3.9%) systems, while Security & Privacy is tested in CV (2%), Application Platforms (0.9%), and NLP (0.5%). 4.) We identified 13 types of testing methods, such as Unit Testing , Input Testing , and Model Testing . Conclusions: This study sheds light on the current adoption of software testing techniques and highlights gaps and limitations in existing ML testing practices.
Evaluating machine learning-driven intrusion detection systems in IoT: Performance and energy consumption
Saeid Jamshidi
Kawser Wazed Nafi
Amin Nikanjam
Evaluating Numeracy of Language Models as a Natural Language Inference Task
Rahmad Mahendra
Damiano Spina
Lawrence Cavedon
Karin Verspoor
Zhangir Azerbayev
Hailey Schoelkopf
Keiran Paster
Marco Dos Santos
Stephen Marcus McAleer
Al-bert Q. Jiang
Jia Deng
Stella Biderman
Sean Welleck. 2024
Llemma
Taylor Berg-Kirkpatrick
Daniel Spokoyny. 2020
Samuel R. Bowman
Gabor Angeli
Christopher Potts
Christopher D. Manning. 2015 … (voir 480 de plus)
Tom Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
Rewon Child
Aditya Ramesh
Daniel M. Ziegler
Jeffrey Wu
Clemens Winter
Chris Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
Benjamin Chess
J. Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei. 2020
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
Fajri Koto
Christian Wibisono
Ade Romadhony
Karissa Vincentio
Jennifer Santoso
David Moel-jadi
Cahya Wirawan
Frederikus Hudi
Muham-mad Satrio Wicaksono
Ivan Halim Parmonangan
Ika Al-fina
Ilham Firdausi Putra
Samsul Rahmadani
Yulianti Oenang
Ali Akbar Septiandri
James Jaya
Kaustubh Dhole
Arie Suryani
Rifki Afina
Dan Putri
Keith Su
Made Nindyatama Stevens
Muhammad Nityasya
Ryan Adilazuarda
R. Hadiwijaya
Diandaru Tiezheng
Vito Yu
Wenliang Ghifari
Yan Dai
Xu Dyah
Haryo Damapuspita
Cuk Wibowo
Ich-wanul Tho
Karo Karo
T. Fatyanosa
Ziwei Ji
Graham Neubig
Timothy Baldwin
Zheng Cai
Maosong Cao
Haojiong Chen
Kai Chen
Keyu Chen
Xin Chen
Xun Chen
Ze-yu Chen
Zhi Chen
Pei Chu
Xiaoyi Dong
Haodong Duan
Qi Fan
Zhaoye Fei
Yan Gao
Jiaye Ge
Chenya Gu
Yuzhe Gu
Tao Gui
Aijia Guo
Qipeng Guo
Conghui He
Yingfan Hu
Ting Huang
T. Jiang
Penglong Jiao
Hongwei Liu
Jiangning Liu
Jiawei Hong
Kaiwen Liu
Kuikun Liu
Xiaoran Liu
Chen Lv
Haijun Lv
Kai Lv 0001
Li Ma
Runyuan Ma
Zerun Ma
Wenchang Ning
Linke Ouyang
Jiantao Qiu
Yuan Qu
Fukai Shang
Yunfan Shao
Hyung Won
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
William Fedus
Yunxuan Li
Xuezhi Wang
Mostafa Dehghani
Siddhartha Brahma
Alex Webson
Shixiang Shane
Zhuyun Gu
Menghua Dai
Xinyun Suzgun
Aakanksha Chen
Alex Chowdhery
Marie Castro-Ros
Kevin Pellat
Dasha Robinson
Sharan Valter
Gaurav Narang
Adams Mishra
Y. YuVincent
Yanping Zhao
Andrew Huang
Dai
Kevin Clark
Minh-Thang Luong
Quoc V. Le
Christopher D. Manning. 2020
Electra
Karl Cobbe
Vineet Kosaraju
Mo Bavarian
Heewoo Jun
Lukasz Kaiser
Matthias Plappert
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Xiao Bi
Deli Chen
Guanting Chen
Shanhuang Chen
Damai Dai
Cheng Deng
Honghui Ding
Kai Dong
Qiushi Du
Zhe Fu
Huazuo Gao
Kaige Gao
Wenjun Gao
Ruiqi Ge
Kang Guan
Daya Guo
Jianzhong Guo
Guangbo Hao
Zhewen Hao
Ying He
Panpan Wenjie Hu
Didem Foss
Dingkang Wang
Duc Le
Dustin Hol-land
Edward Dowling
Eissa Jamil
Elaine Mont-gomery
Eleonora Presani
Emily Hahn
Emily Wood
Erik Brinkman
Esteban Arcaute
Evan Dunbar
Evan Smothers
Fei Sun
Felix Kreuk
Feng Tian
Firat Ozgenel
Francesco Caggioni
F. Guzm’an
Frank J. Kanayet
Frank Seide
Gabriela Medina Florez
Gabriella Schwarz
Gada Badeer
Georgia Swee
Gil Halpern
G. Thattai
Grant Herman
G. Sizov
Guangyi Zhang
Guna Lakshmi-narayanan
Hamid Shojanazeri
Han Zou
Hannah Wang
Han Zha
Haroun Habeeb
Harrison Rudolph
Helen Suk
Henry Aspegren
Hunter Goldman
Igor Molybog
Igor Tufanov
Irina-Elena Veliche
Itai Gat
Jake Weissman
James Geboski
James Kohli
Japhet Asher
Jean-Baptiste Gaya
Jeff Marcus
Jeff Tang
Jennifer Chan
Jenny Zhen
Jeremy Reizen-stein
J. Teboul
Jessica Zhong
Jian Jin
Jingyi Yang
Joe Cummings
Jon Carvill
Jon Shepard
J. McPhie
Jonathan Torres
Josh Ginsburg
Junjie Wang
Kai Wu
U. KamHou
Karan Saxena
Karthik Prasad
Kartikay Khandelwal
Katayoun Zand
Kathy Matosich
Kaushik Veeraragha-van
Kelly Michelena
Keqian Li
Kun Huang
Kushal Chawla
Kushal Lakhotia
Kyle Huang
Lailin Chen
Lakshya Garg
A. Lavender
Leandro Silva
Lee Bell
Lei Zhang
Liangpeng Guo
Licheng Yu
Liron Moshkovich
Luca Wehrstedt
Madian Khabsa
Manav Avalani
Manish Bhatt
Maria Tsim-poukelli
Martynas Mankus
Matan Hasson
Matthias Lennie
Matthias Reso
Maxim Groshev
Maxim Naumov
Maya Lathi
Meghan Keneally
Michal Seltzer
Michal Valko
Michelle Restrepo
Mihir Patel
Mik Vyatskov
Mikayel Samvelyan
Mike Clark
Mike Macey
Mike Wang
Miquel Jubert
Mo Metanat
Mohammad Rastegari
Munish Bansal
Nandhini Santhanam
Natascha Parks
Natasha White
Navyata Bawa
Nayan Singhal
Nick Egebo
Nicolas Usunier
Nikolay Pavlovich
Laptev Ning
Ning Dong
Norman Zhang
Oleg Cheng
Olivia Chernoguz
Omkar Hart
Ozlem Salpekar
Parkin Kalinli
Parth Kent
Paul Parekh
Pa-van Saab
Pedro Balaji
Philip Rittner
Pierre Bontrager
Piotr Roux
Polina Dollár
P. Zvyagina
Pritish Yuvraj
Qian Liang
Rachad Alao
Rachel Rodriguez
Rafi Ayub
Raghotham Murthy
Raghu Nayani
Rahul Mitra
Raymond Li
Rebekkah Hogan
Robin Battey
Rocky Wang
Rohan Mah-eswari
Russell Howes
Ruty Rinott
Sai Jayesh
Bondu Samyak
Sara Datta
Sara Chugh
Sargun Hunt
Sasha Dhillon
Satadru Sidorov
Saurabh Pan
Verma Seiji
Sharadh Yamamoto
Shaun Ramaswamy
Sheng Lind-say
Sheng Feng
Shengxin Cindy Lin
Shiva Zha
Shuqiang Shankar
Sinong Zhang
Wang Sneha
Soji Agarwal
Soumith Sajuyigbe
Chintala Stephanie
Stephen Max
Steve Chen
Steve Kehoe
Sudarshan Satterfield
S. Govindaprasad
Gupta Sung-Bae
Sunny Cho
Suraj Virk
Subramanian Sy
Sy Choudhury
Tal Goldman
T. Remez
Tamara Glaser
Thilo Best
Thomas Kohler
Tianhe Robinson
Tianjun Li
Tim Zhang
Tim Matthews
Tzook Chou
Varun Shaked
Victoria Vontimitta
Victoria Ajayi
Vijai Montanez
Vinay Satish Mohan
Vishal Kumar
Vlad Mangla
Ionescu
Vlad Andrei
V. Poenaru
Vlad T. Mihailescu
Wei Ivanov
Wenchen Li
Wen-wen Wang
Wes Jiang
Bouaziz
Yilin Zhang
Ying Zhang
Yossi Adi
Youngjin Nam
Yu Wang
Yuchen Hao
Yundi Qian
Yuzi He
Zach Rait
Zachary DeVito
Zef Rosnbrick
Zhaoduo Wen
Zhenyu Yang
Zhiwei Zhao. 2024
The Llama
Gemma Team
Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
Shreya Pathak
L. Sifre
Morgane Rivière
Mihir Kale
Pouya Christo-pher Love
Dehghani Tafti
L'eonard Hussenot
Aakanksha Chowdhery
Adam Roberts
Aditya Barua
Alex Botev
Alex Castro-Ros
Ambrose Slone
Amélie Héliou
A. Tacchetti
Anna Bulanova
Antonia Paterson
Beth Tsai
Bobak Shahriari
Le Lan
Christopher A. Choquette-Choo
Clé-ment Crepy
Daniel Matthew Cer
Daphne Ippolito
David Reid
Elena Buchatskaya
Eric Ni
Eric Noland
Geng Yan
George Tucker
George-Christian Muraru
Grigory Rozhdestvenskiy
Henryk Michalewski
Ian Ten-ney
Ivan Grishchenko
Jacob Austin
James Keel-ing
Jane Labanowski
Jean-Baptiste Lespiau
Jeff Stanway
Jenny Brennan
Jeremy Chen
Johan Fer-ret
Justin Chiu
Justin Mao-jones
Kather-ine Lee
Kathy Yu
Katie Millican
Lars Lowe Sjoesund
Lisa Lee
Lucas Dixon
Machel Reid
Maciej Mikuła
Mateo Wirth
Michael Sharman
Nikolai Chinaev
Nithum Thain
Olivier Bachem
Oscar Chang
O. Wahltinez
Paige Bailey
Paul Michel
Petko Yotov Pier
Giuseppe Sessa
Rahma Chaabouni
Ramona Comanescu
Reena Jana
Rohan Anil
While recent advancements in large language models (LLMs) have enhanced their capabilities to solve mathematical problems, other aspects of … (voir plus)numeracy remain underexplored. In this paper, we propose a benchmark to evaluate the ability of language models to perform basic numeracy tasks. We frame numeracy as a Natural Language Inference (NLI) task to assess the models’ ability to understand both numbers and language contexts. We evaluate 49 language models (LMs), including fine-tuned LMs on NLI datasets, instruction-tuned LLMs, and specialized math-LLMs. Our findings reveal three main insights: (1) LLMs only clearly outperform smaller LMs in arithmetic tasks, indicating that mathematical reasoning cannot be generalized to other numeracy skills such as number comparison and normalization; (2) while most language models achieve fair to good accuracy for NLI entailment cases, they still struggle to predict contradiction and neutral cases; and (3) the robustness of language models’ numeracy capabilities needs improvement, particularly in understanding the semantics and pragmatics of numbers in linguistic contexts.
Evolution of High-Throughput Satellite Systems: A Vision of Programmable Regenerative Payload
Olfa Ben Yahia
Zineb Garroussi
Olivier Bélanger
Brunilde Sansò
Jean-François Frigon
Stéphane Martel
Gunes Karabulut Kurt
High-throughput satellite (HTS), with its digital payload technology, is expected to play a key role as an enabler of the upcoming sixth-gen… (voir plus)eration (6G) networks. HTS is mainly designed to provide higher data rates and capacities. Fueled by technological advancements, including beamforming, advanced modulation techniques, reconfigurable phased array technologies, and electronically steerable antennas, HTS has emerged as a fundamental component for future network generations. This paper offers a comprehensive state-of-the-art on HTS systems, focusing on standardization, patents, channel multiple access techniques, routing, load balancing, and the role of software-defined networking (SDN). In addition, we provide a vision for next-generation satellite systems that we have named Extremely-HTS (EHTS) toward autonomous satellites supported by the main requirements and key technologies expected for these systems. The EHTS system will be designed to maximize spectrum reuse and data rates and to flexibly steer the capacity to satisfy user demand. We introduce a novel architecture for future programmable regenerative payloads as well.
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
Instruction-following retrievers have been widely adopted alongside LLMs in real-world applications, but little work has investigated the sa… (voir plus)fety risks surrounding their increasing search capabilities. We empirically study the ability of retrievers to satisfy malicious queries, both when used directly and when used in a retrieval augmented generation-based setup. Concretely, we investigate six leading retrievers, including NV-Embed and LLM2Vec, and find that given malicious requests, most retrievers can (for >50% of queries) select relevant harmful passages. For example, LLM2Vec correctly selects passages for 61.35% of our malicious queries. We further uncover an emerging risk with instruction-following retrievers, where highly relevant harmful information can be surfaced by exploiting their instruction-following capabilities. Finally, we show that even safety-aligned LLMs, such as Llama3, can satisfy malicious requests when provided with harmful retrieved passages in-context. In summary, our findings underscore the malicious misuse risks associated with increasing retriever capability.
A “fine-cuts” approach disentangling psychopathic, autistic and alexithymic traits in their associations with affective, cognitive and motor empathy
Julia Ayache
Nikki Stevenson
Elisha Patel
Alexander Sumich
Nadja Heym
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
Andrew Szot
Omar Attia
Aleksei Timofeev
Harsh Agrawal
Zhe Gan
Zsolt Kira
Alexander T Toshev
We examine the capability of Multimodal Large Language Models (MLLMs) to tackle diverse domains that extend beyond the traditional language … (voir plus)and vision tasks these models are typically trained on. Specifically, our focus lies in areas such as Embodied AI, Games, UI Control, and Planning. To this end, we introduce a process of adapting an MLLM to a Generalist Embodied Agent (GEA). GEA is a single unified model capable of grounding itself across these varied domains through a multi-embodiment action tokenizer. GEA is trained with supervised learning on a large dataset of embodied experiences and with online RL in interactive simulators. We explore the data and algorithmic choices necessary to develop such a model. Our findings reveal the importance of training with cross-domain data and online RL for building generalist agents. The final GEA model achieves strong generalization performance to unseen tasks across diverse benchmarks compared to other generalist models and benchmark-specific approaches.
Generalization Limits of Graph Neural Networks in Identity Effects Learning
Giuseppe Alessio D’Inverno
Simone Brugiapaglia
Graph Neural Networks (GNNs) have emerged as a powerful tool for data-driven learning on various graph domains. They are usually based on a … (voir plus)message-passing mechanism and have gained increasing popularity for their intuitive formulation, which is closely linked to the Weisfeiler-Lehman (WL) test for graph isomorphism to which they have been proven equivalent in terms of expressive power. In this work, we establish new generalization properties and fundamental limits of GNNs in the context of learning so-called identity effects, i.e., the task of determining whether an object is composed of two identical components or not. Our study is motivated by the need to understand the capabilities of GNNs when performing simple cognitive tasks, with potential applications in computational linguistics and chemistry. We analyze two case studies: (i) two-letters words, for which we show that GNNs trained via stochastic gradient descent are unable to generalize to unseen letters when utilizing orthogonal encodings like one-hot representations; (ii) dicyclic graphs, i.e., graphs composed of two cycles, for which we present positive existence results leveraging the connection between GNNs and the WL test. Our theoretical analysis is supported by an extensive numerical study.