Tight Lower Bounds and Improved Convergence in Performative Prediction
Pedram J. Khorsandi
Rushil Gupta
Mehrnaz Mofakhami
Performative prediction is a framework accounting for the shift in the data distribution induced by the prediction of a model deployed in th… (voir plus)e real world. Ensuring rapid convergence to a stable solution where the data distribution remains the same after the model deployment is crucial, especially in evolving environments. This paper extends the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers and enabling convergence to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that utilizing historical datasets can surpass the lower bound for last iterate RRM, and empirically observe faster convergence to the stable point on various performative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Minimizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our framework.
Tight Lower Bounds and Improved Convergence in Performative Prediction
Pedram J. Khorsandi
Rushil Gupta
Mehrnaz Mofakhami
Performative prediction is a framework accounting for the shift in the data distribution induced by the prediction of a model deployed in th… (voir plus)e real world. Ensuring rapid convergence to a stable solution where the data distribution remains the same after the model deployment is crucial, especially in evolving environments. This paper extends the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers and enabling convergence to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that utilizing historical datasets can surpass the lower bound for last iterate RRM, and empirically observe faster convergence to the stable point on various performative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Minimizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our framework.
Beta cells are essential drivers of pancreatic ductal adenocarcinoma development
Cathy C. Garcia
Aarthi Venkat
Daniel C. McQuaid
Sherry Agabiti
Alex Tong
Rebecca L. Cardone
Rebecca Starble
Akin Sogunro
Jeremy B. Jacox
Christian F. Ruiz
Richard G. Kibbey
Mandar Deepak Muzumdar
Pancreatic endocrine-exocrine crosstalk plays a key role in normal physiology and disease. For instance, endocrine islet beta (β) cell secr… (voir plus)etion of insulin or cholecystokinin (CCK) promotes progression of pancreatic adenocarcinoma (PDAC), an exocrine cell-derived tumor. However, the cellular and molecular mechanisms that govern endocrine-exocrine signaling in tumorigenesis remain incompletely understood. We find that β cell ablation impedes PDAC development in mice, arguing that the endocrine pancreas is critical for exocrine tumorigenesis. Conversely, obesity induces β cell hormone dysregulation, alters CCK-dependent peri-islet exocrine cell transcriptional states, and enhances islet proximal tumor formation. Single-cell RNA-sequencing, in silico latent-space archetypal and trajectory analysis, and genetic lineage tracing in vivo reveal that obesity stimulates postnatal immature β cell expansion and adaptation towards a pro-tumorigenic CCK+ state via JNK/cJun stress-responsive signaling. These results define endocrine-exocrine signaling as a driver of PDAC development and uncover new avenues to target the endocrine pancreas to subvert exocrine tumorigenesis.
Improved Localized Machine Unlearning Through the Lens of Memorization
Reihaneh Torkzadehmahani
Reza Nasirigerdeh
Georgios Kaissis
Daniel Rueckert
Eleni Triantafillou
Machine unlearning refers to removing the influence of a specified subset of training data from a machine learning model, efficiently, after… (voir plus) it has already been trained. This is important for key applications, including making the model more accurate by removing outdated, mislabeled, or poisoned data. In this work, we study localized unlearning, where the unlearning algorithm operates on a (small) identified subset of parameters. Drawing inspiration from the memorization literature, we propose an improved localization strategy that yields strong results when paired with existing unlearning algorithms. We also propose a new unlearning algorithm, Deletion by Example Localization (DEL), that resets the parameters deemed-to-be most critical according to our localization strategy, and then finetunes them. Our extensive experiments on different datasets, forget sets and metrics reveal that DEL sets a new state-of-the-art for unlearning metrics, against both localized and full-parameter methods, while modifying a small subset of parameters, and outperforms the state-of-the-art localized unlearning in terms of test accuracy too.
Improved Localized Machine Unlearning Through the Lens of Memorization
Reihaneh Torkzadehmahani
Reza Nasirigerdeh
Georgios Kaissis
Daniel Rueckert
Eleni Triantafillou
Machine unlearning refers to removing the influence of a specified subset of training data from a machine learning model, efficiently, after… (voir plus) it has already been trained. This is important for key applications, including making the model more accurate by removing outdated, mislabeled, or poisoned data. In this work, we study localized unlearning, where the unlearning algorithm operates on a (small) identified subset of parameters. Drawing inspiration from the memorization literature, we propose an improved localization strategy that yields strong results when paired with existing unlearning algorithms. We also propose a new unlearning algorithm, Deletion by Example Localization (DEL), that resets the parameters deemed-to-be most critical according to our localization strategy, and then finetunes them. Our extensive experiments on different datasets, forget sets and metrics reveal that DEL sets a new state-of-the-art for unlearning metrics, against both localized and full-parameter methods, while modifying a small subset of parameters, and outperforms the state-of-the-art localized unlearning in terms of test accuracy too.
Improved Localized Machine Unlearning Through the Lens of Memorization
Reihaneh Torkzadehmahani
Reza Nasirigerdeh
Georgios Kaissis
Daniel Rueckert
Eleni Triantafillou
Machine unlearning refers to removing the influence of a specified subset of training data from a machine learning model, efficiently, after… (voir plus) it has already been trained. This is important for key applications, including making the model more accurate by removing outdated, mislabeled, or poisoned data. In this work, we study localized unlearning, where the unlearning algorithm operates on a (small) identified subset of parameters. Drawing inspiration from the memorization literature, we propose an improved localization strategy that yields strong results when paired with existing unlearning algorithms. We also propose a new unlearning algorithm, Deletion by Example Localization (DEL), that resets the parameters deemed-to-be most critical according to our localization strategy, and then finetunes them. Our extensive experiments on different datasets, forget sets and metrics reveal that DEL sets a new state-of-the-art for unlearning metrics, against both localized and full-parameter methods, while modifying a small subset of parameters, and outperforms the state-of-the-art localized unlearning in terms of test accuracy too.
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Oscar Mañas
Pietro Astolfi
Melissa Hall
Candace Ross
Jack Urbanek
Adina Williams
Michal Drozdzal
MiRGraph: A hybrid deep learning approach to identify microRNA-target interactions by integrating heterogeneous regulatory network and genomic sequences
Pei Liu
Ying Liu
Jiawei Luo
Insect Identification in the Wild: The AMI Dataset
Aditya Jain
Fagner Cunha
M. J. Bunsen
Juan Sebastián Cañas
L. Pasi
N. Pinoy
Flemming Helsing
JoAnne Russo
Marc Botham
Michael Sabourin
Jonathan Fréchette
Alexandre Anctil
Yacksecari Lopez
Eduardo Navarro
Filonila Perez Pimentel
Ana Cecilia Zamora
José Alejandro Ramirez Silva
Jonathan Gagnon
Tom August
K. Bjerge … (voir 8 de plus)
Alba Gomez Segura
Marc Bélisle
Yves Basset
K. P. McFarland
David Roy
Toke Thomas Høye
Maxim Larrivée
Insects represent half of all global biodiversity, yet many of the world's insects are disappearing, with severe implications for ecosystems… (voir plus) and agriculture. Despite this crisis, data on insect diversity and abundance remain woefully inadequate, due to the scarcity of human experts and the lack of scalable tools for monitoring. Ecologists have started to adopt camera traps to record and study insects, and have proposed computer vision algorithms as an answer for scalable data processing. However, insect monitoring in the wild poses unique challenges that have not yet been addressed within computer vision, including the combination of long-tailed data, extremely similar classes, and significant distribution shifts. We provide the first large-scale machine learning benchmarks for fine-grained insect recognition, designed to match real-world tasks faced by ecologists. Our contributions include a curated dataset of images from citizen science platforms and museums, and an expert-annotated dataset drawn from automated camera traps across multiple continents, designed to test out-of-distribution generalization under field conditions. We train and evaluate a variety of baseline algorithms and introduce a combination of data augmentation techniques that enhance generalization across geographies and hardware setups.
ProGRes: Prompted Generative Rescoring on ASR n-Best
Ada Defne Tur
Adel Moumen
The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications
Philippe Brouillard
Chandler Squires
Jonas Wahl
Konrad P. Kording
Karen Sachs
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientifi… (voir plus)c disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these claims by performing a systematic review of the recent causal discovery literature. We present applications in biology, neuroscience, and Earth sciences - fields where causal discovery holds promise for addressing key challenges. We highlight available simulated and real-world datasets from these domains and discuss common assumption violations that have spurred the development of new methods. Our goal is to encourage the community to adopt better evaluation practices by utilizing realistic datasets and more adequate metrics.
The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications
Philippe Brouillard
Chandler Squires
Jonas Wahl
Konrad P. Kording
Karen Sachs
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientifi… (voir plus)c disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these claims by performing a systematic review of the recent causal discovery literature. We present applications in biology, neuroscience, and Earth sciences - fields where causal discovery holds promise for addressing key challenges. We highlight available simulated and real-world datasets from these domains and discuss common assumption violations that have spurred the development of new methods. Our goal is to encourage the community to adopt better evaluation practices by utilizing realistic datasets and more adequate metrics.