Publications

Interacting Diffusion Processes for Event Sequence Forecasting

Mai Zeng

Florence Regol

Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time… (see more) intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. The model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPPs.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

Jinsoo Yoo

Yunpeng Liu

Frank N. Wood

Geoff Pleiss

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos

Shakeeb Murtaza

Marco Pedersoli

Aydin Sarraf

Eric Granger

Weakly-Supervised Video Object Localization (WSVOL) involves localizing an object in videos using only video-level labels, also referred to … (see more)as tags. State-of-the-art WSVOL methods like Temporal CAM (TCAM) rely on class activation mapping (CAM) and typically require a pre-trained CNN classifier. However, their localization accuracy is affected by their tendency to minimize the mutual information between different instances of a class and exploit temporal information during training for downstream tasks, e.g., detection and tracking. In the absence of bounding box annotation, it is challenging to exploit precise information about objects from temporal cues because the model struggles to locate objects over time. To address these issues, a novel method called transformer based CAM for videos (TrCAM-V), is proposed for WSVOL. It consists of a DeiT backbone with two heads for classification and localization. The classification head is trained using standard classification loss (CL), while the localization head is trained using pseudo-labels that are extracted using a pre-trained CLIP model. From these pseudo-labels, the high and low activation values are considered to be foreground and background regions, respectively. Our TrCAM-V method allows training a localization network by sampling pseudo-pixels on the fly from these regions. Additionally, a conditional random field (CRF) loss is employed to align the object boundaries with the foreground map. During inference, the model can process individual frames for real-time localization applications. Extensive experiments on challenging YouTube-Objects unconstrained video datasets show that our TrCAM-V method achieves new state-of-the-art performance in terms of classification and localization accuracy.

2024-07-08

ArXiv (preprint)

doi.org

arxiv.org

Listenable Maps for Audio Classifiers

Francesco Paissan

Mirco Ravanelli

Cem Subakan

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Lookbehind-SAM: k steps back, 1 step forward

Goncalo Mordido

Pranshu Malviya

Aristide Baratin

Sarath Chandar

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

openreview.net

Memory Efficient Neural Processes via Constant Memory Attention Block

Leo Feng

Frederick Tung

Hossein Hajimirsadeghi

Yoshua Bengio

Mohamed Osama Ahmed

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

openreview.net

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie

Polina Kirichenko

Mark Ibrahim

Mahmoud Assran

Andrew Gordon Wilson

Aaron Courville

Nicolas Ballas

There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its … (see more)caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9\% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5\% on ImageNet outperforming a similarly sized CLIP by 1.4\%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0\%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Nearest Neighbour Score Estimators for Diffusion Generative Models

Matthew Niedoba

Dylan Green

Saeid Naderiparizi

Vasileios Lioutas

Jonathan Wilder Lavington

Xiaoxuan Liang

Yunpeng Liu

Ke Zhang

Setareh Dabiri

Adam Ścibior

Berend Zwartsenberg

Frank N. Wood

Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most com… (see more)monly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research. Code will be released upon paper acceptance.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

A Persuasive Approach to Combating Misinformation

Safwan Hossain

Andjela Mladenovic

Yiling Chen

Gauthier Gidel

Bayesian Persuasion is proposed as a tool for social media platforms to combat the spread of misinformation. Since platforms can use machine… (see more) learning to predict the popularity and misinformation features of to-be-shared posts, and users are largely motivated to share popular content, platforms can strategically signal this informational advantage to change user beliefs and persuade them not to share misinformation. We characterize the optimal signaling scheme with imperfect predictions as a linear program and give sufficient and necessary conditions on the classifier to ensure optimal platform utility is non-decreasing and continuous. Next, this interaction is considered under a performative model, wherein platform intervention affects the user's future behaviour. The convergence and stability of optimal signaling under this performative process are fully characterized. Lastly, we experimentally validate that our approach significantly reduces misinformation in both the single round and performative setting.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities

Golnoosh Farnadi

Mohammad Havaei

Negar Rostamzadeh

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Quantitative Analysis of Miniature Synaptic Calcium Transients Using Positive Unlabeled Deep Learning

Frédéric Beaupré

Anthony Bilodeau

Theresa Wiesner

Gabriel Leclerc

Mado Lemieux

Gabriel Nadeau

Katrine Castonguay

Bolin Fan

Simon Labrecque

Renée Hložek

Paul De Koninck

Christian Gagné

Flavie Lavoie-Cardinal

Ca2+ imaging methods are widely used for studying cellular activity in the brain, allowing detailed analysis of dynamic processes across var… (see more)ious scales. Enhanced by high-contrast optical microscopy and fluorescent Ca2+ sensors, this technique can be used to reveal localized Ca2+ fluctuations within neurons, including in sub-cellular compartments, such as the dendritic shaft or spines. Despite advances in Ca2+ sensors, the analysis of miniature Synaptic Calcium Transients (mSCTs), characterized by variability in morphology and low signal-to-noise ratios, remains challenging. Traditional threshold-based methods struggle with the detection and segmentation of these small, dynamic events. Deep learning (DL) approaches offer promising solutions but are limited by the need for large annotated datasets. Positive Unlabeled (PU) learning addresses this limitation by leveraging unlabeled instances to increase dataset size and enhance performance. This approach is particularly useful in the case of mSCTs that are scarce and small, associated with a very small proportion of the foreground pixels. PU learning significantly increases the effective size of the training dataset, improving model performance. Here, we present a PU learning-based strategy for detecting and segmenting mSCTs. We evaluate the performance of two 3D deep learning models, StarDist-3D and 3D U-Net, which are well established for the segmentation of small volumetric structures in microscopy datasets. By integrating PU learning, we enhance the 3D U-Net’s performance, demonstrating significant gains over traditional methods. This work pioneers the application of PU learning in Ca2+ imaging analysis, offering a robust framework for mSCT detection and segmentation. We also demonstrate how this quantitative analysis pipeline can be used for subsequent mSCTs feature analysis. We characterize morphological and kinetic changes of mSCTs associated with the application of chemical long-term potentiation (cLTP) stimulation in cultured rat hippocampal neurons. Our data-driven approach shows that a cLTP-inducing stimulus leads to the emergence of new active dendritic regions and differently affects mSCTs subtypes.

2024-07-08

bioRxiv (preprint)

doi.org

Randomized Confidence Bounds for Stochastic Partial Monitoring

Maxime Heuillet

Ola Ahmad

Audrey Durand

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net