Portrait of Christos Tsirigotis is unavailable

Christos Tsirigotis

PhD - Université de Montréal
Supervisor
Research Topics
Deep Learning
Generative Models
Probabilistic Models
Representation Learning

Publications

BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
Joao Monteiro
Perouz Taslakian
Neural sentence embedding models for dense retrieval typically rely on binary relevance labels, treating query-document pairs as either rele… (see more)vant or irrelevant. However, real-world relevance often exists on a continuum, and recent advances in large language models (LLMs) have made it feasible to scale the generation of fine-grained graded relevance labels. In this work, we propose \textbf{BiXSE}, a simple and effective pointwise training method that optimizes binary cross-entropy (BCE) over LLM-generated graded relevance scores. BiXSE interprets these scores as probabilistic targets, enabling granular supervision from a single labeled query-document pair per query. Unlike pairwise or listwise losses that require multiple annotated comparisons per query, BiXSE achieves strong performance with reduced annotation and compute costs by leveraging in-batch negatives. Extensive experiments across sentence embedding (MMTEB) and retrieval benchmarks (BEIR, TREC-DL) show that BiXSE consistently outperforms softmax-based contrastive learning (InfoNCE), and matches or exceeds strong pairwise ranking baselines when trained on LLM-supervised data. BiXSE offers a robust, scalable alternative for training dense retrieval models as graded relevance supervision becomes increasingly accessible.
FLAM: Frame-Wise Language-Audio Modeling
Ke Chen
Oriol Nieto
Prem Seetharaman
Justin Salamon
Group Robust Classification Without Any Group Information
Joao Monteiro
Pau Rodriguez
David Vazquez
A General Framework For Proving The Equivariant Strong Lottery Ticket Hypothesis
The Strong Lottery Ticket Hypothesis (SLTH) stipulates the existence of a subnetwork within a sufficiently overparameterized (dense) neural … (see more)network that -- when initialized randomly and without any training -- achieves the accuracy of a fully trained target network. Recent works by Da Cunha et. al 2022; Burkholz 2022 demonstrate that the SLTH can be extended to translation equivariant networks -- i.e. CNNs -- with the same level of overparametrization as needed for the SLTs in dense networks. However, modern neural networks are capable of incorporating more than just translation symmetry, and developing general equivariant architectures such as rotation and permutation has been a powerful design principle. In this paper, we generalize the SLTH to functions that preserve the action of the group
Simplicial Embeddings in Self-Supervised Learning and Downstream Classification
Samuel Lavoie
Max Schwarzer
Kenji Kawaguchi
Ankit Vani
Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into …
Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization
Chin-Wei Huang
Ricky T. Q. Chen
Flow-based models are powerful tools for designing probabilistic models with tractable density. This paper introduces Convex Potential Flows… (see more) (CP-Flow), a natural and efficient parameterization of invertible models inspired by the optimal transport (OT) theory. CP-Flows are the gradient map of a strongly convex neural potential function. The convexity implies invertibility and allows us to resort to convex optimization to solve the convex conjugate for efficient inversion. To enable maximum likelihood training, we derive a new gradient estimator of the log-determinant of the Jacobian, which involves solving an inverse-Hessian vector product using the conjugate gradient method. The gradient estimator has constant-memory cost, and can be made effectively unbiased by reducing the error tolerance level of the convex optimization routine. Theoretically, we prove that CP-Flows are universal density approximators and are optimal in the OT sense. Our empirical results show that CP-Flow performs competitively on standard benchmarks of density estimation and variational inference.
A Walk with SGD
Chen Xing
Devansh Arpit
A Walk with SGD
Chen Xing
Devansh Arpit
Exploring why stochastic gradient descent (SGD) based optimization methods train deep neural networks (DNNs) that generalize well has become… (see more) an active area of research. Towards this end, we empirically study the dynamics of SGD when training over-parametrized DNNs. Specifically we study the DNN loss surface along the trajectory of SGD by interpolating the loss surface between parameters from consecutive \textit{iterations} and tracking various metrics during training. We find that the loss interpolation between parameters before and after a training update is roughly convex with a minimum (\textit{valley floor}) in between for most of the training. Based on this and other metrics, we deduce that during most of the training, SGD explores regions in a valley by bouncing off valley walls at a height above the valley floor. This 'bouncing off walls at a height' mechanism helps SGD traverse larger distance for small batch sizes and large learning rates which we find play qualitatively different roles in the dynamics. While a large learning rate maintains a large height from the valley floor, a small batch size injects noise facilitating exploration. We find this mechanism is crucial for generalization because the valley floor has barriers and this exploration above the valley floor allows SGD to quickly travel far away from the initialization point (without being affected by barriers) and find flatter regions, corresponding to better generalization.
A Walk with SGD
Chen Xing
Devansh Arpit