Portrait de Mark Coates

Mark Coates

Membre académique associé
Professeur agrégé, McGill University, Département de génie électrique et informatique

Biographie

Mark Coates est professeur au Département de génie électrique et informatique de l'Université McGill, auquel il s’est joint en 2002. Il a obtenu une licence en génie des systèmes informatiques de l'Université d'Adélaïde (Australie) en 1995 et un doctorat en génie de l'information de l'Université de Cambridge (Royaume-Uni) en 1999. Il a été associé de recherche et conférencier à l'Université Rice, au Texas, de 1999 à 2001. En 2012-2013, il a travaillé en tant que scientifique principal chez Winton Capital Management à Oxford, au Royaume-Uni. Il a assumé de multiples rôles éditoriaux, notamment en tant que rédacteur principal pour IEEE Signal Processing Letters, rédacteur associé pour IEEE Transactions on Signal Processing et rédacteur associé pour IEEE Transactions on Signal and Information Processing over Networks. Les recherches de Mark Coates portent sur l'apprentissage automatique et le traitement statistique des signaux, l'inférence bayésienne et Monte Carlo, et l'apprentissage sur les graphes et les réseaux. Ses contributions les plus influentes et les plus citées concernent la tomographie des réseaux et le filtrage distribué des particules.

Étudiants actuels

Publications

Neural Graph Generation from Graph Statistics.
Kiarash Zahirnia
Yaochen Hu
Oliver Schulte
Bidirectional Learning for Offline Model-based Biological Sequence Design
Can Chen
Yingxue Zhang
Evaluation of Categorical Generative Models - Bridging the Gap Between Real and Synthetic Data
Florence Regol
Anja Kroon
The machine learning community has mainly relied on real data to benchmark algorithms as it provides compelling evidence of model applicabil… (voir plus)ity. Evaluation on synthetic datasets can be a powerful tool to provide a better understanding of a model’s strengths, weaknesses and overall capabilities. Gaining these insights can be particularly important for generative modeling as the target quantity is completely unknown. Multiple issues related to the evaluation of generative models have been reported in the literature. We argue those problems can be avoided by an evaluation based on ground truth. General criticisms of synthetic experiments are that they are too simplified and not representative of practical scenarios. As such, our experimental setting is tailored to a realistic generative task. We focus on categorical data and introduce an appropriately scalable evaluation method. Our method involves tasking a generative model to learn a distribution in a high-dimensional setting. We then successively bin the large space to obtain smaller probability spaces where meaningful statistical tests can be applied. We consider increasingly large probability spaces, which correspond to increasingly difficult modeling tasks, and compare the generative models based on the highest task difficulty they can reach before being detected as being too far from the ground truth. We validate our evaluation procedure with synthetic experiments on both synthetic generative models and current state-of-the-art categorical generative models.
Intent-aware Multi-source Contrastive Alignment for Tag-enhanced Recommendation
Haolun Wu
Yingxue Zhang
Chen Ma
Wei Guo
Ruiming Tang
To offer accurate and diverse recommendation services, recent methods use auxiliary information to foster the learning process of user and i… (voir plus)tem representations. Many state-of-the-art (SOTA) methods fuse different sources of information (user, item, knowledge graph, tags, etc.) into a graph and use Graph Neural Networks (GNNs) to introduce the auxiliary information through the message passing paradigm. In this work, we seek an alternative framework that is light and effective through self-supervised learning across different sources of information, particularly for the commonly accessible item tag information. We use a self-supervision signal to pair users with the auxiliary information (tags) associated with the items they have interacted with before. To achieve the pairing, we create a proxy training task. For a given item, the model predicts which is the correct pairing between the representations obtained from the users that have interacted with this item and the tags assigned to it. This design provides an efficient solution, using the auxiliary information directly to enhance the quality of user and item embeddings. User behavior in recommendation systems is driven by the complex interactions of many factors behind the users’ decision-making processes. To make the pairing process more fine-grained and avoid embedding collapse, we propose a user intent-aware self-supervised pairing process where we split the user embeddings into multiple sub-embedding vectors. Each sub-embedding vector captures a specific user intent via self-supervised alignment with a particular cluster of tags. We integrate our designed framework with various recommendation models, demonstrating its flexibility and compatibility. Through comparison with numerous SOTA methods on seven real-world datasets, we show that our method can achieve better performance while requiring less training time. This indicates the potential of applying our approach on web-scale datasets.
Graph Inductive Biases in Transformers without Message Passing
Liheng Ma
Chen Lin
Derek Lim
Puneet K. Dokania
Philip Torr
Ser-Nam Lim
Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial fo… (voir plus)r Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.
Graph Inductive Biases in Transformers without Message Passing
Liheng Ma
Chen Lin
Derek Lim
Puneet K. Dokania
Philip Torr
Ser-Nam Lim
Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial fo… (voir plus)r Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) — a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive — it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.
Neighbor Auto-Grouping Graph Neural Networks for Handover Parameter Configuration in Cellular Network
Mehrtash Mehrabi
Walid Masoudimansour
Yingxue Zhang
Jie Chuai
Zhitang Chen
Jianye Hao
Yanhui. Geng
Adapting Triplet Importance of Implicit Feedback for Personalized Recommendation
Haolun Wu
Chen Ma
Yingxue Zhang
Ruiming Tang
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Completion
Jiapeng Wu
Yishi Xu
Yingxue Zhang
Chen Ma
Reasoning in a temporal knowledge graph (TKG) is a critical task for information retrieval and semantic search. It is particularly challengi… (voir plus)ng when the TKG is updated frequently. The model has to adapt to changes in the TKG for efficient training and inference while preserving its performance on historical knowledge. Recent work approaches TKG completion (TKGC) by augmenting the encoder-decoder framework with a time-aware encoding function. However, naively fine-tuning the model at every time step using these methods does not address the problems of 1) catastrophic forgetting, 2) the model's inability to identify the change of facts (e.g., the change of the political affiliation and end of a marriage), and 3) the lack of training efficiency. To address these challenges, we present the Time-aware Incremental Embedding (TIE) framework, which combines TKG representation learning, experience replay, and temporal regularization. We introduce a set of metrics that characterizes the intransigence of the model and propose a constraint that associates the deleted facts with negative labels. Experimental results on Wikidata12k and YAGO11k datasets demonstrate that the proposed TIE framework reduces training time by about ten times and improves on the proposed metrics compared to vanilla full-batch training. It comes without a significant loss in performance for any traditional measures. Extensive ablation studies reveal performance trade-offs among different evaluation metrics, which is essential for decision-making around real-world TKG applications.
RNN with Particle Flow for Probabilistic Spatio-temporal Forecasting
Soumyasundar Pal
Liheng Ma
Yingxue Zhang
Spatio-temporal forecasting has numerous applications in analyzing wireless, traffic, and financial networks. Many classical statistical mod… (voir plus)els often fall short in handling the complexity and high non-linearity present in time-series data. Recent advances in deep learning allow for better modelling of spatial and temporal dependencies. While most of these models focus on obtaining accurate point forecasts, they do not characterize the prediction uncertainty. In this work, we consider the time-series data as a random realization from a nonlinear state-space model and target Bayesian inference of the hidden states for probabilistic forecasting. We use particle flow as the tool for approximating the posterior distribution of the states, as it is shown to be highly effective in complex, high-dimensional settings. Thorough experimentation on several real world time-series datasets demonstrates that our approach provides better characterization of uncertainty while maintaining comparable accuracy to the state-of-the art point forecasting methods.
A Multisensor Multi-Bernoulli Filter
Augustin A. Saucan
In this paper, we derive a multisensor multi-Bernoulli (MS-MeMBer) filter for multitarget tracking. Measurements from multiple sensors are e… (voir plus)mployed by the proposed filter to update a set of tracks modeled as a multi-Bernoulli random finite set. An exact implementation of the MS-MeMBer update procedure is computationally intractable. We propose an efficient approximate implementation by using a greedy measurement partitioning mechanism. The proposed filter allows for Gaussian mixture or particle filter implementations. Numerical simulations conducted for both linear-Gaussian and nonlinear models highlight the improved accuracy of the MS-MeMBer filter and its reduced computational load with respect to the multisensor cardinalized probability hypothesis density filter and the iterated-corrector cardinality-balanced multi-Bernoulli filter especially for low probabilities of detection.
Optimization and Analysis of Distributed Averaging With Short Node Memory
Boris Oreshkin
Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each nod… (voir plus)e has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of ¿N improvement.