Portrait de Toby Dylan Hocking

Toby Dylan Hocking

Membre académique associé
Professeur agrégé, Université Sherbrooke, Département d'informatique
Sujets de recherche
Apprentissage automatique médical
Apprentissage profond
Biologie computationnelle
Exploration des données
Optimisation
Vision par ordinateur

Biographie

Originaire de Californie et ayant fait ses études à Berkeley, Toby Dylan Hocking a obtenu son doctorat en mathématiques (apprentissage automatique) à l'École normale supérieure de Cachan (Paris, France) en 2012. Il a travaillé comme postdoc dans le laboratoire d'apprentissage automatique de Masashi Sugiyama à Tokyo Tech en 2013, et dans le laboratoire de génomique de Guillaume Bourque à l'Université McGill.

Il a été professeur adjoint menant à la permanence à la Northern Arizona University pendant 5 ans et aujourd'hui il est professeur agrégé permanent à l'Université de Sherbrooke, où il dirige le laboratoire de recherche LASSO (Learning Algorithms, Statistical Software, Optimization).Toby est également un membre académique associé à Mila - Institut québécois d'intelligence artificielle.

Il est l'auteur de dizaines de paquets R et a publié plus de 50 articles de recherche évalués par des pairs sur l'apprentissage automatique et les logiciels statistiques. Il a encadré plus de 30 étudiants dans des projets de recherche, ainsi que plus de 30 contributeurs de logiciels libres avec le projet R dans le cadre du Google Summer of Code.

Publications

Cross-validation for training and testing co-occurrence network inference algorithms
Daniel Agyapong
Jeffrey Ryan Propster
Jane Marks
Interval Regression: A Comparative Study with Proposed Models
Tung L. Nguyen
Regression models are essential for a wide range of real-world applications. However, in practice, target values are not always precisely kn… (voir plus)own; instead, they may be represented as intervals of acceptable values. This challenge has led to the development of Interval Regression models. In this study, we provide a comprehensive review of existing Interval Regression models and introduce alternative models for comparative analysis. Experiments are conducted on both real-world and synthetic datasets to offer a broad perspective on model performance. The results demonstrate that no single model is universally optimal, highlighting the importance of selecting the most suitable model for each specific scenario.
Interval Regression: A Comparative Study with Proposed Models
Tung L. Nguyen
Regression models are essential for a wide range of real-world applications. However, in practice, target values are not always precisely kn… (voir plus)own; instead, they may be represented as intervals of acceptable values. This challenge has led to the development of Interval Regression models. In this study, we provide a comprehensive review of existing Interval Regression models and introduce alternative models for comparative analysis. Experiments are conducted on both real-world and synthetic datasets to offer a broad perspective on model performance. The results demonstrate that no single model is universally optimal, highlighting the importance of selecting the most suitable model for each specific scenario.
Efficient line search for optimizing Area Under the ROC Curve in gradient descent
Jadon Fowler
Receiver Operating Characteristic (ROC) curves are useful for evaluation in binary classification and changepoint detection, but difficult t… (voir plus)o use for learning since the Area Under the Curve (AUC) is piecewise constant (gradient zero almost everywhere). Recently the Area Under Min (AUM) of false positive and false negative rates has been proposed as a differentiable surrogate for AUC. In this paper we study the piecewise linear/constant nature of the AUM/AUC, and propose new efficient path-following algorithms for choosing the learning rate which is optimal for each step of gradient descent (line search), when optimizing a linear model. Remarkably, our proposed line search algorithm has the same log-linear asymptotic time complexity as gradient descent with constant step size, but it computes a complete representation of the AUM/AUC as a function of step size. In our empirical study of binary classification problems, we verify that our proposed algorithm is fast and exact; in changepoint detection problems we show that the proposed algorithm is just as accurate as grid search, but faster.
SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets
Gabrielle Thibault
C. Bodine
Paul Nelson Arellano
Alexander F Shenkin
Olivia J. Lindly
In many real-world applications of machine learning, we are interested to know if it is possible to train on the data that we have gathered … (voir plus)so far, and obtain accurate predictions on a new test data subset that is qualitatively different in some respect (time period, geographic region, etc). Another question is whether data subsets are similar enough so that it is beneficial to combine subsets during model training. We propose SOAK, Same/Other/All K-fold cross-validation, a new method which can be used to answer both questions. SOAK systematically compares models which are trained on different subsets of data, and then used for prediction on a fixed test subset, to estimate the similarity of learnable/predictable patterns in data subsets. We show results of using SOAK on six new real data sets (with geographic/temporal subsets, to check if predictions are accurate on new subsets), 3 image pair data sets (subsets are different image types, to check that we get smaller prediction error on similar images), and 11 benchmark data sets with predefined train/test splits (to check similarity of predefined splits).
Efficient line search for optimizing Area Under the ROC Curve in gradient descent
Jadon Fowler
Receiver Operating Characteristic (ROC) curves are useful for evaluation in binary classification and changepoint detection, but difficult t… (voir plus)o use for learning since the Area Under the Curve (AUC) is piecewise constant (gradient zero almost everywhere). Recently the Area Under Min (AUM) of false positive and false negative rates has been proposed as a differentiable surrogate for AUC. In this paper we study the piecewise linear/constant nature of the AUM/AUC, and propose new efficient path-following algorithms for choosing the learning rate which is optimal for each step of gradient descent (line search), when optimizing a linear model. Remarkably, our proposed line search algorithm has the same log-linear asymptotic time complexity as gradient descent with constant step size, but it computes a complete representation of the AUM/AUC as a function of step size. In our empirical study of binary classification problems, we verify that our proposed algorithm is fast and exact; in changepoint detection problems we show that the proposed algorithm is just as accurate as grid search, but faster.
Finite Sample Complexity Analysis of Binary Segmentation
Binary segmentation is the classic greedy algorithm which recursively splits a sequential data set by optimizing some loss or likelihood fun… (voir plus)ction. Binary segmentation is widely used for changepoint detection in data sets measured over space or time, and as a sub-routine for decision tree learning. In theory it should be extremely fast for
SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets
Gabrielle Thibault
C. S. Bodine
Paul Nelson Arellano
Alexander F Shenkin
Olivia J. Lindly
In many real-world applications of machine learning, we are interested to know if it is possible to train on the data that we have gathered … (voir plus)so far, and obtain accurate predictions on a new test data subset that is qualitatively different in some respect (time period, geographic region, etc). Another question is whether data subsets are similar enough so that it is beneficial to combine subsets during model training. We propose SOAK, Same/Other/All K-fold cross-validation, a new method which can be used to answer both questions. SOAK systematically compares models which are trained on different subsets of data, and then used for prediction on a fixed test subset, to estimate the similarity of learnable/predictable patterns in data subsets. We show results of using SOAK on six new real data sets (with geographic/temporal subsets, to check if predictions are accurate on new subsets), 3 image pair data sets (subsets are different image types, to check that we get smaller prediction error on similar images), and 11 benchmark data sets with predefined train/test splits (to check similarity of predefined splits).
Enhancing Changepoint Detection: Penalty Learning through Deep Learning Techniques
Tung L. Nguyen
Changepoint detection, a technique for identifying significant shifts within data sequences, is crucial in various fields such as finance, g… (voir plus)enomics, medicine, etc. Dynamic programming changepoint detection algorithms are employed to identify the locations of changepoints within a sequence, which rely on a penalty parameter to regulate the number of changepoints. To estimate this penalty parameter, previous work uses simple models such as linear or tree-based models. This study introduces a novel deep learning method for predicting penalty parameters, leading to demonstrably improved changepoint detection accuracy on large benchmark supervised labeled datasets compared to previous methods.
Enhancing Changepoint Detection: Penalty Learning through Deep Learning Techniques
Tung L. Nguyen
Automated River Substrate Mapping From Sonar Imagery With Machine Learning
C. S. Bodine
D. Buscombe
Reply to: Model uncertainty obscures major driver of soil carbon
Feng Tao
Benjamin Z. Houlton
Serita D. Frey
Johannes Lehmann
Stefano Manzoni
Yuanyuan Huang
Lifen Jiang
Umakant Mishra
Bruce A. Hungate
Michael W. I. Schmidt
Markus Reichstein
Nuno Carvalhais
Philippe Ciais
Ying-Ping Wang
Bernhard Ahrens
Gustaf Hugelius
Xingjie Lu
Zheng Shi
Kostiantyn Viatkin … (voir 15 de plus)
K. Viatkin
Ronald Vargas
Yusuf Yigini
Christian Omuto
Ashish A. Malik
Guillermo Peralta
Rosa Cuevas-Corona
Luciano E. Di Paolo
Isabel Luotto
Cuijuan Liao
Yi-Shuang Liang
Yixin Liang
Vinisa S. Saynes
Xiaomeng Huang
Yiqi Luo