Portrait of Toby Dylan Hocking

Toby Dylan Hocking

Associate Academic Member
Associate Professor, Université Sherbrooke, Department of Computer Science
Research Topics
Computational Biology
Computer Vision
Data Mining
Deep Learning
Medical Machine Learning
Optimization

Biography

A Berkeley-educated California native, Toby Dylan Hocking received his PhD in mathematics (machine learning) from École Normale Supérieure de Cachan (Paris, France) in 2012. He worked as a postdoc in Masashi Sugiyama’s machine learning lab at Tokyo Tech in 2013, and in Guillaume Bourque’s genomics lab in McGill University (2014-2018).

In 2018-2024 he was a tenure-track Assistant Professor at Northern Arizona University, and since 2024, he is a tenured Associate Professor at Université de Sherbrooke, where he directs the LASSO research lab (Learning Algorithms, Statistical Software, Optimization). Toby is also an Associate Academic member at Mila - Quebec Artificial Intelligence Institute.

He has authored dozens of R packages, and has published 50+ peer-reviewed research papers on machine learning and statistical software. He has mentored 30+ students in research projects, as well as another 30+ open-source software contributors with R Project in Google Summer of Code.

Publications

Finite Sample Complexity Analysis of Binary Segmentation
Binary segmentation is the classic greedy algorithm which recursively splits a sequential data set by optimizing some loss or likelihood fun… (see more)ction. Binary segmentation is widely used for changepoint detection in data sets measured over space or time, and as a sub-routine for decision tree learning. In theory it should be extremely fast for
SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets
Gabrielle Thibault
C. S. Bodine
Paul Nelson Arellano
Alexander F Shenkin
Olivia J. Lindly
In many real-world applications of machine learning, we are interested to know if it is possible to train on the data that we have gathered … (see more)so far, and obtain accurate predictions on a new test data subset that is qualitatively different in some respect (time period, geographic region, etc). Another question is whether data subsets are similar enough so that it is beneficial to combine subsets during model training. We propose SOAK, Same/Other/All K-fold cross-validation, a new method which can be used to answer both questions. SOAK systematically compares models which are trained on different subsets of data, and then used for prediction on a fixed test subset, to estimate the similarity of learnable/predictable patterns in data subsets. We show results of using SOAK on six new real data sets (with geographic/temporal subsets, to check if predictions are accurate on new subsets), 3 image pair data sets (subsets are different image types, to check that we get smaller prediction error on similar images), and 11 benchmark data sets with predefined train/test splits (to check similarity of predefined splits).
Enhancing Changepoint Detection: Penalty Learning through Deep Learning Techniques
Tung L. Nguyen
Changepoint detection, a technique for identifying significant shifts within data sequences, is crucial in various fields such as finance, g… (see more)enomics, medicine, etc. Dynamic programming changepoint detection algorithms are employed to identify the locations of changepoints within a sequence, which rely on a penalty parameter to regulate the number of changepoints. To estimate this penalty parameter, previous work uses simple models such as linear or tree-based models. This study introduces a novel deep learning method for predicting penalty parameters, leading to demonstrably improved changepoint detection accuracy on large benchmark supervised labeled datasets compared to previous methods.
Enhancing Changepoint Detection: Penalty Learning through Deep Learning Techniques
Tung L. Nguyen
Penalty Learning for Optimal Partitioning using Multilayer Perceptron
Tung L. Nguyen
Changepoint detection is a technique used to identify significant shifts in sequences and is widely used in fields such as finance, genomics… (see more), and medicine. To identify the changepoints, dynamic programming (DP) algorithms, particularly Optimal Partitioning (OP) family, are widely used. To control the changepoints count, these algorithms use a fixed penalty to penalize the changepoints presence. To predict the optimal value of that penalty, existing methods used simple models such as linear or tree-based, which may limit predictive performance. To address this issue, this study proposes using a multilayer perceptron (MLP) with a ReLU activation function to predict the penalty. The proposed model generates continuous predictions -- as opposed to the stepwise ones in tree-based models -- and handles non-linearity better than linear models. Experiments on large benchmark genomic datasets demonstrate that the proposed model improves accuracy and F1 score compared to existing models.
Automated River Substrate Mapping From Sonar Imagery With Machine Learning
C. S. Bodine
D. Buscombe
Reply to: Model uncertainty obscures major driver of soil carbon
Feng Tao
Benjamin Z. Houlton
Serita D. Frey
Johannes Lehmann
Stefano Manzoni
Yuanyuan Huang
Lifen Jiang
Umakant Mishra
Bruce A. Hungate
Michael W. I. Schmidt
Markus Reichstein
Nuno Carvalhais
Philippe Ciais
Ying-Ping Wang
Bernhard Ahrens
Gustaf Hugelius
Xingjie Lu
Zheng Shi
Kostiantyn Viatkin … (see 15 more)
K. Viatkin
Ronald Vargas
Yusuf Yigini
Christian Omuto
Ashish A. Malik
Guillermo Peralta
Rosa Cuevas-Corona
Luciano E. Di Paolo
Isabel Luotto
Cuijuan Liao
Yi-Shuang Liang
Yixin Liang
Vinisa S. Saynes
Xiaomeng Huang
Yiqi Luo
Functional Labeled Optimal Partitioning
Jacob M. Kaufman
Alyssa J. Stenberg
Deep Learning Approach for Changepoint Detection: Penalty Parameter Optimization
Tung L. Nguyen
Changepoint detection, a technique for identifying significant shifts within data sequences, is crucial in various fields such as finance, g… (see more)enomics, medicine, etc. Dynamic programming changepoint detection algorithms are employed to identify the locations of changepoints within a sequence, which rely on a penalty parameter to regulate the number of changepoints. To estimate this penalty parameter, previous work uses simple models such as linear models or decision trees. This study introduces a novel deep learning method for predicting penalty parameters, leading to demonstrably improved changepoint detection accuracy on large benchmark supervised labeled datasets compared to previous methods.