Rahma Nouaji

PhD - McGill University

Supervisor

Oana Balmau

Research Topics

Distributed Systems

Machine Learning Systems

Website

Publications

MinatoLoader: Accelerating Machine Learning Training Through Efficient Data Preprocessing

Rahma Nouaji

Stella Bitchebe

Ricardo Macedo

Oana Balmau

Machine learning (ML) frameworks, such as PyTorch and TensorFlow, rely on data loaders to preprocess data before feeding it to accelerators.… (see more) When preprocessing is inefficiently pipelined, GPUs can remain idle over long periods of time, leading to substantial training delays. For example, PyTorch's default data loaders can cause up to 76% GPU idleness. A key bottleneck is the variability in preprocessing time across samples within the same dataset. Existing data loaders are oblivious to this variability, training all samples uniformly. In this case, a single slow sample can stall the entire batch, causing head-of-line blocking.

2026-04-25

European Conference on Computer Systems (published)

doi.org

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Rahma Nouaji

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Rahma Nouaji

Publications