Portrait of Rahma Nouaji

Rahma Nouaji

PhD - McGill University
Supervisor
Research Topics
Distributed Systems
Machine Learning Systems

Publications

MinatoLoader: Accelerating Machine Learning Training Through Efficient Data Preprocessing
Stella Bitchebe
Ricardo Macedo
Machine learning (ML) frameworks, such as PyTorch and TensorFlow, rely on data loaders to preprocess data before feeding it to accelerators.… (see more) When preprocessing is inefficiently pipelined, GPUs can remain idle over long periods of time, leading to substantial training delays. For example, PyTorch's default data loaders can cause up to 76% GPU idleness. A key bottleneck is the variability in preprocessing time across samples within the same dataset. Existing data loaders are oblivious to this variability, training all samples uniformly. In this case, a single slow sample can stall the entire batch, causing head-of-line blocking.