Mila > Team > Shuhao Zheng

Shuhao Zheng

Student Ph.D., McGill University

I’m currently a first-year Ph.D. student in McGill University supervised by Prof. Xue Liu. Before that, I finished my undergraduate study and got my Bachelor degree in 2021 at Yao class, Tsinghua University, an undergraduate honour program founded by Turing Award winner Andrew Chi-Chih Yao.

I have a broad interest in AI and blockchain. My research in AI mainly focus on applied machine learning and meta-learning, and my research interest in blockchain is in developing Web3 infrastructures and applying modern cryptography (e.g., zero-knowledge proof, secure multi-party computation) to blockchain systems and applications.

I’m leading the Crypto-Metaverse-Blockchain-Cloud (CMBC) research team and actively looking for collaborators. If you have any interests, please feel free to reach out by email. BTW, I’m also the owner of the ENS domain shuhao.eth. Try :).


Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021)

Shuhao Zheng*, Can Chen*, Xi Chen, Erqun Dong, Xue Liu, Hao Liu, Dejing Dou

Label noise and class imbalance are two major issues coexisting in real-world datasets. To alleviate the two issues, state-of-the-art methods reweight each instance by leveraging a small amount of clean and unbiased data. Yet, these methods overlook class-level information within each instance, which can be further utilized to improve performance. To this end, in this paper, we propose Generalized Data Weighting (GDW) to simultaneously mitigate label noise and class imbalance by manipulating gradients at the class level. To be specific, GDW unrolls the loss gradient to class-level gradients by the chain rule and reweights the flow of each gradient separately. In this way, GDW achieves remarkable performance improvement on both issues. Aside from the performance gain, GDW efficiently obtains class-level weights without introducing any extra computational cost compared with instance weighting methods. Specifically, GDW performs a gradient descent step on class-level weights, which only relies on intermediate gradients. Extensive experiments in various settings verify the effectiveness of GDW. For example, GDW outperforms state-of-the-art methods by 2.56% under the 60% uniform noise setting in CIFAR10. Our code is available at

Read full paper