Irina Rish

Biographie

Irina Rish est professeure titulaire à l'Université de Montréal (UdeM), où elle dirige le Laboratoire d'IA autonome. Membre du corps professoral de Mila – Institut québécois d’intelligence artificielle, elle est titulaire d'une chaire d'excellence en recherche du Canada (CERC) et d'une chaire en IA Canada-CIFAR. Irina dirige le projet INCITE du ministère américain de l'Environnement au sujet des modèles de fondation évolutifs sur les superordinateurs Summit et Frontier à l'Oak Ridge Leadership Computing Facility (OLCF). Elle est cofondatrice et directrice scientifique de Nolano.ai.

Ses recherches actuelles portent sur les lois de mise à l'échelle neuronale et les comportements émergents (capacités et alignement) dans les modèles de fondation, ainsi que sur l'apprentissage continu, la généralisation hors distribution et la robustesse. Avant de se joindre à l'UdeM en 2019, Irina était chercheuse au Centre de recherche IBM Thomas J. Watson, où elle a travaillé sur divers projets à l'intersection des neurosciences et de l'IA, et dirigé le défi NeuroAI. Elle a reçu plusieurs prix IBM : ceux de l’excellence et de l’innovation exceptionnelle (2018), celui de la réalisation technique exceptionnelle (2017), et celui de l’accomplissement en recherche (2009). Elle détient 64 brevets et a écrit plus de 120 articles de recherche, plusieurs chapitres de livres, trois livres publiés et une monographie sur la modélisation éparse.

Étudiants actuels

George Adamopoulos

Stagiaire de recherche

Ivan Anokhin

Doctorat - UdeM

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Doctorat - UdeM

Arjun Ashok

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Blake Richards

Mohammad Javad Darvishi Bayazi

Amin Darabi

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Karim Jerbi

Wagner Drew

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice alumni - UdeM

Maîtrise recherche

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Ioannis Mitliagkas

Nizar Islah

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Eilif Benjamin Muller

Doctorat - UdeM

Collaborateur·rice de recherche

Zafir Khalid

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Neeraj Kumar

Collaborateur·rice alumni - UdeM

Gwen Legate

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Eugene Belilovsky

David Lemay

Maîtrise recherche - UdeM

Jonathan Lim

Collaborateur·rice de recherche

amin.mansouri@mila.quebec

Baihan Lin

Visiteur de recherche indépendant - Mt. Sinai

Maîtrise recherche - UdeM

Collaborateur·rice de recherche

Doctorat - UdeM

Maîtrise recherche - UdeM

Diganta Misra

Maîtrise recherche - UdeM

Timothy Nest

Doctorat - UdeM

Co-superviseur⋅e :

Eilif Benjamin Muller

Mohammad Pezeshki

Collaborateur·rice de recherche

Co-superviseur⋅e :

Doctorat - McGill

Superviseur⋅e principal⋅e :

Pouya Bashivan

Mahta Ramezanian

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Guillaume Dumas

Roland Riachi

Collaborateur·rice de recherche - UdeM

Matthew Riemer

Doctorat - UdeM

Alexis Roger

Doctorat - McGill

Superviseur⋅e principal⋅e :

Blake Richards

Vaibhav Singh

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Publications

LLMs and Personalities: Inconsistencies Across Scales

Tosato Tommaso

Mahmood Hegazy

David Lemay

Mohammed Abukalam

Guillaume Dumas

This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and m… (voir plus)alleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." Larger models demonstrated more consistent responses, while persona prompts significantly influenced trait scores. Notably, the assistant persona led to more predictable scaling, with larger models exhibiting more socially desirable and less variable traits. In contrast, non-conventional personas displayed unpredictable behaviors, sometimes extending personality trait scores beyond the typical human range. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.

2024-10-09

NeurIPS.cc/2024/Workshop/Behavioral_ML (présentation orale)

LLMs and Personalities: Inconsistencies Across Scales

Tosato Tommaso

Mahmood Hegazy

David Lemay

Mohammed Abukalam

Guillaume Dumas

2024-10-09

NeurIPS.cc/2024/Workshop/Behavioral_ML (présentation orale)

RedPajama: an Open Dataset for Training Large Language Models

Maurice Weber

Daniel Y Fu

Quentin Gregory Anthony

Yonatan Oren

Shane Adams

Anton Alexandrov

Xiaozhong Lyu

Huu Nguyen

Xiaozhe Yao

Virginia Adams

Ben Athiwaratkun

Rahul Chalamala

Kezhen Chen

Max Ryabinin

Tri Dao

Percy Liang

Christopher Re

Ce Zhang

2024-09-26

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (spotlight)

Using Unity to Help Solve Reinforcement Learning

Connor Brennan

Andrew Robert Williams

Omar G. Younis

Vedant Vyas

Daria Yasafova

Leveraging the depth and flexibility of XLand as well as the rapid prototyping features of the Unity engine, we present the United Unity Uni… (voir plus)verse — an open-source toolkit designed to accelerate the creation of innovative reinforcement learning environments. This toolkit includes a robust implementation of XLand 2.0 complemented by a user-friendly interface which allows users to modify the details of procedurally generated terrains and task rules with ease. Additionally, we provide a curated selection of terrains and rule sets, accompanied by implementations of reinforcement learning baselines to facilitate quick experimentation with novel architectural designs for adaptive agents. Furthermore, we illustrate how the United Unity Universe serves as a high-level language that enables researchers to develop diverse and endlessly variable 3D environments within a unified framework. This functionality establishes the United Unity Universe (U3) as an essential tool for advancing the field of reinforcement learning, especially in the development of adaptive and generalizable learning systems.

2024-09-26

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

When Machines Outshine Humans in Object Recognition, Benchmarking Dilemma

Mohammad Javad Darvishi Bayazi

Md Rifat Arefin

Jocelyn Faubert

2024-09-15

Journal of Vision (publié)

Knowledge Distillation in Federated Learning: A Practical Guide

Alessio Mora

Irene Tenison

Paolo Bellavista

2024-08-01

Proceedings of the Thirty-ThirdInternational Joint Conference on Artificial Intelligence (publié)

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

Ayush Kaushal

Tejas Pandey

Tejas Vaidhya

Aaryan Bhagat

2024-07-17

ArXiv (prépublication)

arxiv.org

Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale

Ayush Kaushal

Tejas Pandey

Tejas Vaidhya

Aaryan Bhagat

2024-07-17

ArXiv (prépublication)

arxiv.org

Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale

Ayush Kaushal

Tejas Pandey

Tejas Vaidhya

Arnab Kumar Mondal

Aaryan Bhagat

2024-07-17

ArXiv (prépublication)

arxiv.org

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

Eugene Belilovsky

2024-07-08

TMLR (accepté)

Unsupervised Concept Discovery Mitigates Spurious Correlations

Md Rifat Arefin

Yan Zhang

Aristide Baratin

Francesco Locatello

Dianbo Liu

Kenji Kawaguchi

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

Ayush Kaushal

Tejas Vaidhya

Low Rank Decomposition of matrix - splitting a large matrix into a product of two smaller matrix offers a means for compression that reduces… (voir plus) the parameters of a model without sparsification, and hence delivering more speedup on modern hardware. Moreover, unlike quantization, the compressed linear layers remain fully differentiable and all the parameters trainable, while being able to leverage the existing highly efficient kernels over floating point matrices. We study the potential to compress Large Language Models (LLMs) for monolingual Code generation via Low Rank Decomposition (LoRD) and observe that ranks for the linear layers in these models can be reduced by upto 39.58% with less than 1% increase in perplexity. We then use Low Rank Decomposition (LoRD) to compress StarCoder 16B to 13.2B parameter with no drop and to 12.3B with minimal drop in HumanEval Pass@1 score, in less than 10 minutes on a single A100. The compressed models speeds up inference by up to 22.35% with just a single line of change in code over huggingface's implementation with pytorch backend. Low Rank Decomposition (LoRD) models remain compatible with state of the art near-lossless quantization method such as SpQR, which allows leveraging further compression gains of quantization. Lastly, QLoRA over Low Rank Decomposition (LoRD) model further reduces memory requirements by as much as 21.2% over vanilla QLoRA while offering similar gains from parameter efficient fine tuning. Our work shows Low Rank Decomposition (LoRD) as a promising new paradigm for LLM compression.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)