Siva Reddy

Biographie

Siva Reddy est professeur adjoint en informatique et linguistique à l’Université McGill. Ses travaux portent sur les algorithmes qui permettent aux ordinateurs de comprendre et de traiter les langues humaines. Il a fait ses études postdoctorales avec le Stanford NLP Group. Son expertise inclut la construction de symboliques linguistiques et induites et de modèles d’apprentissage profond pour le langage.

Étudiants actuels

Vaibhav Adlakha

Doctorat - McGill

Parishad BehnamGhader

Maîtrise recherche - McGill

Doctorat - McGill

Collaborateur·rice de recherche - McGill

Verna Dankers

Postdoctorat - University of Edinburgh

Jiaqi Deng

Collaborateur·rice de recherche

Charbel El Feghali

Stagiaire de recherche - McGill

Desmond Elliott

Visiteur de recherche indépendant

Co-superviseur⋅e :

Yoshua Bengio

Jay Gala

Maîtrise recherche - McGill

Co-superviseur⋅e :

Collaborateur·rice de recherche

Hanseok Hanseok Oh

Collaborateur·rice alumni

Doctorat - McGill

Co-superviseur⋅e :

Timothy O'Donnell

Imene Kerboua

Collaborateur·rice de recherche - INSA Lyon, France

Doctorat - McGill

Superviseur⋅e principal⋅e :

Golnoosh Farnadi

Austin Kraft

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Zichao Li

Doctorat - McGill

Co-superviseur⋅e :

Jackie Cheung

Fengyuan Liu

Maîtrise recherche - McGill

Co-superviseur⋅e :

Dzmitry Bahdanau

Xing Han Lu

Doctorat - McGill

Maîtrise recherche - McGill

Doctorat - McGill

Postdoctorat - McGill

Marzia Nouri

Maîtrise recherche - McGill

Arkil Patel

Doctorat - McGill

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - N/A

Ben Saine

Stagiaire de recherche - McGill

Dongchan Shin

Collaborateur·rice alumni

Karolina Ewa Stańczak

Collaborateur·rice alumni - McGill

Ivan Titov

Collaborateur·rice de recherche

Co-superviseur⋅e :

Yoshua Bengio

Ada Tur

Stagiaire de recherche - McGill

Doctorat - McGill

Collaborateur·rice alumni - McGill

Donghao Zeng

Stagiaire de recherche - McGill

Comment expliquer l’IA et s’assurer que cette explication est vraie? Les modèles mesurables de fidélité vous indiquent comment y parvenir

Billets de blogue

1 octobre 2024

par

Andrea Madsen

Siva Reddy

Sarath Chandar

Lire l'article

Publications

BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

Juan A. Rodriguez

Xiangru Jian

Siba Smarak Panigrahi

Tianyu Zhang

Aarash Feizi

Abhay Puri

Akshay Kalkunte Suresh

Amirhossein Abaskohi

Pierre-Andre Noel

Sanket Biswas … (voir 23 de plus)

Sara Shanian

Ying Zhang

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi Madhusudhan

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Nicolas Chapados

Sepideh Kharaghani

Sean Hughes

M. Özsu

Issam Hadj Laradji

Sai Rajeswar

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

2024-12-05

ArXiv (prépublication)

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Juan A. Rodriguez

Xiangru Jian

Siba Smarak Panigrahi

Tianyu Zhang

Aarash Feizi

Abhay Puri

Akshay Kalkunte Suresh

Amirhossein Abaskohi

Pierre-Andre Noel

Sanket Biswas … (voir 23 de plus)

Sara Shanian

Ying Zhang

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi Madhusudhan

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Nicolas Chapados

Sepideh Kharaghani

Sean Hughes

M. Özsu

Issam Hadj Laradji

Sai Rajeswar

2024-10-10

NeurIPS.cc/2024/Workshop/RBFM (poster)

VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning

Large language models (LLMs) are increasingly required to solve complex reasoning tasks, like mathematical problems, that involve multiple r… (voir plus)easoning steps before feedback is received. Effectively identifying and prioritizing key steps by accurately assigning credit to these intermediate steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning algorithm for finetuning LLMs, addresses the credit assignment problem by employing value networks to predict the expected cumulative rewards of intermediate states. In this work, we identify significant limitations with this value estimation method. To address this, we propose \methodname that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates of the intermediate values. VinePPO consistently outperforms standard PPO, doing so more efficiently and with lower divergence from the reference model. Our findings underscore the critical importance of accurate credit assignment in LLM post-training and present a simple, yet effective solution.

2024-10-09

NeurIPS.cc/2024/Workshop/MATH-AI (accepté)

VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning

2024-10-09

NeurIPS.cc/2024/Workshop/MATH-AI (accepté)

VinePPO: Refining Credit Assignment in RL Training of LLMs

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (voir plus)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a common reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, recent approaches achieve strong results without it, raising questions about the efficacy of value networks in practice. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they often produce poor estimate of expected return and barely outperform a random baseline when comparing alternative steps. This motivates our key question: Can improved credit assignment enhance RL training for LLMs? To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates. Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time (up to 3.0x). Crucially, it achieves higher test accuracy for a given training accuracy, capturing more generalization signal per sample. These results emphasize the importance of accurate credit assignment in RL training of LLM.

2024-10-02

ArXiv (prépublication)

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

2024-10-02

ArXiv (prépublication)

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (voir plus)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, value networks face challenges in predicting the expected cumulative rewards accurately in complex reasoning tasks, often leading to high-variance updates and suboptimal performance. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps. To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates, bypassing the need for large value networks. Our method consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets with fewer gradient updates (up to 9x), less wall-clock time (up to 3.0x). These results emphasize the importance of accurate credit assignment in RL finetuning of LLM and demonstrate VinePPO's potential as a superior alternative.

2024-10-02

ArXiv (prépublication)

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (voir plus)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, value networks face challenges in predicting the expected cumulative rewards accurately in complex reasoning tasks, often leading to high-variance updates and suboptimal performance. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps. To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates, bypassing the need for large value networks. Our method consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets with fewer gradient updates (up to 9x), less wall-clock time (up to 3.0x). These results emphasize the importance of accurate credit assignment in RL finetuning of LLM and demonstrate VinePPO's potential as a superior alternative.

2024-10-02

ArXiv (prépublication)

Learning Action and Reasoning-Centric Image Editing from Videos and Simulation

Dheeraj Vattikonda

Varun Jampani

2024-09-26

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (spotlight)

Are self-explanations from Large Language Models faithful?

Andreas Madsen

Sarath Chandar

Siva Reddy

2024-08-01

Findings of the Association for Computational Linguistics ACL 2024 (publié)

Benchmarking Vision Language Models for Cultural Understanding

Sjoerd van Steenkiste

Lisa Anne Hendricks

Karolina Stanczak

Aishwarya Agrawal

Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of vi… (voir plus)sual and linguistic data. However, their performance has been typically assessed on general scene understanding - recognizing objects, attributes, and actions - rather than cultural comprehension. This study introduces CulturalVQA, a visual question-answering benchmark aimed at assessing VLM’s geo-diverse cultural understanding. We curate a diverse collection of 2,378 image-question pairs with 1-5 answers per question representing cultures from 11 countries across 5 continents. The questions probe understanding of various facets of culture such as clothing, food, drinks, rituals, and traditions. Benchmarking VLMs on CulturalVQA, including GPT-4V and Gemini, reveals disparity in their level of cultural understanding across regions, with strong cultural understanding capabilities for North America while significantly weaker capabilities for Africa. We observe disparity in their performance across cultural facets too, with clothing, rituals, and traditions seeing higher performances than food and drink. These disparities help us identify areas where VLMs lack cultural understanding and demonstrate the potential of CulturalVQA as a comprehensive evaluation set for gauging VLM progress in understanding diverse cultures.

2024-07-15

ArXiv (prépublication)