Portrait de Nicolas Chapados

Nicolas Chapados

Membre industriel associé
Professeur adjoint, Polytechnique Montréal, Département de mathématiques appliquées
Vice-président, Recherche, ServiceNow Research
Sujets de recherche
Apprentissage profond

Biographie

Nicolas Chapados est vice-président de la recherche chez ServiceNow Inc. Il est titulaire d'un diplôme d'ingénieur de l'Université McGill et d'un doctorat en informatique de l'Université de Montréal. Conjointement avec son directeur de thèse, Yoshua Bengio, il a fondé ApSTAT Technologies en 2001. ApSTAT est une entreprise de transfert technologique visant à développer en contexte industriel des idées de pointe en apprentissage automatique, dans des domaines tels que l'évaluation des risques d'assurance, la planification de la chaîne d'approvisionnement, les prévisions commerciales, la biotechnologie et la gestion des fonds de couverture. À partir de ces travaux, il a également cofondé des entreprises dérivées : Imagia, pour détecter et quantifier précocement le cancer grâce à l'analyse d'images médicales par l'IA; Element AI (acquise par ServiceNow en janvier 2021); et Chapados Couture Capital, un gestionnaire d'actifs quantitatifs. Ses intérêts de recherche comprennent la modélisation de séries temporelles, le traitement du langage naturel et la prise de décisions. Il est titulaire du titre d'analyste financier agréé (CFA).

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :

Publications

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Shravan Nayak
Xiangru Jian
Kevin Qinghong Lin
Juan A. Rodriguez
Montek Kalsi
Rabiul Awal
M. T. ¨Ozsu
David Vazquez
Perouz Taslakian
Spandana Gella
Sai Rajeswar
Human Annotator
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier de Chezelles
Alexandre Lacoste
Massimo Caccia
Léo Boisvert
Megh Thakkar
Tom Marty
Rim Assouel
Sahar Omidi Shayegan
Lawrence Keunho Jang
Xing Han Lu
Ori Yoran
Dehan Kong
Frank F. Xu
Graham Neubig
Russ Salakhutdinov
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (voir plus)utomation and Large Language Models (LLMs) for web interaction tasks. Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. BrowserGym aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. Combined with AgentLab, a complementary framework that aids in agent creation, testing, and analysis, BrowserGym offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. This standardized approach seeks to reduce the time and complexity of developing web agents, supporting more reliable comparisons and facilitating in-depth analysis of agent behaviors, and could result in more adaptable, capable agents, ultimately accelerating innovation in LLM-driven automation. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across all benchmarks currently available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Ahmed Masry
Juan A. Rodriguez
Tianyu Zhang
Suyuchen Wang
Chao Wang
Aarash Feizi
Akshay Kalkunte Suresh
Abhay Puri
Xiangru Jian
Pierre-Andre Noel
Sathwik Tejaswi Madhusudhan
Enamul Hoque
Issam Hadj Laradji
David Vazquez
Perouz Taslakian … (voir 2 de plus)
Spandana Gella
Sai Rajeswar
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges… (voir plus) on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), often produce out-of-distribution or noisy inputs, leading to misalignment between the modalities. In this work, we propose a novel vision-text alignment method, AlignVLM, that maps visual features to a weighted average of LLM text embeddings. Our approach leverages the linguistic priors encoded by the LLM to ensure that visual features are mapped to regions of the space that the LLM can effectively interpret. AlignVLM is particularly effective for document understanding tasks, where scanned document images must be accurately mapped to their textual content. Our extensive experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods. We provide further analysis demonstrating improved vision-text feature alignment and robustness to noise.
Societal Alignment Frameworks Can Improve LLM Alignment
Karolina Sta'nczak
Nicholas Meade
Mehar Bhatia
Hattie Zhou
Konstantin Bottinger
Jeremy Barnes
Jason Stanley
Jessica Montgomery
Richard Zemel
Nicolas Papernot
Denis Therien
Timothy P. Lillicrap
Ana Marasovi'c
Sylvie Delacroix
Gillian K. Hadfield
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez
Xiangru Jian
Siba Smarak Panigrahi
Tianyu Zhang
Aarash Feizi
Abhay Puri
Akshay Kalkunte Suresh
François Savard
Ahmed Masry
Shravan Nayak
Rabiul Awal
Mahsa Massoud
Amirhossein Abaskohi
Zichao Li
Suyuchen Wang
Pierre-Andre Noel
Mats Leon Richter
Saverio Vadacchino
Shubham Agarwal
Sanket Biswas … (voir 19 de plus)
Sara Shanian
Ying Zhang
Sathwik Tejaswi Madhusudhan
Joao Monteiro
Krishnamurthy Dj Dvijotham
Torsten Scholak
Sepideh Kharaghani
Sean Hughes
M. Özsu
Issam Hadj Laradji
Spandana Gella
Perouz Taslakian
David Vazquez
Sai Rajeswar
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to relevant training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure that our data is high quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench,, a benchmark suite with 10 novel tasks where we carefully create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench, improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations revealed that participants preferred the outputs from models trained with BigDocs over those from GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning.
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez
Xiangru Jian
Siba Smarak Panigrahi
Tianyu Zhang
Aarash Feizi
Abhay Puri
Akshay Kalkunte Suresh
François Savard
Ahmed Masry
Shravan Nayak
Rabiul Awal
Mahsa Massoud
Amirhossein Abaskohi
Zichao Li
Suyuchen Wang
Pierre-Andre Noel
Mats Leon Richter
Saverio Vadacchino
Shubham Agarwal
Sanket Biswas … (voir 23 de plus)
Sara Shanian
Ying Zhang
Noah Bolger
Kurt MacDonald
Simon Fauvel
Sathwik Tejaswi Madhusudhan
Srinivas Sunkara
Joao Monteiro
Krishnamurthy Dj Dvijotham
Torsten Scholak
Sepideh Kharaghani
Sean Hughes
M. Özsu
Issam Hadj Laradji
Spandana Gella
Perouz Taslakian
David Vazquez
Sai Rajeswar
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Gaurav Sahu
Abhay Puri
Juan A. Rodriguez
Amirhossein Abaskohi
Mohammad Chegini
Perouz Taslakian
Valentina Zantedeschi
Alexandre Lacoste
David Vazquez
Sai Rajeswar
Issam Hadj Laradji
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier de Chezelles
Alexandre Lacoste
Massimo Caccia
Léo Boisvert
Megh Thakkar
Tom Marty
Rim Assouel
Sahar Omidi Shayegan
Lawrence Jang
Xing Han Lu
Ori Yoran
Dehan Kong
Frank F. Xu
Graham Neubig
Ruslan Salakhutdinov
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (voir plus)utomation and Large Language Models (LLMs) for web interaction tasks. Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. BrowserGym aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. Combined with AgentLab, a complementary framework that aids in agent creation, testing, and analysis, BrowserGym offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. This standardized approach seeks to reduce the time and complexity of developing web agents, supporting more reliable comparisons and facilitating in-depth analysis of agent behaviors, and could result in more adaptable, capable agents, ultimately accelerating innovation in LLM-driven automation. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across all benchmarks currently available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier de Chezelles
Alexandre Lacoste
Massimo Caccia
Léo Boisvert
Megh Thakkar
Tom Marty
Rim Assouel
Sahar Omidi Shayegan
Lawrence Jang
Xing Han Lu
Ori Yoran
Dehan Kong
Frank F. Xu
Graham Neubig
Ruslan Salakhutdinov
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (voir plus)utomation and Large Language Models (LLMs) for web interaction tasks. Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. BrowserGym aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. Combined with AgentLab, a complementary framework that aids in agent creation, testing, and analysis, BrowserGym offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. This standardized approach seeks to reduce the time and complexity of developing web agents, supporting more reliable comparisons and facilitating in-depth analysis of agent behaviors, and could result in more adaptable, capable agents, ultimately accelerating innovation in LLM-driven automation. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across all benchmarks currently available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier de Chezelles
Alexandre Lacoste
Massimo Caccia
Léo Boisvert
Megh Thakkar
Tom Marty
Rim Assouel
Sahar Omidi Shayegan
Lawrence Jang
Xing Han Lu
Ori Yoran
Dehan Kong
Frank F. Xu
Graham Neubig
Ruslan Salakhutdinov
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (voir plus)utomation and Large Language Models (LLMs) for web interaction tasks. Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. BrowserGym aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. Combined with AgentLab, a complementary framework that aids in agent creation, testing, and analysis, BrowserGym offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. This standardized approach seeks to reduce the time and complexity of developing web agents, supporting more reliable comparisons and facilitating in-depth analysis of agent behaviors, and could result in more adaptable, capable agents, ultimately accelerating innovation in LLM-driven automation. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across all benchmarks currently available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Juan Rodriguez
Xiangru Jian
Siba Smarak Panigrahi
Tianyu Zhang
Aarash Feizi
Abhay Puri
Akshay Kalkunte
Franccois Savard
Ahmed Masry
Shravan Nayak
Rabiul Awal
Mahsa Massoud
Amirhossein Abaskohi
Zichao Li
Suyuchen Wang
Pierre-Andre Noel
M. L. Richter
Saverio Vadacchino
Shubbam Agarwal
Sanket Biswas … (voir 23 de plus)
Sara Shanian
Ying Zhang
Noah Bolger
Kurt MacDonald
Simon Fauvel
Sathwik Tejaswi
Srinivas Sunkara
Joao Monteiro
Krishnamurthy Dj Dvijotham
Torsten Scholak
Sepideh Kharagani
Sean Hughes
M. Özsu
Issam Hadj Laradji
Spandanna Gella
Perouz Taslakian
David Vazquez
Sai Rajeswar
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Juan Rodriguez
Xiangru Jian
Siba Smarak Panigrahi
Tianyu Zhang
Aarash Feizi
Abhay Puri
Akshay Kalkunte
Franccois Savard
Ahmed Masry
Shravan Nayak
Rabiul Awal
Mahsa Massoud
Amirhossein Abaskohi
Zichao Li
Suyuchen Wang
Pierre-Andre Noel
Mats Leon Richter
Saverio Vadacchino
Shubbam Agarwal
Sanket Biswas … (voir 23 de plus)
Sara Shanian
Ying Zhang
Noah Bolger
Kurt MacDonald
Simon Fauvel
Sathwik Tejaswi
Srinivas Sunkara
Joao Monteiro
Krishnamurthy Dj Dvijotham
Torsten Scholak
Sepideh Kharaghani
Sean Hughes
M. Özsu
Issam Hadj Laradji
Spandanna Gella
Perouz Taslakian
David Vazquez
Sai Rajeswar
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .