Portrait de Chris Pal

Chris Pal

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Polytechnique Montréal, Département de génie informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage profond

Biographie

Christopher Pal est titulaire d'une chaire en IA Canada-CIFAR, professeur titulaire à Polytechnique Montréal et professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il est également chercheur émérite à ServiceNow Research. Il est engagé dans la recherche sur l'intelligence artificielle et l'apprentissage automatique depuis plus de 25 ans, publiant souvent des travaux sur les méthodes de modélisation du langage à grande échelle et les techniques de modélisation générative. Il a obtenu un doctorat en informatique à l'Université de Waterloo.

Étudiants actuels

Stagiaire de recherche - McGill
Postdoctorat - HEC
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Doctorat - Polytechnique
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Collaborateur·rice alumni - Polytechnique
Doctorat - Polytechnique
Postdoctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - Polytechnique
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Doctorat - Polytechnique
Doctorat - Polytechnique
Doctorat - École de technologie suprérieure
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - HEC
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique

Publications

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lu
Amirhossein Kazemnejad
Nicholas Meade
Arkil Patel
Dongchan Shin
Alejandra Zambrano
Karolina Stanczak
Peter Shaw
Web agents enable users to perform tasks on web browsers through natural language interaction. Evaluating web agents trajectories is an impo… (voir plus)rtant problem, since it helps us determine whether the agent successfully completed the tasks. Rule-based methods are widely used for this purpose, but they are challenging to extend to new tasks and may not always recognize successful trajectories. We may achieve higher accuracy through human evaluation, but the process would be substantially slower and more expensive. Automatic evaluations with LLMs may avoid the challenges of designing new rules and manually annotating trajectories, enabling faster and cost-effective evaluation. However, it is unclear how effective they are at evaluating web agents. To this end, we propose AgentRewardBench, the first benchmark to assess the effectiveness of LLM judges for evaluating web agents. AgentRewardBench contains 1302 trajectories across 5 benchmarks and 4 LLMs. Each trajectory in AgentRewardBench is reviewed by an expert, who answers questions pertaining to the success, side effects, and repetitiveness of the agent. Using our benchmark, we evaluate 12 LLM judges and find that no single LLM excels across all benchmarks. We also find that the rule-based evaluation used by common benchmarks tends to underreport the success rate of web agents, highlighting a key weakness of rule-based evaluation and the need to develop more flexible automatic evaluations. We release the benchmark at: https://agent-reward-bench.github.io
LitLLMs, LLMs for Literature Review: Are we there yet?
Shubham Agarwal
Gaurav Sahu
Abhay Puri
Issam Hadj Laradji
Krishnamurthy Dj Dvijotham
Jason Stanley
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
Luke Rowe
Roger Girgis
Anthony Gosselin
Felix Heide
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
Luke Rowe
Roger Girgis
Anthony Gosselin
Felix Heide
StarFlow: Generating Structured Workflow Outputs From Sketch Images
Patrice Bechard
Chao Wang
Amirhossein Abaskohi
Juan A. Rodriguez
David Vazquez
Spandana Gella
Sai Rajeswar
Perouz Taslakian
StarFlow: Generating Structured Workflow Outputs From Sketch Images
Patrice Bechard
Chao Wang
Amirhossein Abaskohi
Juan A. Rodriguez
David Vazquez
Spandana Gella
Sai Rajeswar
Perouz Taslakian
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Shravan Nayak
Xiangru Jian
Kevin Qinghong Lin
Juan A. Rodriguez
Montek Kalsi
Rabiul Awal
M. T. ¨Ozsu
David Vazquez
Perouz Taslakian
Spandana Gella
Sai Rajeswar
Human Annotator
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Ahmed Masry
Juan A. Rodriguez
Tianyu Zhang
Suyuchen Wang
Chao Wang
Aarash Feizi
Akshay Kalkunte Suresh
Abhay Puri
Xiangru Jian
Pierre-Andre Noel
Sathwik Tejaswi Madhusudhan
Enamul Hoque
Issam Hadj Laradji
David Vazquez
Perouz Taslakian … (voir 2 de plus)
Spandana Gella
Sai Rajeswar
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges… (voir plus) on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), often produce out-of-distribution or noisy inputs, leading to misalignment between the modalities. In this work, we propose a novel vision-text alignment method, AlignVLM, that maps visual features to a weighted average of LLM text embeddings. Our approach leverages the linguistic priors encoded by the LLM to ensure that visual features are mapped to regions of the space that the LLM can effectively interpret. AlignVLM is particularly effective for document understanding tasks, where scanned document images must be accurately mapped to their textual content. Our extensive experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods. We provide further analysis demonstrating improved vision-text feature alignment and robustness to noise.
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Rabiul Awal
Mahsa Massoud
Zichao Li
Aarash Feizi
Suyuchen Wang
David Vazquez
Juan A. Rodriguez
Perouz Taslakian
Spandana Gella
Sai Rajeswar
Understanding diverse web data and automating web development presents an exciting challenge for agentic AI. While existing benchmarks addre… (voir plus)ss isolated web-based tasks—such as website-based Visual Question Answering (VQA) and UI-to-code generation—they lack a unified evaluation suite for assessing web agents that interact with and reason about web environments. We introduce WebMMU, a large-scale benchmark for evaluating AI-driven web agents across multilingual website VQA, HTML/CSS/JavaScript code editing, and sketch-to-code generation. WebMMU provides a comprehensive evaluation suite with real-world website data, multi-step reasoning tasks, and functional UI understanding. Benchmarking state-of-the-art multimodal models on WebMMU reveals significant limitations in web-based reasoning, layout understanding, and structured code generation, particularly in preserving UI hierarchy, handling multilingual content, and producing robust, functional code. While most existing models are optimized for English-only settings, WebMMU highlights the challenges of cross-lingual adaptation in real-world web development. These findings expose critical gaps in current models’ ability to understand website structures, execute user instructions, and generate high-quality web code, underscoring the need for more advanced multimodal reasoning in AI-driven web understanding and development.
Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality
Ge Ya Luo
Gian Mario Favero
Zhi Hao Luo
Alexia Jolicoeur-Martineau
The Fréchet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectivenes… (voir plus)s relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average. Project page: https://oooolga.github.io/JEDi.github.io/.
Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality
Ge Ya Luo
Gian Mario Favero
Zhi Hao Luo
Alexia Jolicoeur-Martineau
The Fréchet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectivenes… (voir plus)s relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average. Project page: https://oooolga.github.io/JEDi.github.io/.
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez
Xiangru Jian
Siba Smarak Panigrahi
Tianyu Zhang
Aarash Feizi
Abhay Puri
Akshay Kalkunte Suresh
François Savard
Ahmed Masry
Shravan Nayak
Rabiul Awal
Mahsa Massoud
Amirhossein Abaskohi
Zichao Li
Suyuchen Wang
Pierre-Andre Noel
Mats Leon Richter
Saverio Vadacchino
Shubham Agarwal
Sanket Biswas … (voir 19 de plus)
Sara Shanian
Ying Zhang
Sathwik Tejaswi Madhusudhan
Joao Monteiro
Krishnamurthy Dj Dvijotham
Torsten Scholak
Sepideh Kharaghani
Sean Hughes
M. Özsu
Issam Hadj Laradji
Spandana Gella
Perouz Taslakian
David Vazquez
Sai Rajeswar
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to relevant training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure that our data is high quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench,, a benchmark suite with 10 novel tasks where we carefully create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench, improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations revealed that participants preferred the outputs from models trained with BigDocs over those from GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning.