Publications

Adaptive Exploration for Data-Efficient General Value Function Evaluations

Arushi Jain

Josiah P. Hanna

General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each… (voir plus) GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This leaves an open question: how can behavior policy be chosen for data-efficient GVF learning? To address this gap, we propose GVFExplorer, which aims at learning a behavior policy that efficiently gathers data for evaluating multiple GVFs in parallel. This behavior policy selects actions in proportion to the total variance in the return across all GVFs, reducing the number of environmental interactions. To enable accurate variance estimation, we use a recently proposed temporal-difference-style variance estimator. We prove that each behavior policy update reduces the mean squared error in the summed predictions over all GVFs. We empirically demonstrate our method's performance in both tabular representations and nonlinear function approximation.

2024-05-13

ArXiv (prépublication)

doi.org

arxiv.org

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

Qiao Gu

Alihusein Kuwajerwala

Sacha Morin

Krishna Murthy

Bipasha Sen

Aditya Agarwal

Corban Rivera

William Paul

Kirsty Ellis

Rama Chellappa

Chuang Gan

Celso M de Melo

Joshua B. Tenenbaum

Antonio Torralba

Florian Shkurti

Liam Paull

For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and effi… (voir plus)cient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, which do not scale well in larger environments, nor do they contain semantic spatial relationships between entities in the environment, which are useful for downstream planning. In this work, we propose ConceptGraphs, an open-vocabulary graph-structured representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association. The resulting representations generalize to novel semantic classes, without the need to collect large 3D datasets or finetune models. We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts. (Project page: https://concept-graphs.github.io/ Explainer video: https://youtu.be/mRhNkQwRYnc )

2024-05-13

2024 IEEE International Conference on Robotics and Automation (ICRA) (publié)

doi.org

openreview.net

Divergent Creativity in Humans and Large Language Models

Antoine Bellemare-Pepin

Franccois Lespinasse

Philipp Thölke

Yann Harel

Kory Wallace Mathewson

Jay A. Olson

Yoshua Bengio

Karim Jerbi CoCo Lab

Psychology Department

U. Montr'eal

Montreal

Qc

Canada

Music department

C. University

Sociology

Anthropology department

Mila

Departmentof Psychology

University of Toronto Mississauga … (voir 5 de plus)

Mississauga

On

Department of Computer Science

Operations Research

Unique Center

The recent surge in the capabilities of Large Language Models (LLMs) has led to claims that they are approaching a level of creativity akin … (voir plus)to human capabilities. This idea has sparked a blend of excitement and apprehension. However, a critical piece that has been missing in this discourse is a systematic evaluation of LLM creativity, particularly in comparison to human divergent thinking. To bridge this gap, we leverage recent advances in creativity science to build a framework for in-depth analysis of divergent creativity in both state-of-the-art LLMs and a substantial dataset of 100,000 humans. We found evidence suggesting that LLMs can indeed surpass human capabilities in specific creative tasks such as divergent association and creative writing. Our quantitative benchmarking framework opens up new paths for the development of more creative LLMs, but it also encourages more granular inquiries into the distinctive elements that constitute human inventive thought processes, compared to those that can be artificially generated.

2024-05-13

ArXiv (prépublication)

doi.org

arxiv.org

GAGE: Genetic Algorithm-Based Graph Explainer for Malware Analysis

Mohd Saqib

Benjamin Fung

Philippe Charland

Andrew Walenstein

Malware analysts often prefer reverse engineering using Call Graphs, Control Flow Graphs (CFGs), and Data Flow Graphs (DFGs), which involves… (voir plus) the utilization of black-box Deep Learning (DL) models. The proposed research introduces a structured pipeline for reverse engineering-based analysis, offering promising results compared to state-of-the-art methods and providing high-level interpretability for malicious code blocks in subgraphs. We propose the Canonical Executable Graph (CEG) as a new representation of Portable Executable (PE) files, uniquely incorporating syntactical and semantic information into its node embeddings. At the same time, edge features capture structural aspects of PE files. This is the first work to present a PE file representation encompassing syntactical, semantic, and structural characteristics, whereas previous efforts typically focused solely on syntactic or structural properties. Furthermore, recognizing the limitations of existing graph explanation methods within Explainable Artificial Intelligence (XAI) for malware analysis, primarily due to the specificity of malicious files, we introduce Genetic Algorithm-based Graph Explainer (GAGE). GAGE operates on the CEG, striving to identify a precise subgraph relevant to predicted malware families. Through experiments and comparisons, our proposed pipeline exhibits substantial improvements in model robustness scores and discriminative power compared to the previous benchmarks. Furthermore, we have successfully used GAGE in practical applications on real-world data, producing meaningful insights and interpretability. This research offers a robust solution to enhance cybersecurity by delivering a transparent and accurate understanding of malware behaviour. Moreover, the proposed algorithm is specialized in handling graph-based data, effectively dissecting complex content and isolating influential nodes.

2024-05-13

IEEE International Conference on Data Engineering (publié)

doi.org

Globally Stable Neural Imitation Policies

Amin Abyaneh

Mariana Sosa Guzmán

Hsiu-Chin Lin

2024-05-13

2024 IEEE International Conference on Robotics and Automation (ICRA) (publié)

doi.org

arxiv.org

TEMPLATES: Characterization of a Merger in the Dusty Lensing SPT0418-47 System

Jared Cathey

Anthony H. Gonzalez

Sidney Lower

Kedar A. Phadke

Justin Spilker

Manuel Aravena

Matthew Bayliss

Jack E. Birkin

Simon Birrer

Scott Chapman

Håkon Dahle

Christopher C. Hayward

Yashar Hezaveh

Ryley Hill

Taylor A. Hutchison

Keunho J. Kim

Guillaume Mahler

Daniel P. Marrone

Desika Narayanan

Alexander Navarre … (voir 7 de plus)

Cassie Reuter

Jane R Rigby

Keren Sharon

Manuel Solimano

Nikolaus Sulzenauer

Joaquin Vieira

David Vizgan

2024-05-13

The Astrophysical Journal (publié)

doi.org

arxiv.org

The 1st International Workshop on Graph Foundation Models (GFM)

Haitao Mao

Jianan Zhao

Xiaoxin He

Zhikai Chen

Qian Huang

Zhaocheng Zhu

Jian Tang

Micheal Bronstein

Xavier Bresson

Bryan Hooi

Haiyang Zhang

Xianfeng Tang

Luo Chen

Jiliang Tang

2024-05-13

Companion Proceedings of the ACM on Web Conference 2024 (publié)

doi.org

An AI-Resilient Text Rendering Technique for Reading and Skimming Documents

Ziwei Gu

Ian Arawjo

Kenneth Li

Jonathan K. Kummerfeld

Elena L. Glassman

2024-05-11

Proceedings of the CHI Conference on Human Factors in Computing Systems (publié)

doi.org

arxiv.org

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Ian Arawjo

Chelse Swoopes

Priyan Vaithilingam

Martin Wattenberg

Elena L. Glassman

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that… (voir plus) go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.

2024-05-11

Proceedings of the CHI Conference on Human Factors in Computing Systems (publié)

doi.org

arxiv.org

Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

Boyd Branch

Piotr Mirowski

Kory Wallace Mathewson

Sophia Ppali

Alexandra Covaci

Social robotics researchers are increasingly interested in multi-party trained conversational agents. With a growing demand for real-world e… (voir plus)valuations, our study presents Large Language Models (LLMs) deployed in a month-long live show at the Edinburgh Festival Fringe. This case study investigates human improvisers co-creating with conversational agents in a professional theatre setting. We explore the technical capabilities and constraints of on-the-spot multi-party dialogue, providing comprehensive insights from both audience and performer experiences with AI on stage. Our human-in-the-loop methodology underlines the challenges of these LLMs in generating context-relevant responses, stressing the user interface's crucial role. Audience feedback indicates an evolving interest for AI-driven live entertainment, direct human-AI interaction, and a diverse range of expectations about AI's conversational competence and utility as a creativity support tool. Human performers express immense enthusiasm, varied satisfaction, and the evolving public opinion highlights mixed emotions about AI's role in arts.

2024-05-11

ArXiv (prépublication)

doi.org

arxiv.org

Calibration‐free parallel transmission of the cervical, thoracic, and lumbar spinal cord at <scp>7T</scp>

Christoph S. Aigner

Manuel F. Sánchez Alarcon

Alexandre D'Astous

Eva Alonso‐Ortiz

Julien Cohen-Adad

Sebastian Schmitter

2024-05-10

Magnetic Resonance in Medicine (publié)

doi.org

Repeat it without me: Crowdsourcing the T1 mapping common ground via the ISMRM reproducibility challenge.

Mathieu Boudreau

Agah Karakuzu

Julien Cohen-Adad

Ecem Bozkurt

Madeline Carr

Marco Castellaro

Luis Concha

Mariya Doneva

Seraina A. Dual

Alex Ensworth

Alexandru Foias

Véronique Fortier

Refaat E. Gabr

Guillaume Gilbert

Carri K. Glide‐Hurst

Matthew Grech‐Sollars

Siyuan Hu

Oscar Jalnefjord

Jorge Jovicich

Kübra Keskin … (voir 22 de plus)

Peter Koken

Anastasia Kolokotronis

Simran Kukran

Nam G. Lee

Ives R. Levesque

Bochao Li

Dan Ma

Burkhard Mädler

Nyasha G. Maforo

Jamie Near

Erick Pasaye

Alonso Ramirez‐Manzanares

Ben Statton

Christian Stehning

Stefano Tambalo

Ye Tian

Chenyang Wang

Kilian Weiss

Niloufar Zakariaei

Shuo Zhang

Ziwei Zhao

Nikola Stikov

PURPOSE T1 mapping is a widely used quantitative MRI technique, but its tissue-specific values remain inconsistent across protocols, sites, … (voir plus)and vendors. The ISMRM Reproducible Research and Quantitative MR study groups jointly launched a challenge to assess the reproducibility of a well-established inversion-recovery T1 mapping technique, using acquisition details from a seminal T1 mapping paper on a standardized phantom and in human brains. METHODS The challenge used the acquisition protocol from Barral et al. (2010). Researchers collected T1 mapping data on the ISMRM/NIST phantom and/or in human brains. Data submission, pipeline development, and analysis were conducted using open-source platforms. Intersubmission and intrasubmission comparisons were performed. RESULTS Eighteen submissions (39 phantom and 56 human datasets) on scanners by three MRI vendors were collected at 3 T (except one, at 0.35 T). The mean coefficient of variation was 6.1% for intersubmission phantom measurements, and 2.9% for intrasubmission measurements. For humans, the intersubmission/intrasubmission coefficient of variation was 5.9/3.2% in the genu and 16/6.9% in the cortex. An interactive dashboard for data visualization was also developed: https://rrsg2020.dashboards.neurolibre.org. CONCLUSION The T1 intersubmission variability was twice as high as the intrasubmission variability in both phantoms and human brains, indicating that the acquisition details in the original paper were insufficient to reproduce a quantitative MRI protocol. This study reports the inherent uncertainty in T1 measures across independent research groups, bringing us one step closer to a practical clinical baseline of T1 variations in vivo.

2024-05-10

Magnetic Resonance in Medicine (publié)

doi.org

Le traitement du langage naturel à l'ère de l'IA générative

Boussole des politiques en IA

Vie étudiante et ressources

Publications

Le traitement du langage naturel à l'ère de l'IA générative

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications