Publications

VisMin: Visual Minimal-Change Understanding

Saba Ahmadi

Fine-grained understanding of objects, attributes, and relationships between objects is crucial for visual-language models (VLMs). Existing … (voir plus)benchmarks primarily focus on evaluating VLMs' capability to distinguish between two very similar \textit{captions} given an image. In this paper, we introduce a new, challenging benchmark termed \textbf{Vis}ual \textbf{Min}imal-Change Understanding (VisMin), which requires models to predict the correct image-caption match given two images and two captions. The image pair and caption pair contain minimal changes, i.e., only one aspect changes at a time from among the following: \textit{object}, \textit{attribute}, \textit{count}, and \textit{spatial relation}. These changes test the models' understanding of objects, attributes (such as color, material, shape), counts, and spatial relationships between objects. We built an automatic framework using large language models and diffusion models, followed by a rigorous 4-step verification process by human annotators. Empirical experiments reveal that current VLMs exhibit notable deficiencies in understanding spatial relationships and counting abilities. We also generate a large-scale training dataset to finetune CLIP and Idefics2, showing significant improvements in fine-grained understanding across benchmarks and in CLIP's general image-text alignment. We release all resources, including the benchmark, training data, and finetuned model checkpoints, at https://vismin.net/.

2024-07-23

ArXiv (prépublication)

doi.org

arxiv.org

VisMin: Visual Minimal-Change Understanding

Rabiul Awal

Saba Ahmadi

Le Zhang

Aishwarya Agrawal

Fine-grained understanding of objects, attributes, and relationships between objects is crucial for visual-language models (VLMs). Existing … (voir plus)benchmarks primarily focus on evaluating VLMs' capability to distinguish between two very similar \textit{captions} given an image. In this paper, we introduce a new, challenging benchmark termed \textbf{Vis}ual \textbf{Min}imal-Change Understanding (VisMin), which requires models to predict the correct image-caption match given two images and two captions. The image pair and caption pair contain minimal changes, i.e., only one aspect changes at a time from among the following: \textit{object}, \textit{attribute}, \textit{count}, and \textit{spatial relation}. These changes test the models' understanding of objects, attributes (such as color, material, shape), counts, and spatial relationships between objects. We built an automatic framework using large language models and diffusion models, followed by a rigorous 4-step verification process by human annotators. Empirical experiments reveal that current VLMs exhibit notable deficiencies in understanding spatial relationships and counting abilities. We also generate a large-scale training dataset to finetune CLIP and Idefics2, showing significant improvements in fine-grained understanding across benchmarks and in CLIP's general image-text alignment. We release all resources, including the benchmark, training data, and finetuned model checkpoints, at https://vismin.net/.

2024-07-23

ArXiv (prépublication)

doi.org

arxiv.org

Wasserstein Distributionally Robust Shallow Convex Neural Networks

Julien Pallage

Antoine Lesage-Landry

2024-07-23

ArXiv (prépublication)

doi.org

arxiv.org

Wasserstein Distributionally Robust Shallow Convex Neural Networks

Julien Pallage

Antoine Lesage-Landry

In this work, we propose Wasserstein distributionally robust shallow convex neural networks (WaDiRo-SCNNs) to provide reliable nonlinear pre… (voir plus)dictions when subject to adverse and corrupted datasets. Our approach is based on a new convex training program for

2024-07-23

ArXiv (prépublication)

doi.org

arxiv.org

A Rapid Method for Impact Analysis of Grid-Edge Technologies on Power Distribution Networks

Feng Li

Ilhan Kocar

Antoine Lesage-Landry

This paper presents a novel rapid estimation method (REM) to perform stochastic impact analysis of grid-edge technologies (GETs) to the powe… (voir plus)r distribution networks. The evolution of network states' probability density functions (PDFs) in terms of GET penetration levels are characterized by the Fokker-Planck equation (FPE). The FPE is numerically solved to compute the PDFs of network states, and a calibration process is also proposed such that the accuracy of the REM is maintained for large-scale distribution networks. The approach is illustrated on a large-scale realistic distribution network using a modified version of the IEEE 8500 feeder, where electric vehicles (EVs) or photovoltaic systems (PVs) are installed at various penetration rates. It is demonstrated from quantitative analyses that the results from our proposed approach have negligible errors comparing with those obtained from Monte Carlo simulations.

2024-07-21

2024 IEEE Power & Energy Society General Meeting (PESGM) (publié)

doi.org

Improving Context-Aware Preference Modeling for Language Models

Silviu Pitis

Ziang Xiao

Nicolas Le Roux

Alessandro Sordoni

While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language pr… (voir plus)esents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute context-conditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B on tested datasets, and (3) investigate the value of context-aware preference modeling.

2024-07-20

ArXiv (prépublication)

doi.org

arxiv.org

Improving Context-Aware Preference Modeling for Language Models

Silviu Pitis

Ziang Xiao

Nicolas Le Roux

Alessandro Sordoni

While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language pr… (voir plus)esents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute context-conditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B on tested datasets, and (3) investigate the value of context-aware preference modeling.

2024-07-20

ArXiv (prépublication)

doi.org

arxiv.org

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

Yili Li

Jing Yu

Keke Gai

Bang Liu

Gang Xiong

Qi Wu

Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, wh… (voir plus)ich are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in natural language processing and computer vision, and have been successfully applied in document retrieval, but their application in multimodal retrieval remains unexplored. To enhance retrieval efficiency, in this paper, we introduce a model-based video indexer named T2VIndexer, which is a sequence-to-sequence generative model directly generating video identifiers and retrieving candidate videos with constant time complexity. T2VIndexer aims to reduce retrieval time while maintaining high accuracy. To achieve this goal, we propose video identifier encoding and query-identifier augmentation approaches to represent videos as short sequences while preserving their semantic information. Our method consistently enhances the retrieval efficiency of current state-of-the-art models on four standard datasets. It enables baselines with only 30%-50% of the original retrieval time to achieve better retrieval performance on MSR-VTT (+1.0%), MSVD (+1.8%), ActivityNet (+1.5%), and DiDeMo (+0.2%). The code is available at https://anonymous.4open.science/r/T2VIndexer-40BE.

2024-07-20

acmmm.org/ACMMM/2024/Conference (présentation orale)

openreview.net

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

Yili Li

Jing Yu

Keke Gai

Bang Liu

Gang Xiong

Qi Wu

Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, wh… (voir plus)ich are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in natural language processing and computer vision, and have been successfully applied in document retrieval, but their application in multimodal retrieval remains unexplored. To enhance retrieval efficiency, in this paper, we introduce a model-based video indexer named T2VIndexer, which is a sequence-to-sequence generative model directly generating video identifiers and retrieving candidate videos with constant time complexity. T2VIndexer aims to reduce retrieval time while maintaining high accuracy. To achieve this goal, we propose video identifier encoding and query-identifier augmentation approaches to represent videos as short sequences while preserving their semantic information. Our method consistently enhances the retrieval efficiency of current state-of-the-art models on four standard datasets. It enables baselines with only 30%-50% of the original retrieval time to achieve better retrieval performance on MSR-VTT (+1.0%), MSVD (+1.8%), ActivityNet (+1.5%), and DiDeMo (+0.2%). The code is available at https://anonymous.4open.science/r/T2VIndexer-40BE.

2024-07-20

acmmm.org/ACMMM/2024/Conference (présentation orale)

openreview.net

Temporal Residual Jacobians For Rig-free Motion Transfer

Sanjeev Muralikrishnan

Niladri Shekhar Dutt

Siddhartha Chaudhuri

Noam Aigerman

Vladimir Kim

Matthew Fisher

Niloy J. Mitra

We introduce Temporal Residual Jacobians as a novel representation to enable data-driven motion transfer. Our approach does not assume acces… (voir plus)s to any rigging or intermediate shape keyframes, produces geometrically and temporally consistent motions, and can be used to transfer long motion sequences. Central to our approach are two coupled neural networks that individually predict local geometric and temporal changes that are subsequently integrated, spatially and temporally, to produce the final animated meshes. The two networks are jointly trained, complement each other in producing spatial and temporal signals, and are supervised directly with 3D positional information. During inference, in the absence of keyframes, our method essentially solves a motion extrapolation problem. We test our setup on diverse meshes (synthetic and scanned shapes) to demonstrate its superiority in generating realistic and natural-looking animations on unseen body shapes against SoTA alternatives. Supplemental video and code are available at https://temporaljacobians.github.io/ .

2024-07-20

ArXiv (prépublication)

doi.org

arxiv.org

Temporal Residual Jacobians For Rig-free Motion Transfer

Sanjeev Muralikrishnan

Niladri Shekhar Dutt

Siddhartha Chaudhuri

Noam Aigerman

Vladimir Kim

Matthew Fisher

Niloy J. Mitra

We introduce Temporal Residual Jacobians as a novel representation to enable data-driven motion transfer. Our approach does not assume acces… (voir plus)s to any rigging or intermediate shape keyframes, produces geometrically and temporally consistent motions, and can be used to transfer long motion sequences. Central to our approach are two coupled neural networks that individually predict local geometric and temporal changes that are subsequently integrated, spatially and temporally, to produce the final animated meshes. The two networks are jointly trained, complement each other in producing spatial and temporal signals, and are supervised directly with 3D positional information. During inference, in the absence of keyframes, our method essentially solves a motion extrapolation problem. We test our setup on diverse meshes (synthetic and scanned shapes) to demonstrate its superiority in generating realistic and natural-looking animations on unseen body shapes against SoTA alternatives. Supplemental video and code are available at https://temporaljacobians.github.io/ .

2024-07-20

ArXiv (prépublication)

doi.org

arxiv.org

ANDES, the high resolution spectrograph for the ELT: science goals, project overview, and future developments

Alessandro Marconi

Artur R. Abreu

Vardan Adibekyan

Valentina Alberti

Simon Albrecht

Jailson Alcaniz

Matteo Aliverti

Carlos Allende Prieto

Julian Alvarado-Gomez

Catarina Alves

Pedro J. Amado

Manuel Amate

Michael Andersen

Simone Antoniucci

E. Artigau

Christophe Bailet

Clark E. Baker

Veronica Baldini

Andrea Balestra

S.A. Barnes … (voir 269 de plus)

Frédérique Baron

Susana Barros

Svend-Marian Bauer

Mathilde Beaulieu

Olga Bellido-Tirado

Björn Benneke

Thomas Bensby

Edwin Bergin

P. Berio

Katia Biazzo

Laurent Bigot

Arjan Bik

Jayne L. Birkby

Nicolas Blind

Olivier Boebion

Isabelle Boisse

Emeline Bolmont

J. S. Bolton

Marco Bonaglia

Xavier Bonfils

Lea Bonhomme

Francesco Borsa

Jean-Claude Bouret

Alexis Brandeker

Wolfgang Brandner

Christopher H. Broeg

Matteo Brogi

Denis Brousseau

Anna Brucalassi

Joar G. Brynnel

Lars A. Buchhave

David F. Buscher

Lorenzo Cabona

A. Cabral

Giorgio Calderone

Rocío Calvo-Ortega

Faustine Cantalloube

Bruno L. Canto Martins

Luca Carbonaro

Yan Caujolle

Gaël Chauvin

Bruno Chazelas

Anne-Laure L. Cheffot

Yuk Shan Cheng

Andrea Chiavassa

Lise B. Christensen

Roberto Cirami

Michele Cirasuolo

Neil J. Cook

Ryan Cooke

Igor Coretti

Stefano Covino

Nicolas B. Cowan

Giovanni Cresci

Stefano Cristiani

Vanderlei Cunha Parro

Guido Cupani

Valentina D'Odorico

Kamalesh Dadi

Izan C. de Castro Leão

Annalisa De Cia

Jose R. De Medeiros

Florian Debras

Michael Debus

Alain Delorme

Olivier Demangeon

Frederic Derie

M. Dessauges-Zavadsky

Paolo Di Marcantonio

Simona Di Stefano

Frank Dionies

Armando Domiciano de Souza

René Doyon

Jennifer S. Dunn

Sébastien E. Egner

David Ehrenreich

Joao P. Faria

Debora Ferruzzi

Chiara Feruglio

Martin Fisher

Adriano Fontana

B S. Frank

C. Fuesslein

M. Fumagalli

Thierry Fusco

Johan P. U. Fynbo

O. Gabella

W. Gaessler

E. Gallo

X. Gao

L. Genolet

M. Genoni

P. Giacobbe

E. Giro

R. S. Gonçalves

O. A. Gonzalez

J. I. González-Hernández

C. Gouvret

F. Gracia Témich

M. G. Haehnelt

C. Haniff

A. Hatzes

R. Helled

H. J. Hoeijmakers

I. Hughes

Philipp Huke

Y. Ivanisenko

A. S. Järvinen

S. P. Järvinen

A. Kaminski

J. Kern

J. Knoche

A. Kordt

H. Korhonen

A. Korn

D. Kouach

G. Kowzan

L. Kreidberg

M. Landoni

A. A. Lanotte

A. Lavail

B. Lavie

D. Lee

M. Lehmitz

J. Li

W. Li

J. Liske

C. Lovis

S. Lucatello

D. Lunney

M. J. MacIntosh

N. Madhusudhan

L. Magrini

R. Maiolino

J. Maldonado

L. Malo

A. W. S. Man

T. Marquart

C. M. J. Marques

E. L. Marques

P. Martinez

A. M. Martins

C. J. A. P. Martins

J. H. C. Martins

P. Maslowski

C. Mason

E. Mason

R. A. McCracken

M. A. F. Melo e Sousa

P. Mergo

G. Micela

D. Milaković

P. Mollière

M. A. Monteiro

D. Montgomery

C. Mordasini

J. Morin

A. Mucciarelli

M. T. Murphy

M. N'Diaye

N. Nardetto

B. Neichel

N. Neri

A. T. Niedzielski

E. Niemczura

B. Nisini

L. Nortmann

P. Noterdaeme

N. J. Nunes

L. Oggioni

F. Olchewsky

E. Oliva

H. Önel

L. Origlia

G. Östlin

N. N.-Q. Ouellette

Enric Pallé

P. Papaderos

G. Pariani

L. Pasquini

J. Peñate Castro

F. Pepe

C. Peroux

L. Perreault Levasseur

Sandrine Perruchot

P. Petit

Oliver Pfuhl

L. Pino

Javier Piqueras

N. Piskunov

A. Pollo

K. Poppenhaeger

M. Porru

J. Puschnig

A. Quirrenbach

Emily Rauscher

R. Rebolo

E. M. A. Redaelli

S. Reffert

D. T. Reid

A. Reiners

P. Richter

M. Riva

S. Rivoire

C. Rodríguez-López

I. U. Roederer

D. Romano

M. Roth

S. Rousseau

J. Rowe

A. Saccardi

S. Salvadori

N. Sanna

N. C. Santos

P. Santos Diaz

Jorge Sanz-Forcada

M. Sarajlic

J.-F. Sauvage

D. Savio

A. Scaudo

S. Schäfer

R. P. Schiavon

T. M. Schmidt

C. Selmi

R. Simoes

A. Simonnin

S. Sivanandam

M. Sordet

R. Sordo

F. Sortino

D. Sosnowska

S. G. Sousa

A. Spang

R. Spiga

E. Stempels

J. R. Y. Stevenson

Klaus G. Strassmeier

A. Suárez Mascareño

A. Sulich

X. Sun

N. R. Tanvir

F. Tenegi-Sanginés

S. Thibault

S. J. Thompson

P. Tisserand

A. Tozzi

M. Turbet

Julien Veran

P. Vallée

I. Vanni

R. Varas

A. Vega-Moreno

K. A. Venn

A. Verma

J. Vernet

M. Viel

G. Wade

C. Waring

M. Weber

J. Weder

B. Wehbé

J. Weingrill

M. Woche

M. Xompero

E. Zackrisson

A. Zanutta

M. R. Zapatero Osorio

M. Zechmeister

J. Zimara

The first generation of ELT instruments includes an optical-infrared high-resolution spectrograph, indicated as ELT-HIRES and recently chris… (voir plus)tened ANDES (ArmazoNes high Dispersion Echelle Spectrograph). ANDES consists of three fibre-fed spectrographs ([U]BV, RIZ, YJH) providing a spectral resolution of

2024-07-18

ArXiv (prépublication)

doi.org

arxiv.org

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications