Publications

INTREPPPID—an orthologue-informed quintuplet network for cross-species prediction of protein-protein interaction

An overwhelming majority of protein–protein interaction (PPI) studies are conducted in a select few model organisms largely due to constra… (voir plus)ints in time and cost of the associated ‘wet lab’ experiments. In silico PPI inference methods are ideal tools to overcome these limitations, but often struggle with cross-species predictions. We present INTREPPPID, a method that incorporates orthology data using a new ‘quintuplet’ neural network, which is constructed with five parallel encoders with shared parameters. INTREPPPID incorporates both a PPI classification task and an orthologous locality task. The latter learns embeddings of orthologues that have small Euclidean distances between them and large distances between embeddings of all other proteins. INTREPPPID outperforms all other leading PPI inference methods tested on both the intraspecies and cross-species tasks using strict evaluation datasets. We show that INTREPPPID’s orthologous locality loss increases performance because of the biological relevance of the orthologue data and not due to some other specious aspect of the architecture. Finally, we introduce PPI.bio and PPI Origami, a web server interface for INTREPPPID and a software tool for creating strict evaluation datasets, respectively. Together, these two initiatives aim to make both the use and development of PPI inference tools more accessible to the community.

2023-12-31

Briefings Bioinform. (publié)

doi.org

INViTE: INterpret and Control Vision-Language Models with Text Explanations

Haozhe Chen

Junfeng Yang

Carl Vondrick

Chengzhi Mao

Columbia University

M. University

Large-scale pre-trained vision foundation models, such as CLIP, have become de facto backbones for various vision tasks. However, due to the… (voir plus)ir black-box nature, understanding the underlying rules behind these models’ predictions and controlling model behaviors have remained open challenges. We present INViTE: a framework for INterpreting Vision Transformer’s latent tokens with Text Explanations. Given a latent token, INViTE retains its semantic information to the final layer using transformer’s local operations and retrieves the closest text for explanation. INViTE enables understanding of model visual reasoning procedure without needing additional model training or data collection. Based on the obtained interpretations, INViTE allows for model editing that controls model reasoning behaviors and improves model robustness against biases and spurious correlations. Our code is available at https://github.com/tonychenxyz/vit-interpret.

2023-12-31

International Conference on Learning Representations (publié)

openreview.net

ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

Tolúlope' Ògúnremí

Kọ́lá Túbọ̀sún

Aremu Anuoluwapo

Iroro Orife

David Ifeoluwa Adelani

2023-12-31

LREC/COLING (publié)

doi.org

arxiv.org

iWISDM: Assessing instruction following in multimodal models at scale

Xiaoxuan Lei

Lucas Gomez

Hao Yuan Bai

Pouya Bashivan

The ability to perform complex tasks from detailed instructions is a key to the remarkable achievements of our species. As humans, we are no… (voir plus)t only capable of performing a wide variety of tasks but also very complex ones that may entail hundreds or thousands of steps to complete. Large language models and their more recent multimodal counterparts that integrate textual and visual inputs have achieved unprecedented success in performing complex tasks. Yet, most existing benchmarks are largely confined to single-modality inputs — either text or vision — and thus, narrowing the scope of multimodal integration assessments, particularly for instruction-following in multimodal contexts. To bridge this gap, we introduce the instructed-Virtual VISual Decision Making (iWISDM) environment engineered to generate a limitless array of vision-language tasks of varying complexity. Using iWISDM, we compiled three distinct benchmarks of instruction following visual tasks across varying complexity levels and evaluated several newly developed multimodal models on these benchmarks. Our findings establish iWISDM as a robust benchmark for assessing the instructional adherence of both existing and emergent multimodal models and highlight a large gap in these models’ ability to precisely follow instructions.

2023-12-31

CoLLAs (publié)

doi.org

proceedings.mlr.press

Joint Multimodal Transformer for Dimensional Emotional Recognition in the Wild

Paul Waligora

Muhammad Osama Zeeshan

Muhammad Haseeb Aslam

Soufiane Belharbi

Alessandro Lameiras Koerich

Marco Pedersoli

Simon Bacon

Eric Granger

Audiovisual emotion recognition (ER) in videos has immense potential over unimodal performance. It effectively leverages the inter-and intra… (voir plus)-modal dependencies between visual and auditory modalities. This work proposes a novel audio-visual emotion recognition system utilizing a joint multimodal transformer architecture with key-based cross-attention. This framework aims to exploit the complementary nature of audio and visual cues (facial expressions and vocal patterns) in videos, leading to superior performance compared to solely relying on a single modality. The proposed model leverages separate backbones for capturing intra-modal temporal dependencies within each modality (audio and visual). Subse-quently, a joint multimodal transformer architecture integrates the individual modality embeddings, enabling the model to effectively capture inter-modal (between audio and visual) and intra-modal (within each modality) relationships. Extensive evaluations on the challenging Affwild2 dataset demonstrate that the proposed model significantly outperforms baseline and state-of-the-art methods in ER tasks.

2023-12-31

arXiv.org (prépublication)

doi.org

KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation

Rambod Azimi

Rishav

Marek Teichmann

S Ebrahimi Kahou

2023-12-31

ENLSP (publié)

doi.org

proceedings.mlr.press

Do Large Language Models Know How Much They Know?

Gabriele Prato

Jerry Huang

Prasanna Parthasarathi

Shagun Sodhani

A. Chandar

2023-12-31

EMNLP (publié)

doi.org

arxiv.org

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

Jinsoo Yoo

Yunpeng Liu

Frank N. Wood

Geoff Pleiss

2023-12-31

ICML (publié)

doi.org

proceedings.mlr.press

Learnable Filters for Geometric Scattering Modules

Alexander Tong

Frederik Wenkel

Dhananjay Bhaskar

Kincaid MacDonald

Jackson Grady

Michael Perlmutter

Smita Krishnaswamy

Guy Wolf

2023-12-31

IEEE Transactions on Signal Processing (publié)

doi.org

arxiv.org

Learning conditional policies for crystal design using offline reinforcement learning

Prashant Govindarajan

Santiago Miret

Jarrid Rector-Brooks

Mariano Phielipp

Janarthanan Rajendran

Sarath Chandar

Conservative Q-learning for band-gap conditioned crystal design with DFT evaluations – the model is trained on trajectories constructed fr… (voir plus)om crystals in the Materials Project. Results indicate promising performance for lower band gap targets.

2023-12-31

Digital Discovery (publié)

doi.org

openreview.net

Learning Lagrangian Multipliers for the Travelling Salesman Problem

Augustin Parjadis

Quentin Cappart

Bistra Dilkina

Aaron M. Ferber

Louis-Martin Rousseau

2023-12-31

CP (publié)