Publications

A Literature Review on Detecting, Verifying, and Mitigating Online Misinformation

Arezo Bodaghi

Ketra A. Schmitt

Pierre Watine

Social media use has transformed communication and made social interaction more accessible. Public microblogs allow people to share and acce… (see more)ss news through existing and social-media-created social connections and access to public news sources. These benefits also create opportunities for the spread of false information. False information online can mislead people, decrease the benefits derived from social media, and reduce trust in genuine news. We divide false information into two categories: unintentional false information, also known as misinformation; and intentionally false information, also known as disinformation and fake news. Given the increasing prevalence of misinformation, it is imperative to address its dissemination on social media platforms. This survey focuses on six key aspects related to misinformation: 1) clarify the definition of misinformation to differentiate it from intentional forms of false information; 2) categorize proposed approaches to manage misinformation into three types: detection, verification, and mitigation; 3) review the platforms and languages for which these techniques have been proposed and tested; 4) describe the specific features that are considered in each category; 5) compare public datasets created to address misinformation and categorize into prelabeled content-only datasets and those including users and their connections; and 6) survey fact-checking websites that can be used to verify the accuracy of information. This survey offers a comprehensive and unprecedented review of misinformation, integrating various methodological approaches, datasets, and content-, user-, and network-based approaches, which will undoubtedly benefit future research in this field.

2023-01-01

IEEE Transactions on Computational Social Systems (published)

doi.org

Lower Bounds for Active Automata Learning.

Loes Kruger

Bharat Garhewal

François Coste

Frits W. Vaandrager

Faissal Ouardi

Guillaume Rabusseau

2023-01-01

ICGI (published)

dblp.uni-trier.de

MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations

Arkil Patel

Satwik Bhattamishra

Siva Reddy

Dzmitry Bahdanau

2023-01-01

EMNLP (published)

doi.org

openreview.net

Maintenance Cost of Software Ecosystem Updates

Solomon Berhe

M. Maynard

Foutse Khomh

2023-01-01

ANT/EDI40 (published)

doi.org

MasakhaNEWS: News Topic Classification for African languages

David Ifeoluwa Adelani

Marek Masiak

Israel Abebe Azime

Jesujoba Oluwadara Alabi

Atnafu Lambebo Tonja

Christine Mwase

Odunayo Ogundepo

Bonaventure F. P. Dossou

Akintunde Oladipo

Doreen Nixdorf

Chris Emezue

sana Sabah al-azzawi

Blessing Kudzaishe Sibanda

Davis David

Lolwethu Ndolela

Jonathan Mukiibi

Tunde Oluwaseyi Ajayi

Tatiana Moteu Ngoli

Brian Odhiambo

Abraham Toluwase Owodunni … (see 42 more)

Nnaemeka Casmir Obiefuna

Shamsuddeen Hassan Muhammad

Saheed Salahudeen Abdullahi

Mesay Gemeda Yigezu

Tajuddeen Gwadabe

Idris Abdulmumin

Mahlet Taye Bame

Oluwabusayo Olufunke Awoyomi

Iyanuoluwa Shode

Tolulope Anu Adelani

Habiba Abdulganiy Kailani

Abdul-Hakeem Omotayo

Adetola Adeeko

Afolabi Abeeb

Aremu Anuoluwapo

Olanrewaju Samuel

Clemencia Siro

Wangari Kimotho

Onyekachi Ogbu

CHINEDU EMMANUEL MBONU

Chiamaka Ijeoma Chukwuneke

Samuel Fanijo

Jessica Ojo

Oyinkansola Fiyinfoluwa Awosan

Tadesse Kebede Guge

Toadoum Sari Sakayo

Pamela Nyatsine

Freedmore Sidume

Oreen Yousuf

Mardiyyah Oduwole

USSEN ABRE KIMANUKA

Kanda Patrick Tshinu

Thina Diko

Siyanda Nxakama

Abdulmejid Tuni Johar

Sinodos Gebre

Muhidin A. Mohamed

Shafie Abdi Mohamed

Fuad Mire Hassan

Moges Ahmed Mehamed

Evrard Ngabire

Pontus Stenetorp

African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individ… (see more)ual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.

2023-01-01

AfricaNLP (published)

doi.org

openreview.net

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages

Cheikh M. Bamba Dione

David Ifeoluwa Adelani

Peter Nabende

Jesujoba Oluwadara Alabi

Thapelo Sindane

Happy Buzaaba

Shamsuddeen Hassan Muhammad

Chris Emezue

Perez Ogayo

Aremu Anuoluwapo

Catherine Gitau

Derguene Mbaye

Jonathan Mukiibi

Blessing Kudzaishe Sibanda

Bonaventure F. P. Dossou

Andiswa Bukula

Rooweither Mabuya

Allahsera Auguste Tapo

Edwin Munkoh-Buabeng

Victoire Memdjokam Koagne … (see 24 more)

Fatoumata Ouoba Kabore

Amelia Taylor

Godson Kalipe

Tebogo Macucwa

Vukosi Marivate

Tajuddeen Gwadabe

Mboning Tchiaze Elvis

Ikechukwu Onyenwe

Gratien Atindogbe

Tolulope Anu Adelani

Idris Akinade

Olanrewaju Samuel

Marien Nahimana

Théogène Musabeyezu

Emile Niyomutabazi

Ester Chimhenga

Kudzai Gotosa

Patrick Mizha

Apelete Agbolo

Seydou Traore

Chinedu Uchechukwu

Aliyu Yusuf

Muhammad Abdullahi

Dietrich Klakow

In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the… (see more) challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages.

2023-01-01

ACL (1) (published)

doi.org

arxiv.org

Measuring Progress in Fine-grained Vision-and-Language Understanding

Emanuele Bugliarello

Laurent Sartran

Aishwarya Agrawal

Lisa Anne Hendricks

Aida Nematzadeh

While pretraining on large-scale image–text data from the Web has facilitated rapid progress on many vision-and-language (V&L) tasks, rece… (see more)nt work has demonstrated that pretrained models lack “fine-grained” understanding, such as the ability to recognise relationships, verbs, and numbers in images. This has resulted in an increased interest in the community to either develop new benchmarks or models for such capabilities. To better understand and quantify progress in this direction, we investigate four competitive V&L models on four fine-grained benchmarks. Through our analysis, we find that X-VLM (Zeng et al., 2022) consistently outperforms other baselines, and that modelling innovations can impact performance more than scaling Web data, which even degrades performance sometimes. Through a deeper investigation of X-VLM, we highlight the importance of both novel losses and rich data sources for learning fine-grained skills. Finally, we inspect training dynamics, and discover that for some tasks, performance peaks early in training or significantly fluctuates, never converging.

2023-01-01

ACL (1) (published)

doi.org

arxiv.org

Mechanistic Mode Connectivity

Ekdeep Singh Lubana

Eric J Bigelow

Robert P. Dick

David Scott Krueger

Hidenori Tanaka

We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved … (see more)via training on a dataset are connected via simple paths of low loss. Specifically, we ask the following question: are minimizers that rely on different mechanisms for making their predictions connected via simple paths of low loss? We provide a definition of mechanistic similarity as shared invariances to input transformations and demonstrate that lack of linear connectivity between two models implies they use dissimilar mechanisms for making their predictions. Relevant to practice, this result helps us demonstrate that naive fine-tuning on a downstream dataset can fail to alter a model's mechanisms, e.g., fine-tuning can fail to eliminate a model's reliance on spurious attributes. Our analysis also motivates a method for targeted alteration of a model's mechanisms, named connectivity-based fine-tuning (CBFT), which we analyze using several synthetic datasets for the task of reducing a model's reliance on spurious attributes.

2023-01-01

ICML (published)

doi.org

openreview.net

Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning

Maziar Gomrokchi

Susan Amin

Hossein Aboutalebi

Alexander Wong

Doina Precup

While significant research advances have been made in the field of deep reinforcement learning, there have been no concrete adversarial atta… (see more)ck strategies in literature tailored for studying the vulnerability of deep reinforcement learning algorithms to membership inference attacks. In such attacking systems, the adversary targets the set of collected input data on which the deep reinforcement learning algorithm has been trained. To address this gap, we propose an adversarial attack framework designed for testing the vulnerability of a state-of-the-art deep reinforcement learning algorithm to a membership inference attack. In particular, we design a series of experiments to investigate the impact of temporal correlation, which naturally exists in reinforcement learning training data, on the probability of information leakage. Moreover, we compare the performance of collective and individual membership attacks against the deep reinforcement learning algorithm. Experimental results show that the proposed adversarial attack framework is surprisingly effective at inferring data with an accuracy exceeding 84% in individual and 97% in collective modes in three different continuous control Mujoco tasks, which raises serious privacy concerns in this regard. Finally, we show that the learning state of the reinforcement learning algorithm influences the level of privacy breaches significantly.

2023-01-01

IEEE Access (published)

doi.org

arxiv.org

Meta Pseudo Labels for Anomaly Detection via Partially Observed Anomalies

Sinong Zhao

Zhaoyang Yu

Xiaofei Wang

T. Marbach

Gang Wang

X. Liu

2023-01-01

International Conference on Database Systems for Advanced Applications (published)

doi.org

MixupE: Understanding and improving Mixup from directional derivative perspective

Yingtian Zou

Vikas Verma

Sarthak Mittal

Wai Hoh Tang

Hieu Pham

Juho Kannala

Yoshua Bengio

Arno Solin

Kenji Kawaguchi

2023-01-01

UAI (published)

doi.org

openreview.net

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

Vikas Verma

Yingtian Zou

Sarthak Mittal

Wai Hoh Tang

Hieu Pham

Juho Kannala

Yoshua Bengio

Arno Solin

Kenji Kawaguchi

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpol… (see more)ating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

2023-01-01

UAI (published)

doi.org

openreview.net

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications