"I Am the One and Only, Your Cyber BFF": Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI
Myra Cheng
Alicia DeVrio
Lisa Egede
Su Lin Blodgett
Many state-of-the-art generative AI (GenAI) systems are increasingly prone to anthropomorphic behaviors, i.e., to generating outputs that ar… (voir plus)e perceived to be human-like. While this has led to scholars increasingly raising concerns about possible negative impacts such anthropomorphic AI systems can give rise to, anthropomorphism in AI development, deployment, and use remains vastly overlooked, understudied, and underspecified. In this perspective, we argue that we cannot thoroughly map the social impacts of generative AI without mapping the social impacts of anthropomorphic AI, and outline a call to action.
ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Adel Nabli
Louis Fournier
Pierre ERBACHER
Louis Serrano
Edouard Oyallon
AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents
Megh Thakkar
Léo Boisvert
Thibault Le Sellier de Chezelles
Alexandre Piché
Alexandre Lacoste
Massimo Caccia
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez
Xiangru Jian
Siba Smarak Panigrahi
Tianyu Zhang
Aarash Feizi
Abhay Puri
Akshay Kalkunte Suresh
François Savard
Ahmed Masry
Shravan Nayak
Rabiul Awal
Mahsa Massoud
Amirhossein Abaskohi
Zichao Li
Suyuchen Wang
Pierre-Andre Noel
Mats Leon Richter
Saverio Vadacchino
Shubham Agarwal
Sanket Biswas … (voir 23 de plus)
Sara Shanian
Ying Zhang
Noah Bolger
Kurt MacDonald
Simon Fauvel
Sathwik Tejaswi Madhusudhan
Srinivas Sunkara
Joao Monteiro
Krishnamurthy Dj Dvijotham
Torsten Scholak
Sepideh Kharaghani
Sean Hughes
M. Özsu
Issam Hadj Laradji
Spandana Gella
Perouz Taslakian
David Vazquez
Sai Rajeswar
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs
Megh Thakkar
Yash More
Quentin Fournier
Matthew D Riemer
Pin-Yu Chen
Payel Das
There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruc… (voir plus)tion-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.
Compositional Risk Minimization
Divyat Mahajan
Mohammad Pezeshki
Kartik Ahuja
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Benjamin Thérien
Charles-Étienne Joseph
Boris Knyazev
Edouard Oyallon
Consistency-diversity-realism Pareto fronts of conditional image generative models
Pietro Astolfi
Marlene Careil
Melissa Hall
Oscar Mañas
Matthew J. Muckley
Jakob Verbeek
Michal Drozdzal
Building world models that accurately and comprehensively represent the real world is the utmost aspiration for conditional image generative… (voir plus) models as it would enable their use as world simulators. For these models to be successful world models, they should not only excel at image quality and prompt-image consistency but also ensure high representation diversity. However, current research in generative models mostly focuses on creative applications that are predominantly concerned with human preferences of image quality and aesthetics. We note that generative models have inference time mechanisms - or knobs - that allow the control of generation consistency, quality, and diversity. In this paper, we use state-of-the-art text-to-image and image-and-text-to-image models and their knobs to draw consistency-diversity-realism Pareto fronts that provide a holistic view on consistency-diversity-realism multi-objective. Our experiments suggest that realism and consistency can both be improved simultaneously; however there exists a clear tradeoff between realism/consistency and diversity. By looking at Pareto optimal points, we note that earlier models are better at representation diversity and worse in consistency/realism, and more recent models excel in consistency/realism while decreasing significantly the representation diversity. By computing Pareto fronts on a geodiverse dataset, we find that the first version of latent diffusion models tends to perform better than more recent models in all axes of evaluation, and there exist pronounced consistency-diversity-realism disparities between geographical regions. Overall, our analysis clearly shows that there is no best model and the choice of model should be determined by the downstream application. With this analysis, we invite the research community to consider Pareto fronts as an analytical tool to measure progress towards world models.
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Andrew Robert Williams
Arjun Ashok
Étienne Marcotte
Valentina Zantedeschi
Jithendaraa Subramanian
Roland Riachi
James Requeima
Alexandre Lacoste
Controlling Forgetting with Test-Time Data in Continual Learning
Vaibhav Singh
Rahaf Aljundi
Foundational vision-language models excel in various tasks but require updates as new tasks or domains emerge. Current Continual Learning (C… (voir plus)L) methods, which focus on supervised training, often suffer from significant forgetting, performing worse than the original models in zero-shot scenarios. This work proposes leveraging test-time, unsupervised data in a self-supervised manner to refresh the model’s memory of previously learned tasks, minimizing forgetting without additional labeling. By introducing a student-teacher framework with gradient-based sparse parameter updates, the approach enhances performance on prior tasks and reduces reliance on offline memory buffers, effectively improving continual learning outcomes.
Controlling Multimodal LLMs via Reward-guided Decoding
Oscar Mañas
Pierluca D'Oro
Koustuv Sinha
Michal Drozdzal
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Rajiv Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
Efstratios Gavves
Object-centric representation learning aims to decompose visual scenes into fixed-size vectors called "slots" or "object files", where each … (voir plus)slot captures a distinct object. Current state-of-the-art models have shown remarkable success in object discovery, particularly in complex real-world scenes, while also generalizing well to unseen domains. However, these models suffer from a key limitation: they lack controllability. Specifically, current object-centric models learn representations based on their preconceived understanding of objects and parts, without allowing user input to guide or modify which objects are represented. Introducing controllability into object-centric models could unlock a range of useful capabilities, such as enabling models to represent scenes at variable levels of granularity based on user specification. In this work, we propose a novel approach that conditions slot representations through guided decomposition, paired with a novel contrastive learning objective, to enable user-directed control over which objects are represented. Our method achieves such controllability without any mask supervision and successfully binds to user-specified objects in complex real-world scenes.