Publications

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
Eric Nguyen
Michael Poli
Marjan Faizi
Armin W Thomas
Callum Birch-Sykes
Michael Wornow
Aman Patel
Clayton M. Rabideau
Stefano Massaroli
Stefano Ermon
Stephen Baccus
Christopher Re
Identification of Substitutable Context-Free Languages over Infinite Alphabets from Positive Data
Yutaro Numaya
Diptarama Hendrian
Ryo Yoshinaka
Ayumi Shinohara
François Coste
Faissal Ouardi
This paper is concerned with the identification in the limit from positive data of sub-stitutable context-free languages cfl s) over infinit… (voir plus)e alphabets. Clark and Eyraud (2007) showed that substitutable cfl s over finite alphabets are learnable in this learning paradigm. We show that substitutable cfl s generated by grammars whose production rules may have predicates that represent sets of potentially infinitely many terminal symbols in a compact manner are learnable if the terminal symbol sets represented by those predicates are learnable, under a certain condition. This can be seen as a result parallel to Argyros and D’Antoni’s work (2018) that amplifies the query learnability of predicate classes to that of symbolic automata classes. Our result is the first that shows such amplification is possible for identifying some cfl s in the limit from positive data.
Impact in Software Engineering Activities After One Year of COVID-19 Restrictions for Startups and Established Companies
Hosna Hooshyar
Eduardo Guerra
Jorge Melegati
Dron Khanna
Abdullah Aldaeej
Gerardo Matturro
Luciana Zaina
Des Greer
Usman Rafiq
Rafael Chanin
Xiaofeng Wang
Juan Garbajosa
Pekka Abrahamsson
Anh Nguyen-Duc
The restrictions imposed by the COVID-19 pandemic required software development teams to adapt, being forced to work remotely and adjust the… (voir plus) software engineering activities accordingly. In the studies evaluating these effects, a few have assessed the impact on software engineering activities from a broader perspective and after a period of time when teams had time to adjust to the changes. No studies have been found comparing software startups and established companies either. This paper aims to investigate the impacts of COVID-19 on software development activities after one year of the pandemic restrictions, comparing the results between startups and established companies. Our approach was to design a cross-sectional survey and distribute it online among software development companies worldwide. The participants were asked about their perception of COVID-19’s pandemic impact on different software engineering activities: requirements engineering, software architecture, user experience design, software implementation, and software quality assurance. The survey received 170 valid answers from 29 countries, and for all the software engineering activities, we found that most respondents did not observe a significant impact. The results also showed that software startups and established companies were affected differently since, in some activities, we found a negative impact in the former and a positive impact in the latter. Regarding the time spent on each software engineering activity, most of the answers reported no change, but on those that did, the result points to an increase in time. Thus, we cannot find any relation between the change in time of effort and the reported positive or negative impact.
Inferring Dynamic Regulatory Interaction Graphs From Time Series Data With Perturbations
Dhananjay Bhaskar
Daniel Sumner Magruder
Edward De Brouwer
Matheo Morales
Aarthi Venkat
Frederik Wenkel
Smita Krishnaswamy
Inferring multiple consensus trees and supertrees using clustering: a review
Gayane S. Barseghyan
Nadia Tahiri
An Intentional Forgetting-Driven Self-Healing Method for Deep Reinforcement Learning Systems
Ahmed Haj Yahmed
Rached Bouchoucha
Houssem Ben Braiek
Deep reinforcement learning (DRL) is increasingly applied in large-scale productions like Netflix and Facebook. As with most data-driven sys… (voir plus)tems, DRL systems can exhibit undesirable behaviors due to environmental drifts, which often occur in constantly-changing production settings. Continual Learning (CL) is the inherent self-healing approach for adapting the DRL agent in response to the environment's conditions shifts. However, successive shifts of considerable magnitude may cause the production environment to drift from its original state. Recent studies have shown that these environmental drifts tend to drive CL into long, or even unsuccessful, healing cycles, which arise from inefficiencies such as catastrophic forgetting, warm-starting failure, and slow convergence. In this paper, we propose Dr. DRL, an effective self-healing approach for DRL systems that integrates a novel mechanism of intentional forgetting into vanilla CL (i.e., standard CL) to overcome its main issues. Dr. DRL deliberately erases the DRL system's minor behaviors to systematically prioritize the adaptation of the key problem-solving skills. Using well-established DRL algorithms, Dr. DRL is compared with vanilla CL on various drifted environments. Dr. DRL is able to reduce, on average, the healing time and fine-tuning episodes by, respectively, 18.74% and 17.72%. Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches.
Invasion of Ukraine Discourse on TikTok Dataset
Benjamin D. Steel
Sara J. Parker
Derek Ruths
We present a dataset of videos and comments from the social media platform TikTok, centred around the invasion of Ukraine in 2022, an event … (voir plus)that launched TikTok into the geopolitical arena. The discourse around the invasion exposed myriad political behaviours and dynamics that are unexplored on this platform. To this end we provide a mass scale language and interaction dataset for further research into these processes. An initial investigation of language and social interaction dynamics are explored in this paper. The dataset and the library used to collect it are open sourced to the public.
Invited commentary on Stoehr J et al: The personal impact of involvement in international global health outreach: A national survey of former operation smile student volunteers.
Iorl: Inductive-Offline-Reinforcement-Learning for Traffic Signal Control Warmstarting
François-Xavier Devailly
Denis Larocque
A Kernel Perspective on Behavioural Metrics for Markov Decision Processes
Tyler Kastner
Mark Rowland
We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We define a ne… (voir plus)w metric under this lens that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective enables us to provide new theoretical results, including value-function bounds and low-distortion finite-dimensional Euclidean embeddings, which are crucial when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.
Lag-Llama: Towards Foundation Models for Time Series Forecasting
Kashif Rasul
Arjun Ashok
Andrew Robert Williams
Arian Khorasani
George Adamopoulos
Rishika Bhagwatkar
Marin Biloš
Hena Ghonia
N. Hassen
Anderson Schneider
Sahil Garg
Yuriy Nevmyvaka
Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-… (voir plus)Llama , a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen “out-of-distribution” time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws [7] to fit and predict model scaling behavior. The open source code is made available at https://github
LEAD: Min-Max Optimization from a Physical Perspective
Reyhane Askari Hemmat
Amartya Mitra
Adversarial formulations have rekindled interest in two-player min-max games. A central obstacle in the optimization of such games is the ro… (voir plus)tational dynamics that hinder their convergence. In this paper, we show that game optimization shares dynamic properties with particle systems subject to multiple forces, and one can leverage tools from physics to improve optimization dynamics. Inspired by the physical framework, we propose LEAD, an optimizer for min-max games. Next, using Lyapunov stability theory from dynamical systems as well as spectral analysis, we study LEAD’s convergence properties in continuous and discrete time settings for a class of quadratic min-max games to demonstrate linear convergence to the Nash equilibrium. Finally, we empirically evaluate our method on synthetic setups and CIFAR-10 image generation to demonstrate improvements in GAN training.