Publications

Structure-Aligned Protein Language Model
Can Chen
David Heurtel-Depeiges
Robert M. Vernon
Christopher J. Langmead
On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew Lampinen
Arslan Chaudhry
Stephanie C.Y. Chan
Cody Wild
Diane Wan
Alexander Y. Ku
Alex Ku
Murray P. Shanahan
James L McClelland
The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages
The NaijaVoices Community
Busayo Awobade
Abraham Owodunni
Handel Emezue
Gloria Monica Tobechukwu Emezue
N. N. Emezue
Sewade Ogun
Bunmi Akinremi
Christopher Pal
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
Honghua Dong
Jiacheng Yang
Xun Deng
Yuhe Jiang
Gennady Pekhimenko
Fan Long
Virtual Cells: Predict, Explain, Discover
Emmanuel Noutahi
Jason Hartford
Ali Denton
Kristina Ulicna
Michael Craig
Jonathan Hsu
Michael Cuccarese
Christopher Gibson
Daniel Cohen
Berton Earnshaw
Caffeine induces age-dependent increases in brain complexity and criticality during sleep
Maxine Arcand-Lavigne
Tarek Lajnef
Sonia Frenette
Julie Carrier
Caffeine is the most widely consumed psychoactive stimulant worldwide. Yet important gaps persist in understanding its effects on the brain,… (voir plus) especially during sleep. We analyzed sleep electroencephalography (EEG) in 40 subjects, contrasting 200 mg of caffeine against a placebo condition, utilizing inferential statistics and machine learning. We found that caffeine ingestion led to an increase in brain complexity, a widespread flattening of the power spectrum’s 1/f-like slope, and a reduction in long-range temporal correlations. Being most prominent during non-rapid eye movement (NREM) sleep, these results suggest that caffeine shifts the brain towards a critical regime and more diverse neural dynamics. Interestingly, this was more pronounced in younger adults (20–27 years) compared to middle-aged participants (41–58 years) during rapid eye movement (REM) sleep, while no significant age effects were observed during NREM. Interpreting these data in the light of modeling and empirical work on EEG-derived measures of excitation-inhibition balance suggests that caffeine promotes a shift in brain dynamics towards increased neural excitation and closer proximity to a critical regime, particularly during NREM sleep.
JPerfEvo: A Tool for Tracking Method-Level Performance Changes in Java Projects
Kaveh Shahedi
Maxime Lamothe
Heng Li
Performance regressions and improvements are common phenomena in software development, occurring periodically as software evolves and mature… (voir plus)s. When developers introduce new changes to a program’s codebase, unforeseen performance variations may arise. Identifying these changes at the method level, however, can be challenging due to the complexity and scale of modern codebases. In this work, we present JPerfEvo, a tool designed to automate the evaluation of the method-level performance impact of each code commit (i.e., the performance variations between the two versions before and after a commit). Leveraging the Java Microbenchmark Harness (JMH) module for benchmarking the modified methods, JPerfEvo instruments their execution and applies robust statistical evaluations to detect performance changes. The tool can classify these changes as performance improvements, regressions, or neutral (i.e., no change), with the change magnitude. We evaluated JPerfEvo on three popular and mature open-source Java projects, demonstrating its effectiveness in identifying performance changes throughout their development histories.
Solving Combinatorial Pricing Problems using Embedded Dynamic Programming Models
Quang Minh Bui
José Neto
The combinatorial pricing problem (CPP) is a bilevel problem in which the leader maximizes their revenue by imposing tolls on certain items … (voir plus)that they can control. Based on the tolls set by the leader, the follower selects a subset of items corresponding to an optimal solution of a combinatorial optimization problem. To accomplish the leader's goal, the tolls need to be sufficiently low to discourage the follower from choosing the items offered by the competitors. In this paper, we derive a single-level reformulation for the CPP by rewriting the follower's problem as a longest path problem using a dynamic programming model, and then taking its dual and applying strong duality. We proceed to solve the reformulation in a dynamic fashion with a cutting plane method. We apply this methodology to 2 distinct dynamic programming models, namely, a novel formulation designated as selection diagram and the well-known decision diagram. We also produce numerical results to evaluate their performances across 3 different specializations of the CPP and a closely related problem that is the knapsack interdiction problem. Our results showcase the potential of the 2 proposed reformulations over the natural value function approach, expanding the set of tools to solve combinatorial bilevel programs.
How Programmers Interact with Multimodal Software Documentation
Deeksha M. Arya
Jin L.C. Guo
Martin P. Robillard
There is a wide variety of online documentation to learn about a given software technology, and prior research has reported that programmers… (voir plus) must invest time and effort to identify one that best suits their need. We evaluated five modalities to present information that enable a software document to cater to the different presentation needs of programmers. We developed a prototype tutorial with these modalities on three topics in Java, namely, regular expressions, inheritance, and exception handling. We investigated how people interact with the modalities in the tutorial given a programming topic and a type of task. We conducted a survey study with 56 respondents and confirm that although text content is most useful for solving conceptual tasks, code examples support deeper comprehension of the underlying concepts. Furthermore, we report that respondents' contradicting preferences for the modalities suggest the need to have multiple alternatives in a software tutorial.
AIFM-ed Curriculum Framework for Postgraduate Family Medicine Education on Artificial Intelligence: Mixed Methods Study
Raymond Tolentino
Fanny Hersson-Edery
Mark Yaffe
As health care moves to a more digital environment, there is a growing need to train future family doctors on the clinical uses of artificia… (voir plus)l intelligence (AI). However, family medicine training in AI has often been inconsistent or lacking. The aim of the study is to develop a curriculum framework for family medicine postgraduate education on AI called “Artificial Intelligence Training in Postgraduate Family Medicine Education” (AIFM-ed). First, we conducted a comprehensive scoping review on existing AI education frameworks guided by the methodological framework developed by Arksey and O’Malley and Joanna Briggs Institute methodological framework for scoping reviews. We adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist for reporting the results. Next, 2 national expert panels were conducted. Panelists included family medicine educators and residents knowledgeable in AI from family medicine residency programs across Canada. Participants were purposively sampled, and panels were held via Zoom, recorded, and transcribed. Data were analyzed using content analysis. We followed the Standards for Reporting Qualitative Research for panels. An integration of the scoping review results and 2 panel discussions of 14 participants led to the development of the AIFM-ed curriculum framework for AI training in postgraduate family medicine education with five key elements: (1) need and purpose of the curriculum, (2) learning objectives, (3) curriculum content, (4) organization of curriculum content, and (5) implementation aspects of the curriculum. Using the results of this study, we developed the AIFM-ed curriculum framework for AI training in postgraduate family medicine education. This framework serves as a structured guide for integrating AI competencies into medical education, ensuring that future family physicians are equipped with the necessary skills to use AI effectively in their clinical practice. Future research should focus on the validation and implementation of the AIFM-ed framework within family medicine education. Institutions also are encouraged to consider adapting the AIFM-ed framework within their own programs, tailoring it to meet the specific needs of their trainees and health care environments.
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
Mingqi Yuan
Roger Creus Castanyer
Bin Li
Xin Jin
Wenjun Zeng
3DMolFormer: A Dual-Channel Framework for Structure-Based Drug Discovery
Xiuyuan Hu
Guoqing Liu
Yang Zhao
Hao Zhang
Xue Liu
Structure-based drug discovery, encompassing the tasks of protein-ligand docking and pocket-aware 3D drug design, represents a core challeng… (voir plus)e in drug discovery. However, no existing work can deal with both tasks to effectively leverage the duality between them, and current methods for each task are hindered by challenges in modeling 3D information and the limitations of available data. To address these issues, we propose 3DMolFormer, a unified dual-channel transformer-based framework applicable to both docking and 3D drug design tasks, which exploits their duality by utilizing docking functionalities within the drug design process. Specifically, we represent 3D pocket-ligand complexes using parallel sequences of discrete tokens and continuous numbers, and we design a corresponding dual-channel transformer model to handle this format, thereby overcoming the challenges of 3D information modeling. Additionally, we alleviate data limitations through large-scale pre-training on a mixed dataset, followed by supervised and reinforcement learning fine-tuning techniques respectively tailored for the two tasks. Experimental results demonstrate that 3DMolFormer outperforms previous approaches in both protein-ligand docking and pocket-aware 3D drug design, highlighting its promising application in structure-based drug discovery. The code is available at: https://github.com/HXYfighter/3DMolFormer .