We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
In human cognition theory, human thinking is governed by two systems: the fast and intuitive System 1 and the slower but more deliberative S… (see more)ystem 2. Recent studies have shown that incorporating System 2 process into Transformers including large language models (LLMs), significantly enhances their reasoning capabilities. Nevertheless, models that purely resemble System 2 thinking require substantially higher computational costs and are much slower to respond. To address this challenge, we present Dualformer, a single Transformer model that seamlessly integrates both the fast and slow reasoning modes. Dualformer is obtained by training on data with randomized reasoning traces, where different parts of the traces are dropped during training. The dropping strategies are specifically tailored according to the trace structure, analogous to analyzing our thinking process and creating shortcuts with patterns. At inference time, our model can be configured to output only the solutions (fast mode) or both the reasoning chain and the final solution (slow mode), or automatically decide which mode to engage (auto mode). In all cases, Dualformer outperforms the corresponding baseline models in both performance and computational efficiency: (1) in slow mode, Dualformer optimally solves unseen 30 x 30 maze navigation tasks 97.6% of the time, surpassing the Searchformer (trained on data with complete reasoning traces) baseline performance of 93.3%, while only using 45.5% fewer reasoning steps; (2) in fast mode, Dualformer completes those tasks with an 80% optimal rate, significantly outperforming the Solution-Only model (trained on solution-only data), which has an optimal rate of only 30%. For math problems, our techniques have also achieved improved performance with LLM fine-tuning, showing its generalization beyond task-specific models.
Predicting molecular impact on cellular function is a core challenge in therapeutic design.
Phenomic experiments, designed to capture cellu… (see more)lar morphology, utilize microscopy based techniques and demonstrate a high throughput solution for uncovering molecular impact on the cell.
In this work, we learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning.
Specifically, we study the problem of Contrastive PhenoMolecular Retrieval, which consists of zero-shot molecular structure identification conditioned on phenomic experiments.
We assess challenges in multi-modal learning of phenomics and molecular modalities such as experimental batch effect, inactive molecule perturbations, and encoding perturbation concentration.
We demonstrate improved multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics model, (2) a novel inter sample similarity aware loss, and (3) models conditioned on a representation of molecular concentration.
Following this recipe, we propose MolPhenix, a molecular phenomics model.
MolPhenix leverages a pre-trained phenomics model to demonstrate significant performance gains across perturbation concentrations, molecular scaffolds, and activity thresholds.
In particular, we demonstrate an 8.1
Understanding how neurons communicate and coordinate their activity is essential for unraveling the brain’s complex functionality. To anal… (see more)yze the intricate spatiotemporal dynamics of neural signaling, we developed Geometric Scattering Trajectory Homology (neuro-GSTH), a novel framework that captures time-evolving neural signals and encodes them into low-dimensional representations. GSTH integrates geometric scattering transforms, which extract multiscale features from brain signals modeled on anatomical graphs, with t-PHATE, a manifold learning method that maps the temporal evolution of neural activity. Topological descriptors from computational homology are then applied to characterize the global structure of these neural trajectories, enabling the quantification and differentiation of spatiotemporal brain dynamics. We demonstrate the power of neuro-GSTH in neuroscience by applying it to both simulated and biological neural datasets. First, we used neuro-GSTH to analyze neural oscillatory behavior in the Kuramoto model, revealing its capacity to track the synchronization of neural circuits as coupling strength increases. Next, we applied neuro-GSTH to neural recordings from the visual cortex of mice, where it accurately reconstructed visual stimulus patterns such as sinusoidal gratings. Neuro-GSTH-derived neural trajectories enabled precise classification of stimulus properties like spatial frequency and orientation, significantly outperforming traditional methods in capturing the underlying neural dynamics. These findings demonstrate that neuro-GSTH effectively identifies neural motifs—distinct patterns of spatiotemporal activity—providing a powerful tool for decoding brain activity across diverse tasks, sensory inputs, and neurological disorders. Neuro-GSTH thus offers new insights into neural communication and dynamics, advancing our ability to map and understand complex brain functions.
Joint-embedding predictive architecture (JEPA) is a self-supervised learning (SSL) paradigm with the capacity of world modeling via action-c… (see more)onditioned prediction. Previously, JEPA world models have been shown to learn action-invariant or action-equivariant representations by predicting one view of an image from another. Unlike JEPA and similar SSL paradigms, animals, including humans, learn to recognize new objects through a sequence of active interactions. To introduce \emph{sequential} interactions, we propose \textit{seq-JEPA}, a novel SSL world model equipped with an autoregressive memory module. Seq-JEPA aggregates a sequence of action-conditioned observations to produce a global representation of them. This global representation, conditioned on the next action, is used to predict the latent representation of the next observation. We empirically show the advantages of this sequence of action-conditioned observations and examine our sequential modeling paradigm in two settings: (1) \emph{predictive learning across saccades}; a method inspired by the role of eye movements in embodied vision. This approach learns self-supervised image representations by processing a sequence of low-resolution visual patches sampled from image saliencies, without any hand-crafted data augmentations. (2) \emph{invariance-equivariance trade-off}; seq-JEPA's architecture results in automatic separation of invariant and equivariant representations, with the aggregated autoregressor outputs being mostly action-invariant and the encoder output being equivariant. This is in contrast with many equivariant SSL methods that expect a single representational space to contain both invariant and equivariant features, potentially creating a trade-off between the two. Empirically, seq-JEPA achieves competitive performance on both invariance and equivariance-related benchmarks compared to existing methods. Importantly, both invariance and equivariance-related downstream performances increase as the number of available observations increases.
As LLMs develop increasingly advanced capabilities, there is an increased need to minimize the harm that could be caused to society by certa… (see more)in model outputs; hence, most LLMs have safety guardrails added, for example via fine-tuning. In this paper, we argue the position that current safety fine-tuning is very similar to a traditional cat-and-mouse game (or arms race) between attackers and defenders in cybersecurity. Model jailbreaks and attacks are patched with bandaids to target the specific attack mechanism, but many similar attack vectors might remain. When defenders are not proactively coming up with principled mechanisms, it becomes very easy for attackers to sidestep any new defenses. We show how current defenses are insufficient to prevent new adversarial jailbreak attacks, reward hacking, and loss of control problems. In order to learn from past mistakes in cybersecurity, we draw analogies with historical examples and develop lessons learned that can be applied to LLM safety. These arguments support the need for new and more principled approaches to designing safe models, which are architected for security from the beginning. We describe several such approaches from the AI literature.
As LLMs develop increasingly advanced capabilities, there is an increased need to minimize the harm that could be caused to society by certa… (see more)in model outputs; hence, most LLMs have safety guardrails added, for example via fine-tuning. In this paper, we argue the position that current safety fine-tuning is very similar to a traditional cat-and-mouse game (or arms race) between attackers and defenders in cybersecurity. Model jailbreaks and attacks are patched with bandaids to target the specific attack mechanism, but many similar attack vectors might remain. When defenders are not proactively coming up with principled mechanisms, it becomes very easy for attackers to sidestep any new defenses. We show how current defenses are insufficient to prevent new adversarial jailbreak attacks, reward hacking, and loss of control problems. In order to learn from past mistakes in cybersecurity, we draw analogies with historical examples and develop lessons learned that can be applied to LLM safety. These arguments support the need for new and more principled approaches to designing safe models, which are architected for security from the beginning. We describe several such approaches from the AI literature.
Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statem… (see more)ents with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration—where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty Large Language Models hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing and correcting this miscalibration, offering a path to safer and more trustworthy AI across domains.