Publications

Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers
TrajGPT: Irregular Time-Series Representation Learning of Health Trajectory.
Qincheng Lu
Mike He Zhu
In the healthcare domain, time-series data are often irregularly sampled with varying intervals through outpatient visits, posing challenges… (voir plus) for existing models designed for equally spaced sequential data. To address this, we propose Trajectory Generative Pre-trained Transformer (TrajGPT) for representation learning on irregularly-sampled healthcare time series. TrajGPT introduces a novel Selective Recurrent Attention (SRA) module that leverages a data-dependent decay to adaptively filter irrelevant past information. As a discretized ordinary differential equation (ODE) framework, TrajGPT captures underlying continuous dynamics and enables a time-specific inference for forecasting arbitrary target timesteps without auto-regressive prediction. Experimental results based on the longitudinal EHR data PopHR from Montreal health system and eICU from PhysioNet showcase TrajGPT's superior zero-shot performance in disease forecasting, drug usage prediction, and sepsis detection. The inferred trajectories of diabetic and cardiac patients reveal meaningful comorbidity conditions, underscoring TrajGPT as a useful tool for forecasting patient health evolution.
TrajGPT: Irregular Time-Series Representation Learning of Health Trajectory.
Qincheng Lu
Mike He Zhu
In the healthcare domain, time-series data are often irregularly sampled with varying intervals through outpatient visits, posing challenges… (voir plus) for existing models designed for equally spaced sequential data. To address this, we propose Trajectory Generative Pre-trained Transformer (TrajGPT) for representation learning on irregularly-sampled healthcare time series. TrajGPT introduces a novel Selective Recurrent Attention (SRA) module that leverages a data-dependent decay to adaptively filter irrelevant past information. As a discretized ordinary differential equation (ODE) framework, TrajGPT captures underlying continuous dynamics and enables a time-specific inference for forecasting arbitrary target timesteps without auto-regressive prediction. Experimental results based on the longitudinal EHR data PopHR from Montreal health system and eICU from PhysioNet showcase TrajGPT's superior zero-shot performance in disease forecasting, drug usage prediction, and sepsis detection. The inferred trajectories of diabetic and cardiac patients reveal meaningful comorbidity conditions, underscoring TrajGPT as a useful tool for forecasting patient health evolution.
CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models
Zhuxuanzi Wang
Mingqiao Mo
Xi Xiao
Chen Liu
Chenrui Ma
Yunbei Zhang
Xiao Wang
Tianyang Wang
Parameter-efficient fine-tuning (PEFT) has become the standard approach for adapting large language models under limited compute and memory … (voir plus)budgets. Although previous methods improve efficiency through low-rank updates, quantization, or heuristic budget reallocation, they often decouple the allocation of capacity from the way updates evolve during training. In this work, we introduce CTR-LoRA, a framework guided by curvature trust region that integrates rank scheduling with stability-aware optimization. CTR-LoRA allocates parameters based on marginal utility derived from lightweight second-order proxies and constrains updates using a Fisher/Hessian-metric trust region. Experiments on multiple open-source backbones (7B-13B), evaluated on both in-distribution and out-of-distribution benchmarks, show consistent improvements over strong PEFT baselines. In addition to increased accuracy, CTR-LoRA enhances training stability, reduces memory requirements, and achieves higher throughput, positioning it on the Pareto frontier of performance and efficiency. These results highlight a principled path toward more robust and deployable PEFT.
Measuring What Matters: Connecting AI Ethics Evaluations to System Attributes, Hazards, and Harms
Over the past decade, an ecosystem of measures has emerged to evaluate the social and ethical implications of AI systems, largely shaped by … (voir plus)high-level ethics principles. These measures are developed and used in fragmented ways, without adequate attention to how they are situated in AI systems. In this paper, we examine how existing measures used in the computing literature map to AI system components, attributes, hazards, and harms. Our analysis draws on a scoping review resulting in nearly 800 measures corresponding to 11 AI ethics principles. We find that most measures focus on four principles – fairness, transparency, privacy, and trust – and primarily assess model or output system components. Few measures account for interactions across system elements, and only a narrow set of hazards is typically considered for each harm type. Many measures are disconnected from where harm is experienced and lack guidance for setting meaningful thresholds. These patterns reveal how current evaluation practices remain fragmented, measuring in pieces rather than capturing how harms emerge across systems. Framing measures with respect to system attributes, hazards, and harms can strengthen regulatory oversight, support actionable practices in industry, and ground future research in systems-level understanding.
Permissive Information-Flow Analysis for Large Language Models
Shoaib Ahmed Siddiqui
Radhika Gaonkar
Boris Köpf
Andrew Paverd
Ahmed Salem
Shruti Tople
Lukas Wutschitz
Menglin Xia
Santiago Zanella-Beguelin
Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy p… (voir plus)roblems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. Assuming each piece of information comes with an additional meta-label (such as low/high integrity labels), one promising approach is to tackle this problem at the system level via dynamic information flow (aka taint) tracking. Unfortunately, this approach of propagating the most restrictive input label to the output is too conservative for applications where LLMs operate on inputs retrieved from diverse sources. In this paper, we propose a novel, more permissive approach to propagate information flow labels through LLM queries. The key idea behind our approach is to propagate only the input labels that were \emph{influential} in generating the model output and to eliminate the labels of unnecessary inputs. We implement and investigate the effectiveness of two variations of this approach, based on (i) prompt-based retrieval augmentation, and (ii) a
Permissive Information-Flow Analysis for Large Language Models
Shoaib Ahmed Siddiqui
Radhika Gaonkar
Boris Köpf
Andrew Paverd
Ahmed Salem
Shruti Tople
Lukas Wutschitz
Menglin Xia
Santiago Zanella-Beguelin
Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy p… (voir plus)roblems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. Assuming each piece of information comes with an additional meta-label (such as low/high integrity labels), one promising approach is to tackle this problem at the system level via dynamic information flow (aka taint) tracking. Unfortunately, this approach of propagating the most restrictive input label to the output is too conservative for applications where LLMs operate on inputs retrieved from diverse sources. In this paper, we propose a novel, more permissive approach to propagate information flow labels through LLM queries. The key idea behind our approach is to propagate only the input labels that were \emph{influential} in generating the model output and to eliminate the labels of unnecessary inputs. We implement and investigate the effectiveness of two variations of this approach, based on (i) prompt-based retrieval augmentation, and (ii) a
Scaling Laws and Symmetry, Evidence from Neural Force Fields
We present an empirical study in the geometric task of learning interatomic potentials, which shows equivariance matters even more at larger… (voir plus) scales; we show a clear power-law scaling behaviour with respect to data, parameters and compute with ``architecture-dependent exponents''. In particular, we observe that equivariant architectures, which leverage task symmetry, scale better than non-equivariant models. Moreover, among equivariant architectures, higher-order representations translate to better scaling exponents. Our analysis also suggests that for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture. At a high level, these results suggest that, contrary to common belief, we should not leave it to the model to discover fundamental inductive biases such as symmetry, especially as we scale, because they change the inherent difficulty of the task and its scaling laws.
Scaling Laws and Symmetry, Evidence from Neural Force Fields
We present an empirical study in the geometric task of learning interatomic potentials, which shows equivariance matters even more at larger… (voir plus) scales; we show a clear power-law scaling behaviour with respect to data, parameters and compute with ``architecture-dependent exponents''. In particular, we observe that equivariant architectures, which leverage task symmetry, scale better than non-equivariant models. Moreover, among equivariant architectures, higher-order representations translate to better scaling exponents. Our analysis also suggests that for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture. At a high level, these results suggest that, contrary to common belief, we should not leave it to the model to discover fundamental inductive biases such as symmetry, especially as we scale, because they change the inherent difficulty of the task and its scaling laws.
Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning
Yash Jhaveri
Patrick Shafto
In the pursuit of finding an optimal policy, reinforcement learning (RL) methods generally ignore the properties of learned policies apart f… (voir plus)rom their expected return. Thus, even when successful, it is difficult to characterize which policies will be learned and what they will do. In this work, we present a theoretical framework for policy optimization that guarantees convergence to a particular optimal policy, via vanishing entropy regularization and a temperature decoupling gambit. Our approach realizes an interpretable, diversity-preserving optimal policy as the regularization temperature vanishes and ensures the convergence of policy derived objects--value functions and return distributions. In a particular instance of our method, for example, the realized policy samples all optimal actions uniformly. Leveraging our temperature decoupling gambit, we present an algorithm that estimates, to arbitrary accuracy, the return distribution associated to its interpretable, diversity-preserving optimal policy.
Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning
Yash Jhaveri
Patrick Shafto
Sound and Modular Activity Analysis for Automatic Differentiation in MLIR
Mai Jacob Peng
William S. Moses
Oleksandr Zinenko