Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Nguyen
Daniel Y Fu
Tri Dao
Stephen Baccus
Stefano Ermon
Christopher Re
Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the c… (voir plus)ore building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attention layers to match Transformers, indicating a gap in capability. In this work, we propose Hyena, a subquadratic drop-in replacement for attention constructed by interleaving implicitly parametrized long convolutions and data-controlled gating. In recall and reasoning tasks on sequences of thousands to hundreds of thousands of tokens, Hyena improves accuracy by more than 50 points over operators relying on state-spaces and other implicit and explicit methods, matching attention-based models. We set a new state-of-the-art for dense-attention-free architectures on language modeling in standard datasets (WikiText103 and The Pile), reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100x faster at sequence length 64K.
Unsupervised Layer-wise Score Aggregation for Textual OOD Detection
Maxime DARRIN
Guillaume Staerman
Eduardo Dadalto Câmara Gomes
Pierre Colombo
Interpret Your Care: Predicting the Evolution of Symptoms for Cancer Patients
Rupali Bhati
Jennifer Jones
Cancer treatment is an arduous process for patients and causes many side-effects during and post-treatment. The treatment can affect almost … (voir plus)all body systems and result in pain, fatigue, sleep disturbances, cognitive impairments, etc. These conditions are often under-diagnosed or under-treated. In this paper, we use patient data to predict the evolution of their symptoms such that treatment-related impairments can be prevented or effects meaningfully ameliorated. The focus of this study is on predicting the pain and tiredness level of a patient post their diagnosis. We implement an interpretable decision tree based model called LightGBM on real-world patient data consisting of 20163 patients. There exists a class imbalance problem in the dataset which we resolve using the oversampling technique of SMOTE. Our empirical results show that the value of the previous level of a symptom is a key indicator for prediction and the weighted average deviation in prediction of pain level is 3.52 and of tiredness level is 2.27.
Stochastic Generative Flow Networks
Ling Pan
Dinghuai Zhang
Moksh J. Jain
Longbo Huang
Generative Flow Networks (or GFlowNets for short) are a family of probabilistic agents that learn to sample complex combinatorial structures… (voir plus) through the lens of"inference as control". They have shown great potential in generating high-quality and diverse candidates from a given energy landscape. However, existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with stochastic dynamics, which can limit their applicability. To overcome this challenge, this paper introduces Stochastic GFlowNets, a new algorithm that extends GFlowNets to stochastic environments. By decomposing state transitions into two steps, Stochastic GFlowNets isolate environmental stochasticity and learn a dynamics model to capture it. Extensive experimental results demonstrate that Stochastic GFlowNets offer significant advantages over standard GFlowNets as well as MCMC- and RL-based approaches, on a variety of standard benchmarks with stochastic dynamics.
LAGrad: Statically Optimized Differentiable Programming in MLIR
Mai Jacob Peng
Effects of incoming particle energy and cluster size on the G-value of hydrated electrons.
Alaina Bui
H. Bekerat
Lilian Childress
Jack C Sankey
Jan Seuntjens
MOT: A Multi-Omics Transformer for Multiclass Classification Tumour Types Predictions
Mazid Osseni
Prudencio Tossou
Franccois Laviolette
Refactoring practices in the context of data-intensive systems
Biruk Asmare Muse
Giuliano Antoniol
Learning to Substitute Ingredients in Recipes
Bahare Fatemi
Quentin Duval
Rohit Girdhar
Michal Drozdzal
Recipe personalization through ingredient substitution has the potential to help people meet their dietary needs and preferences, avoid pote… (voir plus)ntial allergens, and ease culinary exploration in everyone's kitchen. To address ingredient substitution, we build a benchmark, composed of a dataset of substitution pairs with standardized splits, evaluation metrics, and baselines. We further introduce Graph-based Ingredient Substitution Module (GISMo), a novel model that leverages the context of a recipe as well as generic ingredient relational information encoded within a graph to rank plausible substitutions. We show through comprehensive experimental validation that GISMo surpasses the best performing baseline by a large margin in terms of mean reciprocal rank. Finally, we highlight the benefits of GISMo by integrating it in an improved image-to-recipe generation pipeline, enabling recipe personalization through user intervention. Quantitative and qualitative results show the efficacy of our proposed system, paving the road towards truly personalized cooking and tasting experiences.
New wave theory
Score-based Diffusion Models in Function Space
Jae Hyun Lim
Nikola B. Kovachki
R. Baptista
Christopher Beckham
Kamyar Azizzadenesheli
Jean Kossaifi
Vikram Voleti
Jiaming Song
Karsten Kreis
Jan Kautz
Arash Vahdat
Animashree Anandkumar
The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation
Kushal Arora
Jason Aaron Edward Weston
Jackie C.K.Cheung
State-of-the-art language generation models can degenerate when applied to open-ended generation problems such as text completion, story gen… (voir plus)eration, or dialog modeling. This degeneration usually shows up in the form of incoherence, lack of vocabulary diversity, and self-repetition or copying from the context. In this paper, we postulate that ``human-like'' generations usually lie in a narrow and nearly flat entropy band, and violation of these entropy bounds correlates with degenerate behavior. Our experiments show that this stable narrow entropy zone exists across models, tasks, and domains and confirm the hypothesis that violations of this zone correlate with degeneration. We then use this insight to propose an entropy-aware decoding algorithm that respects these entropy bounds resulting in less degenerate, more contextual, and"human-like"language generation in open-ended text generation settings.