Clara Lacroce

Rémi Eyraud

Rémi Emonet

Amaury Habrard

2024-04-17

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Simulating Weighted Automata over Sequences and Trees with Transformers

Michael Rizvi

Maude Lizaire

Transformers are ubiquitous models in the natural language processing (NLP) community and have shown impressive empirical successes in the p… (see more)ast few years. However, little is understood about how they reason and the limits of their computational capabilities. These models do not process data sequentially, and yet outperform sequential neural models such as RNNs. Recent work has shown that these models can compactly simulate the sequential reasoning abilities of deterministic finite automata (DFAs). This leads to the following question: can transformers simulate the reasoning of more complex finite state machines? In this work, we show that transformers can simulate weighted finite automata (WFAs), a class of models which subsumes DFAs, as well as weighted tree automata (WTA), a generalization of weighted automata to tree structured inputs. We prove these claims formally and provide upper bounds on the sizes of the transformer models needed as a function of the number of states the target automata. Empirically, we perform synthetic experiments showing that transformers are able to learn these compact solutions via standard gradient-based training.

2024-04-17

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Optimal Approximate Minimization of One-Letter Weighted Finite Automata

Borja Balle

Prakash Panangaden

In this paper, we study the approximate minimization problem of weighted finite automata (WFAs): to compute the best possible approximation … (see more)of a WFA given a bound on the number of states. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory. We solve the optimal spectral-norm approximate minimization problem for irredundant WFAs with real weights, defined over a one-letter alphabet. We present a theoretical analysis based on AAK theory, and bounds on the quality of the approximation in the spectral norm and

2023-12-31

Mathematical Structures in Computer Science (published)

openreview.net

Towards an AAK Theory Approach to Approximate Minimization in the Multi-Letter Case

Prakash Panangaden

We study the approximate minimization problem of weighted finite automata (WFAs): given a WFA, we want to compute its optimal approximation … (see more)when restricted to a given size. We reformulate the problem as a rank-minimization task in the spectral norm, and propose a framework to apply Adamyan-Arov-Krein (AAK) theory to the approximation problem. This approach has already been successfully applied to the case of WFAs and language modelling black boxes over one-letter alphabets \citep{AAK-WFA,AAK-RNN}. Extending the result to multi-letter alphabets requires solving the following two steps. First, we need to reformulate the approximation problem in terms of noncommutative Hankel operators and noncommutative functions, in order to apply results from multivariable operator theory. Secondly, to obtain the optimal approximation we need a version of noncommutative AAK theory that is constructive. In this paper, we successfully tackle the first step, while the second challenge remains open.

2022-05-31

ArXiv (preprint)

arxiv.org

Extracting Weighted Automata for Approximate Minimization in Language Modelling

Prakash Panangaden

In this paper we study the approximate minimization problem for language modelling. We assume we are given some language model as a black bo… (see more)x. The objective is to obtain a weighted finite automaton (WFA) that fits within a given size constraint and which mimics the behaviour of the original model while minimizing some notion of distance between the black box and the extracted WFA. We provide an algorithm for the approximate minimization of black boxes trained for language modelling of sequential data over a one-letter alphabet. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory. This allows us to use the spectral norm to measure the distance between the black box and the WFA. We provide theoretical guarantees to study the potentially infinite-rank Hankel matrix of the black box, without accessing the training data, and we prove that our method returns an asymptotically-optimal approximation.

2021-08-24

Proceedings of the Fifteenth International Conference on Grammatical Inference (published)