Publications

CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Andrii Zadaianchuk
Maximilian Seitzer
Efstratios Gavves
Object-centric representation learning aims to decompose visual scenes into fixed-size vectors called "slots" or "object files", where each … (voir plus)slot captures a distinct object. Current state-of-the-art models have shown remarkable success in object discovery, particularly in complex real-world scenes, while also generalizing well to unseen domains. However, these models suffer from a key limitation: they lack controllability. Specifically, current object-centric models learn representations based on their preconceived understanding of objects and parts, without allowing user input to guide or modify which objects are represented. Introducing controllability into object-centric models could unlock a range of useful capabilities, such as enabling models to represent scenes at variable levels of granularity based on user specification. In this work, we propose a novel approach that conditions slot representations through guided decomposition, paired with a novel contrastive learning objective, to enable user-directed control over which objects are represented. Our method achieves such controllability without any mask supervision and successfully binds to user-specified objects in complex real-world scenes.
Data-Driven Combinatorial Optimisation (Dagstuhl Seminar 22431)
Andrea Lodi
Michele Lombardi
Neil Yorke-Smith
Machine learning’s impressive achievements in the last decade have urged many scientific communities to ask if and how the techniques deve… (voir plus)loped in that field to leverage data could be used to advance research in others. The combinatorial optimisation community is one of those
Data-Efficient Structured Pruning via Submodular Optimization
Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performa… (voir plus)nce. However, most current structured pruning methods do not provide any performance guarantees, and often require fine-tuning, which makes them inapplicable in the limited-data regime. We propose a principled data-efficient structured pruning method based on submodular optimization. In particular, for a given layer, we select neurons/channels to prune and corresponding new weights for the next layer, that minimize the change in the next layer's input induced by pruning. We show that this selection problem is a weakly submodular maximization problem, thus it can be provably approximated using an efficient greedy algorithm. Our method is guaranteed to have an exponentially decreasing error between the original model and the pruned model outputs w.r.t the pruned size, under reasonable assumptions. It is also one of the few methods in the literature that uses only a limited-number of training data and no labels. Our experimental results demonstrate that our method outperforms state-of-the-art methods in the limited-data regime.
Deposited in DRO : 17 January 2022 Version of attached le : Accepted Version Peer-review status of attached
Nelly Bencomo
Jin L.C. Guo
Rachel Harrison
Hans-Martin Heyn
Tim Menzies
Much has been written about the algorithmic role that AI plays for automation in SE. But what about the role of AI, augmented by human knowl… (voir plus)edge? Can we make a profound advance by combining human and artificial intelligence? Researchers in requirements engineering think so, arguing that requirement engineering is the secret weapon for better AI and better software. Much has been written about the algorithmic role that AI plays for automation in SE. But what about the role of AI, augmented by human knowledge? Can we make a profound advance by combining human and artificial intelligence? Researchers in requirements engineering think so, arguing that requirement engineering is the secret weapon for better AI and better software1. To begin, we first need a definition. What is requirements engineering or RE? RE used to be viewed as an early lifecycle activity that proceeded analysis, design, coding and testing. For safety critical applications there is certainly a pressing need to create those requirements before the coding starts (we will return to this point, later in the paper). However, in this age of DevOps and Autonomous and Self-adaptive systems, requirements can happen at many other times in a software project[15], [14]. We say that: Requirements engineering is any discussion about what to build and how to trade-off competing cost/benefits. It can happen before, during, or after runtime. 1This paper is based on the Panel “Artificial Intelligence and Requirement Engineering: Challenges and Opportunities”, which took place at the Eighth International Workshop on Artificial Intelligence and Requirements Engineering (AIRE). As shown in Table 1 and Table 2, there are many ways AI can help RE, across a broad range of SE activities. But, what about the other way around? If we add more requirements into AI, and use RE methods to get truly desired requirements, can we make better software by combining human and artificial intelligence? In our view, when integrating AI into software engineering is a co-design problem between humans, the AI model, the data required to train and validate the desired behaviour, and the hardware running the AI model, in addition to the classical software components. This means that when integrating AI, you need to know and understand the context of the system in which you want to apply your AI model to derive the necessary model requirements [17]. For example, in the arena of safety critical systems, model construction must be guided by safety requirements. one challenge for AI in RE are safety standards that base on the EN-IEC 61508 standard2. These safety standards assume that for software only systematic faults exists. Therefore, they emphasise correct processes and the creation of lifecycle artifacts to minimise systematic mistakes during both the 2Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems; for example ISO 26262 for the automotive sector or IEC 61511 for the process industry. IEEE Software (submitted) Published by the IEEE Computer Society © 2021 IEEE 1
Detecting Languages Unintelligible to Multilingual Models through Local Structure Probes
Providing better language tools for low-resource and endangered languages is imperative for equitable growth. Recent progress with massively… (voir plus) multilingual pretrained models has proven surprisingly effective at performing zero-shot transfer to a wide variety of languages. However, this transfer is not universal, with many languages not currently understood by multilingual approaches. It is estimated that only 72 languages possess a "small set of labeled datasets" on which we could test a model's performance, the vast majority of languages not having the resources available to simply evaluate performances on. In this work, we attempt to clarify which languages do and do not currently benefit from such transfer. To that end, we develop a general approach that requires only unlabelled text to detect which languages are not well understood by a cross-lingual model. Our approach is derived from the hypothesis that if a model's understanding is insensitive to perturbations to text in a language, it is likely to have a limited understanding of that language. We construct a cross-lingual sentence similarity task to evaluate our approach empirically on 350, primarily low-resource, languages.
Discrete Factorial Representations as an Abstraction for Goal Conditioned RL
Hongyu Zang
Xin Li
Romain Laroche
Remi Tachet des Combes
Discrete-Valued Neural Communication in Structured Architectures Enhances Generalization
Dianbo Liu
Chen Sun
Michael C. Mozer
Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed… (voir plus) of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore the hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck. The motivating intuition is human language in which communication occurs through discrete symbols. Even though individuals have different understandings of what a "cat" is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation. To discretize the values of concepts dynamically communicated among specialist components, we extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication (DVNC). Our experiments show that DVNC substantially improves systematic generalization in a variety of architectures -- transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method very useful in practice. Moreover, we establish a theoretical justification of our discretization process, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.
Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA
Pau Rodríguez
Yash Sharma
Katie E Everett
Rémi Le Priol
Alexandre Lacoste
This work introduces a novel principle we call disentanglement via mechanism sparsity regularization, which can be applied when the latent f… (voir plus)actors of interest depend sparsely on past latent factors and/or observed auxiliary variables. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that relates them. We develop a rigorous identifiability theory, building on recent nonlinear independent component analysis (ICA) results, that formalizes this principle and shows how the latent variables can be recovered up to permutation if one regularizes the latent mechanisms to be sparse and if some graph connectivity criterion is satisfied by the data generating process. As a special case of our framework, we show how one can leverage unknown-target interventions on the latent factors to disentangle them, thereby drawing further connections between ICA and causality. We propose a VAE-based method in which the latent mechanisms are learned and regularized via binary masks, and validate our theory by showing it learns disentangled representations in simulations.
Distinguishing between fake news and satire with transformers
Jwen Fai Low
Benjamin C. M. Fung
Farkhund Iqbal
Shih-Chia Huang
DsMLP: A Learning-Based Multi-Layer Perception for MIMO Detection Implemented by Dynamic Stochastic Computing
Qidie Wu
Jinsheng Kuang
Jiyun Tao
Jienan Chen
Warren J. Gross
As the number of antennas increases in multi-input and multi-output (MIMO) systems, even linear detection methods suffer from sharply increa… (voir plus)sing complexity. This paper proposes a learning-based multi-layer perception (MLP), named dynamic stochastic multi-layer perception (DsMLP), which is implemented by dynamic stochastic computing (DSC). We first establish a similar form between the MLP structure and minimum mean square error (MMSE) matrix operations. Consequently, DsMLP transforms the complex computation problem into an optimization problem of MLP training. Due to the specific design of MLP structure, e.g., same input/output dimension and single layer without activation function, the mathematical representation of DsMLP is identical to the MMSE matrix operations. Therefore, DsMLP guarantees sound model explainability in mathematics, fast convergence in training, and low complexity in computation. Furthermore, we transform the MLP training process to the DSC domain and propose a hardware-efficient scheme for DsMLP. Compared with other state-of-the-art MIMO detectors, DsMLP achieves 1.2× energy efficiency and 1.74× area efficiency.
DyG2Vec: Representation Learning for Dynamic Graphs with Self-Supervision
Mohammad Alomrani
Mahdi Biparva
Yingxue Zhang
Mark J. Coates
Temporal graph neural networks have shown promising results in learning inductive representations by automatically extracting temporal patte… (voir plus)rns. However, previous works often rely on complex memory modules or inefficient random walk methods to construct temporal representations. In addition, the existing dynamic graph encoders are non-trivial to adapt to self-supervised paradigms, which prevents them from utilizing unlabeled data. To address these limitations, we present an efficient yet effective attention-based encoder that leverages temporal edge encodings and window-based subgraph sampling to generate task-agnostic embeddings. Moreover, we propose a joint-embedding architecture using non-contrastive SSL to learn rich temporal embeddings without labels. Experimental results on 7 benchmark datasets indicate that on average, our model outperforms SoTA baselines on the future link prediction task by 4.23% for the transductive setting and 3.30% for the inductive setting while only requiring 5-10x less training/inference time. Additionally, we empirically validate the SSL pre-training significance under two probings commonly used in language and vision modalities. Lastly, different aspects of the proposed framework are investigated through experimental analysis and ablation studies.
Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution
Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochastic Polyak stepsize (SPS). The proposed … (voir plus)SPS comes with strong convergence guarantees and competitive performance; however, it has two main drawbacks when it is used in non-over-parameterized regimes: (i) It requires a priori knowledge of the optimal mini-batch losses, which are not available when the interpolation condition is not satisfied (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. In this work, we study the dynamics and the convergence properties of SGD equipped with new variants of the stochastic Polyak stepsize and provide solutions to both drawbacks of the original SPS. We first show that a simple modification of the original SPS that uses lower bounds instead of the optimal function values can directly solve issue (i). On the other hand, solving issue (ii) turns out to be more challenging and leads us to valuable insights into the method's behavior. We show that if interpolation is not satisfied, the correlation between SPS and stochastic gradients introduces a bias, which effectively distorts the expectation of the gradient signal near minimizers, leading to non-convergence - even if the stepsize is scaled down during training. To fix this issue, we propose DecSPS, a novel modification of SPS, which guarantees convergence to the exact minimizer - without a priori knowledge of the problem parameters. For strongly-convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.