JARV1S: Phenotype Clone Search for Rapid Zero-Day Malware Triage and Functional Decomposition for Cyber Threat Intelligence
Christopher Molloy
Philippe Charland
Steven H. H. Ding
Cyber threat intelligence (CTI) has become a critical component of the defense of organizations against the steady surge of cyber attacks. M… (voir plus)alware is one of the most challenging problems for CTI, due to its prevalence, the massive number of variants, and the constantly changing threat actor behaviors. Currently, Malpedia has indexed 2,390 unique malware families, while the AVTEST Institute has recorded more than 166 million new unique malware samples in 2021. There exists a vast number of variants per malware family. Consequently, the signature-based representation of patterns and knowledge of legacy systems can no longer be generalized to detect future malware attacks. Machine learning-based solutions can match more variants. However, as a black-box approach, they lack the explainability and maintainability required by incident response teams.There is thus an urgent need for a data-driven system that can abstract a future-proof, human-friendly, systematic, actionable, and dependable knowledge representation from software artifacts from the past for more effective and insightful malware triage. In this paper, we present the first phenotype-based malware decomposition system for quick malware triage that is effective against malware variants. We define phenotypes as directly observable characteristics such as code fragments, constants, functions, and strings. Malware development rarely starts from scratch, and there are many reused components and code fragments. The target under investigation is decomposed into known phenotypes that are mapped to known malware families, malware behaviors, and Advanced Persistent Threat (APT) groups. The implemented system provides visualizable phenotypes through an interactive tree map, helping the cyber analysts to navigate through the decomposition results. We evaluated our system on 200,000 malware samples, 100,000 benign samples, and a malware family with over 27,284 variants. The results indicate our system is scalable, efficient, and effective against zero-day malware and new variants of known families.
Agnostic Physics-Driven Deep Learning
Benjamin Scellier
Siddhartha Mishra
Yann Ollivier
Works for Me! Cannot Reproduce – A Large Scale Empirical Study of Non-reproducible Bugs
Mohammad Masudur Rahman
Marco Castelluccio
Contextual bandit optimization of super-resolution microscopy
Anthony Bilodeau
Renaud Bernatchez
Albert Michaud-Gagnon
Flavie Lavoie-Cardinal
Evaluating Multimodal Interactive Agents
Josh Abramson
Arun Ahuja
Federico Carnevale
Petko Georgiev
Alex Goldin
Alden Hung
Jessica Landon
Timothy P. Lillicrap
Alistair M. Muldal
Adam Santoro
Tamara von Glehn
Greg Wayne
Nathaniel Wong
Chen Yan
Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research. However, evaluating these… (voir plus) interactions is challenging: collecting online human-agent interactions is slow and expensive, yet faster proxy metrics often do not correlate well with interactive evaluation. In this paper, we assess the merits of these existing evaluation metrics and present a novel approach to evaluation called the Standardised Test Suite (STS). The STS uses behavioural scenarios mined from real human interaction data. Agents see replayed scenario context, receive an instruction, and are then given control to complete the interaction offline. These agent continuations are recorded and sent to human annotators to mark as success or failure, and agents are ranked according to the proportion of continuations in which they succeed. The resulting STS is fast, controlled, interpretable, and representative of naturalistic interactions. Altogether, the STS consolidates much of what is desirable across many of our standard evaluation metrics, allowing us to accelerate research progress towards producing agents that can interact naturally with humans. A video may be found at https://youtu.be/YR1TngGORGQ.
Assessing the Quality of Direct-to-Consumer Teleconsultation Services in Canada
Jean Noel Nikiema
Eleah Stringer
Marie-Pierre Moreault
Priscille Pana
Marco Laverdiere
Jean-Louis Denis
Béatrice Godard
Mylaine Breton
Guy Paré
Aviv Shachak
Claudia Lai
Elizabeth M. Borycki
Andre W. Kushniruk
Aude Motulsky
A Conceptual Framework for Representing Events Under Public Health Surveillance
Anya Okhmatovskaia
Yannan Shen
Iris Ganser
Nigel Collier
Nicholas B King
Zaiqiao Meng
Information integration across multiple event-based surveillance (EBS) systems has been shown to improve global disease surveillance in expe… (voir plus)rimental settings. In practice, however, integration does not occur due to the lack of a common conceptual framework for encoding data within EBS systems. We aim to address this gap by proposing a candidate conceptual framework for representing events and related concepts in the domain of public health surveillance.
MaskEval: Weighted MLM-Based Evaluation for Text Summarization and Simplification
Yu Lu Liu
Rachel Bawden
Thomas Scaliom
Benoı̂t Sagot
AB0393 SURVIVAL ON JANUS KINASE INHIBITORS VERSUS OTHER ADVANCED THERAPIES IN RHEUMATOID ARTHRITIS
N. Bakhtiar
Leanne Gray
S. Bilgrami
Lesley Lesley Ottewell
Frank Wood
Mohsin Bukhari
ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning
Sean Chen
Jensen Gao
Siddharth Reddy
Anca Dragan
Sergey Levine
Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e.g., webcam images of eye gaze) can… (voir plus) be challenging, especially when it involves inferring the user's desired action in the absence of a natural ‘default’ interface. Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem, and enables the interface to adapt to individual users. However, this approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse. We propose a hierarchical solution that learns efficiently from sparse user feedback: we use offline pre-training to acquire a latent embedding space of useful, high-level robot behaviors, which, in turn, enables the system to focus on using online user feedback to learn a mapping from user inputs to desired high-level behaviors. The key insight is that access to a pre-trained policy enables the system to learn more from sparse rewards than a naïve RL algorithm: using the pre-trained policy, the system can make use of successful task executions to relabel, in hindsight, what the user actually meant to do during unsuccessful executions. We evaluate our method primarily through a user study with 12 participants who perform tasks in three simulated robotic manipulation domains using a webcam and their eye gaze: flipping light switches, opening a shelf door to reach objects inside, and rotating a valve. The results show that our method successfully learns to map 128-dimensional gaze features to 7-dimensional joint torques from sparse rewards in under 10 minutes of online training, and seamlessly helps users who employ different gaze strategies, while adapting to distributional shift in webcam inputs, tasks, and environments
Diffusion Kurtosis Imaging of the neonatal Spinal Cord: design and application of the first processing pipeline implemented in Spinal Cord Toolbox
Rosella Trò
Monica Roascio
Domenico Tortora
Mariasavina Severino
Andrea Rossi
Marco Massimo Fato
Gabriele Arnulfo
Diffusion Kurtosis Imaging (DKI) has undisputed advantages over more classical diffusion Magnetic Resonance Imaging (dMRI), as witnessed by … (voir plus)a fast-increasing number of clinical applications and software packages widely adopted in brain imaging domain. Despite its power in probing tissue microstructure compared to conventional MRI, DKI is still largely underutilized in Spinal Cord (SC) imaging because of its inherently demanding technological requirements. If state-of-the-art hardware advancements have recently allowed to make great strides in applying this emerging method to adult and child SC, the same does not apply to neonatal setting. Indeed, amplified technical issues related to SC district in this age range have made this field still unexplored. However, results arising from recent application of DKI to adult and child SC are promising enough to suggest how informative this technique would be in investigating newborns, too. Due to its extreme sensitivity to non-gaussian diffusion, DKI proves particularly suitable for detecting complex, subtle, fast microstructural changes occurring in this area at this early and critical stage of development, and not identifiable with only DTI. Given the multiplicity of congenital anomalies of the spinal canal, their crucial effect on later developmental outcome, and the close interconnection between SC region and the above brain, managing to apply such a method to neonatal cohort becomes of utmost importance. In this work, we illustrate the first semi-automated pipeline for handling with DKI data of neonatal SC, from acquisition setting to estimation of diffusion (DTI & DKI) measures, through accurate adjustment of processing algorithms customized for adult SC. Each processing step of this pipeline, built on Spinal Cord Toolbox (SCT) software, has undergone Quality Control check by supervision of an expert pediatric neuroradiologist, and the overall procedure has preliminarily been tested in a pilot clinical case study. Results of this application agree with findings achieved in a corresponding adult survey, thus confirming validity of adopted pipeline and diagnostic value of DKI in pediatrics. This novel tool hence paves the wave for extending its application also to other promising advanced dMRI models, such as Neurite Orientation Dispersion and Density Imaging (NODDI), and to a wider range of potential clinical applications concerning neonatal period.
Improving Source Separation by Explicitly Modeling Dependencies between Sources
Ethan Manilow
Curtis Hawthorne
Bryan Pardo
Jesse Engel
We propose a new method for training a supervised source separation system that aims to learn the interdependent relationships between all c… (voir plus)ombinations of sources in a mixture. Rather than independently estimating each source from a mix, we reframe the source separation problem as an Orderless Neural Autoregressive Density Estimator (NADE), and estimate each source from both the mix and a random subset of the other sources. We adapt a standard source separation architecture, Demucs, with additional inputs for each individual source, in addition to the input mixture. We randomly mask these input sources during training so that the network learns the conditional dependencies between the sources. By pairing this training method with a blocked Gibbs sampling procedure at inference time, we demonstrate that the network can iteratively improve its separation performance by conditioning a source estimate on its earlier source estimates. Experiments on two source separation datasets show that training a Demucs model with an Orderless NADE approach and using Gibbs sampling (up to 512 steps) at inference time strongly outperforms a Demucs baseline that uses a standard regression loss and direct (one step) estimation of sources.