Publications

Exploring Security Practices in Infrastructure as Code: An Empirical Study

Alexandre Verdet

Mohammad Hamdaqa

Leuson Da Silva

Cloud computing has become popular thanks to the widespread use of Infrastructure as Code (IaC) tools, allowing the community to convenientl… (see more)y manage and configure cloud infrastructure using scripts. However, the scripting process itself does not automatically prevent practitioners from introducing misconfigurations, vulnerabilities, or privacy risks. As a result, ensuring security relies on practitioners understanding and the adoption of explicit policies, guidelines, or best practices. In order to understand how practitioners deal with this problem, in this work, we perform an empirical study analyzing the adoption of IaC scripted security best practices. First, we select and categorize widely recognized Terraform security practices promulgated in the industry for popular cloud providers such as AWS, Azure, and Google Cloud. Next, we assess the adoption of these practices by each cloud provider, analyzing a sample of 812 open-source projects hosted on GitHub. For that, we scan each project configuration files, looking for policy implementation through static analysis (checkov). Additionally, we investigate GitHub measures that might be correlated with adopting these best practices. The category Access policy emerges as the most widely adopted in all providers, while Encryption in rest are the most neglected policies. Regarding GitHub measures correlated with best practice adoption, we observe a positive, strong correlation between a repository number of stars and adopting practices in its cloud infrastructure. Based on our findings, we provide guidelines for cloud practitioners to limit infrastructure vulnerability and discuss further aspects associated with policies that have yet to be extensively embraced within the industry.

2023-08-06

ArXiv (preprint)

doi.org

arxiv.org

Multi-variable Hard Physical Constraints for Climate Model Downscaling

Jose Gonz'alez-Abad

'Alex Hern'andez-Garc'ia

Paula Harder

David Rolnick

Jos'e Manuel Guti'errez

2023-08-01

ArXiv (preprint)

doi.org

arxiv.org

Are vividness judgments in mental imagery correlated with perceptual thresholds?

Ian Charest

Clémence Bertrand Pilon

Hugo Delhaye

Vincent Taschereau-Dumouchel

Frédéric Gosselin

2023-07-31

Journal of Vision (published)

doi.org

Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data

Philipp Thölke

Yorguin-Jose Mantilla-Ramos

Hamza Abdelhedi

Charlotte Maschke

Arthur Dehgan

Yann Harel

Anirudha Kemtur

Loubna Mekki Berrada

Myriam Sahraoui

Tammy Young

Antoine Bellemare Pépin

Clara El Khantour

Mathieu Landry

Annalisa Pascarella

Vanessa Hadid

Etienne Combrisson

Jordan O'Byrne

Karim Jerbi

Machine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of … (see more)ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric - defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.

2023-07-31

NeuroImage (published)

doi.org

Consultative engagement of stakeholders toward a roadmap for African language technologies

Kathleen Siminyu

Jade Abbott

Kọ́lá Túbọ̀sún

Aremu Anuoluwapo

Blessing Kudzaishe Sibanda

Kofi Yeboah

David Ifeoluwa Adelani

Masabata Mokgesi-Selinga

Frederick R. Apina

Angela Thandizwe Mthembu

Arshath Ramkilowan

Babatunde Oladimeji

2023-07-31

Patterns (published)

doi.org

A cop-winning strategy on strongly cop-win graphs

Jos'ee Desharnais

Franccois Laviolette

Héli Marcoux

Norbert Polat

2023-07-31

Discrete Mathematics (published)

doi.org

Decentralized Linear Quadratic Systems With Major and Minor Agents and Non-Gaussian Noise

Mohammad Afshari

Aditya Mahajan

A decentralized linear quadratic system with a major agent and a collection of minor agents is considered. The major agent affects the minor… (see more) agents, but not vice versa. The state of the major agent is observed by all agents. In addition, the minor agents have a noisy observation of their local state. The noise process is not assumed to be Gaussian. The structures of the optimal strategy and the best linear strategy are characterized. It is shown that the major agent's optimal control action is a linear function of the major agent's minimum mean-squared error (MMSE) estimate of the system state while the minor agent's optimal control action is a linear function of the major agent's MMSE estimate of the system state and a “correction term” that depends on the difference of the minor agent's MMSE estimate of its local state and the major agent's MMSE estimate of the minor agent's local state. Since the noise is non-Gaussian, the minor agent's MMSE estimate is a nonlinear function of its observation. It is shown that replacing the minor agent's MMSE estimate with its linear least mean square estimate gives the best linear control strategy. The results are proved using a direct method based on conditional independence, common-information-based splitting of state and control actions, and simplifying the per-step cost based on conditional independence, orthogonality principle, and completion of squares.

2023-07-31

IEEE Transactions on Automatic Control (published)

doi.org

arxiv.org

Determinants of Access to Essential Surgery in the Democratic Republic of Congo

Luc Malemo Kalisya

Ava Yap

Boniface Mitume

Christian Salmon

Kambale Karafuli

Dan Poenaru

Rosebella Onyango