We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Adversarial Training with Synthesized Data: A Path to Robust and Generalizable Neural Networks
Adversarial Training (AT) is a well-known framework designed to mitigate adversarial vulnerabilities in neural networks. Recent research ind… (see more)icates that incorporating adversarial examples (AEs) in training can enhance models' generalization capabilities. To understand the impact of AEs on learning dynamics, we study AT through the lens of sample difficulty methodologies. Our findings show that AT leads to more stable learning dynamics compared to Natural Training (NT), resulting in gradual performance improvements and less overconfident predictions. This suggests that AT steers training away from learning easy, perturbable spurious features toward more resilient and generalizable ones. However, a trade-off exists between adversarial robustness and generalization gains, due to robust overfitting, limiting practical deployment. To address this, we propose using synthesized data to bridge this gap. Our results demonstrate that AT benefits significantly from synthesized data, whereas NT does not, enhancing generalization without compromising robustness and offering new avenues for developing robust and generalizable models.
Economic evaluation of the effect of needle and syringe programs on skin, soft tissue, and vascular infections in people who inject drugs: a microsimulation modelling approach
Language model capabilities predictably improve from scaling a model's size and training data. Motivated by this, increasingly large languag… (see more)e models have been trained, yielding an array of impressive capabilities. Yet these models are vulnerable to adversarial prompts, such as"jailbreaks"that hijack models to perform undesired behaviors, posing a significant risk of misuse. Prior work indicates that computer vision models become more robust with model and data scaling, raising the question: does language model robustness also improve with scale? We study this question empirically, finding that larger models respond substantially better to adversarial training, but there is little to no benefit from model scale in the absence of explicit defenses.
Predicting the Population Risk of Suicide Using Routinely Collected Health Administrative Data in Quebec, Canada: Model-Based Synthetic Estimation Study
Background Suicide is a significant public health issue. Many risk prediction tools have been developed to estimate an individual’s risk o… (see more)f suicide. Risk prediction models can go beyond individual risk assessment; one important application of risk prediction models is population health planning. Suicide is a result of the interaction among the risk and protective factors at the individual, health care system, and community levels. Thus, policy and decision makers can play an important role in suicide prevention. However, few prediction models for the population risk of suicide have been developed. Objective This study aims to develop and validate prediction models for the population risk of suicide using health administrative data, considering individual-, health system–, and community-level predictors. Methods We used a case-control study design to develop sex-specific risk prediction models for suicide, using the health administrative data in Quebec, Canada. The training data included all suicide cases (n=8899) that occurred from January 1, 2002, to December 31, 2010. The control group was a 1% random sample of living individuals in each year between January 1, 2002, and December 31, 2010 (n=645,590). Logistic regression was used to develop the prediction models based on individual-, health care system–, and community-level predictors. The developed model was converted into synthetic estimation models, which concerted the individual-level predictors into community-level predictors. The synthetic estimation models were directly applied to the validation data from January 1, 2011, to December 31, 2019. We assessed the performance of the synthetic estimation models with four indicators: the agreement between predicted and observed proportions of suicide, mean average error, root mean square error, and the proportion of correctly identified high-risk regions. Results The sex-specific models based on individual data had good discrimination (male model: C=0.79; female model: C=0.85) and calibration (Brier score for male model 0.01; Brier score for female model 0.005). With the regression-based synthetic models applied in the validation data, the absolute differences between the synthetic risk estimates and observed suicide risk ranged from 0% to 0.001%. The root mean square errors were under 0.2. The synthetic estimation model for males correctly predicted 4 of 5 high-risk regions in 8 years, and the model for females correctly predicted 4 of 5 high-risk regions in 5 years. Conclusions Using linked health administrative databases, this study demonstrated the feasibility and the validity of developing prediction models for the population risk of suicide, incorporating individual-, health system–, and community-level variables. Synthetic estimation models built on routinely collected health administrative data can accurately predict the population risk of suicide. This effort can be enhanced by timely access to other critical information at the population level.
Predicting the Population Risk of Suicide Using Routinely Collected Health Administrative Data in Quebec, Canada: Model-Based Synthetic Estimation Study
Vision-Language Models (VLMs) have witnessed a surge in both research and real-world applications. However, as they becoming increasingly pr… (see more)evalent, ensuring their robustness against adversarial attacks is paramount. This work systematically investigates the impact of model design choices on the adversarial robustness of VLMs against image-based attacks. Additionally, we introduce novel, cost-effective approaches to enhance robustness through prompt formatting. By rephrasing questions and suggesting potential adversarial perturbations, we demonstrate substantial improvements in model robustness against strong image-based attacks such as Auto-PGD. Our findings provide important guidelines for developing more robust VLMs, particularly for deployment in safety-critical environments.
Yoruba—an African language with roughly 47 million speakers—encompasses a continuum with several dialects. Recent efforts to develop NLP… (see more) technologies for African languages have focused on their standard dialects, resulting in disparities for dialects and varieties for which there are little to no resources or tools. We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus; YORULECT across three domains and four regional yoruba dialects. To develop this corpus, we engaged native speakers, traveling to communities where these dialects are spoken, to collect text and speech data. Using our newly created corpus, we conducted extensive experiments on (text) machine translation, automatic speech recognition, and speech-to-text translation. Our results reveal substantial performance disparities between standard yoruba and the other dialects across all tasks. However, we also show that with dialect-adaptive finetuning, we are able to narrow this gap. We believe our dataset and experimental analysis will contribute greatly to developing NLP tools for Yoruba and its dialects, and potentially for other African languages, by improving our understanding of existing challenges and offering a high-quality dataset for further development. We will release YORULECT dataset and models publicly under an open license.