Mila is hosting its first quantum computing hackathon on November 21, a unique day to explore quantum and AI prototyping, collaborate on Quandela and IBM platforms, and learn, share, and network in a stimulating environment at the heart of Quebec’s AI and quantum ecosystem.
This new initiative aims to strengthen connections between Mila’s research community, its partners, and AI experts across Quebec and Canada through in-person meetings and events focused on AI adoption in industry.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
This paper introduces Radical Questioning (RQ), a structured, pre-design ethics framework developed to assess whether artificial intelligenc… (see more)e (AI) should be applied to complex social problems rather than merely how. While much of responsible AI development focuses on aligning systems with principles such as fairness, transparency, and accountability, it often begins after the decision to build has already been made, implicitly treating the deployment of AI as a given rather than a question in itself. In domains such as human trafficking, marked by contested definitions, systemic injustice, and deep stakeholder asymmetries, such assumptions can obscure foundational ethical concerns. RQ offers an upstream, deliberative process for surfacing these concerns before design begins. Drawing from critical theory, participatory ethics, and relational responsibility, RQ formalizes a five-step framework to interrogate problem framings, confront techno-solutionist tendencies, and reflect on the moral legitimacy of intervention. Developed through interdisciplinary collaboration and engagement with survivor-led organizations, RQ was piloted in the domain of human trafficking (HT) which is a particularly high-stakes and ethically entangled application area. Its use led to a fundamental design shift: away from automated detection tools and toward survivor-controlled, empowerment-based technologies. We argue that RQ's novelty lies in both its temporal position, i.e, prior to technical design, and its orientation toward domains where harm is structural and ethical clarity cannot be achieved through one-size-fits-all solutions. RQ thus addresses a critical gap between abstract principles of responsible AI and the lived ethical demands of real-world deployment.
2025-10-15
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (published)
Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statem… (see more)ents with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration—where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty Large Language Models hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing and correcting this miscalibration, offering a path to safer and more trustworthy AI across domains.
Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statem… (see more)ents with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration—where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty Large Language Models hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing and correcting this miscalibration, offering a path to safer and more trustworthy AI across domains.
Human trafficking (HT) for forced sexual exploitation, often described as modern-day slavery, is a pervasive problem that affects millions o… (see more)f people worldwide. Perpetrators of this crime post advertisements (ads) on behalf of their victims on adult service websites (ASW). These websites typically contain hundreds of thousands of ads including those posted by independent escorts, massage parlor agencies and spammers (fake ads). Detecting suspicious activity in these ads is difficult and developing data-driven methods is challenging due to the hard-to-label, complex and sensitive nature of the data.
In this paper, we propose T-Net, which unlike previous solutions, formulates this problem as weakly supervised classification. Since it takes several months to years to investigate a case and obtain a single definitive label, we design domain-specific signals or indicators that provide weak labels. T-Net also looks into connections between ads and models the problem as a graph learning task instead of classifying ads independently. We show that T-Net outperforms all baselines on a real-world dataset of ads by 7% average weighted F1 score. Given that this data contains personally identifiable information, we also present a realistic data generator and provide the first publicly available dataset in this domain which may be leveraged by the wider research community.
2024-03-24
AAAI Conference on Artificial Intelligence (published)
Human trafficking (HT) for forced sexual exploitation, often described as modern-day slavery, is a pervasive problem that affects millions o… (see more)f people worldwide. Perpetrators of this crime post advertisements (ads) on behalf of their victims on adult service websites (ASW). These websites typically contain hundreds of thousands of ads including those posted by independent escorts, massage parlor agencies and spammers (fake ads). Detecting suspicious activity in these ads is difficult and developing data-driven methods is challenging due to the hard-to-label, complex and sensitive nature of the data.
In this paper, we propose T-Net, which unlike previous solutions, formulates this problem as weakly supervised classification. Since it takes several months to years to investigate a case and obtain a single definitive label, we design domain-specific signals or indicators that provide weak labels. T-Net also looks into connections between ads and models the problem as a graph learning task instead of classifying ads independently. We show that T-Net outperforms all baselines on a real-world dataset of ads by 7% average weighted F1 score. Given that this data contains personally identifiable information, we also present a realistic data generator and provide the first publicly available dataset in this domain which may be leveraged by the wider research community.
2024-03-24
Proceedings of the AAAI Conference on Artificial Intelligence (published)
In this work, we propose a weak supervision pipeline SWEET: Supervise Weakly for Entity Extraction to fight Trafficking for extracting perso… (see more)n names from noisy escort advertisements. Our method combines the simplicity of rule-matching (through antirules, i.e., negated rules) and the generalizability of large language models fine-tuned on benchmark, domain-specific and synthetic datasets, treating them as weak labels.
One of the major challenges in this domain is limited labeled data. SWEET addresses this by obtaining multiple weak labels through labeling functions and effectively aggregating them. SWEET outperforms the previous supervised SOTA method for this task by 9% F1 score on domain data and better generalizes to common benchmark datasets. Furthermore, we also release HTGEN, a synthetically generated dataset of escort advertisements (built using ChatGPT) to facilitate further research within the community.
2023-12-01
Findings of the Association for Computational Linguistics: EMNLP 2023 (published)
Online sex trafficking is on the rise and a majority of trafficking victims report being advertised online. The use of OnlyFans as a platfor… (see more)m for adult content is also increasing, with Twitter as its main advertising tool. Furthermore, we know that traffickers usually work within a network and control multiple victims. Consequently, we suspect that there may be networks of traffickers promoting multiple OnlyFans accounts belonging to their victims. To this end, we present the first study of OnlyFans advertisements on Twitter in the context of finding organized activities. Preliminary analysis of this space shows that most tweets related to OnlyFans contain generic text, making text-based methods less reliable. Instead, focusing on what ties the authors of these tweets together, we propose a novel method for uncovering coordinated networks of users based on their behaviour. Our method, called Multi-Level Clustering (MLC), combines two levels of clustering that considers both the network structure as well as embedded node attribute information. It focuses jointly on user connections (through mentions) and content (through shared URLs). We apply MLC to real-world data of 2 million tweets pertaining to OnlyFans and analyse the detected groups. We also evaluate our method on synthetically generated data (with injected ground truth) and show its superior performance compared to competitive baselines. Finally, we discuss examples of organized clusters as case studies and provide interesting conclusions to our study.
2023-04-30
Proceedings of the 15th ACM Web Science Conference 2023 (published)
Law enforcement and domain experts can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected… (see more) ads. How can we explain clustering results intuitively and interactively, visualizing potential evidence for experts to analyze? We present TrafficVis, the first interface for cluster-level HT detection and labeling. Developed through months of participatory design with domain experts, TrafficVis provides coordinated views in conjunction with carefully chosen backend algorithms to effectively show spatio-temporal and text patterns to a wide variety of anti-HT stakeholders. We build upon state-of-the-art text clustering algorithms by incorporating shared metadata as a signal of connected and possibly suspicious activity, then visualize the results. Domain experts can use TrafficVis to label clusters as HT, or other, suspicious, but non-HT activity such as spam and scam, quickly creating labeled datasets to enable further HT research. Through domain expert feedback and a usage scenario, we demonstrate TRAFFICVIS's efficacy. The feedback was overwhelmingly positive, with repeated high praises for the usability and explainability of our tool, the latter being vital for indicting possible criminals.
2023-01-01
IEEE Transactions on Visualization and Computer Graphics (published)
Human trafficking analysts investigate groups of related online escort advertisements (called micro-clusters) to detect suspicious activitie… (see more)s and identify various modus operandi. This task is complex as it requires finding patterns and linked meta-data across micro-clusters such as the geographical spread of ads, cluster sizes, etc. Additionally, drawing insights from the data is challenging without visualizing these micro-clusters. To address this, in close-collaboration with domain experts, we built VisPaD, a novel interactive way for characterizing and visualizing micro-clusters and their associated meta-data, all in one place. VisPaD helps discover underlying patterns in the data by projecting micro-clusters in a lower dimensional space. It also allows the user to select micro-clusters involved in suspicious patterns and interactively examine them leading to faster detection and identification of trends in the data. A demo of VisPaD is also released1.