Mila > AIR

AIR


The field of Artificial intelligence (AI) is moving fast, and policy makers from around the world are creating guidelines, rules and regulations to keep pace. 

However, navigating the field of regulatory changes can be difficult. We spoke with a combination of lawyers, judges, policy makers and academics who highlighted the challenges they’ve faced in obtaining consolidated, up-to-date information about global policy developments. They all pointed to the need for a tool that could help with the discovery and curation of AI-related policy changes without having to deal with paywalls. Furthermore, they wanted help in the early research phase; obtaining feedback on research avenues and ideas – almost like a brainstorming companion, so to speak. 

While the OECD.AI Observatory has been a trusted resource for many, there was a desire for more efficient access to information, one that could help initiate policy research.

That’s why Mila partnered with the OECD’s AI Observatory, to tackle the challenges faced by those working in AI policy and adjacent fields. In collaboration, we’re developing AIR, a question-and-answer (Q&A) tool providing easy access to high-quality information about AI policies from around the world, along with insightful analyses produced by the OECD’s team of experts.

Objectives


While existing tools use keyword searches to match and retrieve relevant policy documents, AIR uses cutting-edge AI technology (specifically Large Language Models) to enhance the retrieval and consolidation process of information. AIR understands the nature of your question in a more contextualized way and generates condensed, natural language responses; providing information that is more readily available than if the user had to search through an entire database of policy documents.

AIR is restricted in the types of information it can provide to the user. It can only provide information that is contained within the OECD AI Observatory’s dataset, which allows us to ensure relevant and high-quality responses.

Test the beta version

AIR is using Buster, an open-source Python library. Once finalized, AIR will help policymakers and anyone else interested in AI policies to get answers they can trust.

Test the beta version of AIR on Hugging Face now

Who’s Driving AIR at Mila? 


AIR’s work is being led by Mila’s AI for Humanity team: Hadrien Bertrand, Allison Cohen, Jeremy Pinto, Benjamin Prud’homme and Jerome Solis, along with OECD’s team: Luis Aranda, Fabio Curipaixao and Jan Sturm.

Hadrien Bertrand
Senior Applied Research Scientist, Applied Research Team, Mila

Allison Cohen
Senior Applied AI Projects Manager, Mila

Jeremy Pinto
Senior Applied Research Scientist, Applied Research Team, Mila

Benjamin Prud’homme
Vice-President, Policy, Society and Global Affairs, Mila

Jerome Solis
Director, Applied AI Projects, AI for Humanity, Mila

Luis Aranda, OECD

Fabio Curipaixao, OECD

Jan Sturm, OECD

FAQ

Learn more about the open-source Python library, Buster.

What is Buster?

Buster is an open-source Python library developed by Mila’s Applied Machine Learning Research Team. It gives tools based on Large Language Models such as ChatGPT a narrower focus by instructing them to only retrieve information from trusted sources. Ultimately, it can be used as a chatbot to answer questions about any topic based on documents pre-selected by a specific organization or individual.

Which problem does Buster seek to address?

Sifting through hundreds of pages of legal, policy, or technical documents is very time-consuming for humans, and traditional keyword-based search can yield unsatisfactory results. Large language Model-based tools like ChatGPT can answer a wide range of questions but are not sufficiently accurate to be used for specialized tasks. Buster seeks to fill that gap by facilitating access to large amounts of data while limiting -but not fully eliminating- ChatGPT’s common pitfalls like hallucinations, or made-up answers.

How does Buster work?

First, a knowledge base has to be prepared by splitting documents into retrievable chunks of data that can then be “fed” to Buster. When a user submits a question in the chat box, the tool analyzes this predetermined dataset, selects the most relevant chunks of data and sends them to ChatGPT, which retrieves and synthesizes the data using natural language.

What are the advantages of using Buster?

Buster allows users to save time by easily accessing information from large databases using natural language. Its scope of action is limited, which reduces the chances of getting an irrelevant answer since the user knows exactly which dataset has been used to provide an output. Finally, Buster always gives the source it used to craft its answers, providing a degree of transparency into its responses.

What are the limitations of Buster?

Buster is limited to the information contained in the knowledge base and can’t answer questions outside of its scope. It is not made to work with excessively broad queries -like summarizing a whole document- or to make comparisons between different documents. Finally, like ChatGPT, there is no guarantee that Buster will not make up answers, so users should always verify the accuracy of its answers by looking at the source material provided.

Buster is also built on top of existing infrastructure (it’s not a stand alone solution) so it will inherit the limitations of those systems as well (i.e. hallucinations, safety features, changes to the model performance over time).