The paper, named “Data science opportunities of large-language models for neuroscience and biomedicine," seeks to bridge the gap between general LLM-based AI tools like ChatGPT and scientific research.
Danilo Bzdok, also Associate Professor at McGill University’s Department of Biomedical Engineering and lead author on the paper, explained that these models designed by the machine learning community do not necessarily work out of the box in other domains.
“One big question that we address is: how can we re-express neuroscientific data in a way that it resembles sequence information, which is the form of information that the large language models prefer to ingest?”
“In other words, how can we take sources of information from neuroscience, but shape them in a way that they're a natural starting point for a large language model?”
Breaking down silos
LLMs-based tools, he argued, could play a pivotal role in the world of scientific research by breaking down silos and fostering interdisciplinary dialogue and collaborations.
He used the example of researchers studying different aspects of complex diseases like Alzheimer’s who are specialists in their own field but don’t have a way to easily obtain and use data from other fields.
“The geneticists who study Alzheimer's don't necessarily know the epidemiologists who study Alzheimer's and they, in turn, don't really know the physicians who do randomized clinical trials about what treatments to give, what interventions to do in the clinical practice every day.”
In this scenario, AI-based tools could be trained on massive amounts of data from all these disciplines in a much shorter timeframe than if a human had to read all the papers at once and collate the data.
Though hallucinations -or confidently making false statements- are a common flaw of large language models, Danilo Bzdok says that using AI would benefit the scientific community nonetheless.
“Even if you have to double check the final result by a panel of experts because it touches on different disciplines that don't usually talk to each other, you still have a huge gain in productivity and time.”
Pursuing novel research directions
According to him, LLMs could even point to new research directions and hypotheses by having access to large amounts of data from many different fields studying a specific issue from various angles.
In that sense, the paper seeks to spark interest in the use of LLMs for scientific research and help prioritize projects that would have the highest likelihood of benefiting from using these tools.
"We want to discuss and animate discussions around what directions of research -that were very much impossible before the use of this technology- we can embark on confidently.”
According to him, scientists should embrace the growing potential of LLMs to stay on top of an ever-growing research environment.
“We would be missing out on a huge opportunity if we don't try to make the best use of this potential in neuroscience and health as well because the human brain and human health are some of the most complicated topics in general.”
“We need to stay open to the possibility that the human mind alone may be insufficient to solve at least some scientific challenges. And this is where there could be something like a partnership between Large Language Models and the scientist,” he concluded.