The 2024 Nobel Prize Winners in Chemistry and Physics Mark an Important Shift Toward Using AI in Scientific Research, including Drug Discovery
In awarding two of the 2024 Nobel Prizes, one for Chemistry and the other for Physiology or Medicine, to scientific researchers focused on advancing the use of artificial intelligence, the Nobel Committee is signaling that AI is becoming a central mainstream pillar in advanced scientific research.
In Physics, the 2024 Nobel Prize was awarded to John Hopfield and Geoffrey Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
In Chemistry, the 2024 Nobel Prize was awarded to researchers seeking to predict how proteins fold, a topic that falls within the growing field of structural biology.
David Baker explains his Nobel Prize-winning research on protein design at the Institute for Protein Design.
Half of the Nobel Award in Chemistry went to David Baker, who developed the Rosetta software system for computational protein design. The second half of the award went to Demis Hassabis and John M. Jumper for their work in developing the AI-based AlphaFold 2 for protein structure prediction.
AlphaFold, the AI-Based Protein Folding Prediction Champ, Goes Limited Open Source
By the time the AI-based AlphaFold 2 system was released, it had already emerged as a clear-cut winner in the contest for protein structure prediction held by CASP (Critical Assessment of Structure Prediction.)
AlphaFold (and competing toolsets) are now able to unlock some of the key secrets in protein reproduction. For example, we now have a much clearer idea of how amino acids can encode the final 3D shape of a protein. We also have a better understanding of the individual steps required to fold and unfold the complex proteins. Finally, we have a basis for computationally predicting the 3D structure of any existing or yet-to-be-invented proteins – all of which are incredibly useful tools for new drug discovery and development.
One reason this knowledge is very important is that many human diseases are often directly associated with proteins folding incorrectly, such as sickle cell anemia and potentially Alzheimer’s. Understanding these protein folding errors could help drug developers design new clinical therapies against specific genetic targets, creating new, more effective treatments.
There was great excitement in the Spring of 2024 when Nature published an article about AlphaFold3, the latest iteration, which adds the ability to model proteins within the context of other molecules, an extremely useful feature for researchers hoping to study protein interactions with potential drug compounds.
However, the greater scientific community quickly became frustrated because the publication, apparently at the AI software initiative’s parent company Google’s behest, omitted the all-important source code for AlphaFold3.
Google has now reversed course, and AlphaFold3 is now open source, but they have limited its use to non-commercial applications. Furthermore, access to the important training weights is available only by request and limited to those with academic affiliations.
This restriction puts commercial drug discovery research in a bind.
In response, several research teams, including two Chinese organizations, Baidu and ByteDance (owner of the “TikTok” social media platform) are developing their weighted versions. Meanwhile, Mohammed AlQuraishi, a computational biologist at Columbia University, is hoping to release an open version with its training weights, dubbed OpenFold3, which would allow commercial drug companies to use their proprietary data without running afoul of Google’s access restrictions.
How Does the Pharma Industry Plan to Use AI for Drug Discovery and Development?
AI-based drug discovery and development techniques are poised to revolutionize the pharma industry.
Large investments are pouring into creating AI-based drug discovery and development software platforms that can:
- Identify suitable disease “targets” that would benefit from developing new drugs, including the ability to parse and interpret comprehensive patient disease datasets (known as omics).
- Generate a large number of new and novel compounds (such as chemicals or proteins) that could target specific disease mechanisms
- Evaluate and optimize the clinical efficacy of new drug candidates
Here are some of the leading early entrants into the drug development platform race:
· RFDiffusion, a Platform Developed at the David Baker Lab
RFDiffusion is an AI-based platform developed by members of the lab run by David Baker (co-winner of the 2024 Chemistry Nobel Prize, as mentioned above) at the University of Washington in Seattle. RFDiffusion, as the name implies, uses generative AI diffusion techniques to create novel chemical or protein drug candidates based on requirements suggested by AI prompts. (doi 10.1038/d41586-023-02227-y and doi 10.1038/d41586-022-02947-7)
· Isomorphic Labs, a new Google/DeepMind Spinoff
DeepMind (which developed AlphaFold) has created a new, separate entity called Isomorphic Labs that will leverage its AI tools, including developing AlphaFold into a protein design tool for drug discovery. Pharma giants Eli Lilly and Novartis have reportedly already inked partnerships with Isomorphic Labs.
· Chan Zuckerberg Initiative (CZI), Building an AI-based Virtual Cell Platform
The Chan Zuckerberg Initiative has announced it plans to build one of the world’s largest AI clusters to host biomedical Large Language Models (LLMs) at scale, providing nonprofit life science researchers access to the next generation of tools, what CZI is calling “Virtual Cells with Artificial Intelligence.”
· Evolutionary Scale, Founded by Ex-Meta Researcher
Alexander Rives, a former scientist at Meta, founded Evolutionary Scale in New York City and has received $142 million in new seed funding. Their software platform is a protein language model, dubbed ESM3, which has already demonstrated its prowess by creating a new green fluorescent protein (GFP) for use as a marker in the laboratory. (doi 10.1038/d41586-024-02214-x)
Other notable AI-based drug discovery initiatives include the UK-based Healx, which recently raised another $47 million; the PINNACLE initiative at Harvard Medical School, which seeks to better understand how proteins interact with surrounding cells; and Inceptive, an AI-based drug discovery startup founded by Jakob Uszkoreit, an ex-Google employee famous as a co-author of the seminal AI paper “Attention is All You Need” that laid the groundwork for today’s modern LLM transformers.
The pharma industry is eager to take advantage of this new technology.
According to research provided by SciLife, the market for artificial intelligence applications in the pharmaceutical industry tallied up to just over $3 billion in 2024 and will increase to just over $18 billion by 2029 (a CAGR of 42.68%). They also report that pharma companies hope to shorten drug discovery and development timeframes from 5 years down to one.
Are there Limitations to AI-Based Drug Discovery and Development?
According to researchers at Boston Consulting Group (doi 10.1016/j.drudis.2024.104009), there have been about 75 AI-discovered drug candidate molecules that have already reached the clinical testing phase, with oncology drugs highly represented.
But are there limitations to using AI-based tools as the primary method for discovering new drugs?
There are several challenges.
The first is as old as computer science itself: “Garbage in, Garbage out.”
It turns out there isn’t nearly enough reliable, accurate data to power these AI models. More work is needed in the area because the success of the underlying Large Language Models (LLMs) is closely tied to the amount of accurate training data available to them. (doi 10.1038/d41586-023-02896-9)
Without sufficient data, LLMs can sometimes output information that is biased or outright wrong, a phenomenon known as hallucinations.
Another area of concern is data standardization. This takes many forms, but the data that is collected is often captured in highly variant forms – creating data integration and compilation headaches that mimic the difficulties of integrating patient health records stored in different medical record software formats. Privacy rules also shield a lot of data from researchers, whether it’s due to HIPAA patient privacy rules in the US or the stringent GDPR rules in the EU, which oftentimes precludes the simple collation of data from as many sources as possible.
Another issue that can cause concern is that we may find that, in many cases, we understand less about the underlying mechanisms of drugs developed by AI-based systems compared to drugs developed by the slower, more painstaking traditional “human-based” research methods.
But is this a problem if a drug is proven to work effectively despite not knowing its mechanism of action?
One could argue that aspirin was prescribed for over 70 years before its mechanism of pain relief was finally discovered by Sir John Robert Vane in 1971 (for which he shared the 1982 Nobel Prize in Physiology or Medicine.)
There are other technical challenges facing AI-based discovery tools.
In an article published in Nature, Sara Reardon has identified some key areas that need more work. Once is teaching AI systems to be better at fine-tuning protein binders that help attach drug molecules to specific sites. A second issue is improving AI’s ability to design catalyst enzymes – which are tricky because their physical structure (which models like AlphaFold focus on) is not always indicative of their function. Changing environmental conditions is another potentially difficult area for AI drug discovery; changes in conditions such as temperature or pH can temporarily affect a protein’s structure, known as its “confirmation.” Complex combinations are another challenging area for AI-based drug discovery – think of combining different proteins or compounds into little drug delivery machines – this is proving hard for current generation AI models. Finally, AI-based drug discovery systems suffer from a lack of timely feedback to help them train to do better. Testing each drug candidate takes time in the physical world, years even, which is achingly slow when you are trying to improve an AI system.
Could AI-Based Drug Discovery and Development Create Dangerous Risks?
The final concern about AI-based drug discovery and development is the very real possibility it could be potentially used to create highly dangerous synthetic pathogens just as easily as life-saving medicines – either on purpose (e.g. weaponization) or by accident. (doi 10.1038/d41586-024-00699-0)
This potential danger echoes the existing concerns about “gain of function” research in general, in which, for example, researchers experiment with gene editing tools (such as CRISPR-CAS9) to make and study variants of viruses (for example) to determine how likely (or unlikely) they might evolve in nature to become a dangerous pathogen.
Formaspace is Your Laboratory Research Partner
Evolving Workspaces. It’s in our DNA.
Talk to your Formaspace Sales Representative or Strategic Dealer Partner today to learn more about how we can work together to make your next construction project or remodel a success.