In recent years, demand for the high-quality data used to train and test artificial intelligence (AI) models has exploded. However, an array of regulatory requirements and privacy rules are making that data more complex, expensive and time-consuming to gather.
As a result, interest in using synthetic data to power AI models has intensified across many sectors, especially in those that are highly-regulated or which require harder to obtain real-world data (RWD). In drug development, researchers are investigating the viability of synthetic data as an alternative to RWD in some of its AI-informed approaches. Let’s take a closer look at how the use of synthetic data is impacting the drug development landscape.
In contrast to RWD (which is data collected from electronic health records, medical claims data, product/disease registries, or other real world sources), synthetic data is information that's artificially generated by computer algorithms or other statistical methods to simulate real-world data. Synthetic datasets can include numerical, binary, categorical or unstructured data. These synthetic data sets can then be used in lieu of real-world data sets in order to train or validate machine learning (ML) models.
Synthetic data can be created in a variety of ways, including: random selection from a distribution, agent-based modeling, or via AI-supported generative models.
Synthetic data is still considered an emerging field of science, which means many of the methodologies used to create synthetic data sets are still being actively developed and tested.
There are a range of important benefits offered by the use of synthetic data.
There are many real and potential applications for synthetic data that relate to healthcare and drug discovery specifically. Especially because health data is so strictly regulated, synthetic data gives researchers a way to obtain vital information without accessing actual patient data records. This is different from data masking techniques, which still present a range of privacy-related complications.
Some possible applications for synthetic data use in drug development include:
The broader role synthetic data will play in drug development remains to be seen, though many experts agree its use in AI-informed approaches is likely to increase exponentially. Gartner has estimated that by 2030, synthetic data will overtake actual data in training AI models. When it comes to widespread adoption, researchers will have to work alongside regulators and policymakers to develop and adapt clinical-quality measures and evaluation metrics for synthetic data and its practical use.
VeriSIM Life has developed its own sophisticated computational platform that leverages advanced AI and ML techniques to improve drug discovery and development by greatly reducing the time and money it takes to bring a drug to market. Contact us to learn more about BIOiSIM™ and how our AI-enabled platform helps de-risk R&D decisions.