In today’s complex world, we truly only understand a tiny fraction of biology. When performing clinical trials, scientists/researchers must figure out ways to develop methods to effectively represent a whole with a subset or sampling due to costs and time. These methodologies, when done correctly, are all based on statistical methods. This is true for population genetics and patient stratification for clinical trials and studies. It is also true for algorithmic techniques like machine learning, where you test and validate a model by training the model on a statistically “sampled” subset of the total data. One of the essential concepts that will affect all of what you are about to read is data quality and integrity. We have all heard the phrase garbage in equals garbage out. If you start sampling exercises and deal with poor data quality (including consistency) be prepared for a data wrangling exercise and the more significant potential for problems with your trials and patient outcomes.
Let us discuss some of these methods. Stratified Sampling is simply the method of obtaining a representative sample from a population that researchers have divided into similar subpopulations. If you think about organization in general, you are basically grouping by similarities so that you can understand your data in greater detail. In quite simple terms, if you had a simple Random Sampling or Systematic Sampling you do not generally capture the subgroups efficiently and will most likely miss the less populated or rarer subgroups. These methods can be used within subgroups or if you have a homogenous population.
Now let us discuss the terms proportionate and disproportionate sampling. Proportionate means corresponding in size or amount to something else. Proportionate Sampling is the method where you match Sampling in a subgroup to the percentage seen in the total population. So, if a particular Subgroup is 5% of the total population, your sample size would be at 5%, so they are represented equally or proportionately. You are sacrificing detail for a subgroup and enhancing the overall population understanding with proportionate Sampling. Disproportionate Sampling means you ignore the relative numbers and increase sample size based on variability. By including more samples, your level of detail and understanding can go up for each subgroup, but the level of knowledge from a population perspective goes down. Another method is Cluster Sampling, an algorithmic process in which a large population is clustered based on a geographical location but with as diverse as possible subpopulation and similar distribution of critical characteristics. Each Cluster has a unique population but a criteria representation of the entire population. Depending on the application and use case, random sampling within and across clusters may give you good outcomes, but statistical analysis may prove otherwise.
Stratify means arranging, classifying, or putting into strata, Latin for layers. Patient stratification simply means separating patients based on established criteria which may include gender, ethnicity, risk, patient data completeness, disease state, socio-economic condition, employment, education, etc. Patients are stratified in clinical trials so that you can remove or manage bias and maximize treatment prognosis and responsiveness and ensure each stratum or subgroup of patients has equal exposure or allocation to experimental treatments. Your strata are a subset or subgroup of a total patient population, and the data for each subgroup sampling statistically represents that subset of your entire population. When subgroups are added together, a “total” population can be well represented.
Risk Stratification is a multi-step process for identifying and assigning risk factors to patients based on health, disease state, lifestyle, and other ancillary factors risk. Think of it like triaging used to assess injured patients for emergency surgery. Patients are assigned scores based on their medical history and status. Algorithms can rank or put patients in subgroups based on their scores. Clinicians can then manage their patients based on their perceived level of care and associated risks and spend more time with patients at higher. This stratified approach in patient care management is more effective for the overall patient population.
Patient stratification is critical in today’s clinical trial design and costly clinical trials. It can prevent unnecessary patient hardships and health risks and salvage or prevent clinical trial failure or delays.
Today’s patient stratification can be significantly enhanced by Data Science, Artificial Intelligence, and Machine Learning. Like many other Life Science domains, the Clinical environment is still suffering from Data Integrity and unFAIR (Findable, Accessible, Interoperable, and Reusable) data and process issues. Clinical imaging is one of these great tools used in stratification today. Combining data and the machine learning of imaging equates to a potent tool for clinicians and trial designers. Its value will grow significantly as technology and usage increase. To summarize, data-driven approaches that can use patient history from electronic health records (EHR), Real-World Data (RWD), and Real-World Evidence (RWE), combined with diagnostics and screening techniques which include imaging optimized with machine learning and artificial intelligence will drive better stratification methods and techniques and result in better outcomes for patients.
Drug discovery and development are the ultimate team sport! Unfortunately, it’s a costly endeavor that fails much more often than it succeeds. One of the ultimate tests is, does the drug work? In other words, is the drug efficacious, or did it work as intended? Clinical studies are very expensive, so if you find out in a Phase II study that the investigational drug has little or no efficacy for your target disease it can usually be attributed to an insufficient mode of action and the drug does not alter the disease state (25-35%), unexpected safety/toxicity and or off-target effects (40-50%), and limited impact and commercial feasibility or viable comes into question (15-20%).
Patient selection can have a significant impact on these outcomes. With proper patient stratification that drives diversity and heterogeneity, some of these missteps can be prevented. Drug development organizations want to know beforehand if a drug or therapy is generically safe for a whole population versus a precision medicine that works for only subgroups of a population. With this information or understanding of the translational gap, trial outcomes should improve. This is where translational science and medicine come into play. These scientists and clinicians are trying very hard to make these connections in late research, preclinical, and early development to improve outcomes for patients and design better and safer clinical trials. Because of this approach and the need to perform more Multi-Omics, Microbiome, and advanced experimentation and testing, researchers will rely on patient stratification in their internal and external biobanks. [Further reading: Integrating AI/ML Models for Patient Stratification Leveraging Omics Dataset and Clinical Biomarkers from COVID-19 Patients: A Promising Approach to Personalized Medicine]
One of their biggest challenges is data integrity, including consent, and the concept of FAIR (Findable, Accessible, Interoperable, Reusable) data. Finding the data, the ability to access the data, use the data with other data and analysis techniques, and then ultimately reusing the data in things like predictive science shouldn’t be a nice to have but a must-have. This scenario is real and unfolding in biopharma, biotech, and CROs around the globe, and those that understand to need to have a highly functional informatics environment will come out on top.
If you’re interested in learning more about the transformative insights VeriSIM Life can provide to stratify patients for better clinical trial results and deliver precision medicine, read our solution brief on Patient Stratification.