Real World, Big Data

There are now important opportunities for academia and certain industries to harness the potential of large volumes of routinely collected electronic health records (Real World Data – RWD) for research across the life sciences to a clinical research spectrum.

This article summarizes a recently published literature review to assess the extent to which large-scale (“big”) RWD is already being utilized for drug development, including biomarker discovery or validation, understanding disease courses and associations, stratifying populations to profile diseases or to target therapies more precisely, and for drug safety monitoring.

Epidemiology and public health records have been using large collections of health data (e.g. disease registries, cohort datasets, and pooled EHR data such as GPRD) for decades. However, the value of large health data sets for drug development is perceived as a much-heralded new opportunity, triggering this review to ascertain the present extent of this focus.

The authors were challenged to find a formal definition of “big” data for clinical research and opted to set a threshold of one million patient records being the required minimum analysis data set size to qualify for inclusion in this review. They have made clear in the paper that this figure was somewhat arbitrary and have invited the community to further debate what scale of data set should be considered “big data” in health.

Their literature search, which started off with finding 534 publications to manually screen, was narrowed down to 20 that actually reported new empirical research. Most of the other publications, which were rejected, described potential opportunity, or described data sets that could be used for such research but had not yet been used, or presented methodologies or tools that could be used to conduct research on a large scale. These 20 selected studies mainly reported public health or health services research undertaken by academic or public bodies or by health insurers, but which could be used as knowledge for drug development even if the authors had not themselves foreseen that usage. Although going back further, the authors found that the relevant publications were all from the last five years, especially the last three, and their distribution suggests that this is a rapidly growing area of research.

Some examples of the findings reported in the full paper are:

  • A Bayesian model to develop a more effective map of molecular interactions of cancer outcomes;
  • Analyzing the records of 25 million patients with periodontal disease to find that they were more likely than the general population to have rheumatoid arthritis;
  • Analyzing the records of 27 million patients to profile the individual risk factors following knee arthroplasty;
  • 2.8 million data points examined from the real-world pragmatic use of ranibizumab to treat age-related macular degeneration;
  • The records of a million patients, comprising 9 million care notes, analyzed for statistically significant drug safety signals.

The author’s findings suggest that big health data has successfully been used to improve understanding of diagnostic and therapeutic pathways and to discover new drug safety signals. Big data specifically offers the opportunity to study rare diseases and early incidents of a disease that have not so far proved detectable. These studies demonstrate the value of using RWD to uncover new disease and treatment insights in rapid, low-cost, and non-invasive ways.

The authors conclude that the field is still at an early stage of using big data for research in drug development. They also note the possibility that some industry-sponsored or industry-conducted research might not have been published in the academic literature, and might, therefore, have been missed.

They call upon the field to define more clearly what is meant by big data in health, and also to consider what quality and detection thresholds should quality as Real World Evidence from the signals derived from big data, such as effect size, number needed to treat, sample size, etc. They also call more specifically to the pharma industry to itself invest in more pre-competitive collaborations on best practices and methodologies to harness the potential of big health data.

These findings are described in the article entitled, Real world big data for clinical research and drug development, recently published in the journal Drug Discovery TodayThis work was conducted by Gurparkash Singh, Nigel Hughes, and Bart Vannieuwenhuyse from Janssen Research and DevelopmentDuane Schulthess from Vital Transformation, and Dipak Kalra from the University of Ghent.