Real World, Big Data

There are now important opportunities for academia and certain industries to harness the potential of large volumes of routinely collected electronic health records (Real World Data – RWD) for research across the life sciences to a clinical research spectrum.

This article summarizes a recently published literature review to assess the extent to which large-scale (“big”) RWD is already being utilized for drug development, including biomarker discovery or validation, understanding disease courses and associations, stratifying populations to profile diseases or to target therapies more precisely, and for drug safety monitoring.

Epidemiology and public health records have been using large collections of health data (e.g. disease registries, cohort datasets, and pooled EHR data such as GPRD) for decades. However, the value of large health data sets for drug development is perceived as a much-heralded new opportunity, triggering this review to ascertain the present extent of this focus.

The authors were challenged to find a formal definition of “big” data for clinical research and opted to set a threshold of one million patient records being the required minimum analysis data set size to qualify for inclusion in this review. They have made clear in the paper that this figure was somewhat arbitrary and have invited the community to further debate what scale of data set should be considered “big data” in health.

Their literature search, which started off with finding 534 publications to manually screen, was narrowed down to 20 that actually reported new empirical research. Most of the other publications, which were rejected, described potential opportunity, or described data sets that could be used for such research but had not yet been used, or presented methodologies or tools that could be used to conduct research on a large scale. These 20 selected studies mainly reported public health or health services research undertaken by academic or public bodies or by health insurers, but which could be used as knowledge for drug development even if the authors had not themselves foreseen that usage. Although going back further, the authors found that the relevant publications were all from the last five years, especially the last three, and their distribution suggests that this is a rapidly growing area of research.

Some examples of the findings reported in the full paper are:

  • A Bayesian model to develop a more effective map of molecular interactions of cancer outcomes;
  • Analyzing the records of 25 million patients with periodontal disease to find that they were more likely than the general population to have rheumatoid arthritis;
  • Analyzing the records of 27 million patients to profile the individual risk factors following knee arthroplasty;
  • 2.8 million data points examined from the real-world pragmatic use of ranibizumab to treat age-related macular degeneration;
  • The records of a million patients, comprising 9 million care notes, analyzed for statistically significant drug safety signals.

The author’s findings suggest that big health data has successfully been used to improve understanding of diagnostic and therapeutic pathways and to discover new drug safety signals. Big data specifically offers the opportunity to study rare diseases and early incidents of a disease that have not so far proved detectable. These studies demonstrate the value of using RWD to uncover new disease and treatment insights in rapid, low-cost, and non-invasive ways.

The authors conclude that the field is still at an early stage of using big data for research in drug development. They also note the possibility that some industry-sponsored or industry-conducted research might not have been published in the academic literature, and might, therefore, have been missed.

They call upon the field to define more clearly what is meant by big data in health, and also to consider what quality and detection thresholds should quality as Real World Evidence from the signals derived from big data, such as effect size, number needed to treat, sample size, etc. They also call more specifically to the pharma industry to itself invest in more pre-competitive collaborations on best practices and methodologies to harness the potential of big health data.

These findings are described in the article entitled, Real world big data for clinical research and drug development, recently published in the journal Drug Discovery TodayThis work was conducted by Gurparkash Singh, Nigel Hughes, and Bart Vannieuwenhuyse from Janssen Research and DevelopmentDuane Schulthess from Vital Transformation, and Dipak Kalra from the University of Ghent.

About The Author

Dipak Kalra

"Professor Dipak Kalra is President of the European Institute for Health Records and of the European Institute for Innovation through Health Data. He undertakes international research and standards development, and advises on adoption strategies, relating to Electronic Health Records."

Speak Your Mind!


Freshwater Beneath The Sea: Investigating A Hidden Resource

Groundwater beneath the seafloor in many of the world’s continental shelves has salinities well below that of seawater. Increasingly, there is pressure to tap into these subsea freshwater reserves. There are two potential sources of fresh and brackish groundwater in subsea aquifers: (1) entrapped paleo-freshwater emplaced during glacial maxima, when sea levels were lower and […]

Using Olefin Metathesis To Produce Macrocyclic Products

Molecules containing twelve or more atoms within at least one large ring are called macrocyclic compounds or macrocycles. At the beginning of the 20th century, the synthesis or even existence of macrocycles was questioned by many chemists. One of the pioneers, a Croatian-Swiss scientist Leopold Ružička[1] in his Nobel lecture stated: “I was hindered (…) by the general […]

Applying Simple Techniques To Develop Smarter Green Roof Substrates

Urban sprawl and dense urbanization are creating problems with how we manage our stormwater.  Large areas of impermeable surfaces (i.e. paved roadways, paths, buildings, and loss of green spaces) combined with more intense and frequent storms that are expected to occur in the coming decades, has forced city developers and landscape architects to seek new […]

Potential Therapeutic Relevance For TIM-3 In Breast Cancer Patients

Since the first FDA approval of an immune checkpoint blockade (ICB) therapy in 2011, immunotherapy has increasingly become a standard-of-care treatment for cancer. ICB can work exceedingly well in a subset of patients, but disparities in response rates exist, particularly between tumor types (1). The other major factor predicting response to ICB is the extent […]

How Science Helped Detect The Powerful Explosive TATP In Unidentified Objects

Triacetone triperoxide (TATP) is a powerful explosive without military use because it is very sensitive to mechanical shock and so very difficult to safely handling, a reason for which terrorists dubbed TATP “the Mother of Satan”. TATP is easily prepared from acetone and hydrogen peroxide under acidic catalysis, being a home-made explosive almost undetectable by […]

Recommendations For Installing VLS-PVs In Saudi Arabia

Very large photovoltaic systems (VLS-PV) are considered to be the best existing solutions in desert regions to solve environmental and energy security problems. Saudi Arabia has already set itself the task of installing 16 GW of solar energy by 2032. To implement this plan, the government has decided to invest more than $109 billion. It […]

Is CCl4 (Carbon Tetrachloride) Polar Or Nonpolar?

Carbon Tetrachloride can be expressed as CCl4, and it is made out of one carbon molecule and four chloride molecules. Carbon tetrachloride is nonpolar. Why is carbon tetrachloride nonpolar? It is nonpolar because the dipole moments of the molecule are evenly spaced around the central carbon atom. This means that their individual effects are canceled […]