In my last blog, Zika and Big Data, I introduced the concept of applying big data to help address today’s Zika epidemic. Before diving into specifics on this, I’d like to take a step back and share my personal learnings from last year’s Ebola scare, which have helped shape my thoughts on the topic.
To be grounded, though, let me clarify: big data is not a cure, it can’t be injected by a caregiver, and of itself cannot provide the formula for a vaccine to the scientists, the compassion, or to urgent care delivery.
At the height of Ebola, I found myself at the U.S. National Institute of Health (NIH) in Bethesda, Maryland. I was to meet with an influential infectious disease physician who was directing treatment protocol for Ebola around the world. My plan was to describe the great possibilities that increased use of big data could provide to help with Ebola.
The doctor explained he had just written and released the multi-step nursing workflow for donning and doffing protective suits. He wondered if I was there to find out the protocol and provide feedback from the field. When I was mute, he explained that his top priority, right then, was to figure out the minimum number of caregivers needed for a single Ebola patient. (He was close to deciding it was nine, and was open to see if I had any feedback on that.) By the end of the conversation it became clear that at Ground Zero, at the height of a viral epidemic, big data might—like many other powerful vectors—have non-infinite value.
Ground Zero aside, big data is being used or can be used in virtually every part of viral epidemiology today. By eliminating constraints around the volume, variety, or velocity of data that can be captured, processed, and explored, big data technologies — Apache Hadoop in particular — facilitates a wide range of use cases in epidemiology; use cases that require the ability of a data platform to ingest complex data types, in real time, and make that data immediately available for discovery, querying and analysis.
Some examples of epidemiology applications for big data technologies like Hadoop include:
- Understanding molecular properties at a basic and applied level a priori
- Global signal detection as the virus spreads across the world
- Signal detection at the patient level in the developed world
- Population education and patient communication
- Mosquito gene drives using clustered regularly-interspaced short palindromic repeats (CRISPR), as they come online
- Analysis to identify biomarkers in mother and child at the genetic level that point to transmission
- Additional workflows in data collection, discovery, dissemination and communication
In the third and final blog in this series, I’ll circle back to the original topic of Zika and explore in greater detail some of the ways big data technologies like Hadoop could be applied to help identify its cause, treat its victims, and prevent its further spread.