Healthcare is perhaps more affected by the rapid proliferation of data than any other industry. With the HITECH and HIPAA regulations and high-visibility data breaches driving more focus on security, retention, and privacy of personal health information (PHI), the latent demand for a scalable data strategy has become a primary driver of industry decision-making. Rapid innovation, spearheaded by new technology and data-driven leadership, is upending the business of health. This brings analytics into the spotlight for a movement dedicated to efficiency, accessibility, and, most importantly, outcomes.
The past decade has seen a proliferation of analytic offerings related to population health and medical home, disease-specific and cohort-specific outcomes, and the accountable care organization (ACO). Underlying all these new solutions is the challenge and opportunity of massive data growth. The potential payoff is huge: when collected, combined with public and historic information, and made secure, big data offers providers, payers, pharma, device manufacturers, and solutions developers deeper insights than previously imaginable.
Hadoop and the Whole Genome
In the healthcare domain, whole genome research is among the most visible cases of the symbiotic relationship between big data and better outcomes for all. The industry already relies on limited genetic capabilities to help researchers, clinicians, and investigators understand diseases, diagnose patients, and develop targeted drug mixes. However, unlike legacy strategies such as genetic profiling and exome sequencing, a view of the complete DNA sequence potentially enables personalized and predictive medicine for the first time, in addition to driving a fuller understanding of the origin, nature, and proper treatment of many more diseases, toxins, and common ailments. Whether for more thorough clinical genetics, identification of individual biomarkers, or more general translational research, whole genome analytics is key to clinical improvement at hospitals.
Historically, genomic sequencing and analysis of data on hundreds of drugs, thousands of individuals, and countless genetic variations has been prohibitively complex and expensive for health systems. Where research hospitals could afford high-performance computing (HPC), the massive data in a complete sequence had to be moved to memory and brought to compute, adding a non-trivial time and cost requirement. However, as part of an enterprise data hub, Hadoop can help analyze an entire genome in a reported hour or less (compared to weeks in the past), accompanied by a 99% accuracy rate at a cost in the hundreds of dollars—a thousand-fold improvement in expense and throughput.
Leveraging Apache Spark, Impala, Apache Parquet, and Apache Avro with HDFS’s unparalleled storage scalability as part of an enterprise data hub, Cloudera can help governments, pharmaceutical companies, health systems, and research organizations centrally feed both cold storage and high performance computing on data of any origin, age, and format to deliver whole genome analytics as part of a single workstream for the first time.
Find Us at HIMSS 2015
Visit Cloudera next week in Chicago at the HIMSS conference in the Clinical & Business Knowledge Center at booth 5484, kiosk 19 to learn how an enterprise data hub drives innovation at Cerner, Children’s Healthcare of Atlanta, Explorys, Kaiser Permanente, RelayHealth, and Premier. Find out how Hadoop is transforming the industry by:
- Enabling clinical genomics through analysis of thousands of whole genomes
- Processing data streams in real time, such as HL7 and biomonitor data from the bedside
- Turning unstructured clinical text into insights through natural language processing
Attend our talks to go deeper into the big data breakthroughs that customers are achieving from better insights:
- Better Care with Big Data with Kaiser Permanente, Cerner, and Children’s Healthcare of Atlanta
- Using Big Data to Improve Outcomes in the NICU with Children’s Healthcare of Atlanta