Big Data Solution to Harnessing Unstructured Data in Healthcare

Categories: Enterprise Data Hub General Partners

Today, I have the honor to host Murali Nagendranath, Strategy Officer, Wired Informatics, in this blog on how to use unstructured data to gain insights from health data.


In order for healthcare professionals to make effective clinical decisions and to offer the best possible treatment and care protocols, comprehensive understanding of the patient characteristics as indicated in their records is an absolute must. This longitudinal and comprehensive understanding of the patient’s clinical history will provide healthcare professionals the necessary ammunition to proactively treat patients by surfacing high probability triggers that are hidden as comorbidities, symptoms and as part of the genetic predisposition within the family history.

Today, a major portion of patient clinical observations, including radiology reports, operative notes, and discharge summaries are recorded as narrative text (dictated and transcribed, or directly entered into the system by care providers). In some systems even laboratory and medication records are only available as part of the physician’s notes, this is largely because the nature of conversation between the provider and patient tends to be personalized and does not lend itself well to a “click” model of data capture. The volume of unstructured clinical data is set to grow over the next decade as more and more mobile transcription solutions are rolled out along with speech to text translation.

The generated clinical narrative, unlike most other documented scripts tends to be complex both structurally and semantically as it contains a lot of domain specific information and more fragmented phrases. For ex: “Negative for Sepsis” is a very common phrase used within clinical narratives. In regular text this would have been written as “The patient was negative for Sepsis”, making it easier to understand the context and the meaning.

This kind of structure makes clinical text processing complex and interesting at the same time. The knowledge extraction in clinical domain tends to be a complex offshoot of the more plain vanilla Natural Language Processing. With clinical text parsing, NLP needs to be pumped on steroids to be able to identify not only the rich knowledge attributes such as medications, procedures, symptoms, family history etc. but also resolve co-references and severity associated with attributes thereby providing the complete context. Once the extraction is complete the extracted elements are normalized to industry standard vocabularies for universal representation. In healthcare these vocabularies are called ‘Ontologies’ and they provide a universally accepted way of semantically describing medical terms and their groupings. A few of the common Ontologies that are popular in commercial use are ICD 9, ICD 10 and SNOMED. The National Library of Medicine publishes these ontologies and makes them available so that clinical text parsing can be implemented with semantic standards.  This kind of ‘variety’ in data, needed together, calls for the Cloudera platform.

Trying to use Search alone against the clinical notes is not enough.  Based on the information presented above, unlike a pure “Search” functionality that can only identify documents with the contained text string, clinical text processing goes above and beyond in turning the corpus into a semantically searchable and minable corpus. Search’s role is as a value added offering on top of the processed data. Further, the search set is not restricted to just text strings but could be a combination of strings and concept codes defined in ontologies. The results are incredibly powerful and semantically complete for decision-making.

In the past, clinical text parsing has been a slow process because of older regular expression based approaches that required significant processing times to extract and normalize attributes. With the advancements in NLP and ML, combined with the big data stack, clinical text parsing can now be configured to operate in near real time thereby processing incoming text at the source and enriching the decision centric databases within the healthcare system.

Enter, Cloudera & Wired Informatics. A powerful combination of big data stack and clinical NLP offering that enables organizations to successfully deploy and manage their data hubs in an efficient manner. By normalizing unstructured narratives in near real time prior to moving in to the data hubs, organizations are able to source all their application data needs from a single source in a consistent and cohesive fashion. Analogous to financial institutions need for generating VAR on their portfolios in real time, healthcare institutions can generate precise segmentation and risk classification in half the cost and time. The normalized longitudinal data derived on individual patients further allows care delivery in a precise and consistent manner with emphasis on higher quality. Results:

  • Healthcare institutions can perform granular reporting and advanced segmentation across population and longitudinally monitor the patient care and health.
  • Healthcare institutions can create population risk and health analytics at half the cost and time to traditional warehousing alternates.
  • Healthcare institutions can improve their revenue cycles by over 30% leveraging stronger documentation and assisted coding.
  • Healthcare institutions can reduce documentation errors and incorrect procedures by enabling a more rigorous automated approach and reduce expenses by 20%.

This Big Data NLP stack promises to meet all the current and growing pains in managing and leveraging healthcare data across several clinical and operational areas within the ecosystem and must be a strong consideration for organizations embarking on the transformation.


Murali Nagendranath, is the Co-founder and Strategy Officer for Wired Informatics, a clinical NLP company that helps Healthcare organizations extract knowledge from unstructured clinical data. In his role, he leads all efforts related to Business Strategy, Sales & Marketing for the company and helps his clients manage and harness their unstructured data in an enterprise friendly way.


One response on “Big Data Solution to Harnessing Unstructured Data in Healthcare

Leave a Reply