I am always excited when I hear how Apache Hadoop is being used as a tool to improve health and quality of life. Since joining Cloudera I have seen Hadoop used to reduce sepsis, understand Parkinson’s disease, combat children sex trafficking, and fight ebola among many other great causes. When Cloudera Cares, an employee lead philanthropic group within Cloudera, approached me with the idea to run Data for Good Hackathon at Cloudera’s Austin office, I was just thrilled. After considering many challenges where we can apply data science for social good, I came across an article on how the University of Texas’ DIY Diagnostics team was working on method to detect and reduce mosquito transmitted diseases, including the Zika virus. I quickly reached out to the group and decided that would be the subject of our Hackathon.
The University of Texas already runs large Apache Hadoop clusters for different types of research and just launched its Wrangler system at the Texas Advanced Computing Center (TACC). When the TACC department heard about the Hackathon they quickly jumped on board, it just all came together students, faculty, compute resources, research and industry collaboration, all for a social cause.
The Data Science Hackathon to combat mosquito-transmitted diseases was held on May 15th at Cloudera’s Austin office with sponsorships from Intersys Consulting. We had a great turnout of over 50 socially conscious citizens from UT, local startups, and even someone all the way from Brazil that was in Austin for OSCON who came prepared with Brazil specific research.
We learned more about the Aedes mosquito and the spreading cases of Zika virus, scraping data of the CDC, WHO, and ECDC, visualizing data to present the quick progression of cases in the past weeks across Central America.
One participant presented a method to detect stagnate bodies of water using a TensorFlow model on a 28GB collection of high definition aerial photography of Austin, TX. By training the TensorFlow model, a technology that can run on Apache Spark to differentiate a green, brown bodies of water vs. a clear pool or a stream, the system can help find these prime mosquito breeding grounds.
Others worked on a data collection mobile app which allows people to quickly and easily report potential Zika cases, and symptoms on their mobile device. The app automatically collects the geo location of the report, and when combined with the UT DIY mosquito virus detection kit, it can help collect information on where infected mosquitoes may be traveling.
The attendees learned from one of Cloudera’s resident data scientists, Juliet Hougland, the different sides of Data Science. Juliet also demonstrated the power of Jupyter notebooks and libraries to parse and cleanse data sets from excel spreadsheet and transform into well formed data. Data that was then loaded, aggregated and analyzed in RStudio. The “data plumbing” side of Data Science.
The goal of the event was not to find “the cure for Zika,” but to build awareness around the disease, and to highlight the challenges to find and create data sets for research and the socialization of open data sets for social good.
TACC’s Wrangler supercomputer with 120 Intel Haswell-based servers, 3,000 processor cores and a total 10PB of storage is just eager to take on this and new big data problems.
The idea is that this Hackathon is not a one time event, but a platform that can be reused for future research, data sharing and continual collaboration between industry, academia, and data citizens.
Get involved and learn more about the current state of Zika virus at the CDC, WHO and ECDC
Checkout the Zika research from the DIY Diagnostics team at UT Austin
Learn about University of Texas’ involvement in Cloudera’s Academic Partnership (CAP) program
Read other blogs from Cloudera:
Zika and Big Data
Learning from Ebola — How Big Data Can Be Applied to Viral Epidemiology
Applying Big Data to Help Solve Zika
Cloudera’s Precision Medicine Program