With the third baby born in the US with microcephaly related to Zika virus, the disease is again making headlines. Researchers are working diligently to find prevention and treatment methods, and as is for any research, collaboration and data sharing are critical to this process. I learned recently at a Zika Hackathon, hosted by Cloudera Cares, public data sets related to Zika are hard to find. Hackers have resorted to developing code to scrape CDC, WHO and other websites to collect information for research when public data is not available. Open data sets for collaboration and research is something that DJ Patil US Chief Data Scientist has long been driving and we have seen results of this on www.data.gov. But what if we not only have open data, but systems that can store terabytes of data ready for researchers to collaborate with compute just waiting to accelerate analysis? This last point is a goal for the Texas Advanced Computing Center at the University of Austin.
On June 2nd the Texas Advanced Computing Center inaugurated its Advanced Computing Building and announced a $30 million award from the National Science Foundation (NSF) to build Stampede 2, a supercomputer that will surpass the performance of the current Stampede system, doubling the peak performance, memory, storage capacity, and bandwidth. The processors in the system will include a mix of upcoming Intel® Xeon Phi™ Processors, codenamed “Knights Landing,” and future-generation Intel® Xeon® processors, connected by Intel® Omni-Path Architecture. The last phase of the system will include integration of the upcoming 3D XPoint non-volatile memory technology.
The opportunities that an environment like Stampede 2 can unlock for research and science are exciting. We have already seen the results at small scale with our Zika Hackathon with big data compute and storage resources volunteered by TACC on its powerful Wrangler system. If a small group of volunteers can make an important step forward towards common goals in the battle against infections diseases, just think what the industry, academia and public sector can do at large. TACC is looking for hard problems to solve, they want to collect, store and publicly share big data sets that can be used for research and analytics. TACC makes building and running a Cloudera CDH cluster easy so any researcher that would like to run a model in R, Scala or Python can just run a few clicks and commands, this is huge as many times the challenges to research is building an environment from the ground up.
Cloudera is a strong supporter of data to improve health and quality of life and has joined President Barack Obama’s Precision Medicine Initiative (PMI) by providing training and software, and collaborating with academic and government research using data and analytics. Cloudera has committed to train 1,000 precision medicine researchers on big data science and technology, a three-year commitment in software, services and training.
Industry, academia and public sector are joining forces to unlock the potential of
data with advanced analytics, machine learning and data science in the fight against infectious diseases and other illnesses to improve health and quality of life. Become a data citizen and join this effort by learning more on how you can volunteer, individually or through your organization.
Read other blogs from Cloudera:
Data Citizens Gather at Hackathon in Austin, TX to Fight Mosquito-Transmitted Disease
Zika and Big Data
Learning from Ebola — How Big Data Can Be Applied to Viral Epidemiology
Applying Big Data to Help Solve Zika
Genome Analysis Toolkit
For more information, contact: Makeda Easter, Texas Advanced Computing Center, 512-471-8217; Faith Singer-Villalobos, Texas Advanced Computing Center, 512-232-5771.
Irene Qualters, NSF, (703) 292-2339, firstname.lastname@example.org
Robert Chadduck, NSF, (703) 292-224, email@example.com