NSF Cloud Enlightens Scientific Research
When National Science Foundation officials announced in February that three major providers of cloud computing were donating up to $9 million collectively for big data research, they already were looking for ways to broaden the effort to include a wider variety of topics, including cybersecurity. The expansion is intended to benefit both research and education initiatives and is necessary, in part, because the cloud providers now acquire cutting-edge hardware before it is made available to researchers.
The foundation announced nearly $30 million in new funding for research in data science and engineering through a program known as Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering (BIGDATA). The National Science Foundation (NSF) awards are paired with support from Amazon Web Services, Google Cloud Platform and Microsoft Azure, which committed up to $3 million each in cloud resources for relevant BIGDATA projects over a three-year period starting last fiscal year. A key goal of the collaboration is to encourage research projects focusing on large-scale experimentation and scalability studies.
The BIGDATA program funds novel research in computer science, statistics, computational science and mathematics that seeks to advance the frontiers of data science, an NSF announcement explains. The program also supports work on innovative applications that leverage data science breakthroughs to enhance knowledge in various domains, including social and behavioral sciences, education, biology, physical sciences and engineering.
But researchers intend to broaden the BIGDATA program still further. The NSF hosted a workshop earlier this year in Alexandria, Virginia, that included cloud providers Amazon, Google, Microsoft, Oracle and IBM. It also included researchers and many college and university chief information officers (CIOs). “We’ve been hearing from our community that this is a good idea and that they would like to expand,” reports Chaitanya Baru, senior adviser for data science in the NSF’s Directorate for Computer and Information Science and Engineering.
The workshop explored ways to structure an NSF-wide effort to create an interface between the research and education communities, the computer science field and the commercial cloud ecosystem. Participants also discussed possible plans for paying the cloud providers. “I don’t think we can bank on them continuing to donate huge amounts of money forever,” Baru says. “This is why CIOs were also involved. At some point, somebody may have to pay them; otherwise, they may lose their interest.”
He points out that many university campuses already are switching entirely to cloud computing rather than continuing to purchase new information technology systems. Furthermore, Baru notes, universities are no longer gaining access to state-of-the-art hardware. “In the old days, academia used to get the latest and greatest, the bleeding-edge stuff. The latest, greatest hardware nowadays is in the cloud,” he says. “You and I can’t buy the kind of hardware that the cloud can deploy because these hardware vendors are now working with the cloud guys. They’re the guys who are able to pay the money.”
Ideally, the NSF will “be able to play with the cloud guys,” Baru says, and provide researchers access to the latest and greatest technologies, such as graphics processing units (GPUs). He cites Google’s technology as one example of leading-edge GPUs. “Google has innovated internally and has this thing called the TensorFlow Processing Unit, which runs superfast with machine learning algorithms,” Baru says.
Some cloud providers are now allowing researchers to use algorithms designed for quantum computers, although the nascent algorithms are not yet as powerful or as world-altering as these machines are expected to become. Baru again mentions Google as a prime example. “Now they’re slowly using things like quantum computers. Google has a quantum computer in their cloud,” he says.
Expanding cloud availability to researchers would benefit many areas of study, but Baru specifies the NSF’s Secure and Trustworthy Cyberspace program as one that could potentially use big cloud computing. The program goals are aligned with the Federal Cybersecurity Research and Development Strategic Plan and the National Privacy Research Strategy to protect and preserve the growing social and economic benefits of cyber systems while ensuring security and privacy. The NSF program takes an interdisciplinary, comprehensive and holistic approach to cybersecurity research, development and education, and it encourages the transition of research into practice, according to an NSF website.
The expansion also will benefit educational efforts. “The community is also saying we need to get more of our kids introduced to the cloud. It’s a serious platform in the real world, and a lot of these kids, when they want to go out into the world, want to use clouds. We need to train them,” Baru says.
He reports a “massive boom” in interest in computer science and data science in recent years. To illustrate his point, Baru cites the University of California, Berkeley. About four years ago, he says, the university’s introduction to data science class for undergraduates had about 100 students. Today it has 1,000. That is largely because the class is now mandatory for undergraduates, but cloud computing has played a significant role. “If you’re growing at that rate, there’s no campus in this country that can keep up with providing information technology resources on campus. The only reason Berkeley pulled this off is that their entire course is run in the cloud,” Baru says.
The NSF’s agreements with the three major cloud providers essentially came together following a 2016 discussion at a Starbucks, he recalls. The companies were donating cloud computing resources to universities but found few researchers willing or able to use them. Part of the problem was that the cloud offerings would be around $50,000, which provides a lot of cloud computing capabilities but doesn’t pay for research assistants and other resources.
Additionally, the cloud providers are not necessarily equipped to determine which scientific research projects merit funding. On the other hand, evaluating research proposals is a core capability for the NSF. The solution, first agreed upon by Amazon Web Services, was for researchers to be awarded a certain amount of cloud computing resources at the same time they receive an NSF grant. The BIGDATA program was a natural fit because of the amount of computing power required to research vast quantities of information.
Doina Caragea, a computer science professor at Kansas State University, jumped at the opportunity. She is studying big data resulting from major disasters in an effort to provide greater situational awareness for response organizations and facilitate faster, more effective responses. “This project aims to explore machine learning solutions to help emergency response organizations deal with the overload of relevant and trustworthy information, in real time, to improve situational awareness,” Caragea says via email.
The approach will be to design an integrated knowledge transfer framework based on deep learning, with a focus on transferring knowledge from one or more prior crises to a current crisis. “The project assumes that each crisis event has unique characteristics in terms of its nature, location, actors and even social media response, while some patterns persist during different emergency events,” she adds.
The problem, Caragea elaborates, is that the volume of data resulting from a major disaster presents too much of a challenge for responders to glean lessons learned. “Manually sifting through voluminous streaming data to filter useful information in real time is inherently impossible,” she says.
Caragea chose to use Amazon Web Services because it provides a variety of computational resources and services, some “equipped with cutting-edge GPU architectures and powerful frameworks—and optimized for big data analysis, machine learning and deep learning,” she says.
“The staff at Amazon was readily available, proved to be very knowledgable and helped us identify the most appropriate and cost-effective resources for our project,” Caragea adds. “Thanks to NSF and Amazon Web Services, our project has the potential to transform the way in which crisis response organizations operate and, in turn, provide better support to the victims of disasters in a timely fashion,” she concludes.
The NSF-cloud provider partnership resulted in nine projects starting last fiscal year with the donated cloud resources. Baru reports that only about $3 million in resources were allocated, essentially leaving $6 million unused. But that likely will not last long. “This is the very first time the research community has seen this kind of program, so there is some notion of uptick. We fully expect in the current year to get a whole lot more requests because people get tuned into what’s going on,” he says.