Enable breadcrumbs token at /includes/pageheader.html.twig

NSF Bare Metal Testbed Benefits Cloud Researchers

The new testbed grows and shrinks to meet demand.

A newly created cloud computing testbed funded by the National Science Foundation (NSF) allows researchers to experiment remotely on bare minimum servers and provides the flexibility to select operating systems and other software applications.

Other cloud computing testbeds use dedicated infrastructure, making it difficult to efficiently handle peaks in demand. Additionally, most testbeds are limited to a specific community of users, which also limits enhancements from the community, according to the NSF website for its newest testbed, the Open Cloud Testbed (OCT). Also, testbeds usually are isolated from production environments, meaning they have no direct way to provide researchers access to production information, real data sets and real users, which curbs the ability to pursue certain research efforts. Finally, the combination of these challenges introduces barriers to another research goal—that of transitioning research developed in the testbed to practice.

The OCT is designed to address those challenges inherent in an isolated testbed by integrating testbed capabilities into the Mass Open Cloud, an existing cloud for academic users. In particular it adds dedicated resources, including a cluster of field-programmable gate arrays, which are integrated circuits designed to be easily configured. It also integrates enhanced nodes and the Massachusetts Green High Performance Computing Center and is complemented by NSF’s existing testbeds known as Chameleon and CloudLab. 

Michael Zink, the OCT principal investigator and a professor at the University of Massachusetts Amherst, touts the new testbed’s “bare metal” or “bare minimum” servers. “In our approach, we do what we call a bare minimum server, so that you can install and run whatever operating system you like. If you came up with a new alternative to Windows, MacOS or Linux, you could run and test it there.”

Though still in its infancy, the OCT includes more hardware than initially proposed, thanks to a donation from Two Sigma, a financial sciences company. “We added a much more significant amount of hardware to our testbed than we had proposed. That is because we got lucky. We got a huge donation of slightly used hardware—probably two years old—that we could integrate into our testbed. In the proposal, we talked about 10 servers and now we have over 100.”

Most of the servers are outfitted with Intel Xeon processors E5-2660 version 3, which offer 256 megabytes of random access memory, 10 gigabytes for network interfaces and storage space of either 400 or 900 gigabytes. The OCT also boasts 16 field-programmable gate arrays (FPGAs) that can be used for a variety of applications.

That allows more flexibility that cannot be found on common commercial cloud testbeds, he explains. “We often get these questions about why can’t people do that in Amazon or Microsoft Azure or Google Cloud? Because you don’t get access to the server level.”

That “low-level access” applies to the FPGAs as well. “Microsoft has some FPGAs in Azure, but the users do not get this low-level access. That’s a major achievement because that gives the research community access to new hardware in a testbed,” he elaborates.

Image
Michael Zink, Open Cloud Testbed principal investigator, National Science Foundation, and professor, University of Massachusetts Amherst
In our approach, we do what we call a bare minimum server, so that you can install and run whatever operating system you like. If you came up with a new alternative to Windows, MacOS or Linux, you could run and test it there.
Michael Zink
Open Cloud Testbed principal investigator, National Science Foundation, and professor, University of Massachusetts Amherst

Zink also touts the OCT’s flexibility for researchers on tight deadlines, such as those preparing papers or presentations for major conferences. “There’s a very interesting paper from the CloudLab team on when this demand goes up ahead of conference deadlines. What we set out to build is a system where you can allocate resources that are sometimes used for other compute tasks and add them to this testbed when there’s high demand and remove them from these testbeds when demand is low—making something that can grow and shrink,” Zink explains.

He says the fledgling program does not have hard data on the number or types of users, but he believes it probably averages at least 10 to 20 users per day. The OCT has been cited in a number of research papers, he adds, and has been used to study such topics as cloud cybersecurity, virtualization, operating systems, networking and disaggregation. Its sister testbeds, Chameleon and CloudLab, have been used to research a wide array of topics, including cuttlefish, scale-space attention networks, casual learning with big data and network functions in next-generation wireless technologies.

“None of the researchers have to build their own testbeds. They can use [OCT] remotely and run experiments without having to make a major investment in this type of hardware to do it locally at their institutions,” he says.

The testbed, which is free to users affiliated with U.S. institutions, helps level the playing field, he adds. “A professor might have some research funds to buy their own equipment. Undergrads certainly don’t have that, and if we talk about equality and fairness, this opens doors for under-resourced universities because it might be even harder for them to get their own equipment.”

Zink’s own students also use the OCT. He describes one student preparing to defend her Ph.D. proposal who used the testbed to study an alternative to traditional Internet protocol-based networking. “The Internet as we know it today runs on Internet protocol, or IP, but there are new ideas that kind of materialize in terms of information-centric networking. In the future, the idea is that we can address content just by a unique name,” he reports. “To test that, you need a system that does not run the Internet protocol stack but runs a new stack.”

The new approach, he adds, could be useful for multicast networking, in which data is sent simultaneously to a group of computers. “This new approach is more favorable for multicast approaches, so my student implemented that all in the testbed and did some analysis with that. She couldn’t have done that in any of the public-private clouds that are out there because she needed the testbed where she had full access about what she’s running as a network protocol stack.”

Zink, who was a co-principal investigator on NSF’s CloudLab project, says the two testbeds are now connected, and Chameleon is expected to be added this summer. “Once we had our hardware up and running, we could expose some of the hardware through the CloudLab framework, so when you now go to the CloudLab main page, you can allocate resources from the original sites plus our sites. We’re in the process of doing that with Chameleon.”

While CloudLab and Chameleon also include FPGAs, they are used only for computation. The OCT’s FPGAs provide additional capabilities. “Ours can also be hooked up to the network. These FPGAs have their own eEthernet network interfaces, so you can actually hook them up to the network, and you can then run experiments where either your full protocol stack or part of your protocol stack is implemented in the FPGA, and you can run experiments between these FPGAs.”

Based on user feedback, the OCT team will study the testbed’s power consumption and explore the possibility of lowering the energy usage. “To make that possible, you want to figure out if I run operating system X versus operating system Y, or virtualization X versus virtualization Y, or algorithm one versus algorithm two, what does that mean in terms of energy consumption in the data center or in the cloud? Being able to measure that is important, so we’re working on a solution, and I hope that by the end of the year we have that ready and going.”

The NSF awarded a grant of about $2 million to Red Hat Inc., in December of 2019. Building the initial testbed took nine months or less, and the FPGAs were added within 18 months, Zink recalls.