Virtualization Increases Supercomputers' Potential

June 2010
By Henry S. Kenyon, SIGNAL Magazine


Kevin Pedretti, lead researcher for Sandia National Laboratories’ virtual external operating system program, uses his laptop computer to inspect a virtual machine experiment running on Sandia’s Red Storm super-computer. The goal of the effort is to create a more flexible environment for scientists conducting research on supercomputers.

New software offers increased flexibility for high-performance computing and simulation.

A prototype virtual operating system will allow researchers to load experiments into supercomputers quickly without having to modify their programs substantially to operate on a specific platform. The software uses a technique called virtualization to enable a machine to run multiple operating systems, something current supercomputers cannot do. This capability would allow high-performance computers to operate a wider range of software, from highly specialized modeling and simulation programs to commercial applications.

Developed at Sandia National Laboratories, the goal of the virtual external operating system project is to run custom software and operating systems on top of the fixed operating systems residing on supercomputers. Kevin Pedretti, a senior member of Sandia’s technical staff and lead researcher for the virtualization project, explains that a common complaint from researchers is that the specialized operating systems of supercomputers prevent the use of applications not designed specifically to run on those machines without major modifications to the software.

Pedretti’s team is employing virtualization to allow application programs to use the computer’s system software on an on-demand basis. A virtual machine is a software program that operates like a physical computer and permits an actual computer to run a variety of operating systems. The virtualization team has leveraged Sandia’s pioneering work on lightweight operating systems for supercomputers to develop the software. Lightweight operating systems have relatively few lines of code—hundreds of thousands of lines as opposed to the millions of lines in a general-purpose operating code such as Linux. Known as lightweight kernels, these programs trade functionality for performance. By limiting functionality, computers can be scaled up to more nodes but lack the functionality of general-purpose operating systems.

Virtualization was attractive to the Sandia researchers because it allowed them to combine a hypervisor with a lightweight kernel while continuing to have a lightweight environment supporting a broader range of functionality. Hypervisors sit between a computer’s hardware and the operating system and use virtualization to emulate hardware and software. By controlling the hardware and virtualizing it, hypervisors allow multiple operating systems to run simultaneously on a supercomputer. Pedretti explains that this capability is similar to a personal computer running multiple software applications at the same time.

The project was proposed in fall 2007 to Sandia’s Laboratory Directed Research and Development Program, which funds basic scientific research. In 2008, the Sandia team began collaborating with scientists at the University of New Mexico and Northwestern University in Illinois. Pedretti says that the university researchers had developed a lightweight hypervisor for high-performance computing called Palacios, while Sandia had a lightweight kernel called Kitten. These two were combined to provide Sandia’s lightweight kernel with a hypervisor capability.

The program’s goal is to run unmodified operating systems as guest programs on a supercomputer’s operating system, Pedretti shares. The experiments that currently run on Sandia’s Red Storm supercomputer use its native operating system running as a guest underneath the Palacios/Kitten combination. He adds that while the software is described as a virtual external operating system, what really is being virtualized is the computer’s hardware. “We’re running guest operating systems that are executing on top of virtualized hardware that we export,” he explains.

When they began running experiments with the software, Sandia researchers wanted to understand the virtualized layer’s overhead—the amount of computer resources they required to operate. Pedretti notes there has been much academic discussion about using virtualization in high-performance computing, but it has not been adopted widely because the virtualization overhead is assumed to be very high, on the order of 20 percent of memory and processing time.

Pedretti admits that his team is unaware of any recent experiments examining the virtualization support present in modern X86 processors using a hypervisor tuned for high-performance computing and running applications inside of it. “We weren’t aware of anyone doing that to see what the overhead was,” he says.

The Sandia researchers began by running small-scale experiments on a 50-node development system. When the programs were run, researchers noted that the performance overhead was less than five percent. These results were encouraging and led to larger-scale experiments. To scale up the work, Pedretti’s team requested time on Red Storm and ran experiments on 4,096 of its 12,960 processor nodes. The results were the same on a larger system: overhead of the virtualization layer remained at five percent. “It indicates to us that it’s a feasible approach and we’re moving forward to get the functionality we need to move it into a normal high-performance computing environment,” he says.

The current state of the software is what Pedretti refers to as “stunt mode.” He explains that the group has a functioning hypervisor and all the tools necessary to run guest operating systems. But what is missing are the tools needed for users to select the number of nodes and the type of application to run on them. This data currently is input manually. The next step is to develop user-focused tools for scalably launching parallel virtual machines/guest operating systems, he says. The team first must create a command line interface that will be followed by a graphical user interface. “We need the interface to the user. We’ve got all the underlying tools, the hypervisor and lightweight kernel, which I would argue is the harder part. The next step is to do the user interface,” he says.


Supercomputers such as Sandia’s Red Storm use specialized operating systems. Researchers often find it difficult to test their software on these machines because the programs must be modified substantially. By using virtual machine software, Sandia researchers can instruct a super-computer to run multiple operating systems, making the machines more adaptable for a variety of research purposes such as simulation and modeling.

The program is working on a flexible timeline. Pedretti notes that by the end of this year, one of his project milestones is to have a functioning user interface for launching guest operating systems. To achieve this, the plan is to demonstrate the capability on a 50-node development system. “It’s largely a research project, but our end goal is to prove the value of virtualization in high-performance computing, show that we have the tools needed to do it and make the case for incorporating it into the standard system software stack on future machines,” he explains. Pedretti adds that another challenge will be convincing vendors such as IBM or Cray to include the software in their systems.

There were some challenges in developing the virtual operating system. Pedretti notes that none of the existing hypervisors were adequate for the type of work his team wanted to do because they were focused on single server applications. One challenge was to devise a thin, lightweight hypervisor layer that would be well-suited for high-performance computing.

Researchers also did not know the proper memory management strategies for the software. They conducted experiments comparing software memory management against hardware memory management, concluding that each type of software was best suited for different applications.

Pedretti says that the ultimate goal of his virtualization research is to create a more flexible environment for supercomputer users. Another objective is using the virtual operating system as a tool for conducting system software research. He relates that the U.S. Department of Energy (DOE) has the goal of developing an exascale or exaflop supercomputer by 2020. This computer would be three orders of magnitude faster than current supercomputers and capable of one million trillion calculations per second. But to achieve this, he says, there must be substantial ground-breaking innovation in the system software stack.

To launch the exascale computer program, the DOE has issued a call for proposals for the development of X-Stack system software. Pedretti explains that this work is related to virtualization because one of the largest issues regarding system software research is access to large-scale development platforms. “It’s very difficult to procure access to a machine like Red Storm, boot your custom operating system on it, see what happens, and when things invariably go wrong, to debug it,” he says.

If a virtualization capability was deployed in a standard supercomputer software stack, the X-Stack researchers could boot their custom operating systems on the machine using all its standard mechanisms. They would not have to request special time; they could simply boot their system software stack in the same way as a user booting an application on a desktop computer. Pedretti says this capability would allow researchers to access a large-scale computing platform easily.

A final, tangential goal is the ability to boot commercial operating systems such as Windows on supercomputers, which would allow researchers to simulate large swaths of the Internet. Pedretti contends that such a large research platform would allow scientists to study how botnets and viruses work in an experimental platform. He says that Sandia is not focused on this type of research, but adds that other researchers are looking at using virtualization to study these topics.

Sandia National Laboratories:
Virtual source machine monitor framework (Palacios):


Enjoyed this article? SUBSCRIBE NOW to keep the content flowing.