Enable breadcrumbs token at /includes/pageheader.html.twig

Keylime Provides Root-of-Trust at Scale

The firmware zero-trust architecture is ready for prime-time, cloud-based deployment.

Illustration design by Chris D’Elia based on artwoIllustration design by Chris D’Elia based on artwork by Zlatko Guzmic and Anastasia Asadcheva/Shutterstockrk by Zlatko Guzmic and Anastasia Asadcheva/Shutterstock

Researchers at the Massachusetts Institute of Technology Lincoln Laboratory that developed the Linux-based open-source zero-trust architecture called Keylime are now seeing it deployed more significantly. Built on top of the so-called Linux TPM2 software stack, one of the main features of the cloud-enabled Keylime architecture is its runtime integrity monitoring, which is designed to protect the use of hardware, Internet of Things and legacy devices from the moment the devices start turning on or booting. The platform enables the use of these types of systems in conjunction with the cloud, protect the root-of-trust level using cryptography, a verifying function, monitoring and other cybersecurity measures.

Already instituted as a core part of the Commonwealth of Massachusetts’ cloud security, the zero-trust software platform was also recently adopted by IBM, say Charles Munson and Nabil Schear, who created Keylime while at Massachusetts Institute of Technology (MIT) Lincoln Laboratory’s Secure Resilient Systems and Technology Group.

The architecture’s unique remote boot verification, or attestation, enables users to monitor a large scale of remote nodes using a hardware-based cryptographic root of trust that supplies verification not just for Internet of Things (IoT) devices, but for computing at the edge and cloud-based operations.

“The goal of Keylime is to make sure before you give any kind of machine your secrets that it starts out in a trustworthy state,” explains MIT Lincoln Laboratory’s Munson. “And after you’ve given it your secrets, you want to make sure it stays in a trustworthy state. For example, if you have an IoT device that you want to bring up, you want to make sure it is trustworthy before you put anything sensitive on it. [With Keylime’s attestation] you can load all your data and software running on it and continuously monitor it to make sure it stays in a good state. It helps preserve the sensitivity of any of your secrets and makes sure you have integrity in your computing environment.”

“We like to joke that Keylime is doing what seems like a very straightforward job of just establishing a key on a computer,” says Schear, who is now a senior security partner at Netflix, working to protect their cloud-based global entertainment streaming platform. “But it’s very challenging to do that without accidentally making a mistake and then relying upon something that you shouldn’t have or expecting some other route of trust to kind of come and save you. The problem is that bootstrapping is very hard because around every corner is another problem.”

The solution leverages the trusted platform module—known as a TPM—a common, inexpensive hardware chip widely installed on laptops, IoT devices and other digital technologies. As part of its normal operations, the TPM chips supply a hash, or short string of data, to the device. The Keylime architecture detects if any part of the hash has been altered.

“You have this TPM module inside of your system or device, and it’s a separate little processor,” says Schear. “The TPM has its own little view of the world, and it has some cryptographic keys that are burned into it when it’s manufactured. And that is what is at the center of the root of trust. That’s the thing that we use to bootstrap everything on.”

“The TPM helps us answer two different questions,” Munson shares. “The first question is that it figures out which machine you are talking to. The TPM has a security certificate and a key that is baked in by the manufacturer of the chip. The key is really difficult to extract or to be able to change; it’s kind of hardcoded into that chip. And that gives us an identity of the chip itself, a unique identity that is signed by the manufacturer. So that allows us to figure out whom we’re talking to.”

Munson explains that when a machine first starts up, it goes through early computing boot phases. Using the hash data, the Keylime solution then will take measurement of a component before it hands off control to any next component, storing the hash data in the TPM chip.

“As it goes up the chain, each component is handing off control to the next, and it continuously measures the next component and makes sure that it stores all those in the TPM securely,” Munson explains. “So, what we’re doing is we’re creating a chain of hashes in the TPM chip, and later on, we can reach out to the TPM chip remotely and ask the TPM chip to give us that chain of hashes. That allows us to remotely look through everything that has ever run on that machine, and we can compare it against an allowed list. If any of those hashes don’t match up, it is clear there’s something wrong in that machine. It will be detected by Keylime and then isolated from the network, so it doesn’t do any damage,” Munson says.

Schear came up with the idea to pursue such a zero-trust architecture at MIT Lincoln Lab as a way to work past the confines of the TPM processes. The chip’s operations are relatively slow, taking about a second to do a single cryptographic operation and cannot be leveraged at scale.

The researchers needed to find ways to connect the TPM, the physical chip on a motherboard, into the virtual world and do so securely. “We also were trying to solve some of the problems around the TPM because it is very slow to do its [operations], and we needed to scale for the cloud, to be able to support thousands upon thousands of machines. And now that I am at Netflix, I can see how many thousands of machines there are to scale, and it really is a lot,” Schear notes.

In addition, the researchers wanted to design a “proper route of trust in the cloud solution” that was simple to use. As such, they created a verifier. “It can live in the cloud, alongside your other machines, and its job is to reach out to machines as they’re coming up, as they are booting and ask the TPM for some information about its identity,” he says. “And at the end of the day, you can get a value that represents exactly what software was executed on this system, and what the TPM can do is it can present those to an external or internal verifier.”

“The cloud verifier’s first job is basically to reach out to each machine and request the attestations,” Munson continues. “It compares them all against the accepted list of what should be running, to see if a machine is good or bad before it gives it any secrets. And that verifier allows [the Keylime solution] to scale linearly.”

In running the cloud verifier, the Keylime architecture works by taking an initial secret key and cryptographically splitting it into half and giving one half of that key to the cloud verifier and the other half to the cloud node.

“Neither the cloud verifier nor the cloud node have enough details to be able to get the original secret back,” Munson suggests. “And then the cloud verifier will reach out to the cloud node and do that attestation, where it will check everything that has ever run on the node. Once the cloud verifier determines that it is in a good state, it will give its half of the key to the cloud node, and it can use that to bootstrap more things into the cloud.”

That groundbreaking ability to provide a secure link from hardware to the cloud and delivering that bootstrap key to a computer also makes it scalable—in addition to making it incredibly secure, Munson notes. “Otherwise, it would be difficult for cloud services to reach down into the hardware to do that attestation,” he says. “Keylime makes it capable of scaling and cloud compatible out of the box and gives you the ability to have control over your own keys.”

Comparatively, cloud providers store users’ sensitive data in kind of a big bank that they do not have direct control over, and it is not based in a hardware root of trust, Schear points out. Moreover, normal cloud protections are usually based on policy and software.

“The state-of-the-art in cloud before and actually still is today, is that you just completely trust the cloud provider to do all of this for you, this bootstrapping process,” he says. “And while I love cloud providers—and Netflix is a big user of Amazon’s cloud service—we all don’t have to trust them that much. …With the Keylime project, what we are trying to prove is that it’s possible to ‘have your cake and eat it too,’ to be able to use the cloud, but also be able to firmly establish your faith and trust in the machine that you have been given and then be able to do sensitive things on top of it.”

In addition to maintaining control over your own keys, users do not have to modify any of their software tools, Munson advises, as Keylime “just runs in the background.”

Moreover, making the Keylime architecture an open-source project has also strengthened the tool, its deployment, and added to its acceptance by technology firms. “We were an open-source project early on, we started working with a group called Massachusetts Open Cloud, which is headed up by Boston University and Northeastern University,” Munson states. “Essentially, we started getting into their cloud environment, and they used Keylime as the core security component for the cloud that they were building for the state of Massachusetts, and Red Hat was also on that team.”

The MIT Lincoln Lab researchers worked closely with Red Hat to build a community around the Keylime platform, and the company was instrumental in getting Keylime adopted as a Sandbox Technology of the Cloud Native Computing Foundation, which is part of the Linux Foundation group, Munson notes. In 2019, IBM purchased Red Hat, and after seeing the possibilities of Keylime, elected to add the platform into its cloud offerings for financial and other enterprise clients, given its “strong security posture.”

Additionally, over the last several years, Keylime was awarded several accolades. “This year we received the FLC award for transition of technology, given by the Federal Laboratory Consortium for Technology Transfer,” Munson shares. The MIT Lincoln Lab researchers won the regional award, “and we hope to potentially go on to a national level next year. And in 2020 we received the R&D World 100 award,” which annually recognizes 100 top new technologies.

With Schear and Munson continuing on the Keylime effort as adjunct advisors, other Keylime contributors are working on translating part of the original software—which was written in Python—into the programming language Rust. “[That will make] it harder to exploit the agent that turns on the nodes, making it less likely that you would have any kind of exploits,” Munson shares. Eventually, the plan is to move everything into Rust, he says, not just the nodes.

The MIT Lincoln Lab researcher encourages people to shift their thinking to a zero-trust mentality. “Start out with the assumption that you can never trust anything, even inside your own environments and that you always have to verify everything,” Munson says.