Control Systems Need Software Security Too
Multiple decades of research have focused on building more secure and resilient systems by incorporating defensive techniques into computer systems. Such techniques range from enforcement-based defenses that apply some invariant to the execution of code on a machine to randomization-based defenses that enhance a system’s resiliency to attacks by creating uncertainty, diversity or dynamism in the internals of the system. Such defenses have evolved to address increasingly sophisticated attacks that bypass previous defensive technologies and minimize security-related overheads.
While such software security techniques are not truly impervious to attack, their widespread deployment in enterprise and general-purpose computing environments has significantly raised the bar for software security. However, many cyber-physical systems (CPS) and safety-critical application domains such as industrial control systems (ICS), avionics, automotive and other mission-critical applications have not seen the widespread adoption of many of these generally effective software defenses. Attackers are increasingly targeting such systems, for example, the Triton attack, because they have recognized their vulnerability as well as the strategic importance of these critical assets.
A significant difference between CPS applications and traditional enterprise computing applications—for example, web browsing and databases—is the cyber interaction with the physical world. Such interactions impose several unique constraints on the design of such systems. For example, the cyber components of a CPS are often resource-constrained given the size, weight and power (SWaP) requirements of the system.
Furthermore, control systems, which actuate the physical world via motors or other actuators, must do so in a timely fashion to ensure both safe and efficient operation. For example, an arc flash in a power system can cause life-threatening explosions, so a relay must be tripped to shut off power in less than a few milliseconds to prevent such dangerous situations. The latency requirements of the physical system, therefore, dictate temporal or real-time cyber constraints. Because of this, many real-time systems have not been instrumented with software security defenses that have already been proved effective in the enterprise environment.
To reliably meet the latency requirements of safety- and mission-critical applications, such systems require a high degree of runtime determinism. Algorithms with predictable performance are favored over those with strong average-case performance but poor performance in rare corner cases, ensuring that the system can reliably and responsively respond to safety-critical events.
In addition, many real-time application domains, such as in avionics, require formal verification and certification that all real-time requirements will be met in all cases. This formal verification requires analysis of the worst-case behavior of all software, as well as how such software will be run on the platform.
These rigorous design, analysis and testing requirements significantly affect how software may be instrumented for the purpose of improving security. The litany of enterprise-grade defenses that have been studied for decades have widely been optimized to offer minimal average-case runtime overhead, as many consider defenses with more than 5 percent to 10 percent runtime overhead to be too slow to practically deploy.
Many such defenses have not yet been evaluated from the context of worst-case performance, which is critical to real-time safety- and mission-critical applications. Such evaluations are critical to understanding what software security defenses can and should be applied in real-time applications and how future defenses may be designed for such applications.
Scientists at MIT’s Lincoln Laboratory (MITLL) have evaluated high technology readiness level (TRL) software defenses to identify those that can be applied in real-time application domains and how next-generation defenses can be adapted for such applications. They have chosen to evaluate several defenses from the two overarching classes of defenses that exist today: enforcement-based defenses, which add more security checks at different points during execution, and randomization-based defenses, which randomize different aspects of the program so that attackers cannot reliably target specific program features in an attack.
Enforcement-based defenses often have much more deterministic effects on the runtime performance of the protected software. For example, control-flow integrity (CFI) instruments control flow transitions to check that an attacker has not corrupted the target address to divert control to malicious logic. Such defenses are often rather lightweight, both in the average case, as well as in the worst case. In MITLL’s experiments, the worst-case performance overhead has been measured to be less than 2 percent on average for the benchmarks in the TACLeBench benchmark suites, which are commonly used in the embedded-systems community.
Lightweight defenses such as CFI are effective in many cases and amenable to real-time analysis; however, a sophisticated adversary can sometimes thwart them. Therefore, stronger enforcement-based defenses have been proposed that instrument more checks but are subject to higher overheads.
SoftBound is an example of such a defense. SoftBound provides complete memory safety to prevent memory corruption—a crucial step in any software attack. Unfortunately, the additional checks SoftBound imposes can be quite costly, imposing well over 100 percent overhead in many cases. Through their evaluations, MITLL’s scientists found that, unlike other enforcement-based defenses, the overhead of SoftBound is relatively consistent between the average and worst cases. Thus, while it exhibits high runtime overhead, that overhead is deterministic; therefore, if sufficient slack time in the target system exists, SoftBound could possibly be adopted in a real-time application. Many real-time applications, however, are very SWaP constrained, and thus such high overheads, even if deterministic, are likely untenable.
The other class of software defense under consideration is randomization-based defenses. These defenses have seen widespread deployment in enterprise-class systems for several years, with all major operating systems using address-space layout randomization by default.
Other more recent defenses, such as Selfrando or compiler-assisted code randomization, apply randomization at a finer granularity and, as a result, offer stronger security. These defenses incur less than 2 percent overhead in the average case, but as finer-grained defenses, they can incur overhead exceeding 50 percent in the worst case.
Such high overheads can be attributed to how specific randomizations interact with the low-level cache and memory hierarchy. These defenses are therefore considered much higher overhead in systems where real-time performance is paramount, whereas in enterprise environments, their average-case performance overhead is sufficiently low to be practically deployable. This observation demonstrates that software defenses must be tailored to the real-time application environment.
Many randomization-based defenses also are subject to other hurdles to adoption in real-time applications. For example, some safety-critical systems must be certified for logical and temporal correctness before being deployed. To certify the real-time performance of a system, the actual software deployed must be analyzed for certification. Compiler-based randomization defenses function by independently diversifying applications in different systems.
This is problematic for vendors of real-time applications, as software certification is too expensive and time consuming to conduct for each unit sold. In other words, aircraft vendors certify the software on a specific plane model; certifying each individual plane sold would be prohibitively expensive.
While existing randomization-based defenses are subject to hurdles to adoption in real-time applications, there is opportunity for new randomization-based defenses that exhibit lower worst-case performance overheads. There is a need for defenses that diversify the attacker-relevant parts of the application while preserving the timing characteristics of the application in such a way that would withstand the rigors of certification.
Another key difference between safety- and mission-critical applications as compared to enterprise applications is the need to maintain continuous and safe operation. In the enterprise environment, when a defense detects or prevents an attack, it is considered acceptable to crash the process to prevent the attack from continuing. This is not a tenable solution in safety- and mission-critical applications. Therefore, in addition to needing software-security defenses that have predictable real-time performance, systems need to be designed for resilience so that the system can continue operation even in the presence of a cyber threat. The system must failover to a backup controller in a timely fashion to continue safe operation, while the primary controller restarts and recovers from a cyber event. All of this must be done in a deterministic fashion that can be demonstrated to maintain safe operation.
Low-level embedded control systems are increasingly being targeted by adversaries, and there is a strong need for stronger software defenses for such systems. The cyber-physical nature of such systems impose real-time performance constraints not seen in enterprise computing systems, and such constraints fundamentally alter how software defenses should be designed and applied.
MIT Lincoln Laboratory scientists demonstrated that current randomization-based defenses, which have low average-case overhead, can incur significant worst-case overhead that may be untenable in real-time applications, while some low-overhead enforcement-based defenses have low worst-case performance overheads making them more amenable to real-time applications. Such defenses should be incorporated into a comprehensive resilient architecture with a strategy for failover and timely recovery in the case of a cyber threat.
Dr. Bryan C. Ward is a technical staff member and Ryan D. Burrow is an associate staff member in the Secure Resilient Systems and Technology Group at MIT Lincoln Laboratory and conducted the evaluations to identify high technology readiness level software defenses that can be applied in real-time application domains.
This article is the first place winner in SIGNAL Media’s The Cyber Edge writing contest. The second and third place winning articles will be published in an upcoming issue of SIGNAL Magazine. ManTech International Corporation is sponsoring the competition.
For information on the 2021 The Cyber Edge Writing Contest, go to https://signal.afcea.org/TCEWritingContest
DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. This material is based upon work supported by the Department of Defense under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do
not necessarily reflect the views of the Department of Defense.