Federal AI Framework Aims at Accountability, Responsibility
The Government Accountability Office sets guidelines around four principles.
Federal organizations that plan on implementing artificial intelligence systems now have a framework for practices that addresses development, monitoring, legal and ethical issues. The Government Accountability Office (GAO) document, “Artificial Intelligence: An Accountability Framework for Federal Agencies and Other Entities,” aims to ensure responsible use of artificial intelligence (AI), including oversight.
“This report is really the first of its kind that pushes a lot of these aspirations down to the practices, procedures and questions level,” says Taka Ariga, chief data scientist for the GAO and director of its innovation lab. “It’s meant to be actionable; it’s meant to be hands-on.” He adds that this framework should be helpful for federal agencies as well as for commercial, academic and nonprofit organizations.
“For some time, we’ve known artificial intelligence as a transformative technology,” Ariga says. “Part of our desire was, looking around, to say, ‘How do we actually empirically verify not only the performance but also the societal impacts of AI?’ Where we arrived at is [that] taking that auditor’s perspective is probably the most effective way of doing that,” he offers.
He continues that many traditional AI frameworks were at a high-altitude, aspirational level. But simply saying, “Thou shalt do no harm,” does not provide the necessary guidance for the day-to-day responsibilities of program managers and data scientists. Conversations with international counterparts as well as state and local partners confirmed the need for this degree of guidelines.
“We could not have timed this any better because of all the proliferating AI mandates out there,” Ariga adds. “It’s not if; it’s when organizations such as GAO will be called upon to do these types of assessments.” He notes that this framework is not the end-all; input will lead to changes as it evolves.
Farahnaaz Khakoo-Mausel, assistant director with the GAO in the science and technology assessments and analytics team, notes there was no lack of desired principles when the GAO began this work. But there was little in the way of implementation. Being able to operationalize these principles became key as federal agencies began to mature their AI efforts.
Ariga cites two key takeaways from the report. One is to treat AI as a team sport, not just the responsibility of a data scientist or a program manager. Many of the issues and enablers of AI are multidisciplinary skill sets that cannot be forced on one individual or a group. The second takeaway is the life cycle of AI development, which is laid out in the report. This approach eschews simply taking the machine learning models or looking at specific data sets. Instead, the report examines the issue from the initial design to the development and deployment of AI solutions to include continuous monitoring. “AI is quite susceptible to the concept of model drift and data drift,” he says. “So how do we not just deploy it and forget it; how do we continually go back and make sure the performance is still aligned with the original design of these systems?”
As agencies establish these foundational principles, then use-case-specific applications are probably the next step, Khakoo-Mausel offers. In envisioning that move, planners are considering the implications in terms of potentially modifying the procedures, she adds.
The framework is divided into four principles: data, governance, monitoring and performance. Khakoo-Mausel notes that the four principles are not new concepts, as the planners wanted the report’s language to be accessible to the evaluation, auditing and oversight communities. Of the four, governance is key. “Without the right goals and objectives, without the right workforce and a risk management strategy, a lot of the setup that governance processes establish, really managing the data, the performance and the monitoring becomes more challenging,” she offers. “That’s the current condition,” which she likens to an existing information technology governance structure that is inappropriate for AI.
“We want to make sure we are laying out the life cycle in a way that says, ‘No matter the entry point of your AI journey, that from a monitor’s perspective we want to see some of these design considerations,’” Ariga says.
The key parts of governance are goals and objectives, risk management strategy and ensuring that internal and external stakeholders have communication methods, Khakoo-Mausel states. The governance principle comprises organizational-level governance and system-level governance, which largely focuses on technical-level specifications that are required and need documentation. The top level is familiar in the federal space, but the system level generates new practices in the GAO report.
For top-level, organizational-level governance, the GAO calls for entities to define clear goals, roles and responsibilities. They also should demonstrate values and principles to foster trust, build an effective workforce, engage stakeholders with diverse perspectives to mitigate risks and implement an AI-specific risk management plan.
At the system level, organizations should establish technical specifications to address important issues. Khakoo-Mausel explains that these specifications must align with existing laws, regulations, standards and guidance. Ariga points out that at the system level, the framework does not recommend whether an organization should buy AI or develop it. But, at the governance level, it wants to ensure that the data and underlying technology have audit rights that can be assessed independently. This includes ensuring that support contractors leave behind the necessary documentation so organizations can sustain and modify the capabilities going forward.
“For us, it’s really important to look at a macro-level governance versus at the individual system component level,” he states. The alignment between both levels “sets the tone with how we proceed with the rest of the AI development journey.”
Monitoring is its own framework category, even though some aspects fall under governance. Khakoo-Mausel explains that auditing standards and internal controls separate the two so that monitoring ensures that a task is achieved. She emphasizes that this is not a static monitoring process, but instead, it is continuous because the model itself is evolving.
The framework calls for organizations to develop plans for continuous monitoring and to document results and corrective actions. It also requires assessing sustainment and expanded use to ensure AI relevance and to determine if and how an AI system could be scaled or expanded.
“In governance you specify what you will do. Monitoring is verifying that you actually said you did what you said you were going to do,” Ariga says. But he adds that the framework takes into account an element of scalability. Often, an AI journey begins as something of a prototype that then moves into broader use if successful. Scalability is essential for moving forward, he observes.
Khakoo-Mausel adds that monitoring also may require corrective action if a process is not working out. Someone must be responsible for stopping a model if it is no longer appropriate or successful.
A third framework principle is data, and the GAO report breaks it down into two areas: data used to develop an AI model; and data used to operate an AI system. Ariga offers that AI often is a collection of machine learning models, and the GAO wants to ensure that the necessary training and validation were completed at the individual machine learning model level using best practices. Again, issues include whether it is serving the customers and if corrective action need be applied. In model development, why certain variables were used is another factor for consideration—for convenience or from mathematical methods, for example.
Data is vital for AI system development, Ariga says. “Once you start stringing together the series of machine learning models that constitutes a system, you are feeding more operational data in to try to predict a desirable outcome as a system; you want to make sure the performance in fact is still consistent with what the individual model development was done at the system level,” he states. The sum of components must also equal in an alignment with what the original construct laid out in the governance.
Khakoo-Mausel emphasizes that the key audit method for AI is documentation. In today’s traditional static models, people can re-create what an agency is trying to do. But AI will not allow that, so the documentation piece is essential for determining reliability and appropriateness.
The second part of the data criteria, operating an AI system, focuses on real-time data streams in which data must be filtered or labeled correctly. Organizations must assess the interconnectivities and dependencies of these data streams to identify potential biases and also assess data security and privacy.
The demographic variables present in the operational environment often differ from those of the developmental model, Khakoo-Mausel observes. Tracking operational data can help avoid these types of bias.
The fourth focus, performance, brings together all the results of an AI application. Describing it as a dichotomy between technical performance with biases and transparency, Ariga says the goal is to delineate the performance in terms of the mathematical nature of a system versus performance with regard to the design objective.
The performance principle breaks down into the component level and the system level. For the component level, organizations should catalog model and non-model components that compose the AI system, define metrics and assess performance and output of each component. For the system level, organizations should do the same for the entire AI system. And, they should document methods for assessment, performance metrics and outcomes, identify potential biases and establish procedures for human supervision.
The human supervision aspect is vital, Khakoo-Mausel notes. Human monitoring of existing systems can introduce bias, so it is important to test and identify these processes.
Ariga notes that the publication endeavored to avoid specifying two particular elements. One is the definition of AI; the other is ethics. Instead, the framework sought to align with the ethics value originally advocated in the governance section of the framework. The GAO did not urge organizations to follow any specific ethics. Instead, it asked whether ethical values are articulated in an organization and how process can be assessed against that value.
Khakoo-Mausel emphasizes that the framework was careful to support the government auditing principle that ethics is an oversight management responsibility—a commitment to a value. “This should not fall with the data scientists, the designers and the developers,” she says. “That’s really inappropriate. This really falls on a governance perspective, an organizational issue.”
She continues that the concept of appropriateness is strewn throughout all four principles. A series of questions addresses the issue of whether being able to accrue data makes it appropriate. The requirement for responsible use does operationalize ethics throughout the implementation of AI solutions, she adds.
“At the end of this … the notion of civil rights issues and legislative issues are going to be a concern,” she relates. “For that program manager, this framework sets up some foundational practices that are good practices as they move forward in their implementation.”
The GAO framework can be downloaded from https://www.gao.gov/products/gao-21-519sp