We’ve contributed to a multi-stakeholder report by 58 co-authors at 30 organizations, including the Centre for the Future of Intelligence, Mila, Schwartz Reisman Institute for Technology and Society, Center for Advanced Study in the Behavioral Sciences, and Center for Security and Emerging Technologies. This report describes 10 mechanisms to improve the verifiability of claims made about AI systems. Developers can use these tools to provide evidence that AI systems are safe, secure, fair, or privacy-preserving. Users, policymakers, and civil society can use these tools to evaluate AI development processes.
While a growing number of organizations have articulated ethics principles to guide their AI development process, it can be difficult for those outside of an organization to verify whether the organization's AI systems reflect those principles in practice. This ambiguity makes it harder for stakeholders such as users, policymakers, and civil society to scrutinize AI developers' claims about properties of AI systems and could fuel competitive corner-cutting, increasing social risks and harms. The report describes existing and potential mechanisms that can help stakeholders grapple with questions like:
- Can I (as a user) verify the claims made about the level of privacy protection guaranteed by a new AI system I’d like to use for machine translation of sensitive documents?
- Can I (as a regulator) trace the steps that led to an accident caused by an autonomous vehicle? Against what standards should an autonomous vehicle company’s safety claims be compared?
- Can I (as an academic) conduct impartial research on the risks associated with large-scale AI systems when I lack the computing resources of industry?
- Can I (as an AI developer) verify that my competitors in a given area of AI development will follow best practices rather than cut corners to gain an advantage?
The 10 mechanisms highlighted in the report are listed below, along with recommendations aimed at advancing each one. (See the report for discussion of how these mechanisms support verifiable claims as well as relevant caveats about our findings.)
Institutional Mechanisms and Recommendations
- Third party auditing. A coalition of stakeholders should create a task force to research options for conducting and funding third party auditing of AI systems.
- Red teaming exercises. Organizations developing AI should run red teaming exercises to explore risks associated with systems they develop, and should share best practices and tools.
- Bias and safety bounties. AI developers should pilot bias and safety bounties for AI systems to strengthen incentives and processes for broad-based scrutiny of AI systems.
- Sharing of AI incidents. AI developers should share more information about AI incidents, including through collaborative channels.
Software Mechanisms and Recommendations
- Audit trails. Standard setting bodies should work with academia and industry to develop audit trail requirements for safety-critical applications of AI systems.
- Interpretability. Organizations developing AI and funding bodies should support research into the interpretability of AI systems, with a focus on supporting risk assessment and auditing.
- Privacy-preserving machine learning. AI developers should develop, share, and use suites of tools for privacy-preserving machine learning that include measures of performance against common standards.
Hardware Mechanisms and Recommendations
- Secure hardware for machine learning. Industry and academia should work together to develop hardware security features for AI accelerators or otherwise establish best practices for the use of secure hardware (including secure enclaves on commodity hardware) in machine learning contexts.
- High-precision compute measurement. One or more AI labs should estimate the computing power involved in a single project in great detail and report on lessons learned regarding the potential for wider adoption of such methods.
- Compute support for academia. Government funding bodies should substantially increase funding for computing power resources for researchers in academia, in order to improve the ability of those researchers to verify claims made by industry.
We and our co-authors will be doing further research on these mechanisms and OpenAI will be looking to adopt several of these mechanisms in the future. We hope that this report inspires meaningful dialogue, and we are eager to discuss additional institutional, software, and hardware mechanisms that could be useful in enabling trustworthy AI development. We encourage anyone interested in collaborating on these issues to connect with the corresponding authors and visit the report website.