HackerOne Joins Call for Action on AI Third-Party Evaluation and Flaw Disclosure

Ilona Cohen
Chief Legal and Policy Officer
Dane Sherrets
Staff Innovations Architect
Image
New paper validates importance of research

HackerOne has long partnered with the security and AI research communities to advocate for stronger legal protections for independent researchers. In an effort to share lessons learned from HackerOne’s expertise in AI evaluation, we participated in a recent Stanford-MIT-Princeton led workshop where experts from academia, government, and industry, gathered to examine the future of AI evaluation. HackerOne highlighted the need for independent scrutiny in AI systems and the potential for the vulnerability management models used in cybersecurity to be adapted for AI security and other flaw reporting. 

A newly released paper, “A Call to Action: In-House Evaluation Is Not Enough.Towards Robust Third-Party Flaw Disclosure for General-Purpose AI,” builds on these conversations and presents a critical call for change in how we assess and address the risks associated with general-purpose AI (GPAI) systems. We joined fellow co-authors who have backgrounds in fields such as software security, machine learning, law, and policy to underscore the pressing need for third-party evaluations and a coordinated flaw disclosure system to improve AI security and protect AI researchers. 

The Growing Risk Landscape for AI Systems

General-purpose AI systems like GPT models are now embedded in industries from healthcare to finance. As the paper highlights, the rapid deployment of GPAI systems brings unprecedented challenges, from unintended outcomes to complex security vulnerabilities. These risks grow when AI systems scale, where even small flaws can have significant consequences. 

A key factor in addressing these risks is increasing transparency around AI development and evaluation. In-house evaluations, while common, can sometimes miss critical flaws, and AI evaluation infrastructure is still evolving. Independent third-party evaluations provide broader and more continuous scrutiny, helping identify risks that may otherwise go unnoticed. Drawing from the success of vulnerability disclosure programs (VDPs) in cybersecurity, adopting a similar “default to disclosure” approach for flaws identified by these independent evaluators in AI systems can improve security and risk management. VDPs have proven effective in identifying and addressing vulnerabilities in software, and a similar mindset for AI models can help ensure that flaws are quickly recognized and addressed. Encouraging transparency around AI models and flaws, along with supporting rigorous third-party evaluations, enables stakeholders to proactively identify and address risks, benefiting the entire AI ecosystem. 

The Need for Third-Party Evaluation

Third-party evaluations are critical for uncovering risks missed by internal teams. Several workshop participants noted that third-party evaluators bring essential diversity of perspectives, often uncovering vulnerabilities otherwise missed by developers closely tied to the system. By ensuring a wider range of expertise, these evaluations are crucial for the ongoing security and trustworthiness of AI systems. However, the lack of clarity around legal protections for good faith AI research remains a significant barrier. As noted in the paper, broad interpretations of providers’ terms of service, which often prohibit practices like “reverse engineering” or “automatic data collection,” can deter researchers from identifying critical vulnerabilities. HackerOne has been a leader in advocating for good faith AI researcher protections, including supporting an exemption from copyright law restrictions for good faith AI research and urging the U.S. Department of Justice to extend protections for good faith security research to also protect good faith AI research. 

At HackerOne, we’ve seen the value of external evaluations in cybersecurity through our bug bounty programs, where independent researchers help identify hidden vulnerabilities. A similar model can be applied to AI systems, ensuring continuous scrutiny and faster flaw detection, as described in the paper. Flaw bounty programs, like the program HackerOne supported with Anthropic, offer financial incentives to encourage proactive flaw identification. These programs improve AI resilience and provide a valuable mechanism for collaboration to reduce flaws. 

Coordinated Flaw Disclosure: The Path Forward 

Our research stresses the need for coordinated flaw disclosures with a safe harbor, noting that the interconnected nature of GPAI systems means flaws can transfer across platforms. A unified disclosure process is essential, with standardized AI flaw reports and clear engagement rules to ensure timely, transparent resolution that protects third-party researchers. 

This approach aligns with Vulnerability Disclosure Programs (VDP), where ethical hackers are provided a secure, authorized channel to report vulnerabilities. At HackerOne, we’ve seen how VDPs foster collaboration and quick remediation. Applying similar models to AI systems can enhance security and reduce unintended outcomes for all stakeholders. The paper further suggests the creation of a centralized disclosure hub to aggregate and analyze AI flaws, which would promote transparency, enable sharing across organizations, and enhance the overall security of AI models. 

Strengthening AI Through Third-Party Evaluation

The release of our paper highlights the critical need for third-party evaluations to catch AI flaws that internal teams might miss. Without robust third-party evaluation, we risk overlooking critical flaws with far-reaching consequences.

By adopting third-party evaluations and coordinated flaw disclosure, we can better ensure AI security and reduce unintended outcomes. HackerOne has been at the forefront of AI evaluation, as demonstrated through our AI red teaming efforts with organizations like Snapchat and Anthropic