Technology 2026-02-20 5 min read

Study: Only 4 of 30 Leading AI Agents Publish Agent-Specific Safety Documents

AI Agent Index finds browser agents have 64% of safety fields unreported, while only 4 of 13 frontier-autonomy systems disclose any agent-level evaluation

Every major AI developer publishes safety commitments. Most have ethics frameworks, responsible AI principles, and documentation about the underlying language models powering their products. What very few publish is evidence of safety evaluation for the AI agents built on top of those models - the chatbots, browser automation tools, and enterprise workflow systems that millions of people now use daily to write emails, book travel, manage invoices, and browse the web on their behalf.

The AI Agent Index, an ongoing transparency project involving researchers from the University of Cambridge, MIT, Stanford, Harvard Law School, the University of Pennsylvania, the University of Washington, the Hebrew University of Jerusalem, and Concordia AI, has produced a systematic assessment of 30 leading AI agents. Its latest update, covering public information available through the end of 2025, finds a significant gap between what developers say about AI safety and what they demonstrate about the specific agents they deploy.

The Numbers

Of the 30 agents examined, only four have published agent-specific system cards - the formal safety and evaluation documents that detail autonomy levels, behavioral boundaries, real-world risk analysis, and guardrails specific to that agent's design and deployment context. Those four are ChatGPT Agent, OpenAI Codex, Claude Code, and Gemini 2.5 Computer Use.

The picture is similar across other safety-related disclosures. Twenty-five of the 30 agents do not publish internal safety results. Twenty-three do not share data from third-party testing. Known security incidents or concerns have been documented for only five agents. Prompt injection vulnerabilities - cases where malicious content in a webpage or document can manipulate an agent into ignoring its safety constraints and executing unintended commands - have been documented for two of those five.

The Index assessed safety-related fields including guardrails, sandboxing, risk evaluation, safety evaluations, third-party testing, benchmarks, bug bounty programs, and records of known incidents. Browser agents - tools that autonomously navigate websites, click buttons, fill in forms, and make purchases on users' behalf - had 64% of these fields unreported. Enterprise agents, designed for business workflow automation, had 63% missing. Chat agents were the most transparent category, but still had 43% of safety fields unreported.

Why Agent-Specific Safety Data Matters

Leon Staufer, an MPhil researcher at Cambridge's Leverhulme Centre for the Future of Intelligence who leads the Index update, draws a distinction that cuts to the heart of the transparency problem. Model safety - whether a language model produces harmful text in test conditions - is different from agent safety, which emerges from how that model is embedded in a system with memory, tool access, planning capabilities, and real-world action authority.

An agent that can execute purchases, access email accounts, fill in government forms, or scrape web content can cause consequences that a chatbot answering questions cannot. A safety regression in the underlying language model might be caught by the model developer's testing. A safety flaw in the agent's planning logic, its memory management, or its handling of adversarial web content might not surface until it is exploited in deployment.

The Index identified 13 agents operating at what it classifies as "frontier levels" of autonomy - levels 4 and 5 on a five-point scale, where the user's role shifts from active director to approver or observer. Of those 13 high-autonomy agents, only four disclose any safety evaluations specific to the agent itself.

The Browser Agent Problem

Browser agents present a specific and growing concern. Several of the products reviewed are designed to operate across the open web, navigating arbitrary websites and interacting with services ranging from e-commerce platforms to government portals. At least six agents in the Index explicitly use code and IP address techniques designed to mimic human browsing behavior and bypass anti-bot protections - raising questions about legality, user disclosure, and the practical security posture of systems that operate by deliberately resembling humans.

The Index's case study on Perplexity Comet illustrates the issue concretely. Comet is marketed as an autonomous assistant that works "just like a human assistant". Amazon has threatened legal action against Comet for failing to identify itself as an AI agent when interacting with Amazon's services. The case makes visible a broader problem: 21 of the 30 agents in the Index have no documented default behavior for disclosing their AI nature to the websites and services they interact with. Only three support watermarking of AI-generated media.

Security researchers documented last year that malicious content on a webpage could hijack certain browser agents into executing unauthorized commands. Other attacks have extracted private user data from connected services. These are not theoretical scenarios; they are documented incidents for systems that tens of thousands of people use to manage consequential real-world tasks.

Platform Concentration

The Index also flags a structural risk in how the agent ecosystem is built. Outside of Chinese AI agents, almost all of the 30 reviewed systems depend on one of a small number of foundation models - primarily GPT, Claude, and Gemini. A pricing change, outage, or safety regression in any of those models would cascade across the agents built on top of it. This concentration creates both systemic fragility and, as Staufer notes, potential opportunities for centralized safety monitoring - though those opportunities are not currently being formalized.

Of the five Chinese AI agents reviewed, only one had published any safety frameworks or compliance standards. The study does not draw regulatory conclusions but notes the transparency pattern as a data point in a broader global picture of AI governance development.

What Accountability Looks Like

The Index is descriptive rather than prescriptive, but its framework implies what agent-level accountability would require: published system cards that address the agent's specific capabilities and risk profile, internal safety testing results, third-party audit data, and clear disclosure protocols for agent behavior on the web. No regulatory body currently mandates these disclosures for AI agents in major markets, and the Index's findings suggest that voluntary compliance is far from the norm.

The timing of the study matters. AI agents are not experimental systems confined to research environments; they are products used at scale by consumers and businesses for real-world tasks with real-world consequences. The gap the Index documents - between what developers say about safety and what they demonstrate - is not a future risk but a present one. Security incidents involving browser agents have already been documented. More will follow as these systems become more capable and more widely deployed.

Staufer's final assessment of the latest Index data is direct: the pace of AI agent deployment is outrunning the pace of safety evaluation. The transparency and governance frameworks needed to manage systems that can act autonomously in the real world are, in his characterization, dangerously behind where they need to be. The data in the Index does not contradict that conclusion. What the Index provides, for the first time at this level of detail, is a systematic public record of what those 30 leading systems actually disclose - a baseline against which future disclosures, or the lack of them, can be measured.

Source: AI Agent Index annual update, published February 2026. Lead author: Leon Staufer, Leverhulme Centre for the Future of Intelligence, University of Cambridge. Collaborating institutions: University of Cambridge, University of Washington, Harvard Law School, Stanford University, Concordia AI, University of Pennsylvania, MIT, Hebrew University of Jerusalem. Index covers 30 AI agents with data through December 31, 2025.