97,127 hosts scanned.
One benchmark emerged.
OASB is the open standard for AI agent security: compliance controls, behavioral governance, and a reproducible tool evaluation, built from real-world data.
npx hackmyagent secure --benchmark oasb-1Internet-wide scan data
The current state of AI agent security
HackMyAgent scanned the public internet for exposed AI agent infrastructure. The results informed which OASB controls matter most.
97,127
Hosts discovered
11,192
Hosts scanned
1,594
Vulnerable
1,190
CLAUDE.md exposed
645
MCP tools exposed
5,042
Outdated endpoints
Specifications
Three measurement systemsCheck agent compliance
CIS Benchmarks for AI agents. Answers: is your agent secure?
Govern agent behavior
Behavioral governance. Answers: does your agent behave correctly?
Evaluate security tools
MITRE ATT&CK Evaluations. Answers: does your EDR catch this?
Security controls
46 controls across 10 categories
Open-source toolkit
Every control maps to a free tool
Scan, fix, and verify compliance without vendor lock-in. All tools available at opena2a.org.
HackMyAgent
238 security checks + attack simulation
npx hackmyagent secureSecretless AI
Credential protection for AI tools
npx secretless-ai initAIM
Cryptographic identity and trust scoring
opena2a identity createBrowser Guard
Detect and control browser-based AI agents
Chrome Web StoreDVAA
Vulnerable AI agent for security training
docker compose upOpenA2A CLI
Orchestrates all tools from one command
npx opena2a-cli reviewReference results / scanner leaderboard
82.9% F1 on ground-truth labeled attacks.
4,245 labeled samples, 9 attack categories. The HMA full pipeline scores 82.9% F1 at a 1.16% false-positive rate (82.6% recall). The verdict counts attack findings, not posture: wildcard tool access, which thousands of benign MCP servers declare, is surfaced but excluded from the malicious decision.
| # | Adapter | F1 | Precision | Recall | FPR |
|---|---|---|---|---|---|
| 1 | HMA Full Pipelinereference | 82.9% | 83.2% | 82.6% | 1.16% |
| 2 | HMA Static (regex)reference | 67.5% | 99.3% | 51.1% | 0.03% |
| 3 | NanoMind TME v0.5.0 (ablation) | 14.0% | 7.5% | 93.0% | 79.2% |
Verify your agent's security
Run the benchmark against your AI agent. Read the docs for CI/CD integration.
npx hackmyagent secure --benchmark oasb-1Adopt and contribute
Open standard / Apache-2.0OASB is developed in the open and welcomes co-authors and adopters. Run the benchmark against your product, submit results, or propose new controls and scenarios.
Submit an adapter
Implement the SecurityProductAdapter interface and run the same scorecard. Independent submissions are shown alongside the reference adapter.
Adapter interfaceMap to the controls
Assess your agent against the 46 OASB-1 compliance controls and the 72 OASB-2 governance controls.
View the controlsPropose changes
Open an issue or pull request to refine the controls, scenarios, or scoring methodology.
OASB on GitHubCite this standard
v0.3.2Plain
BibTeX
@misc{oasb2026,
title = {OASB: Open Agent Security Benchmark},
author = {OpenA2A},
year = {2026},
version = {0.3.2},
url = {https://oasb.ai}
}