OASB
Jun 5|OASB Eval re-measured on the v2.0 dataset: 82.9% F1 at a 1.16% false-positive rate
All updates
OASB / Open Agent Security Benchmarkv0.3.2 · Apache-2.0 · OpenA2A

97,127 hosts scanned.
One benchmark emerged.

OASB is the open standard for AI agent security: compliance controls, behavioral governance, and a reproducible tool evaluation, built from real-world data.

$npx hackmyagent secure --benchmark oasb-1
StatusOpen standard
LicenseApache-2.0
MaintainerOpenA2A
Version0.3.2
AIAgentIdentityAuthorizationInputOutputCredentialsSupply ChainAgent-to-AgentMemoryOperationsMonitoringL1L2L3
L1 Essential
L2 Standard
L3 Hardened

Internet-wide scan data

The current state of AI agent security

HackMyAgent scanned the public internet for exposed AI agent infrastructure. The results informed which OASB controls matter most.

97,127

Hosts discovered

11,192

Hosts scanned

1,594

Vulnerable

1,190

CLAUDE.md exposed

645

MCP tools exposed

5,042

Outdated endpoints

Read the full research report

Reference results / scanner leaderboard

82.9% F1 on ground-truth labeled attacks.

4,245 labeled samples, 9 attack categories. The HMA full pipeline scores 82.9% F1 at a 1.16% false-positive rate (82.6% recall). The verdict counts attack findings, not posture: wildcard tool access, which thousands of benign MCP servers declare, is surfaced but excluded from the malicious decision.

DisclosureOpenA2A authors OASB. HackMyAgent is the reference adapter, shown transparently and not ranked above independent submissions.
#AdapterF1PrecisionRecallFPR
1HMA Full Pipelinereference82.9%83.2%82.6%1.16%
2HMA Static (regex)reference67.5%99.3%51.1%0.03%
3NanoMind TME v0.5.0 (ablation)14.0%7.5%93.0%79.2%
Full leaderboard and methodologyDataset: v2.0 / June 2026

Your security team will ask what standard you are using.

Send them here.

OASB Eval

Verify your agent's security

Run the benchmark against your AI agent. Read the docs for CI/CD integration.

$npx hackmyagent secure --benchmark oasb-1

Adopt and contribute

Open standard / Apache-2.0

OASB is developed in the open and welcomes co-authors and adopters. Run the benchmark against your product, submit results, or propose new controls and scenarios.

01

Submit an adapter

Implement the SecurityProductAdapter interface and run the same scorecard. Independent submissions are shown alongside the reference adapter.

Adapter interface
02

Map to the controls

Assess your agent against the 46 OASB-1 compliance controls and the 72 OASB-2 governance controls.

View the controls
03

Propose changes

Open an issue or pull request to refine the controls, scenarios, or scoring methodology.

OASB on GitHub

Cite this standard

v0.3.2

Plain

OpenA2A. (2026). OASB: Open Agent Security Benchmark (v0.3.2). https://oasb.ai

BibTeX

@misc{oasb2026,
  title   = {OASB: Open Agent Security Benchmark},
  author  = {OpenA2A},
  year    = {2026},
  version = {0.3.2},
  url     = {https://oasb.ai}
}