Jun 5|OASB Eval re-measured on the v2.0 dataset: 82.9% F1 at a 1.16% false-positive rate

OASB / Open Agent Security Benchmarkv0.3.2 · Apache-2.0 · OpenA2A

97,127 hosts scanned.
One benchmark emerged.

OASB is the open standard for AI agent security: compliance controls, behavioral governance, and a reproducible tool evaluation, built from real-world data.

$npx hackmyagent secure --benchmark oasb-1

View the 46 controls

StatusOpen standard

LicenseApache-2.0

MaintainerOpenA2A

Version0.3.2

L1 Essential

L2 Standard

L3 Hardened

Internet-wide scan data

The current state of AI agent security

HackMyAgent scanned the public internet for exposed AI agent infrastructure. The results informed which OASB controls matter most.

97,127

Hosts discovered

11,192

Hosts scanned

1,594

Vulnerable

1,190

CLAUDE.md exposed

645

MCP tools exposed

5,042

Outdated endpoints

Read the full research report

Specifications

Three measurement systems

§1

OASB-1Compliance

Check agent compliance

CIS Benchmarks for AI agents. Answers: is your agent secure?

46 controls / 10 categories / L1 L2 L3

§2

OASB-2Governance

Govern agent behavior

Behavioral governance. Answers: does your agent behave correctly?

72 controls / 9 domains / 4 tiers

§3

OASB EvalEvaluation

Evaluate security tools

MITRE ATT&CK Evaluations. Answers: does your EDR catch this?

222 scenarios / 15 ATLAS techniques

Security controls

46 controls across 10 categories

01Identity & Provenance

4 controls

02Capability & Authorization

05Credential Protection

5 controls

06Supply Chain Integrity

5 controls

07Agent-to-Agent Security

4 controls

08Memory & Context Integrity

4 controls

09Operational Security

5 controls

10Monitoring & Response

5 controls

View all 46 controls

Open-source toolkit

Every control maps to a free tool

Scan, fix, and verify compliance without vendor lock-in. All tools available at opena2a.org.

HackMyAgent

238 security checks + attack simulation

InputOutputSupply ChainMemoryOperations

npx hackmyagent secure

Secretless AI

Credential protection for AI tools

Credential Protection

npx secretless-ai init

AIM

Cryptographic identity and trust scoring

IdentityAuthorizationAgent-to-AgentMonitoring

opena2a identity create

Browser Guard

Detect and control browser-based AI agents

DetectionBrowser Security

Chrome Web Store

DVAA

Vulnerable AI agent for security training

All 10 categories

docker compose up

OpenA2A CLI

Orchestrates all tools from one command

All 10 categories

npx opena2a-cli review

Reference results / scanner leaderboard

82.9% F1 on ground-truth labeled attacks.

4,245 labeled samples, 9 attack categories. The HMA full pipeline scores 82.9% F1 at a 1.16% false-positive rate (82.6% recall). The verdict counts attack findings, not posture: wildcard tool access, which thousands of benign MCP servers declare, is surfaced but excluded from the malicious decision.

DisclosureOpenA2A authors OASB. HackMyAgent is the reference adapter, shown transparently and not ranked above independent submissions.

#	Adapter	F1	Precision	Recall	FPR
1	HMA Full Pipelinereference	82.9%	83.2%	82.6%	1.16%
2	HMA Static (regex)reference	67.5%	99.3%	51.1%	0.03%
3	NanoMind TME v0.5.0 (ablation)	14.0%	7.5%	93.0%	79.2%

Full leaderboard and methodologyDataset: v2.0 / June 2026

Your security team will ask what standard you are using.

Send them here.

OASB Eval

Verify your agent's security

Run the benchmark against your AI agent. Read the docs for CI/CD integration.

$npx hackmyagent secure --benchmark oasb-1

Adopt and contribute

Open standard / Apache-2.0

OASB is developed in the open and welcomes co-authors and adopters. Run the benchmark against your product, submit results, or propose new controls and scenarios.

Submit an adapter

Implement the SecurityProductAdapter interface and run the same scorecard. Independent submissions are shown alongside the reference adapter.

Adapter interface

Map to the controls

Assess your agent against the 46 OASB-1 compliance controls and the 72 OASB-2 governance controls.

View the controls

Propose changes

Open an issue or pull request to refine the controls, scenarios, or scoring methodology.

OASB on GitHub

Cite this standard

v0.3.2

Plain

OpenA2A. (2026). OASB: Open Agent Security Benchmark (v0.3.2). https://oasb.ai

BibTeX

@misc{oasb2026,
  title   = {OASB: Open Agent Security Benchmark},
  author  = {OpenA2A},
  year    = {2026},
  version = {0.3.2},
  url     = {https://oasb.ai}
}

97,127 hosts scanned.One benchmark emerged.

The current state of AI agent security

Specifications

46 controls across 10 categories

Every control maps to a free tool

HackMyAgent

Secretless AI

AIM

Browser Guard

DVAA

OpenA2A CLI

82.9% F1 on ground-truth labeled attacks.

Verify your agent's security

Adopt and contribute

Submit an adapter

Map to the controls

Propose changes

Cite this standard

97,127 hosts scanned.
One benchmark emerged.