Ward is security review infrastructure. Every finding ships with a deterministic evidence chain, a calibrated probability, and a signed witness — replayable bit-for-bit and verifiable by a third party without trusting Ward. Runs in your environment.
We ran Ward and the tools teams most commonly rely on over the same corpus of 3,408 entries grounded in historical CVEs across five ecosystems. A finding only counts when it flags the vulnerable code targeted by the patch and disappears on the fix commit.
▸ Paired scoring: a finding is “real” only if present on the vulnerable commit, localized to the code the patch fixed, and absent on the fix commit. CodeQL numbers are withheld until the full-corpus rerun completes cleanly enough to publish a reproducible figure under the same harness. Read the methodology →
A typical scanner emits an alert and a severity. Ward emits a deterministic record: the source-to-sink trace, the calibrated probability, the signed evidence chain that produced the decision, and a hermetic capsule a third party can replay bit-for-bit. Verifiable, not trust-me.
Ward treats a finding as a verifiable record, not a flag in a dashboard. The pipeline is deterministic, the evidence chain is signed, and the inputs and tools are pinned — everything needed to reproduce the result, including a skeptic.
The scanner remains the base layer. On top of it, Ward runs an investigation loop that carries each finding from candidate signal to a decision a reviewer can defend — with the trace, repro, and provenance attached.
input
…
action
Reasons across files to surface vulnerable flows that single-file pattern matching often misses.
Investigation runs inside a capability-restricted sandbox. Every step — model call, repro execution, patch attempt — is captured in a signed witness with provenance pinned to specific tool and model versions. Reproducible, auditable, attributable.
Each finding carries an explicit grade. The block / warn / allow decision is driven by a versioned loss function, not a static threshold — auditable and policy-controlled.
The benchmark matters. So do its limits. Here’s what we count, what we compare, and where the current pre-release claims stop.
For each CVE we have a repo and two SHAs: vuln_sha (the commit the CVE was filed against) and fix_sha (the merge that closed it). We run the scanner on both and call the finding “real” only if it fires on vuln_sha at a location whose scope includes the code the patch fixed, and does not fire on fix_sha. Any other pattern is not credited. Raw alert counts across scanners aren’t comparable; paired scoring is.
No. Static analysis is the entry point, not the whole story. Ward is verifiable security review infrastructure: deterministic scanner output, signed evidence chains, hermetic replay capsules, calibrated probabilities, and a sandboxed investigation layer on top. The scanner is farther along than the investigation layer today.
The static pipeline runs entirely in your environment with no external calls. The investigation layer can use pinned model providers; a fully self-hosted variant for air-gapped deployments is on the near roadmap. In every mode, code only enters declared, sandboxed surfaces, and every model interaction is captured in the signed witness chain.
Each run produces a hermetic replay capsule and a signed evidence chain anchored to an external transparency log. A third party can re-derive the same findings from the capsule and verify the attestations independently — no Ward cooperation required. Anchor staleness and verification failures explicitly downgrade the run’s trust state.
The current comparison includes Semgrep. CodeQL is being rerun under a stricter full-corpus setup, but those runs currently take more than 24 hours and have not completed cleanly enough for us to publish a reproducible headline number. We’ll publish the exact versions, configurations, and harness details alongside the benchmark methodology.
We intend to publish the methodology, scoring harness, benchmark dates, and pinned tool configurations. We have not finalized what portion of the corpus itself will be public.
Ward is pre-release and in active development. There’s no public install today. If you want to be notified when there is, leave your email below.