pre-release · in development

Security findings with evidence, not just alerts.

Ward starts with real-CVE-tested static analysis and carries findings forward into traces, provenance, and reviewable evidence. It is being built for private codebases that need more than a scanner dashboard.

Rust, Go, Python, JavaScript, Java 19 vulnerability classes Evidence-backed findings Pre-release
~/src/project zsh · ward
$ ward review
loading rules ok
analyzing repo ok
building evidence ok
ranking findings ok

▸ finding #1 witness:draft
classpath traversal
severityhigh
confidence0.91
<path>:<line> trace: sourcesink
▸ finding #2 config-trigger
classpath traversal
severitymed
confidence0.70
gradeconfig-dependent
trigger follow_symlinks False → True

2 findings retained ranked with evidence attached
output shown is illustrative
01   Real-world baseline

Real-CVE evaluation is the floor.

We ran Ward and the tools teams most commonly rely on over the same corpus of 2,068 entries grounded in historical CVEs across five ecosystems. A tool only gets credit when it flags the vulnerable code targeted by the patch and that finding disappears on the fix commit. That tells you the scanner is real. It is not the whole product story.

Paired true positives · same corpus, same harness

paired scoring · dated 2026-04-11
ward
799 TP
semgrep
58% of Ward
codeql
headline result withheld pending full-corpus rerun
withheld
ward advantage
Ward: 799 TP · 72.6% recall · 73.1% precision
Δ

▸ Paired scoring: a finding is “real” only if present on the vulnerable commit, localized to the code the patch fixed, and absent on the fix commit. CodeQL is shown as withheld pending validation: the stricter full-corpus rerun currently runs for more than 24 hours and has not completed cleanly enough to publish a reproducible headline number. Read the methodology →

02   Evidence, not vibes

Findings should carry more than a severity label.

Ward is being built as security review infrastructure, not a flat alert stream. The scanner is the first layer; the longer arc is evidence-backed investigation with witness bundles, provenance, and config-aware risk classification.

witness bundles · provenance · config-aware grading

A Ward finding can carry more than a rule match: a cross-file trace, reproducible evidence where a proof lane exists, a reviewable bundle state, and the provenance needed to explain why the system believes the issue is real.

  • Cross-file trace from source to sink
  • Reviewable witness bundle for investigation state
  • Provenance and pinned execution context captured per run
  • Config-dependent risk distinguished from default-unsafe behavior
  • Pre-release: proof lanes are narrower than the scanner surface today
bundle surface current state ▸
signalsurfacestatenotes
traceshippedcross-file
repropilotproof lanes
patchpilotfixture-first
proofshippedwitness
configshippedpath c
03   From detection to evidence

Ward is aiming past flat scan output.

The scanner remains the base layer. On top of it, Ward is adding an investigation workflow that can carry forward traces, repro artifacts, provenance, and evidence grades instead of ending at an alert list.

I · DETECT

Cross-file reasoning

<source>
input
<intermediate>
<sink> · risk
action
finding

Reasons across files to surface vulnerable flows that single-file pattern matching often misses.

II · INVESTIGATE

Witness bundles

candidate trace
·taint flow attached
·entry point identified
·
·bundle: draft
investigation evidence
·repro test
·repro result
·provenance
·review queue

For supported lanes, Ward can carry a candidate forward into repro artifacts, provenance, and review state.

III · DECIDE

Evidence grades

library intrinsic
reproduced
config dependent
opt-in risk
semantic only
needs review
default-unsafe vs opt-in risk vs semantic evidence

Ward is adding product-level distinctions between bugs that are unsafe by default, risks that require an opt-in configuration, and findings that still need analyst judgment.

04   Methodology

What the evaluation shows.

The benchmark matters. So do its limits. Here’s what we count, what we compare, and where the current pre-release claims stop.

What is paired scoring, precisely?

For each CVE we have a repo and two SHAs: vuln_sha (the commit the CVE was filed against) and fix_sha (the merge that closed it). We run the scanner on both and call the finding “real” only if it fires on vuln_sha at a location whose scope includes the code the patch fixed, and does not fire on fix_sha. Any other pattern is not credited. Raw alert counts across scanners aren’t comparable; paired scoring is.

Is Ward just another SAST tool?

No. Static analysis is the entry point, not the whole story. Ward is being built as evidence-backed security review infrastructure: scanner findings, witness bundles, provenance, and reviewable investigation state. The scanner is farther along than the investigation layer today.

Which tools did you compare against?

The current comparison includes Semgrep. CodeQL is being rerun under a stricter full-corpus setup, but those runs currently take more than 24 hours and have not completed cleanly enough for us to publish a reproducible headline number. We’ll publish the exact versions, configurations, and harness details alongside the benchmark methodology.

Will the corpus be public?

We intend to publish the methodology, scoring harness, benchmark dates, and pinned tool configurations. We have not finalized what portion of the corpus itself will be public.

When can I try it?

Ward is pre-release and in active development. There’s no public install today. If you want to be notified when there is, leave your email below.