Skip to content

scan command

Run the full canary-scan analysis pipeline across all seven stages on a target data-source directory.

Usage

canary-scan scan [OPTIONS] DATASOURCE

DATASOURCE is the path to the read-only mounted data-source directory. See the Workflow guide for how to mount a data-source safely.

Examples

# Full scan with defaults
canary-scan scan /mnt/datasource

# Write output to a specific directory
canary-scan scan /mnt/datasource -o /evidence/case-123/canary-scan

# Emit critical findings to stdout (for piping to SIEM)
canary-scan scan /mnt/datasource --stdout --severity-threshold critical

# SARIF output for GitHub Security tab
canary-scan scan /mnt/datasource --format sarif

# Enable specialised file types and steganography brute-forcing
canary-scan scan /mnt/datasource --enable-specialized --crack-steg /wordlists/rockyou.txt

# Suppress known-safe domains
canary-scan scan /mnt/datasource --allowlist allowlist.json

# Force a full re-run, ignoring previous progress
canary-scan scan /mnt/datasource --force

Options

Option Type Default Description
-o, --outdir PATH .canary-scan Output directory for stage artefacts and the final report.
-f, --format json\|csv\|sarif\|all json Final report format. all writes all three formats simultaneously.
--stdout flag off Emit findings as JSONL to stdout in addition to writing files.
--severity-threshold info\|low\|medium\|high\|critical info Only include findings at or above this severity in the final report.
--workers INTEGER 8 Number of parallel workers for CPU-bound scanner stages.
--resume / --no-resume flag --resume Skip stages that completed successfully in a prior run. Use --no-resume to re-run everything.
--force flag off Re-run all stages regardless of prior state, appending a new audit entry.
--strict-deps flag off Treat missing optional dependencies as fatal (exit code 2).
--keep-tmp flag off Retain the temporary extraction directory after the scan completes.
--crack-steg PATH Path to a passphrase wordlist for opt-in steganography brute-forcing via stegseek.
--fuzzy-cluster flag off Allow producer/creator version differences when clustering near-duplicates in the uniqueness stage.
--min-cluster-size INTEGER 2 Minimum number of near-duplicate documents required to form a uniqueness cluster.
--max-archive-depth INTEGER 3 Maximum depth for recursive archive extraction (zip-in-zip etc.).
--enable-specialized flag off Enable Tier 3 specialised file types: audio, video, fonts, DICOM, OneNote. Requires canary-scan[specialized].
--allowlist PATH Path to a JSON or plain-text allowlist file. Matching findings are suppressed.
--denylist PATH Path to a JSON or plain-text denylist file. Matching indicators are force-flagged.
--verbose / --quiet flag off Increase or reduce console output verbosity.

Pipeline stages run

All seven stages execute in sequence:

# Stage Description
1 inventory Walks the filesystem, computes SHA-256 hashes, identifies MIME types, classifies into buckets
2 metadata Extracts metadata via exiftool, scans for tracking URLs, GPS coordinates, PII, and device identifiers
3 remote-refs Inspects document structure for XXE, tracking pixels, remote template links, formula injections, OLE hyperlinks
4 embedded Extracts nested binaries, raster images, OLE/ActiveX objects
5 stego Detects steganographic carriers via steghide/stegseek, QR code URL detection, EXIF thumbnail mismatch
6 uniqueness Near-duplicate clustering to identify per-recipient canary values
7 report Merges and deduplicates findings from all stages, filters by severity, emits the final report

To run a single stage, use the stage command.