PSA Intelligence Canary Scan¶

Scan document data-sources for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with supplied datasets.
When you receive a large document dump from an external party — a leak, legal disclosure, or investigation — those files and documents can contain deliberate or indirect canaries: tracking pixels, embedded JavaScript, remote template links, steganographic watermarks, or per-recipient metadata fingerprints that phone home the moment a file is opened.
canary-scan inspects files without opening it in its native viewer, extracting and analysing raw structure, metadata, embedded objects, and near-duplicate fingerprints to surface anything that may reveal to an external party that the data-source is being examined.
Quick Start (Docker)¶
The primary and recommended way to run canary-scan is using Docker, which comes pre-bundled with all system dependencies and utilities:
# Run the scan using the GitHub Container Registry image
docker run --rm \
-v /mnt/datasource:/data:ro \
-v $(pwd)/canary-scan-out:/output \
ghcr.io/psaintelligence/canary-scan:latest scan /data -o /output
# Review findings
jq '.[] | select(.severity=="critical")' canary-scan-out/canary-scan-report.json
Run canary-scan --guide inside the container/local shell for a concise cheat sheet, or see the Workflow guide for a full walkthrough.
Quick Start (pipx)¶
If you prefer to run the tool natively, you can install the Python package:
See the Install guide for required system dependencies, optional packages, and air-gapped environment setup.
Canary Scan Report Summary¶

Detection Pipeline¶
Seven sequential stages, each writing a JSONL artefact:
graph LR
A[inventory] --> B[metadata] --> C[remote-refs] --> D[embedded] --> E[stego] --> F[uniqueness] --> G[report] | Stage | What it checks |
|---|---|
| inventory | File walk, SHA-256 hashes, MIME types, bucket classification |
| metadata | exiftool extraction, tracking URLs, GPS/serial/PII indicators |
| remote-refs | XXE, tracking pixels, remote template links, formula injections, OLE hyperlinks |
| embedded | Nested binaries, OLE/ActiveX objects, raster image extraction |
| stego | Steghide/stegseek carrier checks, QR code URL detection, EXIF thumbnail mismatch |
| uniqueness | Near-duplicate clustering to find per-recipient canary values |
| report | Merge, deduplicate, filter by severity, emit JSON/CSV/SARIF |
See File Types for the full matrix of supported formats and canary vectors.
Severity Levels¶
| Severity | Meaning | Recommended Action |
|---|---|---|
| critical | Active phone-home URL or JS that fires on open | Do NOT open in native viewer — quarantine |
| high | Embedded OLE/JS objects, steganographic payload | Investigate in an isolated sandbox |
| medium | Unique fingerprint, GPS/PII metadata | Strip metadata before further handling |
| low | Metadata oddity, non-standard producer string | Note for chain of custody |
| info | Annotated — no canary confirmed | Informational only |