- Decoy profile
- A signed bundle containing the model, prompts, and detection rules that define a single decoy identity. Also referred to simply as a profile.
- Profile bundle
- The signed, reproducible artifact format used to ship a decoy profile to production.
- Detection layer
- The in-line stage that inspects every prompt for abuse signals before the model responds.
- Rule-based detection
- Deterministic pattern detection — fast, explainable, and auditable. Sits before ML detection in the pipeline.
- ML detection
- A neural prompt-injection classifier that catches novel phrasings the rule-based layer doesn't. Internally a fine-tuned DeBERTa-class model exported to ONNX for CPU inference.
- Tracked credential
- A unique, traceable fake credential issued to a suspected attacker. If anyone tries to use it later, we know exactly which session leaked it. Sometimes called a canary token.
- Adapter fallback marker
- A recognizable but un-tracked decoy credential the model emits if rule-based detection misses an attack. Guarantees no real secret is ever returned, even on a detection miss.
- Persona model
- The fine-tuned language model that gives each decoy its personality and produces convincing-but-fake responses. Internally a LoRA-adapted small open-weight model.
- Adversarial test suite
- Our internal evaluation harness that runs 80+ jailbreak techniques against every release. Used as the release gate for shipping a new profile bundle.
- Per-tenant tuning
- Customers can add their own detection rules on top of the shipped ruleset without weakening it. Also called an operator overlay. Overlays are additive only — they can raise a threat score, never lower it.
- Cryptographic signing
- Every profile bundle is signed; the runtime refuses to load anything that hasn't been signed by an authorized key. Internally uses Ed25519.
- Reproducibility check
- A tool that rebuilds any shipped profile from source and proves it is bit-identical to the deployed artifact. Runs in a fast mode (~15 seconds) and a full mode (~15 minutes).
- Defense in depth
- If detection misses, the persona model is independently trained to never emit real-looking credentials. The four independent layers (rule-based detection, ML detection, decoy model, cryptographic signing) each prevent a real-credential leak on their own.
- Threat capture rate
- The fraction of adversarial prompts that the platform correctly routes into the decoy. v1.0.4 measured at 98.5%.