Skip to main content
SQUIRRELOPS

AI Deception Platform · v1.0.4

Turn every LLM jailbreak attempt into a labeled threat signal.

Squirrelops sits in front of your customer-facing language model. When someone tries to jailbreak it, exfiltrate secrets, or prompt-inject your agent, we route them into a high-fidelity decoy — and capture exactly what they tried, what tooling they used, and what they were after. Your real model stays clean. Your security team gets a feed of labeled attacker behavior.

SquirrelOps logo

In one paragraph

Squirrelops AI Deception is a deception and attribution platform for LLM endpoints. It sits in front of a customer-facing language model, inspects every prompt with a two-layer detection pipeline (rule-based plus ML), and routes suspicious traffic into a high-fidelity decoy backed by a fine-tuned persona model. Clean prompts go to the real model untouched. v1.0.4 measured 98.5% threat capture on an internal adversarial test suite (66 of 67 attack turns) with zero false positives on benign traffic. Profile bundles are cryptographically signed and bit-identically reproducible — customers verify what runs in production matches what they tested.

What it does

Deceive. Attribute. Scale.

Deceive

Attackers see a convincing assistant that responds to their jailbreaks with plausible-but-fabricated content. No refusals, no “I’m sorry I can’t help” — just engagement designed to keep them talking.

Attribute

Every interaction is fingerprinted. We issue trackable fake credentials, log technique families, and correlate sessions across attempts. Re-use of a leaked credential anywhere on the internet points straight back to the original session.

Scale

Profile bundles are signed, reproducible, and tenant-tunable. Ship the same configuration to ten properties or ten thousand — and prove that what’s running in production is bit-identical to what your security team tested.

How it works

Inspect. Route. Capture.

01

Inspect

Every incoming prompt passes through a two-layer detection pipeline: deterministic rules first (fast and explainable), then a neural classifier that catches novel phrasings. Operators can add their own per-tenant rules on top.

02

Route

Clean prompts go to your real model untouched. Suspicious ones — jailbreak attempts, credential probing, prompt injection, tool-call abuse — get routed into a decoy backed by a model fine-tuned to engage convincingly without leaking anything real.

03

Capture

The decoy issues trackable fake credentials, logs the attack technique, and pushes the labeled event to your SIEM or detection stack. If anyone later tries to use one of those credentials, you know precisely which session leaked it.

v1.0.4 results · as of 2026-05-12

Measurable, reproducible, repeatable.

Results from our internal adversarial test suite covering jailbreak, role-play injection, encoded-instruction smuggling, credential probing, and tool-call abuse. Methodology details in our docs.

98.5%
Threat capture (66 of 67 attack turns)
0
False positives on benign traffic
121
Tracked credentials issued per campaign
MetricResult
Threat capture rate (adversarial test suite)98.5% (66 of 67 attack turns)
Tracked credentials issued (87-turn campaign)121
False positives on benign traffic0
Production rule update turnaround~10 minutes
Profile-bundle reproducibilityBit-identical rebuilds verified
Independent test suite (release gate)815 tests passing

Methodology: an internal adversarial test suite covering jailbreak, role-play injection, encoded-instruction smuggling, credential probing, and tool-call abuse. See the testing guide in our documentation.

Defense in depth

Four independent layers.

Any one of these is sufficient to prevent a real-credential leak. Together, they make the worst case — a novel attack that evades every detection stage — still safe.

Layer 1

Rule-based detection

Deterministic patterns. Roughly sixteen patterns ship by default. Operator-extensible.

Layer 2

ML detection

A neural prompt-injection classifier that catches novel phrasings the rule layer doesn’t.

Layer 3

Decoy model

The persona model is independently trained to emit recognizably-fake credentials — even on detection misses.

Layer 4

Cryptographic signing

Every profile bundle is signed. The runtime refuses to load anything untested.

Per-tenant tuning

Tune detection per tenant — without weakening the baseline.

Tune detection per tenant →

Different customers have different attack surfaces. A fintech needs strict guarding around account-number patterns; a developer-tools company needs to allow code snippets that would otherwise trip a SQL-injection rule.

Squirrelops lets operators add per-tenant rules on top of the shipped ruleset, validated at load time. Overlays are additive only — they can raise a threat score, never lower it. The baseline detection that comes with every profile is preserved.

The audit log shows exactly which rule (operator or default) fired on every event. Nothing hidden, nothing implicit.

Every profile bundle is a signed, sealed artifact. Given the source materials, our reproducibility check rebuilds the bundle and proves it is bit-identical to the one deployed — same bytes, same signature, same hash.

No supply-chain ambiguity. No “did someone swap the model on the way to production?”

The check runs in two modes: a fast staged-input rebuild (~15 seconds, for CI) and a full end-to-end rebuild (~15 minutes, for incident-response forensics).

Trust the artifact

Prove what's running in production is what you tested.

How reproducibility works →

Comparison

LLM guardrail vs. Squirrelops AI Deception.

Two different categories solving two different problems. Guardrails block attempts; Squirrelops engages them and produces attribution.

DimensionLLM GuardrailSquirrelops AI Deception
Response to a jailbreak attemptRefuses (“I cannot help with that”)Engages with plausible-but-fake content
What the attacker learnsThat this attack didn't work; iteratesNothing — they think it worked
Signal to defenderRefusal events (often noisy)Labeled, attributable threat events with technique mapping
Credential leakage on novel attacksDepends on the model behind itDecoy model trained to emit only recognizable fakes; layered with cryptographic signing
Attribution across sessionsNoneTrackable fake credentials correlate sessions; reuse anywhere points back to origin
Per-tenant policyUsually fixed at the vendorAdditive-only operator overlays on top of baseline ruleset
Supply-chain trustVendor-attestedCryptographically signed, bit-identically reproducible bundles

Why now

A new attack surface. The old tooling doesn't see it.

Every company shipping a customer-facing LLM feature has just added a new attack surface — one that traditional security tooling doesn't understand. WAFs don't speak prompt. SIEMs don't see model output. Data-loss-prevention rules were built for filesystem leaks, not for an assistant that's been talked into revealing its system prompt.

Squirrelops gives security teams a purpose-built layer for the LLM era: not another guardrail, but a deception-and-attribution platform that turns each attempted abuse into a labeled, attributable signal.

FAQ

Common questions.

What is Squirrelops AI Deception?

Squirrelops AI Deception is a deception and attribution platform for LLM-exposed surfaces. It sits in front of a customer-facing language model, inspects every prompt with a two-layer detection pipeline, and routes suspicious traffic into a high-fidelity decoy. Clean prompts go to the real model untouched. v1.0.4 measured 98.5% threat capture with zero false positives.

How is this different from an LLM guardrail?

Guardrails sit in-band and refuse: when an attacker tries a jailbreak, the model returns an "I cannot help" message and the attacker iterates. Squirrelops engages: it routes the attacker into a decoy that responds with plausible-but-fabricated content and issues trackable fake credentials. Guardrails block; Squirrelops attributes.

What is the threat capture rate?

98.5% on the v1.0.4 internal adversarial test suite — 66 of 67 attack turns correctly routed, with zero false positives on benign traffic. The suite covers jailbreak, role-play injection, encoded-instruction smuggling, credential probing, and tool-call abuse.

How does per-tenant tuning work?

Operators add their own detection rules on top of the shipped ruleset. Overlays are additive only — they can raise a threat score, never lower it — so the baseline detection is preserved. The audit log shows exactly which rule (operator or default) fired on every event.

What does "reproducible profile bundle" mean?

Every profile bundle is a signed, sealed artifact. Given the source materials, the reproducibility check rebuilds the bundle and proves it is bit-identical to the one deployed — same bytes, same signature, same hash. Customers can verify any bundle they receive against the source.

What does a pilot engagement look like?

A pilot is a 4-to-6-week engagement: we deliver a signed profile bundle scoped to your model and use cases; you run it against your own adversarial traffic; we provide the detection feed and the methodology to evaluate the results. Pilots require an internal red-team or pen-test capability on the customer side.

Run an evaluation pilot

Pilot-ready for security teams with a defined LLM attack surface.

We work with security teams that have a defined LLM attack surface and an internal red-team or pen-test capability. A pilot is a 4-to-6-week engagement: we deliver a signed profile bundle scoped to your model and use cases, you run it against your own adversarial traffic (real or simulated), and we provide the detection feed and the methodology to evaluate the results.

When you get in touch, please include the model you're deploying, your expected query volume, and your security team's role in the evaluation.