Security & Content Safety
Wire containers are used by AI agents, so hostile content in your context can hijack an agent’s tool use or steer its output. Wire screens every piece of content entering a container for prompt-injection patterns before it is stored or made searchable.
What gets scanned
Section titled “What gets scanned”Every inbound write is scanned, including:
- File uploads — the processed text content of each file
- Agent writes — every
wire_writetool call
Scanning runs on the full processed content (for structured files like CSV and JSON, string values are extracted first). Results are evaluated against a severity scale.
What Wire looks for
Section titled “What Wire looks for”Wire looks for patterns commonly associated with prompt injection — for example, instructions that try to override an agent’s behavior, markers that mimic system prompts, or attempts to manipulate an agent’s assumed role.
Each detection is scored on a three-level severity scale (low, medium, high). The scoring combines multiple detection layers and accounts for common evasion techniques including invisible Unicode characters and visually-similar homoglyph substitutions.
What happens when something is flagged
Section titled “What happens when something is flagged”lowseverity — ingested normally. These patterns have frequent legitimate uses (e.g. the word “jailbreak” appearing in a blog post).medium/highseverity — the content is rejected. Files are marked as failed; agent writes return an error. Your existing container data is untouched.
When a file is rejected, you’ll see the failure reason in the container UI. When an agent write is rejected, the agent receives a structured error it can surface to you.
What Wire does not do
Section titled “What Wire does not do”- Wire does not modify your content. Scanning produces a verdict; the content itself is stored byte-for-byte or rejected.
- Wire does not make editorial or quality judgments about your context. Injection detection is a security boundary (protecting agents from hijack), not a content-quality filter.
- Wire does not share flagged content with anyone. Detection happens entirely within your container’s processing pipeline.
False positives
Section titled “False positives”The detection layers are tuned to minimize false positives on normal business content. If you believe a legitimate file or write was blocked in error, the rejection notification links back to the container where you can review the flagged item.
If you have a persistent false-positive on important content, contact support — we tune the detector based on real-world feedback.