AI Safety
Human oversight, PII protection, and defenses against AI-specific threats.
On this page
Overview
Votriz uses AI to draft content, score brand health, and orchestrate campaigns. Every AI interaction sits behind safety guardrails: a mandatory human-approval gate, PII redaction on sensitive paths, prompt-injection defense, and a queryable user-feedback channel. Aligned with the NIST AI RMF across Govern / Map / Measure / Manage.
Human approval gates
All AI-generated content requires human approval before any external action (publishing to social, sending email, posting a crisis response). This is foundational architecture, not a configurable toggle.
- Content queue: drafted posts sit in
content_queuewithstatus='pending'until a user withcontent.approvepermission acts - Email campaigns: a user with
email.campaigns.sendmust explicitly trigger dispatch; the worker won't fan out without that step - Brand monitor: AI auto-drafts crisis responses,
a human approves before posting;
severity='critical'always routes to a human regardless of AI confidence
Ghost Presence (autonomous mode)
Opt-in per org, with a per-brand
auto_approve_confidence_threshold. Even when active:
- Maximum posts per day caps
- Operating hours restrictions
- Content category blocklists (
never_auto_approvetopics) - Confidence-score floor below which the gate stays manual
No training on customer data
Customer content, prompts, Brand DNA, subscriber data, and analytics are never used to train, fine-tune, or otherwise improve any AI model. We use Anthropic's Claude API under a contract that explicitly excludes API data from training corpora — we inherit that guarantee.
Brand DNA voice profiles are stored per-org and loaded into
prompt context at inference time. They never leave the customer's
org_id scope and are never shared with other customers.
PII redaction
The pii_redactor service scans free-form user input
for sensitive patterns before transmission to external AI providers:
| Pattern | Replacement | Restorable? |
|---|---|---|
| Email addresses | [EMAIL_n] | Yes (chatbot reply path) |
| Phone numbers (NANP 3-3-4) | [PHONE_n] | Yes (chatbot reply path) |
| SSN format (3-2-4) | [SSN_REDACTED] | No |
| Credit card (Luhn-validated) | [CC_REDACTED] | No |
Redaction is selectively applied to the support chatbot and email-generation prompt paths. Lead generation is intentionally exempt — extracting public business contact information from search results is the agent's explicit job, and redacting there would defeat the purpose.
Prompt-injection defense
The prompt_guard service inspects user input for
known jailbreak patterns and structure-token injection:
- Instruction override attempts ("ignore previous instructions")
- Role reassignment ("you are now")
- System-prompt extraction ("show me your system prompt")
- Token-boundary manipulation (
<|user|>,system:role-prefix lines, triple backticks misparsed as boundaries)
Detected attempts are sanitized before forwarding (role tokens
defanged, code fences neutered) and the event is logged in
security_audit_log. Brand DNA scoring + the human
approval gate are still the real defenses; this is a cheap
upstream filter that catches the obvious stuff before tokens are
spent on it.
AI quality reporting
Any user can flag a generated piece as biased, inaccurate, inappropriate, off-brand, or surfacing the system prompt:
- In-app: "Report AI issue" button on every generated content card
- API:
POST /ai/report-issuewith{issue_type, description, resource_id?, ai_model?} - Email: [email protected]
Reports land in security_audit_log under
ai.quality_report and are reviewed by the AI Safety
Officer within 24 hours. Patterns trigger a review of the
relevant agent's prompts and scoring thresholds.
Model inventory
| Model | Provider | Purpose | Data sent | Risk |
|---|---|---|---|---|
| Claude Haiku 4.5 | Anthropic | Content, copy, email, SEO scoring, sentiment, chatbot | Brand context (no PII on most paths; redacted on chatbot + email) | Medium |
| GPT-4o-mini | OpenAI | Fallback if Anthropic is unreachable | Same shape as Claude payloads | Medium |
| FLUX.1-schnell | fal.ai | Image generation | Text prompts only | Low |
The full inventory + change procedure lives in
docs/policies/AI_MODEL_INVENTORY.md; updates ship in
the same commit as the code change.
Quality monitoring
Output quality is monitored through:
- Brand DNA scoring — every generated piece scored against the brand's voice profile before reaching the queue
- Approval-rate tracking — declining rates per brand are the leading drift signal
- User feedback loop — every approve / edit / reject decision teaches the system
- Quality-report rate — an org-level surge in
ai.quality_reportrows is one of the five detection signals inAI_INCIDENT_RESPONSE.md
Related documents
Questions or a custom security review?
Enterprise customers receive dedicated security reviews and direct access to our security team. Reach us anytime at [email protected].
Talk to security →