← Back to Compliance Center
🤖

AI Safety

Human oversight, PII protection, and defenses against AI-specific threats.

Last reviewed 2026-04 · Engineering · Owned by AI Safety Officer

On this page

  1. Overview
  2. Human approval gates
  3. No training on customer data
  4. PII redaction
  5. Prompt-injection defense
  6. AI quality reporting
  7. Model inventory
  8. Quality monitoring
  9. Related documents

Overview

Votriz uses AI to draft content, score brand health, and orchestrate campaigns. Every AI interaction sits behind safety guardrails: a mandatory human-approval gate, PII redaction on sensitive paths, prompt-injection defense, and a queryable user-feedback channel. Aligned with the NIST AI RMF across Govern / Map / Measure / Manage.

Human approval gates

All AI-generated content requires human approval before any external action (publishing to social, sending email, posting a crisis response). This is foundational architecture, not a configurable toggle.

Ghost Presence (autonomous mode)

Opt-in per org, with a per-brand auto_approve_confidence_threshold. Even when active:

No training on customer data

Customer content, prompts, Brand DNA, subscriber data, and analytics are never used to train, fine-tune, or otherwise improve any AI model. We use Anthropic's Claude API under a contract that explicitly excludes API data from training corpora — we inherit that guarantee.

Brand DNA voice profiles are stored per-org and loaded into prompt context at inference time. They never leave the customer's org_id scope and are never shared with other customers.

PII redaction

The pii_redactor service scans free-form user input for sensitive patterns before transmission to external AI providers:

PatternReplacementRestorable?
Email addresses[EMAIL_n]Yes (chatbot reply path)
Phone numbers (NANP 3-3-4)[PHONE_n]Yes (chatbot reply path)
SSN format (3-2-4)[SSN_REDACTED]No
Credit card (Luhn-validated)[CC_REDACTED]No

Redaction is selectively applied to the support chatbot and email-generation prompt paths. Lead generation is intentionally exempt — extracting public business contact information from search results is the agent's explicit job, and redacting there would defeat the purpose.

Prompt-injection defense

The prompt_guard service inspects user input for known jailbreak patterns and structure-token injection:

Detected attempts are sanitized before forwarding (role tokens defanged, code fences neutered) and the event is logged in security_audit_log. Brand DNA scoring + the human approval gate are still the real defenses; this is a cheap upstream filter that catches the obvious stuff before tokens are spent on it.

AI quality reporting

Any user can flag a generated piece as biased, inaccurate, inappropriate, off-brand, or surfacing the system prompt:

Reports land in security_audit_log under ai.quality_report and are reviewed by the AI Safety Officer within 24 hours. Patterns trigger a review of the relevant agent's prompts and scoring thresholds.

Model inventory

ModelProviderPurposeData sentRisk
Claude Haiku 4.5AnthropicContent, copy, email, SEO scoring, sentiment, chatbotBrand context (no PII on most paths; redacted on chatbot + email)Medium
GPT-4o-miniOpenAIFallback if Anthropic is unreachableSame shape as Claude payloadsMedium
FLUX.1-schnellfal.aiImage generationText prompts onlyLow

The full inventory + change procedure lives in docs/policies/AI_MODEL_INVENTORY.md; updates ship in the same commit as the code change.

Quality monitoring

Output quality is monitored through:

Related documents

Questions or a custom security review?

Enterprise customers receive dedicated security reviews and direct access to our security team. Reach us anytime at [email protected].

Talk to security →