Incident Response
Four severity levels, defined response targets, and kill switches for every autonomous component.
On this page
Overview
Votriz maintains a documented IR plan covering both general
security incidents and AI-specific scenarios. Every action is
logged in the immutable security_audit_log with full
traceability. The 72-hour breach-notification clock for personal
data (GDPR Article 33) starts at the moment Votriz becomes
“aware” of the breach — that's the trigger
documented in our DPA.
Severity ladder
P1 — Critical
Cross-tenant data exposure via AI output, PII to an unauthorized third party, lead-generator returning rows for the wrong org.
| Step | Target |
|---|---|
| AI Safety Officer paged | 15 min |
| Affected component disabled | 15 min |
| Scope assessment via audit log | 1 h |
| Affected customers notified | 4 h |
| Root cause + fix deployed | 24 h |
| Public post-mortem published | 72 h |
P2 — High
Prompt injection that produces harmful content (caught at the approval gate), system-prompt extraction, brand-monitor safety guard bypass.
- Affected content quarantined — 1 h
prompt_guardpatterns updated — 4 h- Audit-log sweep for similar patterns — 24 h
- Internal post-mortem — 7 days
P2 doesn't require customer notification by default — the human approval gate caught the problem. If content went out, escalate to P1.
P3 — Medium
Quality regression: Brand-DNA scores drop, user-reported issue rate spikes, hallucinated facts.
- Investigate root cause — 24 h
- Adjust prompts / scoring thresholds — 48 h
- Monitor recovery — 7 days
P4 — Low
AI provider timeout, rate limiting, latency outside SLA.
- Automatic failover to OpenAI fallback
- Status-page banner if degradation persists > 30 min
- No customer notification unless SLA target is breached
Detection signals
Five signals page the on-call engineer:
ai.quality_reportrate > 5/hour platform-wideauth.login_failedrate (credential stuffing precursor)- LLM-provider 5xx rate from
services/llm.py - Brand-DNA scoring p95 > baseline + 30%
- Customer email to [email protected]
Kill switches
Three layers, each progressively more drastic.
1. Per-agent disable (preferred)
Comment out the agent's cron registration in
votriz-worker/main.py and rebuild. In-flight jobs
finish; the agent stops being scheduled.
2. Global AI disable
Set the worker-wide kill flag:
Every agent that calls services/llm.py short-circuits
with a "degraded mode" log line and skips its tick. Manual
content + publishing flows still work.
3. Network block (nuclear)
Last resort — only when confirmed exfiltration is in progress and we need to physically prevent more LLM round-trips:
The rule is timestamped in the incident channel and removed when the issue is resolved.
Multi-provider failover
If the primary AI provider (Anthropic) is unavailable:
- Automatic failover to OpenAI via the
services/llm.pyprovider chain - If both fail, agents skip their tick and retry next cron run
- No data is lost — pending work accumulates and is processed once a provider returns
- Status page updates automatically
Operator access
| Access type | Requirement |
|---|---|
| Production shell | Encrypted VPN tunnel; tagged identity required |
| Database direct | Encrypted VPN tunnel + isolated container network |
| Container management | Encrypted VPN tunnel + container CLI |
| Deployment | Automated deployment pipeline; no manual production shell |
All operator access is logged. No customer data is accessed without a documented reason that lands in the audit log.
Customer comms
Customer notification template (P1)
Status page
votriz.com/status — 30-second polling, per-service health indicators, push updates on subscribe.
Related documents
Questions or a custom security review?
Enterprise customers receive dedicated security reviews and direct access to our security team. Reach us anytime at [email protected].
Talk to security →