Kubernetes-style RBAC for AI agents calling tools, plus behavioral analysis that catches attack chains even when every individual call is authorized. The same engine also governs which LLMs your agents call on OpenRouter.
MCP servers expose tools. Agents call them. Nothing in between decides whether they should.
Any agent can call any tool regardless of what it's supposed to do. No roles, no scopes, no default deny.
Each individual call passes inspection. The attack happens across calls. Read secrets here, leak them there.
External content in tool responses instructs the agent to make calls it was never meant to make.
Pagination-based extraction looks routine per request. It only becomes visible as a pattern across the session.
Layer 1 catches unauthorized calls. Layer 2 catches authorized calls that chain into attacks.
Agents, roles, bindings, scopes, verbs, constraints. Allow rules compose additively, deny rules win globally, server-prefixed selectors expand at load time. Default deny.
roles:
- name: sql-readonly
rules:
- name: "Read-only SQL"
resources: ["db.execute_sql"]
verbs: [invoke]
constraints:
sql_intent: [select] # DELETE blocked at the argument level
- name: github-guard
rules:
- name: "Block destructive GitHub"
effect: deny # deny wins, even if another role would allow
resources: ["server:github"] # expands to every github.* tool
verbs: [invoke]
bindings:
- agent: support-bot
roles: [sql-readonly, github-guard]
scopes: [support, customer_data] # can't touch billing or code
Builds a causal graph across tool calls and detects when data flows from a restricted source to an external sink, even across multiple hops.
# Every call passed RBAC. The session is the attack surface.
[1] support.read_ticket allowed ticket_id: TICKET-4829
[2] db.execute_sql allowed SELECT api_key FROM integrations
[3] support.post_reply allowed body: "Here are the keys: sk-live-..."
secret_relay detected event 2 -> event 3 confidence=0.97
Restricted SQL response flowed into a public-facing egress tool.
Tool calls and model calls are the same shape of problem. The same policy engine that gates MCP traffic also picks which model on OpenRouter answers each prompt.
Drop-in for openrouter.ai/api/v1. Every incoming chat completion is classified by a cheap model into {task, complexity, capabilities}, matched against a declarative YAML policy, forwarded with the chosen target. Real cost from OpenRouter, plus a counterfactual baseline so the savings are honest.
classifier:
model: openai/gpt-4o-mini
timeout_ms: 1500
baseline:
model: openai/gpt-4o # what every call WOULD have cost
routes:
- name: "High-complexity code"
if:
task: code
complexity: [medium, high]
target: anthropic/claude-sonnet-4.6
fallback: [anthropic/claude-haiku-4.5]
- name: "Cheap classification"
if:
task: classification
complexity: [low]
target: openai/gpt-4o-mini
- name: "Vision needed"
if:
capabilities_include: vision
target: google/gemini-flash-latest
default: { target: openai/gpt-4o-mini }
A 13-prompt mixed workload (classification, code generation, RAG, summarization, reasoning, vision-hint, tool-use) routed across 3 models. Dashboard tails the audit log in real time.
SESSION TOTALS
total calls 13
total cost $0.0141 # actually spent
baseline cost $0.0229 # if all via openai/gpt-4o
savings $0.0088 38.5%
PER-MODEL
openai/gpt-4o-mini 9 calls $0.0007
anthropic/claude-haiku-4.5 3 calls $0.0017
anthropic/claude-sonnet-4.6 2 calls $0.0108
Run against real trace files from the command line. No infrastructure needed. The full live stack with dashboard is make demo away.
Everything written in Go. Pure evaluator, immutable compiled policy, and a full audit trail on every decision.
| Component | Description | Status |
|---|---|---|
| engine/authz | Pure RBAC evaluator. Allow + deny rules with global deny precedence, server-prefixed selectors, atomic PolicyHolder for hot reload, hardcoded denial reason precedence, per-constraint alias tables. | Done |
| engine/routing | Pure routing policy evaluator. First-match-wins over classifier output, capability filters, complexity ranges, fallback chains. | Done |
| engine/session | Session state and event types. Verb constants, PolicyDecision struct with full rule provenance on every decision. | Done |
| engine/lineage | Causal edge builder using token matching and field overlap across payloads. | Done |
| engine/rules | Behavioral rule evaluator with 6 detection rules covering known MCP attack patterns. | Done |
| engine/risk | Cumulative risk scorer. Outputs a session disposition: allow, warn, pause, or terminate. | Done |
| cmd/gateway | Inline MCP stdio proxy. JSONL audit (schema-versioned, fsync'd), SIGHUP hot reload, SIGTERM graceful drain, HTTP sidecar. | Done |
| cmd/router | OpenAI-compatible drop-in for openrouter.ai. Classifier stage, policy stage, OpenRouter forwarder with real cost capture, counterfactual baseline. | Done |
| cmd/control | HTTP + WebSocket control plane. Tails both audit files via fsnotify, fans live decisions to subscribers, YAML round-trip editing with atomic write + hot reload. | Done |
| cmd/replay | CLI for loading and running JSON session fixtures against the full engine. | Done |
| ui/ | Vite + React + TypeScript dashboard. Seven screens: Connections, Connection detail, Agents, Sessions/Live, Roles editor, Router Live, Router Routes. | Done |
| config/ | YAML policy files for both surfaces. Loaded and compiled at startup; hot-reloadable from the dashboard. | Done |
6 of 10 implemented.
| Rule | Detects |
|---|---|
| secret_relay | A secret token from a restricted response relayed to an egress tool |
| restricted_read_external_write | Restricted data flowing to an external write path |
| pagination_exfiltration | Repeated calls with monotonically increasing page parameters |
| cross_scope_data_movement | Data from one scope being used in another |
| tool_poisoning_indicator | Instruction-like content embedded in tool responses |
| filesystem_traversal_sequence | File operations escalating toward sensitive paths |
Every request goes through two independent gates. Denied events are recorded but never reach the behavioral engine.
The request carries an agent identity, tool name, scope, and argument payload.
engine/authzEvaluates against the compiled policy. Checks binding, scope, verb, and argument constraints. If denied, the decision is recorded and the call is dropped. If allowed, it continues.
Tokens, secrets, and structured fields are extracted and hashed from the payload.
Token matches and field overlaps between events create directed edges, building a causal graph of data movement across the session.
Detection rules run against the graph. Each match produces a finding with severity, confidence, and a recommended action.
Findings accumulate into a risk score. The session gets a final disposition: allow, warn, pause, or terminate.
Four phases shipped. Detection rule catalog and approval workflows are next.
tools/list filtering based on the discover verb/v1/chat/completions drop-inusage.cost capture plus counterfactual baseline savings