Security-First Design

Triggerfish is built on a single premise: the LLM has zero authority. It requests actions; the policy layer decides. Every security decision is made by deterministic code that the AI cannot bypass, override, or influence.

This page explains why Triggerfish takes this approach, how it differs from traditional AI agent platforms, and where to find details on each component of the security model.

Why Security Must Be Below the LLM

Large language models can be prompt-injected. A carefully crafted input -- whether from a malicious external message, a poisoned document, or a compromised tool response -- can cause an LLM to ignore its instructions and take actions it was told not to take. This is not a theoretical risk. It is a well-documented, unsolved problem in the AI industry.

If your security model depends on the LLM following rules, a single successful injection can bypass every safeguard you have built.

Triggerfish solves this by moving all security enforcement to a code layer that sits below the LLM. The AI never sees security decisions. It never evaluates whether an action should be allowed. It simply requests actions, and the policy enforcement layer -- running as pure, deterministic code -- decides whether those actions proceed.

+----------------------------------------------------------+
|  LLM / Agent Reasoning Layer                             |
|  - Can be manipulated via prompt injection               |
|  - Does NOT make security decisions                      |
|  - Requests actions, does not execute them               |
|  - Has ZERO authority                                    |
+----------------------------------------------------------+
                         |
                         v  action request (structured)
+----------------------------------------------------------+
|  POLICY ENFORCEMENT LAYER                                |
|  - Pure code, deterministic                              |
|  - Tracks taint labels on all data in context            |
|  - Computes effective classification of any output       |
|  - Compares against channel/recipient classification     |
|  - BLOCKS or ALLOWS -- no LLM discretion                 |
|  - Logs all decisions for audit                          |
|  - Cannot be prompt-injected                             |
+----------------------------------------------------------+
                         |
                         v  allowed actions only
+----------------------------------------------------------+
|  Execution Layer (plugins, tool calls, messages)         |
+----------------------------------------------------------+

SECURITY

The LLM layer has no mechanism to override, skip, or influence the policy enforcement layer. There is no "parse LLM output for bypass commands" logic. The separation is architectural, not behavioral.

The Core Invariant

Every design decision in Triggerfish flows from one invariant:

Same input always produces the same security decision. No randomness, no LLM calls, no discretion.

This means security behavior is:

Auditable -- you can replay any decision and get the same result
Testable -- deterministic code can be covered by automated tests
Verifiable -- the policy engine is open source (MIT licensed) and anyone can inspect it

Security Principles

Principle	What It Means	Detail Page
Data Classification	All data carries a sensitivity level (RESTRICTED, CONFIDENTIAL, INTERNAL, PUBLIC). Classification is assigned by code when data enters the system.	Architecture: Classification
No Write-Down	Data can only flow to channels and recipients at an equal or higher classification level. CONFIDENTIAL data cannot reach a PUBLIC channel. No exceptions.	No Write-Down Rule
Session Taint	When a session accesses data at a classification level, the entire session is tainted to that level. Taint can only escalate, never decrease.	Architecture: Taint
Deterministic Hooks	Eight enforcement hooks run at critical points in every data flow. Each hook is synchronous, logged, and unforgeable.	Architecture: Policy Engine
Identity in Code	User identity is determined by code at session establishment, not by the LLM interpreting message content.	Identity & Auth
Agent Delegation	Agent-to-agent calls are governed by cryptographic certificates, classification ceilings, and depth limits.	Agent Delegation
Secrets Isolation	Credentials are stored in OS keychains or vaults, never in config files. Plugins cannot access system credentials.	Secrets Management
Audit Everything	Every policy decision is logged with full context: timestamp, hook type, session ID, input, result, and rules evaluated.	Audit & Compliance

Traditional AI Agents vs. Triggerfish

Most AI agent platforms rely on the LLM to enforce safety. The system prompt says "do not share sensitive data," and the agent is trusted to comply. This approach has fundamental weaknesses.

Aspect	Traditional AI Agent	Triggerfish
Security enforcement	System prompt instructions to the LLM	Deterministic code below the LLM
Prompt injection defense	Hope the LLM resists	LLM has no authority to begin with
Data flow control	LLM decides what is safe to share	Classification labels + no-write-down rule in code
Identity verification	LLM interprets "I am the admin"	Code checks cryptographic channel identity
Audit trail	LLM conversation logs	Structured policy decision logs with full context
Credential access	System service account for all users	Delegated user credentials; source system permissions inherited
Testability	Fuzzy -- depends on prompt wording	Deterministic -- same input, same decision, every time
Open for verification	Usually proprietary	MIT licensed, fully auditable

TIP

Triggerfish does not claim that LLMs are unreliable. It claims that LLMs are the wrong layer for security enforcement. A well-prompted LLM will follow its instructions most of the time. But "most of the time" is not a security guarantee. Triggerfish provides a guarantee: the policy layer is code, and code does what it is told, every time.

Defense in Depth

Triggerfish implements ten layers of defense. No single layer is sufficient on its own; together, they form a comprehensive security boundary:

Channel authentication -- code-verified identity at session establishment
Permission-aware data access -- source system permissions, not system credentials
Session taint tracking -- automatic, mandatory, escalation-only
Data lineage -- full provenance chain for every data element
Policy enforcement hooks -- deterministic, non-bypassable, logged
MCP Gateway -- secure external tool access with per-tool permissions
Plugin sandbox -- Deno + WASM double isolation
Secrets isolation -- OS keychain or vault, never config files
Agent identity -- cryptographic delegation chains
Audit logging -- all decisions recorded, no exceptions

Next Steps

Page	Description
No Write-Down Rule	The fundamental data flow rule and how it is enforced
Identity & Auth	Channel authentication and owner identity verification
Agent Delegation	Agent-to-agent identity, certificates, and delegation chains
Secrets Management	How Triggerfish handles credentials across tiers
Audit & Compliance	Audit trail structure, tracing, and compliance exports

Security-First Design ​

Why Security Must Be Below the LLM ​

The Core Invariant ​

Security Principles ​

Traditional AI Agents vs. Triggerfish ​

Defense in Depth ​

Next Steps ​