Skip to main content

AI/ML / Multi Agent Refarch / Threats / DEV

Direct prompt injection overrides guardrails

CCC.MARefArc.TH11

An actor interacting through the application crafts inputs that override system prompts, bypass safety guardrails, or coerce disclosure, requiring no special privileges and exploiting any gap in ingress and model-interaction guardrails.

Related Capabilities

IDTitleDescription
CCC.MARefArc.CP05Agent-ingress zero-trust guardrailsTreats all inputs as untrusted and enforces authentication, authorization, input validation, content filtering, access control, rate limits, and dynamic policy before any request reaches an agent.
CCC.MARefArc.CP16Model-interaction zero-trust guardrailsEnforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution.
CCC.MARefArc.CP01User-facing application surfacePresentation and orchestration surface (web, mobile, chatbot, workflow tool, or integrated enterprise system) that captures user intent, forwards requests to the agent layer, and returns agent outputs.

Related Controls

IDTitleDescription
CCC.MARefArc.CN02User, Application, and Model FirewallingEstablish enforced trust boundaries between the user, the application, and the models and tools by routing all traffic through the agent, LLM, and MCP gateways where guardrails inspect and constrain requests and responses.
CCC.MARefArc.CN10AI Firewall Implementation and ManagementImplement and operate an AI firewall within the guardrail components that inspects prompts, content, and responses for injection, sensitive data, and policy violations.
CCC.MARefArc.CN12Tool Chain Validation and SanitizationValidate tool selection, sanitize tool-call parameters, and constrain tool sequencing within the runtime and MCP guardrails to prevent manipulation of agent tool use.

External Mappings

FrameworkIDRemarks
air-vecAIR-SEC-010-01