AI/ML / Multi Agent Refarch / Threats / DEV

Direct prompt injection overrides guardrails

CCC.MARefArc.TH11

An actor interacting through the application crafts inputs that override system prompts, bypass safety guardrails, or coerce disclosure, requiring no special privileges and exploiting any gap in ingress and model-interaction guardrails.

Related Capabilities

ID	Title	Description
CCC.MARefArc.CP05	Agent-ingress zero-trust guardrails	Treats all inputs as untrusted and enforces authentication, authorization, input validation, content filtering, access control, rate limits, and dynamic policy before any request reaches an agent.
CCC.MARefArc.CP16	Model-interaction zero-trust guardrails	Enforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution.
CCC.MARefArc.CP01	User-facing application surface	Presentation and orchestration surface (web, mobile, chatbot, workflow tool, or integrated enterprise system) that captures user intent, forwards requests to the agent layer, and returns agent outputs.

Related Controls

ID	Title	Description
CCC.MARefArc.CN02	User, Application, and Model Firewalling	Establish enforced trust boundaries between the user, the application, and the models and tools by routing all traffic through the agent, LLM, and MCP gateways where guardrails inspect and constrain requests and responses.
CCC.MARefArc.CN10	AI Firewall Implementation and Management	Implement and operate an AI firewall within the guardrail components that inspects prompts, content, and responses for injection, sensitive data, and policy violations.
CCC.MARefArc.CN12	Tool Chain Validation and Sanitization	Validate tool selection, sanitize tool-call parameters, and constrain tool sequencing within the runtime and MCP guardrails to prevent manipulation of agent tool use.

External Mappings

Framework	ID	Remarks
air-vec	AIR-SEC-010-01