Skip to main content

AI/ML / Multi Agent Refarch / Threats / DEV

VRAM exhaustion on model-serving infrastructure

CCC.MARefArc.TH10

Configuration changes, aggressive caching, or memory leaks in model-serving libraries behind the LLM gateway exhaust GPU VRAM, degrading responsiveness or crashing model serving.

Related Capabilities

ID	Title	Description
CCC.MARefArc.CP15	LLM inference gateway routing	Validates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface.

Related Controls

ID	Title	Description
CCC.MARefArc.CN02	User, Application, and Model Firewalling	Establish enforced trust boundaries between the user, the application, and the models and tools by routing all traffic through the agent, LLM, and MCP gateways where guardrails inspect and constrain requests and responses.
CCC.MARefArc.CN06	Quality of Service and DDoS Prevention	Protect model and tool availability by enforcing quality-of-service controls, rate limits, and abuse and DDoS mitigation at the gateways.
CCC.MARefArc.CN10	AI Firewall Implementation and Management	Implement and operate an AI firewall within the guardrail components that inspects prompts, content, and responses for injection, sensitive data, and policy violations.
CCC.MARefArc.CN17	AI System Observability	Instrument every layer to emit logs, traces, metrics, and events to the Observability Layer so that behaviour, drift, availability, and data handling are continuously visible and auditable.
CCC.MARefArc.CN18	AI System Alerting and Denial of Wallet Monitoring	Monitor spend and usage of models and tools, and alert on anomalous consumption indicative of Denial of Wallet or runaway agentic loops.

External Mappings

Framework	ID	Remarks
air-vec	AIR-OP-007-03