Skip to main content

AI/ML / Multi Agent Refarch / Threats / DEV

VRAM exhaustion on model-serving infrastructure

CCC.MARefArc.TH10

Configuration changes, aggressive caching, or memory leaks in model-serving libraries behind the LLM gateway exhaust GPU VRAM, degrading responsiveness or crashing model serving.

Related Capabilities

IDTitleDescription
CCC.MARefArc.CP15LLM inference gateway routingValidates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface.

Related Controls

IDTitleDescription
CCC.MARefArc.CN02User, Application, and Model FirewallingEstablish enforced trust boundaries between the user, the application, and the models and tools by routing all traffic through the agent, LLM, and MCP gateways where guardrails inspect and constrain requests and responses.
CCC.MARefArc.CN06Quality of Service and DDoS PreventionProtect model and tool availability by enforcing quality-of-service controls, rate limits, and abuse and DDoS mitigation at the gateways.
CCC.MARefArc.CN10AI Firewall Implementation and ManagementImplement and operate an AI firewall within the guardrail components that inspects prompts, content, and responses for injection, sensitive data, and policy violations.
CCC.MARefArc.CN17AI System ObservabilityInstrument every layer to emit logs, traces, metrics, and events to the Observability Layer so that behaviour, drift, availability, and data handling are continuously visible and auditable.
CCC.MARefArc.CN18AI System Alerting and Denial of Wallet MonitoringMonitor spend and usage of models and tools, and alert on anomalous consumption indicative of Denial of Wallet or runaway agentic loops.

External Mappings

FrameworkIDRemarks
air-vecAIR-OP-007-03