Configuration changes, aggressive caching, or memory leaks in model-serving libraries behind the LLM gateway exhaust GPU VRAM, degrading responsiveness or crashing model serving.
AI/ML / Multi Agent Refarch / Threats / DEV
VRAM exhaustion on model-serving infrastructure
CCC.MARefArc.TH10
Related Capabilities
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.CP15 | LLM inference gateway routing | Validates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface. |
Related Controls
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.CN02 | User, Application, and Model Firewalling | Establish enforced trust boundaries between the user, the application, and the models and tools by routing all traffic through the agent, LLM, and MCP gateways where guardrails inspect and constrain requests and responses. |
| CCC.MARefArc.CN06 | Quality of Service and DDoS Prevention | Protect model and tool availability by enforcing quality-of-service controls, rate limits, and abuse and DDoS mitigation at the gateways. |
| CCC.MARefArc.CN10 | AI Firewall Implementation and Management | Implement and operate an AI firewall within the guardrail components that inspects prompts, content, and responses for injection, sensitive data, and policy violations. |
| CCC.MARefArc.CN17 | AI System Observability | Instrument every layer to emit logs, traces, metrics, and events to the Observability Layer so that behaviour, drift, availability, and data handling are continuously visible and auditable. |
| CCC.MARefArc.CN18 | AI System Alerting and Denial of Wallet Monitoring | Monitor spend and usage of models and tools, and alert on anomalous consumption indicative of Denial of Wallet or runaway agentic loops. |
External Mappings
| Framework | ID | Remarks |
|---|---|---|
| air-vec | AIR-OP-007-03 |