Vector databases providing semantic search and grounding so agents can find relevant information from large text corpora.
AI/ML / Multi Agent Refarch / Capabilities / DEV
Vector-based semantic retrieval
CCC.MARefArc.CP13
Related Threats
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.TH03 | Embedding inversion and membership inference on the vector store | Vectors stored for semantic retrieval can be inverted to reconstruct original source text, or probed to infer whether specific confidential information is present, exposing PII or proprietary content held in the knowledge layer. |
| CCC.MARefArc.TH04 | Embedding-store poisoning degrades retrieved context | An actor with write access injects malicious or misleading embeddings into the vector store, degrading the accuracy of retrieved grounding context; the dense numerical representation makes the tampering hard to detect. |
| CCC.MARefArc.TH05 | Vector-store access-control, encryption, and audit gaps | Missing role-based access control, encryption at rest, or audit logging on the vector store allows unauthorized retrieval, modification, or undetected exfiltration of embeddings derived from sensitive internal data. |
| CCC.MARefArc.TH12 | Indirect prompt injection via retrieved or processed content | Malicious instructions hidden in retrieved documents, web-search results, tool outputs, or persisted memory are processed by an agent and hijack its decision-making, escalate privileges, trigger unauthorized actions, or exfiltrate data, which is especially dangerous in automated multi-agent workflows. |
| CCC.MARefArc.TH18 | RAG grounding failures | Even with retrieval, responses may contradict retrieved documents, drop caveats truncated by the context window, fill gaps with incorrect general knowledge, exceed authorized advisory scope, or adopt an inappropriate tone or certainty for the domain. |
| CCC.MARefArc.TH22 | Poor-quality, drifting, and bias-amplifying data | Inaccurate, incomplete, outdated, or biased grounding and training data lead to unreliable outputs, while data and concept drift erodes predictive power over time and amplifies historical errors at scale. |