Skip to main content

AI/ML / Multi Agent Refarch / Controls / DEV

System Acceptance Testing

CCC.MARefArc.CN03 · PREV

Validate agents, models, and end-to-end workflows against accuracy, robustness, bias, drift, and compliance criteria before promotion to production, and re-validate after material changes.

Related Capabilities

ID	Title	Description
CCC.MARefArc.CP14	Approved-model registry and lifecycle	Catalog of approved models with metadata, version information, configuration parameters, and usage constraints, ensuring agents access only models meeting organizational, regulatory, and security standards.
CCC.MARefArc.CP12	Authoritative knowledge source bases	Internal and external repositories of structured data, unstructured documents, and graph-based representations that provide authoritative information for grounding.
CCC.MARefArc.CP20	Feedback engine	Collects and aggregates structured and unstructured feedback from users, evaluators, and automated systems, including correctness assessments, preference signals, and quality ratings, to inform system improvement.
CCC.MARefArc.CP21	Human supervision and oversight	Mechanisms for human reviewers to inspect, approve, correct, or override agent outputs, supporting human-in-the-loop and human-over-the-loop workflows for sensitive or high-impact tasks.
CCC.MARefArc.CP05	Agent-ingress zero-trust guardrails	Treats all inputs as untrusted and enforces authentication, authorization, input validation, content filtering, access control, rate limits, and dynamic policy before any request reaches an agent.
CCC.MARefArc.CP02	Human-in-the-loop output review	Application-embedded controls that allow users to review, approve, or modify agent outputs before they are executed or shared.
CCC.MARefArc.CP16	Model-interaction zero-trust guardrails	Enforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution.
CCC.MARefArc.CP22	Runtime protection	Monitors agent actions and model outputs during execution to detect unsafe, non-compliant, or anomalous behavior, enforcing constraints, blocking disallowed actions, or triggering escalation.
CCC.MARefArc.CP15	LLM inference gateway routing	Validates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface.
CCC.MARefArc.CP13	Vector-based semantic retrieval	Vector databases providing semantic search and grounding so agents can find relevant information from large text corpora.

Related Threats

ID	Title	Description
CCC.MARefArc.TH23	Discriminatory outputs from bias	Biased training data, architectural and feature choices, proxy variables such as postal codes, and uncorrected feedback loops cause systematically discriminatory outcomes against protected groups, with legal and reputational exposure.
CCC.MARefArc.TH25	Non-compliant outputs and model-risk-management gaps	AI-generated advice, marketing, or communications that fail KYC, suitability, disclosure, record-keeping, or model-risk-management expectations create regulatory exposure; weak supervision and accountability lines turn this into direct non-compliance.
CCC.MARefArc.TH19	Silent model version, prompt, and deployment drift	Providers silently retrain, re-prompt, or re-architect models, or change deployment and API defaults, shifting behaviour even when inputs are unchanged; without version pinning in the model registry this breaks reproducibility and validated behaviour.
CCC.MARefArc.TH16	Confident hallucination and fabricated facts	Lacking ground truth and faced with ambiguous prompts or helpfulness-biased tuning, the model fabricates plausible but false facts, figures, or citations, presented with high fluency that makes errors hard to catch and likely to be acted upon.
CCC.MARefArc.TH17	Non-deterministic and non-reproducible outputs	Probabilistic sampling, internal-state variation, context sensitivity, and decoding parameters cause identical inputs to yield different outputs across runs, undermining testing, reproducibility, and reliable evaluation.
CCC.MARefArc.TH18	RAG grounding failures	Even with retrieval, responses may contradict retrieved documents, drop caveats truncated by the context window, fill gaps with incorrect general knowledge, exceed authorized advisory scope, or adopt an inappropriate tone or certainty for the domain.

Assessment Requirements

ID	Text	Applicability
CCC.MARefArc.CN03.AR01	Each agent and model configuration MUST pass a documented acceptance test suite covering accuracy, bias and fairness, and compliance criteria before being onboarded into the respective registry.	tlp-clear, tlp-green, tlp-amber, tlp-red
CCC.MARefArc.CN03.AR02	Acceptance testing MUST be repeated when a pinned model version, system prompt, or deployment configuration changes.	tlp-clear, tlp-green, tlp-amber, tlp-red

Guideline Mappings

Framework	ID	Remarks
finos-air	AIR-PREV-005