LLM Validation Framework

Composable validation guardrails for your LLM pipelines.

What it does

LLM Validation Framework wraps your LLM calls in a composable pipeline of guardrails. Each guardrail is an independent agent that evaluates either the user’s input or the model’s output — returning a structured score and status that your application can act on.

Toxicity

Three-layer harmful content detection — runs fully locally with no API calls. Catches profanity, ML-detected toxicity, and semantic similarity to illegal categories.

Privacy

Regex-based PII and secrets scanning. Detects SSNs, credit cards (Luhn-validated), API keys, and system prompt leakage — also local, no API calls.

Accuracy

LLM-as-a-judge factual correctness check. Optionally grounded against your own document corpus via a RAG retriever.

Relevancy

LLM-as-a-judge check that the answer actually addresses the question asked.

Bias

LLM-as-a-judge detection of stereotypes, discriminatory language, and unfair generalisations across protected traits.

Quick install

pip install validate-llm

A minimal pipeline

from llm_validation_framework import (
    ValidationFramework, LLMProvider, Pipe,
    ToxicityAgent, AccuracyAgent,
)
from llm_validation_framework.config_loader import load_api_key

api_key = load_api_key(provider="ANTHROPIC")
llm = LLMProvider(provider="anthropic", model="claude-haiku-4-5-20251001", key=api_key)

input_guardrail  = Pipe(steps=[ToxicityAgent()], verbose=False)
output_guardrail = Pipe(steps=[ToxicityAgent(), AccuracyAgent()], verbose=False)

vf = ValidationFramework(llm=llm, input_guardrail=input_guardrail, output_guardrail=output_guardrail)
result = vf.validate("What is the Pacific Ocean?")

print(result["status"])  # "PASS" or "FAIL"
print(result["score"])   # 0.0 – 1.0