Getting Started
Installation
pip install validate-llmOptional extras:
pip install "validate-llm[demo]" # FastAPI demo server + web UIpip install "validate-llm[test]" # pytest + datasetspip install "validate-llm[dev]" # everythingRequirements: Python 3.11+
Configure your API key
LLM-as-a-judge agents (AccuracyAgent, RelevancyAgent, BiasAgent) call an LLM to score responses, so they need an API key. ToxicityAgent and PrivacyAgent run entirely locally and need no key.
Keys are resolved in this order:
- Environment variable
{PROVIDER}_API_KEY(e.g.ANTHROPIC_API_KEY) config.iniat the path you pass toload_api_key(config_path=...)config.iniin the current working directory
export ANTHROPIC_API_KEY=sk-ant-...Create a config.ini file (keep it out of version control):
[ANTHROPIC]API_KEY=sk-ant-...Then load it explicitly:
from llm_validation_framework.config_loader import load_api_key
api_key = load_api_key(config_path="./config.ini", provider="ANTHROPIC")Your first pipeline
The central class is ValidationFramework. It orchestrates the full flow:
user query → input guardrail → LLM → output guardrail → ValidationSummaryfrom llm_validation_framework import ( ValidationFramework, LLMProvider, Pipe, ToxicityAgent, AccuracyAgent,)from llm_validation_framework.config_loader import load_api_key
# 1. Load credentials and create the LLMapi_key = load_api_key(provider="ANTHROPIC")llm = LLMProvider(provider="anthropic", model="claude-haiku-4-5-20251001", key=api_key)
# 2. Build guardrail pipelinesinput_guardrail = Pipe(steps=[ToxicityAgent()], verbose=False)output_guardrail = Pipe(steps=[ToxicityAgent(), AccuracyAgent()], verbose=False)
# 3. Create the framework and validatevf = ValidationFramework( llm=llm, input_guardrail=input_guardrail, output_guardrail=output_guardrail,)
result = vf.validate("What is the Pacific Ocean?")Understanding the result
validate() returns a ValidationSummary — a typed dict with this shape:
{ "status": "PASS", # Overall: "PASS" or "FAIL" "score": 0.87, # Average of input + output scores (0.0 – 1.0) "input": { "status": "PASS", "score": 0.95, "results": [ {"status": "PASS", "score": 0.95} # one entry per agent in input_guardrail ] }, "output": { "status": "PASS", "score": 0.79, "results": [ {"status": "PASS", "score": 0.93}, # ToxicityAgent {"status": "PASS", "score": 0.65, "reason": "..."} # AccuracyAgent ] }}Next steps
- Learn about each agent: ToxicityAgent, PrivacyAgent, AccuracyAgent
- Use agents standalone without
ValidationFrameworkvia Pipe - Ground accuracy checks against your own documents: RAG Integration
- Understand how scoring works under the hood: Concepts