AccuracyAgent evaluates whether an LLM’s answer is factually correct. It combines a relevancy sub-check (40%) and a factual accuracy check (60%) — both powered by GEval, Deepeval’s LLM-as-a-judge metric.
When a RAGProvider is supplied, the factual check is grounded against retrieved documents instead of the judge model’s own knowledge.
Constructor
AccuracyAgent(
config_path: str|None=None,
provider: str="anthropic",
model: str="claude-haiku-4-5-20251001",
rag: RAGProvider |None=None,
)
Parameter
Type
Default
Description
config_path
str | None
None
Path to config.ini for API key loading. Falls back to env var then cwd.
provider
str
"anthropic"
LLM provider for the judge model (any litellm provider).
model
str
"claude-haiku-4-5-20251001"
Judge model identifier.
rag
RAGProvider | None
None
Optional RAG retriever. When provided, factual checks are grounded against retrieved context.
evaluate(data)
defevaluate(
data: dict, # {"question": str, "answer": str}
on_progress=None,
) -> EvaluationResult
Runs the relevancy sub-check and the factual accuracy check, then returns a combined score.
data must be a dict with both "question" and "answer" keys.
Both sub-scores come from GEval and are in the range [0.0, 1.0]. The factual check uses different evaluation criteria depending on whether a RAGProvider is attached:
Without RAG: The judge uses its own knowledge to assess factual correctness. It is intentionally lenient on brief or single-word correct answers.
With RAG: The judge uses retrieved context as the source of truth. Answers that contradict the context fail even if they’re plausible from general knowledge.
Examples
Without RAG
from llm_validation_framework import AccuracyAgent
agent =AccuracyAgent()
result = agent.evaluate({
"question": "Where is the Eiffel Tower?",
"answer": "The Eiffel Tower is located in Paris, France.",
})
print(result["status"]) # "PASS"
print(f"{result['score']:.2f}") # e.g. "0.88"
print(result["reason"])
With RAG
from llm_validation_framework import AccuracyAgent, RAGProvider
# Any LangChain-compatible retriever
retriever = your_vectorstore.as_retriever()
rag =RAGProvider(retriever)
agent =AccuracyAgent(rag=rag)
result = agent.evaluate({
"question": "Who founded the company according to our internal docs?",
"answer": "Jane Smith founded the company in 2010.",
})
# Grounded against retrieved documents — will FAIL if docs say otherwise