Getting Started

AI AutoEvals provides automated factuality evaluation of AI responses. This guide will help you get up and running quickly.

Installation

Install the module using Composer:

composer require drupal/ai_autoevals
drush en ai_autoevals

Configure Default AI Provider: Set the provider and model to use for evaluations at /admin/config/ai/autoevals:
- Default Provider: Select your configured AI provider (e.g., OpenAI, Anthropic)
- Default Model: Choose the model for evaluations (e.g., GPT-4, Claude 3)
Enable Auto-Tracking: Check “Auto-track requests” to automatically evaluate all AI responses that match the configured operation types.
Configure Evaluation Settings:
- Operation Types: Which operations to evaluate (chat, chat_completion)
- Fact Extraction Method: Choose AI-generated, rule-based, or hybrid
- Context Depth: Number of conversation turns to include
- Retention Period: How long to keep evaluation results

Evaluations are processed asynchronously via the Drupal Queue API. You can process them in two ways:

Let cron process evaluations automatically (60 second time limit per cron run).

Process the queue manually:

drush queue:run ai_autoevals_evaluation_worker

Check the dashboard at /admin/content/ai-autoevals to see:

Evaluations return scores from 0.0 to 1.0 based on factual accuracy:

Score	Meaning	Description
1.0	Exact Match	Response fully meets expected criteria
0.6	Superset	Response includes all expected info plus more
0.4	Subset	Response has some expected info but missing some
0.0	Disagreement	Response contradicts expected facts
1.0	Irrelevant	Differences don’t affect factuality