Basic Usage
This guide covers the basic usage of AI AutoEvals for automated factuality evaluation.
Configuration
Section titled “Configuration”
Global Settings
Section titled “Global Settings”Visit /admin/config/ai/autoevals to configure global settings:
- Default Evaluation Provider: AI provider and model to use for running evaluations
- Auto-track: Automatically evaluate all matching AI requests
- Operation Types: Which AI operation types to evaluate (chat, chat_completion)
- Fact Extraction Method: Default method for extracting evaluation criteria (fallback setting, see below)
- Context Depth: Number of conversation turns to include for context
- Retention Period: How long to keep evaluation results
- Debug Mode: Enable additional logging for troubleshooting
Important: The “Fact Extraction Method” global setting is used as a fallback. Individual evaluation sets can override this setting with their own configuration. This allows you to have different extraction strategies for different types of content or evaluation sets.
Auto-Tracking
Section titled “Auto-Tracking”The easiest way to use AI AutoEvals is with auto-tracking enabled. This automatically evaluates all AI responses that match your configured operation types.
Enable Auto-Tracking
Section titled “Enable Auto-Tracking”-
Go to
/admin/config/ai/autoevals -
Check the “Auto-track requests” checkbox
-
Save the configuration
All AI requests will now be automatically queued for evaluation.
Process Evaluations
Section titled “Process Evaluations”Evaluations are processed asynchronously. Run the evaluation queue:
drush queue:run ai_autoevals_evaluation_workerOr let cron process them automatically.
Manual Tracking
Section titled “Manual Tracking”For more control, you can selectively track specific requests by adding a tag to your AI calls.
Tagging Requests
Section titled “Tagging Requests”$ai_provider = \Drupal::service('ai.provider')->createInstance('amazeeio');$response = $ai_provider->chat($input, $model, [ 'ai_autoevals:track' => TRUE,]);Only requests with this tag will be evaluated.
Adding Context to Tags
Section titled “Adding Context to Tags”$response = $ai_provider->chat($input, $model, [ 'ai_autoevals:track' => TRUE, 'category' => 'support', 'priority' => 'high',]);This information is stored with the evaluation and can be used for filtering or routing.
Understanding Evaluation Scores
Section titled “Understanding Evaluation Scores”Evaluations return scores from 0.0 to 1.0 based on factual accuracy:
| Score | Meaning | Description |
|---|---|---|
| 1.0 | Exact Match | Response fully meets expected criteria |
| 0.6 | Superset | Response includes all expected info plus more |
| 0.4 | Subset | Response has some expected info but missing some |
| 0.0 | Disagreement | Response contradicts expected facts |
Scores are determined by comparing the AI response against evaluation criteria extracted from the user’s question.
View Results
Section titled “View Results”Dashboard Overview
Section titled “Dashboard Overview”Visit /admin/content/ai-autoevals to see:
- Total evaluations processed
- Average score across all evaluations
- Evaluations by status (pending, processing, completed, failed)
- Evaluations by evaluation set
- Recent evaluations
- Score distribution chart
View Individual Evaluations
Section titled “View Individual Evaluations”Click on any evaluation to see:
- Original question and AI response
- Extracted evaluation criteria
- Score and analysis
- Evaluation set used
- Provider and model information
- Timestamp and metadata
Filter Results
Section titled “Filter Results”Use filters to find specific evaluations:
- By status (pending, processing, completed, failed)
- By evaluation set
- By score range
- By provider or model
- By date range
- By tags
Programmatic Usage
Section titled “Programmatic Usage”You can also create and manage evaluations programmatically:
$evaluationManager = \Drupal::service('ai_autoevals.evaluation_manager');
// Create evaluation$evaluation = $evaluationManager->createEvaluation([ 'evaluation_set_id' => 'default', 'request_id' => 'unique-request-id', 'provider_id' => 'amazeeio', 'model_id' => 'chat', 'operation_type' => 'chat', 'input' => 'What is the capital of France?', 'output' => 'The capital of France is Paris.', 'tags' => ['category' => 'geography'],]);
// Queue for processing$evaluationManager->queueEvaluation($evaluation->id());Best Practices
Section titled “Best Practices”-
Start with Auto-Tracking
Begin with auto-tracking enabled to get a baseline of your AI’s performance.
-
Monitor Scores Regularly
Check the dashboard regularly to track performance trends and identify issues.
-
Use Tags for Organization
Add tags to categorize your evaluations for better filtering and analysis.
-
Process Queue Regularly
Ensure the evaluation queue is processed regularly to avoid backlog.
-
Review Failed Evaluations
Investigate failed evaluations to identify configuration issues or API problems.
Next Steps
Section titled “Next Steps”- Learn about Evaluation Sets for advanced configuration
- Explore the Dashboard for detailed analytics
- Read the API Reference for programmatic usage