Skip to content
Secure Private AI for Enterprises and Developers - amazee.ai

Getting Started

AI AutoEvals provides automated factuality evaluation of AI responses. This guide will help you get up and running quickly.

  1. Install the module using Composer:

    Terminal window
    composer require drupal/ai_autoevals
    drush en ai_autoevals
  2. Configure the module at /admin/config/ai/autoevals

  3. Visit the dashboard at /admin/content/ai-autoevals

  1. Configure Default AI Provider: Set the provider and model to use for evaluations at /admin/config/ai/autoevals:

    • Default Provider: Select your configured AI provider (e.g., OpenAI, Anthropic)
    • Default Model: Choose the model for evaluations (e.g., GPT-4, Claude 3)
  2. Enable Auto-Tracking: Check “Auto-track requests” to automatically evaluate all AI responses that match the configured operation types.

  3. Configure Evaluation Settings:

    • Operation Types: Which operations to evaluate (chat, chat_completion)
    • Fact Extraction Method: Choose AI-generated, rule-based, or hybrid
    • Context Depth: Number of conversation turns to include
    • Retention Period: How long to keep evaluation results

Evaluations are processed asynchronously via the Drupal Queue API. You can process them in two ways:

Let cron process evaluations automatically (60 second time limit per cron run).

Process the queue manually:

Terminal window
drush queue:run ai_autoevals_evaluation_worker

Check the dashboard at /admin/content/ai-autoevals to see:

  • Total evaluations
  • Average score
  • Evaluations by status
  • Evaluations by evaluation set
  • Recent evaluations
  • Score distribution

Evaluations return scores from 0.0 to 1.0 based on factual accuracy:

ScoreMeaningDescription
1.0Exact MatchResponse fully meets expected criteria
0.6SupersetResponse includes all expected info plus more
0.4SubsetResponse has some expected info but missing some
0.0DisagreementResponse contradicts expected facts
1.0IrrelevantDifferences don’t affect factuality