Skip to content
Secure Private AI for Enterprises and Developers - amazee.ai

Dashboard

The AI AutoEvals dashboard provides comprehensive analytics and insights into your AI response evaluations.

Navigate to /admin/content/ai-autoevals to access the dashboard.

You’ll need the “view ai autoevals results” permission to access the dashboard.

Dashboard Overview

The dashboard displays the following high-level metrics:

  • Total Evaluations: Total number of evaluations processed
  • Average Score: Mean score across all completed evaluations (0.0 - 1.0)
  • Completion Rate: Percentage of evaluations that completed successfully

View evaluations by status:

  • Pending: Awaiting processing
  • Processing: Currently being evaluated
  • Completed: Successfully evaluated
  • Failed: Evaluation failed due to an error

Compare performance across different evaluation sets:

  • Average score per evaluation set
  • Number of evaluations per set
  • Completion rate per set

View the most recent evaluations with:

  • Request ID
  • Evaluation set used
  • Score
  • Status
  • Timestamp

Click on any evaluation to view detailed information.

Visual chart showing the distribution of scores across all evaluations:

  • 1.0 (Exact Match)
  • 0.6 (Superset)
  • 0.4 (Subset)
  • 0.0 (Disagreement)

This helps identify patterns in your AI’s performance.

Click “View All Results” or navigate to /admin/content/ai-autoevals/results

  • ID: Evaluation ID
  • Score: Evaluation score
  • Status: Current status (pending, processing, completed, failed)
  • Evaluation Set: Name of evaluation set used
  • Provider: AI provider used
  • Model: AI model used
  • Operation Type: Type of operation (chat, chat_completion)
  • Created: Date and time created
  • Operations: Actions (view, requeue, re-evaluate, delete)

Use the filter form to narrow down evaluations:

  • Status: Filter by status (pending, processing, completed, failed)
  • Evaluation Set: Filter by specific evaluation set
  • Score Range: Filter by minimum and maximum scores
  • Provider: Filter by AI provider
  • Model: Filter by AI model
  • Date Range: Filter by creation date

Click on column headers to sort:

  • Score (ascending/descending)
  • Created date (ascending/descending)

Batch Operations Interface

Select multiple evaluations and perform batch operations:

  • Requeue: Queue selected evaluations for re-processing
  • Re-evaluate: Re-evaluate with a different evaluation set
  • Delete: Delete selected evaluations

Evaluation Result Detail

Click on any evaluation in the list to view detailed information.

  • Request ID: Unique identifier for the request
  • Evaluation Set: Configuration used for evaluation
  • Score: Final score (0.0 - 1.0)
  • Choice: Evaluation choice (A, B, C, D)
  • Status: Current status
  • Created: Timestamp
  • Input: User’s original question
  • Output: AI’s response
  • Facts: Extracted evaluation criteria
  • Analysis: LLM’s analysis of response
  • Reasoning: Explanation for the score
  • Provider: AI provider used
  • Model: AI model used
  • Operation Type: Type of operation
  • Evaluation Time: How long the evaluation took
  • Tags: Associated tags
  • Additional Metadata: Custom metadata stored with the evaluation

On the detail view, you can:

  • Requeue: Queue evaluation for re-processing
  • Re-evaluate: Evaluate with a different configuration
  • Delete: Delete evaluation

Monitor trends by:

  1. Regularly checking the average score

  2. Comparing score distributions over time

  3. Tracking completion rates

  4. Identifying patterns in failures

Use filters to identify issues:

  • Filter by status “failed” to see failed evaluations
  • Filter by score range to find low-scoring responses
  • Filter by provider/model to compare performance
  • Filter by tags to identify problematic categories

Compare different evaluation sets:

  1. Filter by different evaluation sets

  2. Compare average scores

  3. Review score distributions

  4. Identify which configuration performs better

Export evaluation data for external analysis:

  1. Apply desired filters

  2. Select evaluations to export

  3. Use batch operations to export as CSV

Set up custom notifications based on evaluation results using the event system:

use Drupal\ai_autoevals\Event\PostEvaluationEvent;
public function onPostEvaluation(PostEvaluationEvent $event): void {
if ($event->getScore() < 0.5) {
// Send notification for low-scoring evaluations
// Log to moderation queue
// Trigger workflow
}
}

See the Event System documentation for more details.

  1. Regular Monitoring

    Check the dashboard regularly to track performance trends and identify issues early.

  2. Filter Strategically

    Use filters to focus on specific areas:

    • Low-scoring evaluations
    • Failed evaluations
    • Specific evaluation sets
    • Specific providers/models
  3. Compare Configurations

    Use A/B testing to compare different evaluation strategies and optimize your configuration.

  4. Investigate Failures

    Regularly review failed evaluations to identify and fix configuration or API issues.

  5. Track Trends

    Monitor performance over time to identify improvements or degradations in AI quality.