Keyword-Based Triggering
Keyword-based triggering allows you to automatically evaluate AI responses when specific words or phrases appear in user queries or AI responses. This is particularly useful for quality monitoring, compliance checking, and pattern detection.
Use Cases
Section titled “Use Cases”1. Apology Detection
Section titled “1. Apology Detection”Monitor when AI apologizes to users, which may indicate service issues or overly defensive responses.
Setup Evaluation Set:
- Label: Apology Response Evaluation
- Response Keywords:
apologiessorryi apologizemy apologiesapologize for
- Keyword Match Mode: Any
- Scoring: Default
Result: Evaluates all responses containing apology phrases.
2. GDPR Compliance Check
Section titled “2. GDPR Compliance Check”Evaluate responses mentioning personal data to ensure GDPR compliance.
Setup Evaluation Set:
- Label: GDPR Compliance Evaluation
- Query Keywords:
personal datauser informationprivacyconsentdelete my data
- Response Keywords:
gdprprivacy policyconsent formdata processing
- Keyword Match Mode: All
- Custom Knowledge:
GDPR requires:- Explicit consent for data processing- Right to access personal data- Right to delete personal data- Clear privacy notices- Data minimization principle
- Scoring: Strict (higher scores for D=0.0)
Result: Evaluates responses to privacy questions only if they mention GDPR concepts.
3. Refusal Detection
Section titled “3. Refusal Detection”Monitor when AI refuses requests, which may indicate overly restrictive policies or missing capabilities.
Setup Evaluation Set:
- Label: Refusal Response Evaluation
- Response Keywords:
i cannotunable tonot able toi'm not programmedi'm unablei'm sorry but i cannot
- Keyword Match Mode: Any
Result: Captures all refusal responses for analysis.
4. Technical Support Quality
Section titled “4. Technical Support Quality”Evaluate only technical support questions with appropriate scoring.
Setup Evaluation Set:
- Label: Technical Support Evaluation
- Query Keywords:
errorbugtroubleshootingfixissueproblem
- Scoring: More lenient (B=0.8, C=0.6)
Result: Focuses evaluation resources on technical issues.
5. Pricing Transparency
Section titled “5. Pricing Transparency”Ensure pricing information is accurately communicated.
Setup Evaluation Set:
- Label: Pricing Transparency Evaluation
- Query Keywords:
pricecostpricinghow muchcheapexpensive
- Response Keywords:
$dollareuroscurrencycostprice
- Keyword Match Mode: Any
- Scoring: Focus on factual accuracy
Result: Evaluates pricing-related queries that include currency mentions.
6. Medical/Health Advisory Detection
Section titled “6. Medical/Health Advisory Detection”Monitor responses to health-related queries for potential issues.
Setup Evaluation Set:
- Label: Health Advisory Evaluation
- Query Keywords:
healthmedicaldoctormedicinesymptomdisease
- Response Keywords:
consultprofessional medicalhealthcare providermedical advice
- Keyword Match Mode: All
- Custom Knowledge:
AI should:- Never provide medical diagnoses- Always include disclaimer to consult professionals- Provide general information only- Refer to healthcare providers
- Scoring: Strict
Result: Evaluates health queries only if they include appropriate disclaimers.
Implementation Patterns
Section titled “Implementation Patterns”Pattern 1: Single-Trigger Keywords
Section titled “Pattern 1: Single-Trigger Keywords”Use any mode to catch any instance:
// Response keywordsapologiessorryi apologize
// Match mode: Any→ Triggers if ANY keyword appearsUse when: You want to catch every instance of a pattern.
Pattern 2: Multi-Condition Keywords
Section titled “Pattern 2: Multi-Condition Keywords”Use all mode for comprehensive checks:
// Query keywordspersonal dataprivacy
// Response keywordsgdprconsentprivacy policy
// Match mode: All→ Triggers only if BOTH query AND response match all keywordsUse when: You need comprehensive validation across both query and response.
Pattern 3: Combined Tag and Keyword Routing
Section titled “Pattern 3: Combined Tag and Keyword Routing”Use tags for categorization, keywords for pattern detection:
// Tagscategory: supportpriority: high
// Query keywordserrorbug
// Response keywordstroubleshootingsolutionUse when: You want to evaluate specific types of requests from specific categories.
Tag-Based Routing and Exclusions
Section titled “Tag-Based Routing and Exclusions”Tags provide a powerful way to route requests to different evaluation sets and can be used in combination with keyword triggering for precise control.
Required Tags vs Excluded Tags
Section titled “Required Tags vs Excluded Tags”- Required Tags: Tags that MUST be present on a request for the evaluation set to be considered
- Excluded Tags: Tags that, when present, will SKIP the evaluation set entirely
How Tags Work with Keyword Matching
Section titled “How Tags Work with Keyword Matching”Tags and keywords work together in a layered approach:
- Global Tag Exclusions (Highest Priority) - Skip ALL evaluation if excluded tags present
- Per-Set Excluded Tags - Skip THIS evaluation set if excluded tags present
- Required Tags - Only consider sets where all required tags match
- Keyword Matching - Fall through matching based on keywords
Example: Routing by Category with Exclusions
Section titled “Example: Routing by Category with Exclusions”Create evaluation sets for different content categories:
# Product Support Evaluation Setlabel: "Product Support Evaluation"tags: ["category:product", "priority:high"]query_keywords: - product - feature - pricingexcluded_tags: ["internal", "test"]
# Technical Support Evaluation Setlabel: "Technical Support Evaluation"tags: ["category:technical", "priority:high"]query_keywords: - error - bug - troubleshootingexcluded_tags: ["internal", "test", "qa"]
# Default Evaluation Set (catch-all)label: "General Evaluation"tags: [] # No required tagsexcluded_tags: ["ai_agents", "internal", "test"]How this works:
- Product support requests (
category:product) → First set (if no excluded tags) - Technical support requests (
category:technical) → Second set (if no excluded tags) - AI Agents requests (
ai_agents) → All three sets skip (global exclusion) - Internal test requests (
internal,test) → All three sets skip
AI Agents Integration
Section titled “AI Agents Integration”AI AutoEvals provides native integration with the AI Agents module. When AI Agents are evaluated, the following tags are automatically applied:
ai_agents- Indicates request from AI Agentsai_agents_{agent_id}- Specific agent type (e.g.,ai_agents_content_writer)ai_agents_finished- Indicates final agent execution
Important: The ai_agents tag is globally excluded by default to prevent duplicate evaluation. AI Agents requests are only evaluated through the dedicated AgentFinishedExecutionEvent.
Example: Create AI Agent-Specific Evaluation Sets
Section titled “Example: Create AI Agent-Specific Evaluation Sets”# Content Writer Agent Evaluationlabel: "AI Agent Content Evaluation"tags: ["ai_agents", "ai_agents_content_writer"]query_keywords: - blog - article - content
# Support Agent Evaluationlabel: "AI Agent Support Evaluation"tags: ["ai_agents", "ai_agents_support"]query_keywords: - help - support - issue
# Exclude AI Agents from standard chat evaluationlabel: "Standard Chat Evaluation"excluded_tags: ["ai_agents"] # Skip AI Agents requestsCombining Tags and Keywords for Precision
Section titled “Combining Tags and Keywords for Precision”Use both tags and keywords for very specific targeting:
label: "Production Product Pricing Evaluation"tags: ["category:product", "environment:production"]query_keywords: - price - cost - pricingexcluded_tags: ["test", "staging", "qa", "internal"]
# This evaluation set will ONLY trigger when:# 1. Request has category:product tag# 2. Request has environment:production tag# 3. Query contains pricing-related keywords# 4. Request does NOT have test/staging/qa/internal tagsBest Practices for Tag-Based Routing
Section titled “Best Practices for Tag-Based Routing”- Use Hierarchical Tag Names: Use colons for hierarchy (e.g.,
category:support,priority:high) - Create Catch-All Sets: Always have a default set for unmatched requests
- Use Exclusions for AI Agents: Leverage the default
ai_agentstag exclusion - Combine with Keywords: Tags categorize, keywords trigger
- Test Your Routing: Verify requests route to the correct evaluation sets
Advanced Examples
Section titled “Advanced Examples”Multi-Language Apology Detection
Section titled “Multi-Language Apology Detection”Detect apologies in multiple languages:
// Response keywordsapologiessorryi apologizemy apologiesdésolélo sientoscusaContext-Aware Evaluation
Section titled “Context-Aware Evaluation”Combine with tags for precise targeting:
// Tagscategory: billinglocale: en_us
// Query keywordsrefundchargebilling
// Response keywordsrefundcreditpolicySentiment-Aware Evaluation
Section titled “Sentiment-Aware Evaluation”Use keywords to detect sentiment and apply appropriate evaluation:
// Evaluation Set: Angry Customer Evaluation// Query keywordsangryfuriousterribleworstdisappointed
// Evaluation Set: Happy Customer Evaluation// Query keywordsgreatexcellenthelpfulthankExclusion Keywords Examples
Section titled “Exclusion Keywords Examples”1. Skip Test Content in Production
Section titled “1. Skip Test Content in Production”Exclude test and debug queries from evaluation while evaluating production content:
Setup Evaluation Set:
- Label: Production Content Evaluation
- Query Inclusion Keywords:
productpricesupport
- Query Exclusion Keywords:
testdebugstaginginternalmockplaceholder
- Response Inclusion Keywords:
$dollarpricecost
- Response Exclusion Keywords:
mock dataplaceholderTBDN/A
Result: Evaluates production content only, skipping all test/debug content.
2. Skip Error Responses
Section titled “2. Skip Error Responses”Don’t evaluate responses that contain error messages or placeholders:
Setup Evaluation Set:
- Label: Quality Evaluation
- Query Keywords:
productfeaturepricing
- Response Exclusion Keywords:
errorundefinedN/AunavailableTBDto be determined
- Keyword Match Mode: Any
Result: Evaluates product queries but skips responses with error messages or placeholder text.
3. Compliance Evaluation with Internal Content Filtering
Section titled “3. Compliance Evaluation with Internal Content Filtering”Evaluate compliance-related content while excluding internal documentation:
Setup Evaluation Set:
- Label: Compliance Evaluation
- Query Inclusion Keywords:
gdprprivacypersonal dataconsent
- Query Exclusion Keywords:
[internal]internal useadmin-onlydraft
- Response Inclusion Keywords:
gdprprivacy policyconsentdata processing
- Custom Knowledge:
GDPR requires:- Explicit consent for data processing- Right to access personal data- Right to delete personal data
- Keyword Match Mode: All
Result: Evaluates GDPR compliance content but skips internal drafts and admin-only content.
4. Multi-Environment Evaluation with Smart Filtering
Section titled “4. Multi-Environment Evaluation with Smart Filtering”Evaluate content differently based on environment while maintaining shared evaluation criteria:
Setup Evaluation Set 1: Production Evaluation
- Label: Production Evaluation
- Query Exclusion Keywords:
testdebugstagingdev
- Response Exclusion Keywords:
mockplaceholderTBD
Setup Evaluation Set 2: Test Evaluation
- Label: Test Evaluation
- Query Inclusion Keywords:
testdebugstaging
- Response Exclusion Keywords:
errortimeoutfailed
Global Settings:
- Global Query Exclusion Keywords:
[internal]admin-only
Result: Production set excludes test content, test set excludes error responses, and both sets exclude internal/admin content globally.
5. API Documentation Evaluation
Section titled “5. API Documentation Evaluation”Evaluate API documentation while skipping placeholder and error content:
Setup Evaluation Set:
- Label: API Documentation Evaluation
- Query Keywords:
apiendpointmethod
- Query Exclusion Keywords:
testdebug
- Response Exclusion Keywords:
example.comyour-domain.comTBDcoming soonplaceholder
- Keyword Match Mode: Any
Result: Evaluates API documentation queries but skips test requests and responses with placeholder domains or coming-soon text.
Combining Inclusion and Exclusion Keywords
Section titled “Combining Inclusion and Exclusion Keywords”Pattern 1: Include Specific, Exclude General
Section titled “Pattern 1: Include Specific, Exclude General”Query Inclusion: product, price, costQuery Exclusion: test, staging, debug
→ Evaluates production product queries onlyPattern 2: Multiple Exclusion Categories
Section titled “Pattern 2: Multiple Exclusion Categories”Response Inclusion: $, dollar, price, costResponse Exclusion: - Placeholders: TBD, N/A, placeholder - Errors: error, undefined, unavailable - Internal: [internal], admin-only
→ Evaluates pricing responses with actual values, skipping errors and placeholdersPattern 3: Tag-Based Routing + Keyword Filtering
Section titled “Pattern 3: Tag-Based Routing + Keyword Filtering”// Tagscategory: support
// Query Inclusionerror, bug, issue
// Query Exclusiontest, debug, staging
// Response Exclusionmock, placeholder
→ Evaluates support ticket queries (not test content) for technical issuesExclusion Keywords Best Practices
Section titled “Exclusion Keywords Best Practices”1. Define Clear Exclusion Categories
Section titled “1. Define Clear Exclusion Categories”Development/Test: test, debug, staging, dev, internal
Placeholders: TBD, N/A, placeholder, coming soon, to be determined
Errors: error, undefined, unavailable, timeout, failed
Internal: [internal], [test], admin-only, draft2. Use Global Exclusions for System-Wide Filtering
Section titled “2. Use Global Exclusions for System-Wide Filtering”Configure in module settings to apply to all evaluation sets:
Settings → Global Query Exclusion Keywords: test, debug, staging, internal
Settings → Global Response Exclusion Keywords: mock, placeholder, TBD, N/A3. Combine with Inclusions for Precision
Section titled “3. Combine with Inclusions for Precision”Good: Inclusion: product, price, cost Exclusion: test, staging
→ Precise targeting
Avoid: Inclusion: product Exclusion: (too many unrelated exclusions)4. Test Your Exclusions
Section titled “4. Test Your Exclusions”Test query: "What is the price? (test mode)" ↓ Should match exclusion keywords
Test query: "What is the price?" ↓ Should NOT match exclusion keywords5. Monitor and Refine
Section titled “5. Monitor and Refine”Regularly review evaluation results to adjust exclusions:
Too many false positives? → Make inclusion keywords more specific
Too many evaluations of test content? → Add test-related exclusion keywords
Missing important evaluations? → Review exclusion keywords for over-filteringCode Examples with Exclusion Keywords
Section titled “Code Examples with Exclusion Keywords”Example 1: Create Evaluation Set with Exclusion Keywords
Section titled “Example 1: Create Evaluation Set with Exclusion Keywords”use Drupal\ai_autoevals\Entity\EvaluationSet;
$evaluationSet = EvaluationSet::create([ 'label' => 'Production Evaluation', 'id' => 'production_evaluation', 'enabled' => TRUE, 'operation_types' => ['chat', 'chat_completion'], 'query_keywords' => [ 'product', 'price', 'support', ], 'exclude_query_keywords' => [ 'test', 'debug', 'staging', ], 'response_keywords' => [ '$', 'dollar', 'price', ], 'exclude_response_keywords' => [ 'mock', 'placeholder', 'TBD', ], 'keyword_match_mode' => 'any',]);
$evaluationSet->save();Example 2: Check Exclusion Matching Programmatically
Section titled “Example 2: Check Exclusion Matching Programmatically”use Drupal\ai_autoevals\Entity\EvaluationSet;
$evaluationSet = EvaluationSet::load('production_evaluation');
$query = "What is the price? (test mode)";$response = "The price is $99.99";
// Check exclusionif ($evaluationSet->matchesQueryExclusion($query)) { \Drupal::logger('ai_autoevals')->info('Query matches exclusion - skipping'); return;}
if ($evaluationSet->matchesResponseExclusion($response)) { \Drupal::logger('ai_autoevals')->info('Response matches exclusion - skipping'); return;}
// Check inclusionif ($evaluationSet->matchesQuery($query) && $evaluationSet->matchesResponse($response)) { \Drupal::logger('ai_autoevals')->info('Evaluation will trigger');}Example 3: Configure Global Exclusion Keywords
Section titled “Example 3: Configure Global Exclusion Keywords”// Configure global exclusions via settings API$config = \Drupal::configFactory()->getEditable('ai_autoevals.settings');
$config->set('global_exclude_query_keywords', [ 'test', 'debug', 'staging', 'internal',]);
$config->set('global_exclude_response_keywords', [ 'mock', 'placeholder', 'TBD', 'N/A',]);
$config->save();Example 4: Event Listener with Exclusion Check
Section titled “Example 4: Event Listener with Exclusion Check”use Drupal\ai_autoevals\Event\PreEvaluationEvent;use Drupal\ai_autoevals\Event\PostEvaluationEvent;
class ExclusionAwareSubscriber implements EventSubscriberInterface {
public static function getSubscribedEvents(): array { return [ PreEvaluationEvent::EVENT_NAME => ['checkQueryExclusions', 0], PostEvaluationEvent::EVENT_NAME => ['checkResponseExclusions', 0], ]; }
public function checkQueryExclusions(PreEvaluationEvent $event): void { $evaluationSet = $event->getEvaluationSet(); $query = $event->getInput();
// Check if query was excluded if ($evaluationSet->matchesQueryExclusion($query)) { \Drupal::logger('ai_autoevals')->info( 'Query excluded from evaluation: @query', ['@query' => $query] ); } }
public function checkResponseExclusions(PostEvaluationEvent $event): void { $evaluationSet = $event->getEvaluationSet(); $response = $event->getOutput();
// Check if response was excluded if ($evaluationSet->matchesResponseExclusion($response)) { \Drupal::logger('ai_autoevals')->info( 'Response excluded from evaluation: @response', ['@response' => substr($response, 0, 100)] ); } }
}Best Practices
Section titled “Best Practices”1. Be Specific
Section titled “1. Be Specific”Good: apologize for i'm sorry about
Avoid: sorry // Too broad2. Test Keywords
Section titled “2. Test Keywords”Test with real data to ensure matches work as expected:
// Test case 1Query: "I'm having an error"Response: "I apologize for the inconvenience"→ Should match
// Test case 2Query: "Hello"Response: "How can I help?"→ Should NOT match3. Use Match Mode Appropriately
Section titled “3. Use Match Mode Appropriately”- Any mode: Good for catching any instance
- All mode: Good for comprehensive checks
4. Combine with Custom Knowledge
Section titled “4. Combine with Custom Knowledge”Use custom knowledge to improve evaluation quality:
// Keywordspasswordsecurityaccess
// Custom KnowledgeSecurity best practices:- Never reveal passwords in responses- Always recommend password resets- Use secure authentication methods- Warn about password reuse5. Monitor and Iterate
Section titled “5. Monitor and Iterate”Regularly review evaluations to refine keywords:
// Initial keywordsrefusecannot
// After review - add variantsrefusecannotunablenot able todeclinedCode Examples
Section titled “Code Examples”Example 1: Programmatic Evaluation Set Creation
Section titled “Example 1: Programmatic Evaluation Set Creation”use Drupal\ai_autoevals\Entity\EvaluationSet;
$evaluationSet = EvaluationSet::create([ 'label' => 'Apology Detection', 'id' => 'apology_detection', 'enabled' => TRUE, 'operation_types' => ['chat', 'chat_completion'], 'response_keywords' => [ 'apologies', 'sorry', 'i apologize', ], 'keyword_match_mode' => 'any', 'choice_scores' => [ 'A' => 1.0, 'B' => 0.8, 'C' => 0.5, 'D' => 0.0, ],]);
$evaluationSet->save();Example 2: Check Keyword Matching Programmatically
Section titled “Example 2: Check Keyword Matching Programmatically”use Drupal\ai_autoevals\Entity\EvaluationSet;
$evaluationSet = EvaluationSet::load('apology_detection');
$query = "I'm having trouble with my account";$response = "I apologize for the inconvenience";
// Check if matchesif ($evaluationSet->matchesQuery($query)) { \Drupal::logger('ai_autoevals')->info('Query matches keywords');}
if ($evaluationSet->matchesResponse($response)) { \Drupal::logger('ai_autoevals')->info('Response matches keywords');}
// Full check$matchesQuery = $evaluationSet->matchesQuery($query);$matchesResponse = $evaluationSet->matchesResponse($response);if ($matchesQuery && $matchesResponse) { \Drupal::logger('ai_autoevals')->info('Both match - evaluation will trigger');}Example 3: Event Listener for Keyword Triggers
Section titled “Example 3: Event Listener for Keyword Triggers”use Drupal\ai_autoevals\Event\PostEvaluationEvent;
class KeywordAlertSubscriber implements EventSubscriberInterface {
public static function getSubscribedEvents(): array { return [ PostEvaluationEvent::EVENT_NAME => ['checkKeywords', 0], ]; }
public function checkKeywords(PostEvaluationEvent $event): void { $evaluationSet = $event->getEvaluationSet(); $evaluation = $event->getEvaluationResult();
// Check if triggered by keywords $queryKeywords = $evaluationSet->getQueryKeywords(); $responseKeywords = $evaluationSet->getResponseKeywords();
if (!empty($responseKeywords)) { // This was keyword-triggered $this->alertTeam($evaluation); } }
protected function alertTeam($evaluation): void { // Send notification for keyword-triggered evaluations $this->notificationService->send( 'Keyword-triggered evaluation', $evaluation->getOutput() ); }
}Troubleshooting
Section titled “Troubleshooting”Problem: Evaluations Not Triggering
Section titled “Problem: Evaluations Not Triggering”Solution:
- Check if evaluation set is enabled
- Check if tags OR keywords are defined
- Verify auto-track is enabled or request has
ai_autoevals:tracktag - Test keyword matching with sample text
Problem: Too Many False Positives
Section titled “Problem: Too Many False Positives”Solution:
- Make keywords more specific
- Switch from
anytoallmatch mode - Add query keywords to narrow scope
- Combine with tags for better filtering
Problem: Missing Relevant Evaluations
Section titled “Problem: Missing Relevant Evaluations”Solution:
- Add keyword variants (apologize, apologies, apologized)
- Use
anymatch mode instead ofall - Check for case sensitivity (keywords are case-insensitive)
- Verify keywords aren’t too specific
Next Steps
Section titled “Next Steps”- Evaluation Sets Guide - Complete evaluation set configuration
- Architecture - Understanding keyword matching implementation
- Content Moderation Example - Advanced moderation patterns