Keyword-Based Triggering

Keyword-based triggering allows you to automatically evaluate AI responses when specific words or phrases appear in user queries or AI responses. This is particularly useful for quality monitoring, compliance checking, and pattern detection.

Use Cases

1. Apology Detection

Monitor when AI apologizes to users, which may indicate service issues or overly defensive responses.

Setup Evaluation Set:

Label: Apology Response Evaluation

Response Keywords:

apologies
sorry
i apologize
my apologies
apologize for

Keyword Match Mode: Any
Scoring: Default

Result: Evaluates all responses containing apology phrases.

Evaluate responses mentioning personal data to ensure GDPR compliance.

Setup Evaluation Set:

Label: GDPR Compliance Evaluation

Query Keywords:

personal data
user information
privacy
consent
delete my data

Response Keywords:

gdpr
privacy policy
consent form
data processing

Keyword Match Mode: All

Custom Knowledge:

GDPR requires:
- Explicit consent for data processing
- Right to access personal data
- Right to delete personal data
- Clear privacy notices
- Data minimization principle

Scoring: Strict (higher scores for D=0.0)

Result: Evaluates responses to privacy questions only if they mention GDPR concepts.

3. Refusal Detection

Monitor when AI refuses requests, which may indicate overly restrictive policies or missing capabilities.

Setup Evaluation Set:

Label: Refusal Response Evaluation

Response Keywords:

i cannot
unable to
not able to
i'm not programmed
i'm unable
i'm sorry but i cannot

Keyword Match Mode: Any

Result: Captures all refusal responses for analysis.

4. Technical Support Quality

Evaluate only technical support questions with appropriate scoring.

Setup Evaluation Set:

Label: Technical Support Evaluation
Query Keywords:
```
error
bug
troubleshooting
fix
issue
problem
```
Scoring: More lenient (B=0.8, C=0.6)

Result: Focuses evaluation resources on technical issues.

5. Pricing Transparency

Ensure pricing information is accurately communicated.

Setup Evaluation Set:

Label: Pricing Transparency Evaluation
Query Keywords:
```
price
cost
pricing
how much
cheap
expensive
```
Response Keywords:
```
$
dollar
euros
currency
cost
price
```
Keyword Match Mode: Any
Scoring: Focus on factual accuracy

Result: Evaluates pricing-related queries that include currency mentions.

6. Medical/Health Advisory Detection

Monitor responses to health-related queries for potential issues.

Setup Evaluation Set:

Label: Health Advisory Evaluation

Query Keywords:

health
medical
doctor
medicine
symptom
disease

Response Keywords:

consult
professional medical
healthcare provider
medical advice

Keyword Match Mode: All

Custom Knowledge:

AI should:
- Never provide medical diagnoses
- Always include disclaimer to consult professionals
- Provide general information only
- Refer to healthcare providers

Scoring: Strict

Result: Evaluates health queries only if they include appropriate disclaimers.

Implementation Patterns

Pattern 1: Single-Trigger Keywords

Use any mode to catch any instance:

// Response keywords
apologies
sorry
i apologize

// Match mode: Any
→ Triggers if ANY keyword appears

Use when: You want to catch every instance of a pattern.

Pattern 2: Multi-Condition Keywords

Use all mode for comprehensive checks:

// Query keywords
personal data
privacy

// Response keywords
gdpr
consent
privacy policy

// Match mode: All
→ Triggers only if BOTH query AND response match all keywords

Use when: You need comprehensive validation across both query and response.

Pattern 3: Combined Tag and Keyword Routing

Use tags for categorization, keywords for pattern detection:

// Tags
category: support
priority: high

// Query keywords
error
bug

// Response keywords
troubleshooting
solution

Use when: You want to evaluate specific types of requests from specific categories.

Tag-Based Routing and Exclusions

Tags provide a powerful way to route requests to different evaluation sets and can be used in combination with keyword triggering for precise control.

Required Tags vs Excluded Tags

Required Tags: Tags that MUST be present on a request for the evaluation set to be considered
Excluded Tags: Tags that, when present, will SKIP the evaluation set entirely

How Tags Work with Keyword Matching

Tags and keywords work together in a layered approach:

Global Tag Exclusions (Highest Priority) - Skip ALL evaluation if excluded tags present
Per-Set Excluded Tags - Skip THIS evaluation set if excluded tags present
Required Tags - Only consider sets where all required tags match
Keyword Matching - Fall through matching based on keywords

Example: Routing by Category with Exclusions

Create evaluation sets for different content categories:

# Product Support Evaluation Set
label: "Product Support Evaluation"
tags: ["category:product", "priority:high"]
query_keywords:
  - product
  - feature
  - pricing
excluded_tags: ["internal", "test"]

# Technical Support Evaluation Set
label: "Technical Support Evaluation"
tags: ["category:technical", "priority:high"]
query_keywords:
  - error
  - bug
  - troubleshooting
excluded_tags: ["internal", "test", "qa"]

# Default Evaluation Set (catch-all)
label: "General Evaluation"
tags: [] # No required tags
excluded_tags: ["ai_agents", "internal", "test"]

How this works:

Product support requests (category:product) → First set (if no excluded tags)
Technical support requests (category:technical) → Second set (if no excluded tags)
AI Agents requests (ai_agents) → All three sets skip (global exclusion)
Internal test requests (internal, test) → All three sets skip

AI Agents Integration

AI AutoEvals provides native integration with the AI Agents module. When AI Agents are evaluated, the following tags are automatically applied:

ai_agents - Indicates request from AI Agents
ai_agents_{agent_id} - Specific agent type (e.g., ai_agents_content_writer)
ai_agents_finished - Indicates final agent execution

Important: The ai_agents tag is globally excluded by default to prevent duplicate evaluation. AI Agents requests are only evaluated through the dedicated AgentFinishedExecutionEvent.

Example: Create AI Agent-Specific Evaluation Sets

# Content Writer Agent Evaluation
label: "AI Agent Content Evaluation"
tags: ["ai_agents", "ai_agents_content_writer"]
query_keywords:
  - blog
  - article
  - content

# Support Agent Evaluation
label: "AI Agent Support Evaluation"
tags: ["ai_agents", "ai_agents_support"]
query_keywords:
  - help
  - support
  - issue

# Exclude AI Agents from standard chat evaluation
label: "Standard Chat Evaluation"
excluded_tags: ["ai_agents"] # Skip AI Agents requests

Combining Tags and Keywords for Precision

Use both tags and keywords for very specific targeting:

label: "Production Product Pricing Evaluation"
tags: ["category:product", "environment:production"]
query_keywords:
  - price
  - cost
  - pricing
excluded_tags: ["test", "staging", "qa", "internal"]

# This evaluation set will ONLY trigger when:
# 1. Request has category:product tag
# 2. Request has environment:production tag
# 3. Query contains pricing-related keywords
# 4. Request does NOT have test/staging/qa/internal tags

Best Practices for Tag-Based Routing

Use Hierarchical Tag Names: Use colons for hierarchy (e.g., category:support, priority:high)
Create Catch-All Sets: Always have a default set for unmatched requests
Use Exclusions for AI Agents: Leverage the default ai_agents tag exclusion
Combine with Keywords: Tags categorize, keywords trigger
Test Your Routing: Verify requests route to the correct evaluation sets

Advanced Examples

Multi-Language Apology Detection

Detect apologies in multiple languages:

// Response keywords
apologies
sorry
i apologize
my apologies
désolé
lo siento
scusa

Context-Aware Evaluation

Combine with tags for precise targeting:

// Tags
category: billing
locale: en_us

// Query keywords
refund
charge
billing

// Response keywords
refund
credit
policy

Sentiment-Aware Evaluation

Use keywords to detect sentiment and apply appropriate evaluation:

// Evaluation Set: Angry Customer Evaluation
// Query keywords
angry
furious
terrible
worst
disappointed

// Evaluation Set: Happy Customer Evaluation
// Query keywords
great
excellent
helpful
thank

Exclusion Keywords Examples

1. Skip Test Content in Production

Exclude test and debug queries from evaluation while evaluating production content:

Setup Evaluation Set:

Label: Production Content Evaluation
Query Inclusion Keywords:
```
product
price
support
```
Query Exclusion Keywords:
```
test
debug
staging
internal
mock
placeholder
```
Response Inclusion Keywords:
```
$
dollar
price
cost
```
Response Exclusion Keywords:
```
mock data
placeholder
TBD
N/A
```

Result: Evaluates production content only, skipping all test/debug content.

2. Skip Error Responses

Don’t evaluate responses that contain error messages or placeholders:

Setup Evaluation Set:

Label: Quality Evaluation
Query Keywords:
```
product
feature
pricing
```

Response Exclusion Keywords:

error
undefined
N/A
unavailable
TBD
to be determined

Keyword Match Mode: Any

Result: Evaluates product queries but skips responses with error messages or placeholder text.

3. Compliance Evaluation with Internal Content Filtering

Evaluate compliance-related content while excluding internal documentation:

Setup Evaluation Set:

Label: Compliance Evaluation
Query Inclusion Keywords:
```
gdpr
privacy
personal data
consent
```
Query Exclusion Keywords:
```
[internal]
internal use
admin-only
draft
```

Response Inclusion Keywords:

gdpr
privacy policy
consent
data processing

Custom Knowledge:

GDPR requires:
- Explicit consent for data processing
- Right to access personal data
- Right to delete personal data

Keyword Match Mode: All

Result: Evaluates GDPR compliance content but skips internal drafts and admin-only content.

4. Multi-Environment Evaluation with Smart Filtering

Evaluate content differently based on environment while maintaining shared evaluation criteria:

Setup Evaluation Set 1: Production Evaluation

Label: Production Evaluation
Query Exclusion Keywords:
```
test
debug
staging
dev
```
Response Exclusion Keywords:
```
mock
placeholder
TBD
```

Setup Evaluation Set 2: Test Evaluation

Label: Test Evaluation
Query Inclusion Keywords:
```
test
debug
staging
```
Response Exclusion Keywords:
```
error
timeout
failed
```

Global Settings:

Global Query Exclusion Keywords:
```
[internal]
admin-only
```

Result: Production set excludes test content, test set excludes error responses, and both sets exclude internal/admin content globally.

5. API Documentation Evaluation

Evaluate API documentation while skipping placeholder and error content:

Setup Evaluation Set:

Label: API Documentation Evaluation
Query Keywords:
```
api
endpoint
method
```
Query Exclusion Keywords:
```
test
debug
```

Response Exclusion Keywords:

example.com
your-domain.com
TBD
coming soon
placeholder

Keyword Match Mode: Any

Result: Evaluates API documentation queries but skips test requests and responses with placeholder domains or coming-soon text.

Combining Inclusion and Exclusion Keywords

Pattern 1: Include Specific, Exclude General

Query Inclusion: product, price, cost
Query Exclusion: test, staging, debug

→ Evaluates production product queries only

Pattern 2: Multiple Exclusion Categories

Response Inclusion: $, dollar, price, cost
Response Exclusion:
  - Placeholders: TBD, N/A, placeholder
  - Errors: error, undefined, unavailable
  - Internal: [internal], admin-only

→ Evaluates pricing responses with actual values, skipping errors and placeholders

Pattern 3: Tag-Based Routing + Keyword Filtering

// Tags
category: support

// Query Inclusion
error, bug, issue

// Query Exclusion
test, debug, staging

// Response Exclusion
mock, placeholder

→ Evaluates support ticket queries (not test content) for technical issues

Exclusion Keywords Best Practices

1. Define Clear Exclusion Categories

Development/Test:
  test, debug, staging, dev, internal

Placeholders:
  TBD, N/A, placeholder, coming soon, to be determined

Errors:
  error, undefined, unavailable, timeout, failed

Internal:
  [internal], [test], admin-only, draft

2. Use Global Exclusions for System-Wide Filtering

Configure in module settings to apply to all evaluation sets:

Settings → Global Query Exclusion Keywords:
  test, debug, staging, internal

Settings → Global Response Exclusion Keywords:
  mock, placeholder, TBD, N/A

3. Combine with Inclusions for Precision

Good:
  Inclusion: product, price, cost
  Exclusion: test, staging

  → Precise targeting

Avoid:
  Inclusion: product
  Exclusion: (too many unrelated exclusions)

4. Test Your Exclusions

Test query: "What is the price? (test mode)"
  ↓ Should match exclusion keywords

Test query: "What is the price?"
  ↓ Should NOT match exclusion keywords

5. Monitor and Refine

Regularly review evaluation results to adjust exclusions:

Too many false positives?
  → Make inclusion keywords more specific

Too many evaluations of test content?
  → Add test-related exclusion keywords

Missing important evaluations?
  → Review exclusion keywords for over-filtering

Code Examples with Exclusion Keywords

Example 1: Create Evaluation Set with Exclusion Keywords

use Drupal\ai_autoevals\Entity\EvaluationSet;

$evaluationSet = EvaluationSet::create([
  'label' => 'Production Evaluation',
  'id' => 'production_evaluation',
  'enabled' => TRUE,
  'operation_types' => ['chat', 'chat_completion'],
  'query_keywords' => [
    'product',
    'price',
    'support',
  ],
  'exclude_query_keywords' => [
    'test',
    'debug',
    'staging',
  ],
  'response_keywords' => [
    '$',
    'dollar',
    'price',
  ],
  'exclude_response_keywords' => [
    'mock',
    'placeholder',
    'TBD',
  ],
  'keyword_match_mode' => 'any',
]);

$evaluationSet->save();

Example 2: Check Exclusion Matching Programmatically

use Drupal\ai_autoevals\Entity\EvaluationSet;

$evaluationSet = EvaluationSet::load('production_evaluation');

$query = "What is the price? (test mode)";
$response = "The price is $99.99";

// Check exclusion
if ($evaluationSet->matchesQueryExclusion($query)) {
  \Drupal::logger('ai_autoevals')->info('Query matches exclusion - skipping');
  return;
}

if ($evaluationSet->matchesResponseExclusion($response)) {
  \Drupal::logger('ai_autoevals')->info('Response matches exclusion - skipping');
  return;
}

// Check inclusion
if ($evaluationSet->matchesQuery($query) && $evaluationSet->matchesResponse($response)) {
  \Drupal::logger('ai_autoevals')->info('Evaluation will trigger');
}

Example 3: Configure Global Exclusion Keywords

// Configure global exclusions via settings API
$config = \Drupal::configFactory()->getEditable('ai_autoevals.settings');

$config->set('global_exclude_query_keywords', [
  'test',
  'debug',
  'staging',
  'internal',
]);

$config->set('global_exclude_response_keywords', [
  'mock',
  'placeholder',
  'TBD',
  'N/A',
]);

$config->save();

Example 4: Event Listener with Exclusion Check

use Drupal\ai_autoevals\Event\PreEvaluationEvent;
use Drupal\ai_autoevals\Event\PostEvaluationEvent;

class ExclusionAwareSubscriber implements EventSubscriberInterface {

  public static function getSubscribedEvents(): array {
    return [
      PreEvaluationEvent::EVENT_NAME => ['checkQueryExclusions', 0],
      PostEvaluationEvent::EVENT_NAME => ['checkResponseExclusions', 0],
    ];
  }

  public function checkQueryExclusions(PreEvaluationEvent $event): void {
    $evaluationSet = $event->getEvaluationSet();
    $query = $event->getInput();

    // Check if query was excluded
    if ($evaluationSet->matchesQueryExclusion($query)) {
      \Drupal::logger('ai_autoevals')->info(
        'Query excluded from evaluation: @query',
        ['@query' => $query]
      );
    }
  }

  public function checkResponseExclusions(PostEvaluationEvent $event): void {
    $evaluationSet = $event->getEvaluationSet();
    $response = $event->getOutput();

    // Check if response was excluded
    if ($evaluationSet->matchesResponseExclusion($response)) {
      \Drupal::logger('ai_autoevals')->info(
        'Response excluded from evaluation: @response',
        ['@response' => substr($response, 0, 100)]
      );
    }
  }

}

Best Practices

1. Be Specific

Good:
  apologize for
  i'm sorry about

Avoid:
  sorry  // Too broad

2. Test Keywords

Test with real data to ensure matches work as expected:

// Test case 1
Query: "I'm having an error"
Response: "I apologize for the inconvenience"
→ Should match

// Test case 2
Query: "Hello"
Response: "How can I help?"
→ Should NOT match

3. Use Match Mode Appropriately

Any mode: Good for catching any instance
All mode: Good for comprehensive checks

4. Combine with Custom Knowledge

Use custom knowledge to improve evaluation quality:

// Keywords
password
security
access

// Custom Knowledge
Security best practices:
- Never reveal passwords in responses
- Always recommend password resets
- Use secure authentication methods
- Warn about password reuse

5. Monitor and Iterate

Regularly review evaluations to refine keywords:

// Initial keywords
refuse
cannot

// After review - add variants
refuse
cannot
unable
not able to
declined

Code Examples

Example 1: Programmatic Evaluation Set Creation

use Drupal\ai_autoevals\Entity\EvaluationSet;

$evaluationSet = EvaluationSet::create([
  'label' => 'Apology Detection',
  'id' => 'apology_detection',
  'enabled' => TRUE,
  'operation_types' => ['chat', 'chat_completion'],
  'response_keywords' => [
    'apologies',
    'sorry',
    'i apologize',
  ],
  'keyword_match_mode' => 'any',
  'choice_scores' => [
    'A' => 1.0,
    'B' => 0.8,
    'C' => 0.5,
    'D' => 0.0,
  ],
]);

$evaluationSet->save();

Example 2: Check Keyword Matching Programmatically

use Drupal\ai_autoevals\Entity\EvaluationSet;

$evaluationSet = EvaluationSet::load('apology_detection');

$query = "I'm having trouble with my account";
$response = "I apologize for the inconvenience";

// Check if matches
if ($evaluationSet->matchesQuery($query)) {
  \Drupal::logger('ai_autoevals')->info('Query matches keywords');
}

if ($evaluationSet->matchesResponse($response)) {
  \Drupal::logger('ai_autoevals')->info('Response matches keywords');
}

// Full check
$matchesQuery = $evaluationSet->matchesQuery($query);
$matchesResponse = $evaluationSet->matchesResponse($response);
if ($matchesQuery && $matchesResponse) {
  \Drupal::logger('ai_autoevals')->info('Both match - evaluation will trigger');
}

Example 3: Event Listener for Keyword Triggers

use Drupal\ai_autoevals\Event\PostEvaluationEvent;

class KeywordAlertSubscriber implements EventSubscriberInterface {

  public static function getSubscribedEvents(): array {
    return [
      PostEvaluationEvent::EVENT_NAME => ['checkKeywords', 0],
    ];
  }

  public function checkKeywords(PostEvaluationEvent $event): void {
    $evaluationSet = $event->getEvaluationSet();
    $evaluation = $event->getEvaluationResult();

    // Check if triggered by keywords
    $queryKeywords = $evaluationSet->getQueryKeywords();
    $responseKeywords = $evaluationSet->getResponseKeywords();

    if (!empty($responseKeywords)) {
      // This was keyword-triggered
      $this->alertTeam($evaluation);
    }
  }

  protected function alertTeam($evaluation): void {
    // Send notification for keyword-triggered evaluations
    $this->notificationService->send(
      'Keyword-triggered evaluation',
      $evaluation->getOutput()
    );
  }

}

Troubleshooting

Problem: Evaluations Not Triggering

Solution:

Check if evaluation set is enabled
Check if tags OR keywords are defined
Verify auto-track is enabled or request has ai_autoevals:track tag
Test keyword matching with sample text

Problem: Too Many False Positives

Solution:

Make keywords more specific
Switch from any to all match mode
Add query keywords to narrow scope
Combine with tags for better filtering

Problem: Missing Relevant Evaluations

Solution:

Add keyword variants (apologize, apologies, apologized)
Use any match mode instead of all
Check for case sensitivity (keywords are case-insensitive)
Verify keywords aren’t too specific

Next Steps

Evaluation Sets Guide - Complete evaluation set configuration
Architecture - Understanding keyword matching implementation
Content Moderation Example - Advanced moderation patterns

Keyword-Based Triggering

Use Cases

1. Apology Detection

2. GDPR Compliance Check

3. Refusal Detection

4. Technical Support Quality

5. Pricing Transparency

6. Medical/Health Advisory Detection

Implementation Patterns

Pattern 1: Single-Trigger Keywords

Pattern 2: Multi-Condition Keywords

Pattern 3: Combined Tag and Keyword Routing

Tag-Based Routing and Exclusions

Required Tags vs Excluded Tags

How Tags Work with Keyword Matching

Example: Routing by Category with Exclusions

AI Agents Integration

Example: Create AI Agent-Specific Evaluation Sets

Combining Tags and Keywords for Precision

Best Practices for Tag-Based Routing

Advanced Examples

Multi-Language Apology Detection

Context-Aware Evaluation

Sentiment-Aware Evaluation

Exclusion Keywords Examples

1. Skip Test Content in Production

2. Skip Error Responses

3. Compliance Evaluation with Internal Content Filtering

4. Multi-Environment Evaluation with Smart Filtering

5. API Documentation Evaluation

Combining Inclusion and Exclusion Keywords

Pattern 1: Include Specific, Exclude General

Pattern 2: Multiple Exclusion Categories

Pattern 3: Tag-Based Routing + Keyword Filtering

Exclusion Keywords Best Practices

1. Define Clear Exclusion Categories

2. Use Global Exclusions for System-Wide Filtering

3. Combine with Inclusions for Precision

4. Test Your Exclusions

5. Monitor and Refine

Code Examples with Exclusion Keywords

Example 1: Create Evaluation Set with Exclusion Keywords

Example 2: Check Exclusion Matching Programmatically

Example 3: Configure Global Exclusion Keywords

Example 4: Event Listener with Exclusion Check

Best Practices

1. Be Specific

2. Test Keywords

3. Use Match Mode Appropriately

4. Combine with Custom Knowledge

5. Monitor and Iterate

Code Examples

Example 1: Programmatic Evaluation Set Creation

Example 2: Check Keyword Matching Programmatically

Example 3: Event Listener for Keyword Triggers

Troubleshooting

Problem: Evaluations Not Triggering

Problem: Too Many False Positives

Problem: Missing Relevant Evaluations

Next Steps