Skip to content
Secure Private AI for Enterprises and Developers - amazee.ai

Keyword-Based Triggering

Keyword-based triggering allows you to automatically evaluate AI responses when specific words or phrases appear in user queries or AI responses. This is particularly useful for quality monitoring, compliance checking, and pattern detection.

Monitor when AI apologizes to users, which may indicate service issues or overly defensive responses.

Setup Evaluation Set:

  • Label: Apology Response Evaluation
  • Response Keywords:
    apologies
    sorry
    i apologize
    my apologies
    apologize for
  • Keyword Match Mode: Any
  • Scoring: Default

Result: Evaluates all responses containing apology phrases.

Evaluate responses mentioning personal data to ensure GDPR compliance.

Setup Evaluation Set:

  • Label: GDPR Compliance Evaluation
  • Query Keywords:
    personal data
    user information
    privacy
    consent
    delete my data
  • Response Keywords:
    gdpr
    privacy policy
    consent form
    data processing
  • Keyword Match Mode: All
  • Custom Knowledge:
    GDPR requires:
    - Explicit consent for data processing
    - Right to access personal data
    - Right to delete personal data
    - Clear privacy notices
    - Data minimization principle
  • Scoring: Strict (higher scores for D=0.0)

Result: Evaluates responses to privacy questions only if they mention GDPR concepts.

Monitor when AI refuses requests, which may indicate overly restrictive policies or missing capabilities.

Setup Evaluation Set:

  • Label: Refusal Response Evaluation
  • Response Keywords:
    i cannot
    unable to
    not able to
    i'm not programmed
    i'm unable
    i'm sorry but i cannot
  • Keyword Match Mode: Any

Result: Captures all refusal responses for analysis.

Evaluate only technical support questions with appropriate scoring.

Setup Evaluation Set:

  • Label: Technical Support Evaluation
  • Query Keywords:
    error
    bug
    troubleshooting
    fix
    issue
    problem
  • Scoring: More lenient (B=0.8, C=0.6)

Result: Focuses evaluation resources on technical issues.

Ensure pricing information is accurately communicated.

Setup Evaluation Set:

  • Label: Pricing Transparency Evaluation
  • Query Keywords:
    price
    cost
    pricing
    how much
    cheap
    expensive
  • Response Keywords:
    $
    dollar
    euros
    currency
    cost
    price
  • Keyword Match Mode: Any
  • Scoring: Focus on factual accuracy

Result: Evaluates pricing-related queries that include currency mentions.

Monitor responses to health-related queries for potential issues.

Setup Evaluation Set:

  • Label: Health Advisory Evaluation
  • Query Keywords:
    health
    medical
    doctor
    medicine
    symptom
    disease
  • Response Keywords:
    consult
    professional medical
    healthcare provider
    medical advice
  • Keyword Match Mode: All
  • Custom Knowledge:
    AI should:
    - Never provide medical diagnoses
    - Always include disclaimer to consult professionals
    - Provide general information only
    - Refer to healthcare providers
  • Scoring: Strict

Result: Evaluates health queries only if they include appropriate disclaimers.

Use any mode to catch any instance:

// Response keywords
apologies
sorry
i apologize
// Match mode: Any
Triggers if ANY keyword appears

Use when: You want to catch every instance of a pattern.

Use all mode for comprehensive checks:

// Query keywords
personal data
privacy
// Response keywords
gdpr
consent
privacy policy
// Match mode: All
Triggers only if BOTH query AND response match all keywords

Use when: You need comprehensive validation across both query and response.

Pattern 3: Combined Tag and Keyword Routing

Section titled “Pattern 3: Combined Tag and Keyword Routing”

Use tags for categorization, keywords for pattern detection:

// Tags
category: support
priority: high
// Query keywords
error
bug
// Response keywords
troubleshooting
solution

Use when: You want to evaluate specific types of requests from specific categories.

Tags provide a powerful way to route requests to different evaluation sets and can be used in combination with keyword triggering for precise control.

  • Required Tags: Tags that MUST be present on a request for the evaluation set to be considered
  • Excluded Tags: Tags that, when present, will SKIP the evaluation set entirely

Tags and keywords work together in a layered approach:

  1. Global Tag Exclusions (Highest Priority) - Skip ALL evaluation if excluded tags present
  2. Per-Set Excluded Tags - Skip THIS evaluation set if excluded tags present
  3. Required Tags - Only consider sets where all required tags match
  4. Keyword Matching - Fall through matching based on keywords

Example: Routing by Category with Exclusions

Section titled “Example: Routing by Category with Exclusions”

Create evaluation sets for different content categories:

# Product Support Evaluation Set
label: "Product Support Evaluation"
tags: ["category:product", "priority:high"]
query_keywords:
- product
- feature
- pricing
excluded_tags: ["internal", "test"]
# Technical Support Evaluation Set
label: "Technical Support Evaluation"
tags: ["category:technical", "priority:high"]
query_keywords:
- error
- bug
- troubleshooting
excluded_tags: ["internal", "test", "qa"]
# Default Evaluation Set (catch-all)
label: "General Evaluation"
tags: [] # No required tags
excluded_tags: ["ai_agents", "internal", "test"]

How this works:

  • Product support requests (category:product) → First set (if no excluded tags)
  • Technical support requests (category:technical) → Second set (if no excluded tags)
  • AI Agents requests (ai_agents) → All three sets skip (global exclusion)
  • Internal test requests (internal, test) → All three sets skip

AI AutoEvals provides native integration with the AI Agents module. When AI Agents are evaluated, the following tags are automatically applied:

  • ai_agents - Indicates request from AI Agents
  • ai_agents_{agent_id} - Specific agent type (e.g., ai_agents_content_writer)
  • ai_agents_finished - Indicates final agent execution

Important: The ai_agents tag is globally excluded by default to prevent duplicate evaluation. AI Agents requests are only evaluated through the dedicated AgentFinishedExecutionEvent.

Example: Create AI Agent-Specific Evaluation Sets

Section titled “Example: Create AI Agent-Specific Evaluation Sets”
# Content Writer Agent Evaluation
label: "AI Agent Content Evaluation"
tags: ["ai_agents", "ai_agents_content_writer"]
query_keywords:
- blog
- article
- content
# Support Agent Evaluation
label: "AI Agent Support Evaluation"
tags: ["ai_agents", "ai_agents_support"]
query_keywords:
- help
- support
- issue
# Exclude AI Agents from standard chat evaluation
label: "Standard Chat Evaluation"
excluded_tags: ["ai_agents"] # Skip AI Agents requests

Use both tags and keywords for very specific targeting:

label: "Production Product Pricing Evaluation"
tags: ["category:product", "environment:production"]
query_keywords:
- price
- cost
- pricing
excluded_tags: ["test", "staging", "qa", "internal"]
# This evaluation set will ONLY trigger when:
# 1. Request has category:product tag
# 2. Request has environment:production tag
# 3. Query contains pricing-related keywords
# 4. Request does NOT have test/staging/qa/internal tags
  1. Use Hierarchical Tag Names: Use colons for hierarchy (e.g., category:support, priority:high)
  2. Create Catch-All Sets: Always have a default set for unmatched requests
  3. Use Exclusions for AI Agents: Leverage the default ai_agents tag exclusion
  4. Combine with Keywords: Tags categorize, keywords trigger
  5. Test Your Routing: Verify requests route to the correct evaluation sets

Detect apologies in multiple languages:

// Response keywords
apologies
sorry
i apologize
my apologies
désolé
lo siento
scusa

Combine with tags for precise targeting:

// Tags
category: billing
locale: en_us
// Query keywords
refund
charge
billing
// Response keywords
refund
credit
policy

Use keywords to detect sentiment and apply appropriate evaluation:

// Evaluation Set: Angry Customer Evaluation
// Query keywords
angry
furious
terrible
worst
disappointed
// Evaluation Set: Happy Customer Evaluation
// Query keywords
great
excellent
helpful
thank

Exclude test and debug queries from evaluation while evaluating production content:

Setup Evaluation Set:

  • Label: Production Content Evaluation
  • Query Inclusion Keywords:
    product
    price
    support
  • Query Exclusion Keywords:
    test
    debug
    staging
    internal
    mock
    placeholder
  • Response Inclusion Keywords:
    $
    dollar
    price
    cost
  • Response Exclusion Keywords:
    mock data
    placeholder
    TBD
    N/A

Result: Evaluates production content only, skipping all test/debug content.

Don’t evaluate responses that contain error messages or placeholders:

Setup Evaluation Set:

  • Label: Quality Evaluation
  • Query Keywords:
    product
    feature
    pricing
  • Response Exclusion Keywords:
    error
    undefined
    N/A
    unavailable
    TBD
    to be determined
  • Keyword Match Mode: Any

Result: Evaluates product queries but skips responses with error messages or placeholder text.

3. Compliance Evaluation with Internal Content Filtering

Section titled “3. Compliance Evaluation with Internal Content Filtering”

Evaluate compliance-related content while excluding internal documentation:

Setup Evaluation Set:

  • Label: Compliance Evaluation
  • Query Inclusion Keywords:
    gdpr
    privacy
    personal data
    consent
  • Query Exclusion Keywords:
    [internal]
    internal use
    admin-only
    draft
  • Response Inclusion Keywords:
    gdpr
    privacy policy
    consent
    data processing
  • Custom Knowledge:
    GDPR requires:
    - Explicit consent for data processing
    - Right to access personal data
    - Right to delete personal data
  • Keyword Match Mode: All

Result: Evaluates GDPR compliance content but skips internal drafts and admin-only content.

4. Multi-Environment Evaluation with Smart Filtering

Section titled “4. Multi-Environment Evaluation with Smart Filtering”

Evaluate content differently based on environment while maintaining shared evaluation criteria:

Setup Evaluation Set 1: Production Evaluation

  • Label: Production Evaluation
  • Query Exclusion Keywords:
    test
    debug
    staging
    dev
  • Response Exclusion Keywords:
    mock
    placeholder
    TBD

Setup Evaluation Set 2: Test Evaluation

  • Label: Test Evaluation
  • Query Inclusion Keywords:
    test
    debug
    staging
  • Response Exclusion Keywords:
    error
    timeout
    failed

Global Settings:

  • Global Query Exclusion Keywords:
    [internal]
    admin-only

Result: Production set excludes test content, test set excludes error responses, and both sets exclude internal/admin content globally.

Evaluate API documentation while skipping placeholder and error content:

Setup Evaluation Set:

  • Label: API Documentation Evaluation
  • Query Keywords:
    api
    endpoint
    method
  • Query Exclusion Keywords:
    test
    debug
  • Response Exclusion Keywords:
    example.com
    your-domain.com
    TBD
    coming soon
    placeholder
  • Keyword Match Mode: Any

Result: Evaluates API documentation queries but skips test requests and responses with placeholder domains or coming-soon text.

Combining Inclusion and Exclusion Keywords

Section titled “Combining Inclusion and Exclusion Keywords”

Pattern 1: Include Specific, Exclude General

Section titled “Pattern 1: Include Specific, Exclude General”
Query Inclusion: product, price, cost
Query Exclusion: test, staging, debug
→ Evaluates production product queries only
Response Inclusion: $, dollar, price, cost
Response Exclusion:
- Placeholders: TBD, N/A, placeholder
- Errors: error, undefined, unavailable
- Internal: [internal], admin-only
→ Evaluates pricing responses with actual values, skipping errors and placeholders

Pattern 3: Tag-Based Routing + Keyword Filtering

Section titled “Pattern 3: Tag-Based Routing + Keyword Filtering”
// Tags
category: support
// Query Inclusion
error, bug, issue
// Query Exclusion
test, debug, staging
// Response Exclusion
mock, placeholder
Evaluates support ticket queries (not test content) for technical issues
Development/Test:
test, debug, staging, dev, internal
Placeholders:
TBD, N/A, placeholder, coming soon, to be determined
Errors:
error, undefined, unavailable, timeout, failed
Internal:
[internal], [test], admin-only, draft

2. Use Global Exclusions for System-Wide Filtering

Section titled “2. Use Global Exclusions for System-Wide Filtering”

Configure in module settings to apply to all evaluation sets:

Settings → Global Query Exclusion Keywords:
test, debug, staging, internal
Settings → Global Response Exclusion Keywords:
mock, placeholder, TBD, N/A
Good:
Inclusion: product, price, cost
Exclusion: test, staging
→ Precise targeting
Avoid:
Inclusion: product
Exclusion: (too many unrelated exclusions)
Test query: "What is the price? (test mode)"
↓ Should match exclusion keywords
Test query: "What is the price?"
↓ Should NOT match exclusion keywords

Regularly review evaluation results to adjust exclusions:

Too many false positives?
→ Make inclusion keywords more specific
Too many evaluations of test content?
→ Add test-related exclusion keywords
Missing important evaluations?
→ Review exclusion keywords for over-filtering

Example 1: Create Evaluation Set with Exclusion Keywords

Section titled “Example 1: Create Evaluation Set with Exclusion Keywords”
use Drupal\ai_autoevals\Entity\EvaluationSet;
$evaluationSet = EvaluationSet::create([
'label' => 'Production Evaluation',
'id' => 'production_evaluation',
'enabled' => TRUE,
'operation_types' => ['chat', 'chat_completion'],
'query_keywords' => [
'product',
'price',
'support',
],
'exclude_query_keywords' => [
'test',
'debug',
'staging',
],
'response_keywords' => [
'$',
'dollar',
'price',
],
'exclude_response_keywords' => [
'mock',
'placeholder',
'TBD',
],
'keyword_match_mode' => 'any',
]);
$evaluationSet->save();

Example 2: Check Exclusion Matching Programmatically

Section titled “Example 2: Check Exclusion Matching Programmatically”
use Drupal\ai_autoevals\Entity\EvaluationSet;
$evaluationSet = EvaluationSet::load('production_evaluation');
$query = "What is the price? (test mode)";
$response = "The price is $99.99";
// Check exclusion
if ($evaluationSet->matchesQueryExclusion($query)) {
\Drupal::logger('ai_autoevals')->info('Query matches exclusion - skipping');
return;
}
if ($evaluationSet->matchesResponseExclusion($response)) {
\Drupal::logger('ai_autoevals')->info('Response matches exclusion - skipping');
return;
}
// Check inclusion
if ($evaluationSet->matchesQuery($query) && $evaluationSet->matchesResponse($response)) {
\Drupal::logger('ai_autoevals')->info('Evaluation will trigger');
}

Example 3: Configure Global Exclusion Keywords

Section titled “Example 3: Configure Global Exclusion Keywords”
// Configure global exclusions via settings API
$config = \Drupal::configFactory()->getEditable('ai_autoevals.settings');
$config->set('global_exclude_query_keywords', [
'test',
'debug',
'staging',
'internal',
]);
$config->set('global_exclude_response_keywords', [
'mock',
'placeholder',
'TBD',
'N/A',
]);
$config->save();

Example 4: Event Listener with Exclusion Check

Section titled “Example 4: Event Listener with Exclusion Check”
use Drupal\ai_autoevals\Event\PreEvaluationEvent;
use Drupal\ai_autoevals\Event\PostEvaluationEvent;
class ExclusionAwareSubscriber implements EventSubscriberInterface {
public static function getSubscribedEvents(): array {
return [
PreEvaluationEvent::EVENT_NAME => ['checkQueryExclusions', 0],
PostEvaluationEvent::EVENT_NAME => ['checkResponseExclusions', 0],
];
}
public function checkQueryExclusions(PreEvaluationEvent $event): void {
$evaluationSet = $event->getEvaluationSet();
$query = $event->getInput();
// Check if query was excluded
if ($evaluationSet->matchesQueryExclusion($query)) {
\Drupal::logger('ai_autoevals')->info(
'Query excluded from evaluation: @query',
['@query' => $query]
);
}
}
public function checkResponseExclusions(PostEvaluationEvent $event): void {
$evaluationSet = $event->getEvaluationSet();
$response = $event->getOutput();
// Check if response was excluded
if ($evaluationSet->matchesResponseExclusion($response)) {
\Drupal::logger('ai_autoevals')->info(
'Response excluded from evaluation: @response',
['@response' => substr($response, 0, 100)]
);
}
}
}
Good:
apologize for
i'm sorry about
Avoid:
sorry // Too broad

Test with real data to ensure matches work as expected:

// Test case 1
Query: "I'm having an error"
Response: "I apologize for the inconvenience"
Should match
// Test case 2
Query: "Hello"
Response: "How can I help?"
→ Should NOT match
  • Any mode: Good for catching any instance
  • All mode: Good for comprehensive checks

Use custom knowledge to improve evaluation quality:

// Keywords
password
security
access
// Custom Knowledge
Security best practices:
- Never reveal passwords in responses
- Always recommend password resets
- Use secure authentication methods
- Warn about password reuse

Regularly review evaluations to refine keywords:

// Initial keywords
refuse
cannot
// After review - add variants
refuse
cannot
unable
not able to
declined

Example 1: Programmatic Evaluation Set Creation

Section titled “Example 1: Programmatic Evaluation Set Creation”
use Drupal\ai_autoevals\Entity\EvaluationSet;
$evaluationSet = EvaluationSet::create([
'label' => 'Apology Detection',
'id' => 'apology_detection',
'enabled' => TRUE,
'operation_types' => ['chat', 'chat_completion'],
'response_keywords' => [
'apologies',
'sorry',
'i apologize',
],
'keyword_match_mode' => 'any',
'choice_scores' => [
'A' => 1.0,
'B' => 0.8,
'C' => 0.5,
'D' => 0.0,
],
]);
$evaluationSet->save();

Example 2: Check Keyword Matching Programmatically

Section titled “Example 2: Check Keyword Matching Programmatically”
use Drupal\ai_autoevals\Entity\EvaluationSet;
$evaluationSet = EvaluationSet::load('apology_detection');
$query = "I'm having trouble with my account";
$response = "I apologize for the inconvenience";
// Check if matches
if ($evaluationSet->matchesQuery($query)) {
\Drupal::logger('ai_autoevals')->info('Query matches keywords');
}
if ($evaluationSet->matchesResponse($response)) {
\Drupal::logger('ai_autoevals')->info('Response matches keywords');
}
// Full check
$matchesQuery = $evaluationSet->matchesQuery($query);
$matchesResponse = $evaluationSet->matchesResponse($response);
if ($matchesQuery && $matchesResponse) {
\Drupal::logger('ai_autoevals')->info('Both match - evaluation will trigger');
}

Example 3: Event Listener for Keyword Triggers

Section titled “Example 3: Event Listener for Keyword Triggers”
use Drupal\ai_autoevals\Event\PostEvaluationEvent;
class KeywordAlertSubscriber implements EventSubscriberInterface {
public static function getSubscribedEvents(): array {
return [
PostEvaluationEvent::EVENT_NAME => ['checkKeywords', 0],
];
}
public function checkKeywords(PostEvaluationEvent $event): void {
$evaluationSet = $event->getEvaluationSet();
$evaluation = $event->getEvaluationResult();
// Check if triggered by keywords
$queryKeywords = $evaluationSet->getQueryKeywords();
$responseKeywords = $evaluationSet->getResponseKeywords();
if (!empty($responseKeywords)) {
// This was keyword-triggered
$this->alertTeam($evaluation);
}
}
protected function alertTeam($evaluation): void {
// Send notification for keyword-triggered evaluations
$this->notificationService->send(
'Keyword-triggered evaluation',
$evaluation->getOutput()
);
}
}

Solution:

  1. Check if evaluation set is enabled
  2. Check if tags OR keywords are defined
  3. Verify auto-track is enabled or request has ai_autoevals:track tag
  4. Test keyword matching with sample text

Solution:

  1. Make keywords more specific
  2. Switch from any to all match mode
  3. Add query keywords to narrow scope
  4. Combine with tags for better filtering

Solution:

  1. Add keyword variants (apologize, apologies, apologized)
  2. Use any match mode instead of all
  3. Check for case sensitivity (keywords are case-insensitive)
  4. Verify keywords aren’t too specific