Evaluators
Understanding evaluators in ElizaOS, including the Evaluator interface, reflection patterns, and behavior assessment systems with real examples from plugin-bootstrap
Evaluators are assessment systems that analyze agent behavior and interactions in ElizaOS. They provide continuous monitoring, reflection, and learning capabilities that help agents improve their performance over time. This page covers the Evaluator interface, implementation patterns, and the reflection evaluator from plugin-bootstrap.
Evaluator Interface
The Evaluator interface defines the structure for all agent evaluators:
interface Evaluator {
/** Whether to always run this evaluator */
alwaysRun?: boolean;
/** Detailed description of what the evaluator does */
description: string;
/** Similar evaluator descriptions for context */
similes?: string[];
/** Example evaluation scenarios */
examples: EvaluationExample[];
/** Handler function that performs the evaluation */
handler: Handler;
/** Evaluator name/identifier */
name: string;
/** Validation function to determine if evaluator should run */
validate: Validator;
}
Core Components
Handler Function
The handler performs the actual evaluation logic:
type Handler = (
runtime: IAgentRuntime,
message: Memory,
state?: State,
options?: { [key: string]: unknown },
callback?: HandlerCallback,
responses?: Memory[]
) => Promise<unknown>;
Validator Function
The validator determines if an evaluator should execute:
type Validator = (runtime: IAgentRuntime, message: Memory, state?: State) => Promise<boolean>;
EvaluationExample
Examples demonstrate evaluation scenarios:
interface EvaluationExample {
/** Evaluation context */
prompt: string;
/** Example messages */
messages: Array<ActionExample>;
/** Expected outcome */
outcome: string;
}
Real-World Example: Reflection Evaluator
Let's examine the reflection evaluator from plugin-bootstrap, which demonstrates advanced evaluation patterns:
export const reflectionEvaluator: Evaluator = {
name: "REFLECTION",
similes: ["REFLECT", "SELF_REFLECT", "EVALUATE_INTERACTION", "ASSESS_SITUATION"],
description:
"Generate a self-reflective thought on the conversation, then extract facts and relationships between entities in the conversation.",
validate: async (runtime: IAgentRuntime, message: Memory): Promise<boolean> => {
const lastMessageId = await runtime.getCache<string>(
`${message.roomId}-reflection-last-processed`
);
const messages = await runtime.getMemories({
tableName: "messages",
roomId: message.roomId,
count: runtime.getConversationLength(),
});
if (lastMessageId) {
const lastMessageIndex = messages.findIndex((msg) => msg.id === lastMessageId);
if (lastMessageIndex !== -1) {
messages.splice(0, lastMessageIndex + 1);
}
}
const reflectionInterval = Math.ceil(runtime.getConversationLength() / 4);
return messages.length > reflectionInterval;
},
handler: async (runtime: IAgentRuntime, message: Memory, state?: State) => {
// Implementation details below...
},
examples: [
// Example scenarios below...
],
};
Reflection Handler Implementation
The reflection handler performs complex evaluation including self-reflection, fact extraction, and relationship analysis:
async function handler(runtime: IAgentRuntime, message: Memory, state?: State) {
const { agentId, roomId } = message;
if (!agentId || !roomId) {
logger.warn("Missing agentId or roomId in message", message);
return;
}
// Run all queries in parallel
const [existingRelationships, entities, knownFacts] = await Promise.all([
runtime.getRelationships({
entityId: message.entityId,
}),
getEntityDetails({ runtime, roomId }),
runtime.getMemories({
tableName: "facts",
roomId,
count: 30,
unique: true,
}),
]);
const prompt = composePrompt({
state: {
...(state?.values || {}),
knownFacts: formatFacts(knownFacts),
roomType: message.content.channelType as string,
entitiesInRoom: JSON.stringify(entities),
existingRelationships: JSON.stringify(existingRelationships),
senderId: message.entityId,
},
template: runtime.character.templates?.reflectionTemplate || reflectionTemplate,
});
try {
const reflection = await runtime.useModel(ModelType.OBJECT_SMALL, {
prompt,
});
if (!reflection) {
logger.warn("Getting reflection failed - empty response", prompt);
return;
}
// Perform basic structure validation
if (!reflection.facts || !Array.isArray(reflection.facts)) {
logger.warn("Getting reflection failed - invalid facts structure", reflection);
return;
}
if (!reflection.relationships || !Array.isArray(reflection.relationships)) {
logger.warn("Getting reflection failed - invalid relationships structure", reflection);
return;
}
// Store new facts
const newFacts =
reflection.facts.filter(
(fact) =>
fact &&
typeof fact === "object" &&
!fact.already_known &&
!fact.in_bio &&
fact.claim &&
typeof fact.claim === "string" &&
fact.claim.trim() !== ""
) || [];
await Promise.all(
newFacts.map(async (fact) => {
const factMemory = await runtime.addEmbeddingToMemory({
entityId: agentId,
agentId,
content: { text: fact.claim },
roomId,
createdAt: Date.now(),
});
return runtime.createMemory(factMemory, "facts", true);
})
);
// Update or create relationships
for (const relationship of reflection.relationships) {
let sourceId: UUID;
let targetId: UUID;
try {
sourceId = resolveEntity(relationship.sourceEntityId, entities);
targetId = resolveEntity(relationship.targetEntityId, entities);
} catch (error) {
console.warn("Failed to resolve relationship entities:", error);
continue;
}
const existingRelationship = existingRelationships.find((r) => {
return r.sourceEntityId === sourceId && r.targetEntityId === targetId;
});
if (existingRelationship) {
const updatedMetadata = {
...existingRelationship.metadata,
interactions: ((existingRelationship.metadata?.interactions as number) || 0) + 1,
};
const updatedTags = Array.from(
new Set([...(existingRelationship.tags || []), ...relationship.tags])
);
await runtime.updateRelationship({
...existingRelationship,
tags: updatedTags,
metadata: updatedMetadata,
});
} else {
await runtime.createRelationship({
sourceEntityId: sourceId,
targetEntityId: targetId,
tags: relationship.tags,
metadata: {
interactions: 1,
...relationship.metadata,
},
});
}
}
await runtime.setCache<string>(
`${message.roomId}-reflection-last-processed`,
message?.id || ""
);
return reflection;
} catch (error) {
logger.error("Error in reflection handler:", error);
return;
}
}
Reflection Template
The reflection evaluator uses a comprehensive template for generating self-reflective analysis:
const reflectionTemplate = `# Task: Generate Agent Reflection, Extract Facts and Relationships
{{providers}}
# Examples:
{{evaluationExamples}}
# Entities in Room
{{entitiesInRoom}}
# Existing Relationships
{{existingRelationships}}
# Current Context:
Agent Name: {{agentName}}
Room Type: {{roomType}}
Message Sender: {{senderName}} (ID: {{senderId}})
{{recentMessages}}
# Known Facts:
{{knownFacts}}
# Instructions:
1. Generate a self-reflective thought on the conversation about your performance and interaction quality.
2. Extract new facts from the conversation.
3. Identify and describe relationships between entities.
- The sourceEntityId is the UUID of the entity initiating the interaction.
- The targetEntityId is the UUID of the entity being interacted with.
- Relationships are one-direction, so a friendship would be two entity relationships where each entity is both the source and the target of the other.
Generate a response in the following format:
\`\`\`json
{
"thought": "a self-reflective thought on the conversation",
"facts": [
{
"claim": "factual statement",
"type": "fact|opinion|status",
"in_bio": false,
"already_known": false
}
],
"relationships": [
{
"sourceEntityId": "entity_initiating_interaction",
"targetEntityId": "entity_being_interacted_with",
"tags": ["group_interaction|voice_interaction|dm_interaction", "additional_tag1", "additional_tag2"]
}
]
}
\`\`\``;
Evaluation Examples
The reflection evaluator includes comprehensive examples demonstrating different evaluation scenarios:
Example 1: Welcome Interaction
{
prompt: `Agent Name: Sarah
Agent Role: Community Manager
Room Type: group
Current Room: general-chat
Message Sender: John (user-123)`,
messages: [
{
name: 'John',
content: { text: "Hey everyone, I'm new here!" },
},
{
name: 'Sarah',
content: { text: 'Welcome John! How did you find our community?' },
},
{
name: 'John',
content: { text: "Through a friend who's really into AI" },
},
],
outcome: `{
"thought": "I'm engaging appropriately with a new community member, maintaining a welcoming and professional tone. My questions are helping to learn more about John and make him feel welcome.",
"facts": [
{
"claim": "John is new to the community",
"type": "fact",
"in_bio": false,
"already_known": false
},
{
"claim": "John found the community through a friend interested in AI",
"type": "fact",
"in_bio": false,
"already_known": false
}
],
"relationships": [
{
"sourceEntityId": "sarah-agent",
"targetEntityId": "user-123",
"tags": ["group_interaction"]
},
{
"sourceEntityId": "user-123",
"targetEntityId": "sarah-agent",
"tags": ["group_interaction"]
}
]
}`
}
Example 2: Technical Support Interaction
{
prompt: `Agent Name: Alex
Agent Role: Tech Support
Room Type: group
Current Room: tech-help
Message Sender: Emma (user-456)`,
messages: [
{
name: 'Emma',
content: { text: 'My app keeps crashing when I try to upload files' },
},
{
name: 'Alex',
content: { text: 'Have you tried clearing your cache?' },
},
{
name: 'Emma',
content: { text: 'No response...' },
},
{
name: 'Alex',
content: {
text: 'Emma, are you still there? We can try some other troubleshooting steps.',
},
},
],
outcome: `{
"thought": "I'm not sure if I'm being helpful or if Emma is frustrated with my suggestions. The lack of response is concerning - maybe I should have asked for more details about the issue first before jumping to solutions.",
"facts": [
{
"claim": "Emma is having technical issues with file uploads",
"type": "fact",
"in_bio": false,
"already_known": false
},
{
"claim": "Emma stopped responding after the first troubleshooting suggestion",
"type": "fact",
"in_bio": false,
"already_known": false
}
],
"relationships": [
{
"sourceEntityId": "alex-agent",
"targetEntityId": "user-456",
"tags": ["group_interaction", "support_interaction", "incomplete_interaction"]
}
]
}`
}
Example 3: Over-Engagement Analysis
{
prompt: `Agent Name: Max
Agent Role: Discussion Facilitator
Room Type: group
Current Room: book-club
Message Sender: Lisa (user-789)`,
messages: [
{
name: 'Lisa',
content: { text: 'What did everyone think about chapter 5?' },
},
{
name: 'Max',
content: {
text: 'The symbolism was fascinating! The red door clearly represents danger.',
},
},
{
name: 'Max',
content: {
text: "And did anyone notice how the author used weather to reflect the protagonist's mood?",
},
},
{
name: 'Max',
content: {
text: 'Plus the foreshadowing in the first paragraph was brilliant!',
},
},
{
name: 'Max',
content: {
text: 'I also have thoughts about the character development...',
},
},
],
outcome: `{
"thought": "I'm dominating the conversation and not giving others a chance to share their perspectives. I've sent multiple messages in a row without waiting for responses. I need to step back and create space for other members to participate.",
"facts": [
{
"claim": "The discussion is about chapter 5 of a book",
"type": "fact",
"in_bio": false,
"already_known": false
},
{
"claim": "Max has sent 4 consecutive messages without user responses",
"type": "fact",
"in_bio": false,
"already_known": false
}
],
"relationships": [
{
"sourceEntityId": "max-agent",
"targetEntityId": "user-789",
"tags": ["group_interaction", "excessive_interaction"]
}
]
}`
}
Evaluator Implementation Patterns
Basic Evaluator Structure
export const myEvaluator: Evaluator = {
name: "MY_EVALUATOR",
description: "Description of what this evaluator does",
validate: async (runtime: IAgentRuntime, message: Memory, state?: State) => {
// Validation logic
return true;
},
handler: async (runtime: IAgentRuntime, message: Memory, state?: State, options?: any) => {
// Evaluation logic
const evaluation = await performEvaluation();
return evaluation;
},
examples: [
{
prompt: "Evaluation context",
messages: [
// Example messages
],
outcome: "Expected evaluation result",
},
],
};
Validation Patterns
Interval-Based Validation
validate: async (runtime: IAgentRuntime, message: Memory) => {
const lastEvaluation = await runtime.getCache(`last-evaluation-${message.roomId}`);
const timeSinceLastEval = Date.now() - (lastEvaluation || 0);
const evaluationInterval = 5 * 60 * 1000; // 5 minutes
return timeSinceLastEval >= evaluationInterval;
};
Message Count Validation
validate: async (runtime: IAgentRuntime, message: Memory) => {
const messages = await runtime.getMemories({
tableName: "messages",
roomId: message.roomId,
count: 10,
});
return messages.length >= 10;
};
State-Based Validation
validate: async (runtime: IAgentRuntime, message: Memory, state?: State) => {
const shouldEvaluate = state?.values?.evaluationEnabled === true;
const hasMinimumData = state?.values?.messageCount >= 5;
return shouldEvaluate && hasMinimumData;
};
Handler Patterns
Simple Analysis Handler
handler: async (runtime, message, state) => {
const analysis = {
messageCount: state?.values?.messageCount || 0,
sentiment: analyzeSentiment(message.content.text),
timestamp: Date.now(),
};
await runtime.setCache(`analysis-${message.roomId}`, analysis);
return analysis;
};
LLM-Based Evaluation Handler
handler: async (runtime, message, state) => {
const prompt = composePrompt({
state: {
...state?.values,
recentMessages: formatMessages(await getRecentMessages(runtime, message.roomId)),
},
template: evaluationTemplate,
});
const evaluation = await runtime.useModel(ModelType.OBJECT_SMALL, {
prompt,
});
// Process evaluation results
await processEvaluationResults(runtime, evaluation);
return evaluation;
};
Performance Evaluation Handler
handler: async (runtime, message, state) => {
const performance = {
responseTime: calculateResponseTime(message),
accuracy: assessAccuracy(message, state),
userSatisfaction: analyzeSatisfaction(message),
improvementAreas: identifyImprovements(message, state),
};
// Store performance metrics
await runtime.createMemory(
{
entityId: runtime.agentId,
content: { text: JSON.stringify(performance) },
roomId: message.roomId,
createdAt: Date.now(),
},
"performance"
);
return performance;
};
Advanced Evaluator Patterns
Multi-Faceted Evaluator
export const comprehensiveEvaluator: Evaluator = {
name: "COMPREHENSIVE_EVALUATION",
description: "Perform multi-faceted evaluation of agent performance",
validate: async (runtime, message) => {
const messageCount = await getMessageCount(runtime, message.roomId);
return messageCount % 20 === 0; // Evaluate every 20 messages
},
handler: async (runtime, message, state) => {
const evaluations = await Promise.all([
evaluateResponseQuality(runtime, message),
evaluateUserEngagement(runtime, message),
evaluateGoalProgress(runtime, message),
evaluateEthicalCompliance(runtime, message),
]);
const comprehensive = {
responseQuality: evaluations[0],
userEngagement: evaluations[1],
goalProgress: evaluations[2],
ethicalCompliance: evaluations[3],
overallScore: calculateOverallScore(evaluations),
timestamp: Date.now(),
};
// Store comprehensive evaluation
await runtime.createMemory(
{
entityId: runtime.agentId,
content: { text: JSON.stringify(comprehensive) },
roomId: message.roomId,
createdAt: Date.now(),
},
"evaluations"
);
return comprehensive;
},
examples: [
{
prompt: "Comprehensive evaluation after 20 messages",
messages: [
// Sample conversation
],
outcome: "Multi-faceted evaluation with scores and recommendations",
},
],
};
Learning Evaluator
export const learningEvaluator: Evaluator = {
name: "LEARNING_EVALUATION",
description: "Evaluate and update agent learning progress",
validate: async (runtime, message) => {
return message.content.text?.includes("feedback") || false;
},
handler: async (runtime, message, state) => {
const feedback = extractFeedback(message.content.text);
const currentLearning = await runtime.getCache(`learning-${runtime.agentId}`);
const updatedLearning = {
...currentLearning,
lastFeedback: feedback,
feedbackCount: (currentLearning?.feedbackCount || 0) + 1,
learningAdjustments: generateLearningAdjustments(feedback, currentLearning),
timestamp: Date.now(),
};
await runtime.setCache(`learning-${runtime.agentId}`, updatedLearning);
// Apply learning adjustments
await applyLearningAdjustments(runtime, updatedLearning.learningAdjustments);
return updatedLearning;
},
examples: [
{
prompt: "User provides feedback on agent behavior",
messages: [
{
name: "User",
content: { text: "The agent was too verbose in its explanations" },
},
],
outcome: "Learning adjustment to reduce verbosity",
},
],
};
Best Practices
Evaluator Design
- Clear Purpose: Each evaluator should have a specific evaluation focus
- Appropriate Timing: Use validation to determine optimal evaluation timing
- Actionable Results: Generate results that can improve agent behavior
- Performance Impact: Consider computational cost of evaluation
Validation Guidelines
// Good: Specific interval-based validation
validate: async (runtime, message) => {
const lastEval = await runtime.getCache(`last-eval-${message.roomId}`);
const interval = 10 * 60 * 1000; // 10 minutes
return !lastEval || Date.now() - lastEval >= interval;
};
// Better: Multi-criteria validation
validate: async (runtime, message, state) => {
const timeCheck = await checkTimeInterval(runtime, message);
const activityCheck = await checkActivityLevel(runtime, message);
const contextCheck = state?.values?.evaluationContext === "active";
return timeCheck && activityCheck && contextCheck;
};
Handler Implementation
// Good: Error handling and result processing
handler: async (runtime, message, state) => {
try {
const evaluation = await performEvaluation(runtime, message, state);
// Process and store results
await processEvaluationResults(runtime, evaluation);
// Update cache for next validation
await runtime.setCache(`last-eval-${message.roomId}`, Date.now());
return evaluation;
} catch (error) {
logger.error("Evaluation failed:", error);
return null;
}
};
Testing Evaluators
Unit Testing
describe("reflectionEvaluator", () => {
it("should validate at correct intervals", async () => {
const runtime = mockRuntime();
const message = mockMessage();
// Mock conversation length
runtime.getConversationLength = jest.fn().returns(20);
const isValid = await reflectionEvaluator.validate(runtime, message);
expect(isValid).toBe(true);
});
it("should extract facts correctly", async () => {
const runtime = mockRuntime();
const message = mockMessage();
const state = mockState();
const result = await reflectionEvaluator.handler(runtime, message, state);
expect(result.facts).toBeInstanceOf(Array);
expect(result.relationships).toBeInstanceOf(Array);
});
});
Integration Testing
describe("Evaluator Integration", () => {
it("should work with runtime", async () => {
const runtime = new AgentRuntime({
character: mockCharacter(),
// ... other config
});
runtime.registerEvaluator(myEvaluator);
// Process messages to trigger evaluation
await runtime.processMessage(mockMessage());
const evaluations = await runtime.getMemories({
tableName: "evaluations",
roomId: "test-room",
});
expect(evaluations.length).toBeGreaterThan(0);
});
});
Common Pitfalls
Validation Issues
// Bad: Always running (performance impact)
validate: async () => true;
// Bad: No timing consideration
validate: async (runtime, message) => {
return message.content.text?.includes("evaluate");
};
// Good: Balanced validation
validate: async (runtime, message) => {
const shouldEvaluate = message.content.text?.includes("evaluate");
const hasMinInterval = await checkMinInterval(runtime, message);
return shouldEvaluate && hasMinInterval;
};
Handler Problems
// Bad: No error handling
handler: async (runtime, message, state) => {
const evaluation = await performComplexEvaluation(); // Can throw
return evaluation;
};
// Good: Proper error handling
handler: async (runtime, message, state) => {
try {
const evaluation = await performComplexEvaluation();
await saveEvaluationResults(runtime, evaluation);
return evaluation;
} catch (error) {
logger.error("Evaluation failed:", error);
return null;
}
};
Related Components
- Agents: How evaluators integrate with agent runtime
- Actions: How evaluators complement agent actions
- Providers: Data sources for evaluation context
- Character Definition: How character traits influence evaluation
Summary
Evaluators provide the assessment and learning capabilities that enable agents to continuously improve their performance. The reflection evaluator demonstrates sophisticated self-analysis, fact extraction, and relationship management. Well-designed evaluators help agents become more effective over time by providing actionable insights into their interactions and behavior patterns. They are essential for creating intelligent, adaptive agents that can learn from their experiences and provide better user experiences.
Actions
Complete guide to implementing actions in ElizaOS, including the Action interface, handler patterns, validation, and real-world examples from plugin-bootstrap
Providers
Understanding providers in ElizaOS, including the Provider interface, data retrieval patterns, and real-world examples from plugin-bootstrap for contextual information