Easily Compare LLMs Using OpenAI and Google Sheets
工作流概述
这是一个包含21个节点的复杂工作流,主要用于自动化处理各种任务。
工作流源代码
{
"id": "",
"meta": {
"instanceId": "",
"templateCredsSetupCompleted": true
},
"name": "Easily Compare LLMs Using OpenAI and Google Sheets",
"tags": [],
"nodes": [
{
"id": "",
"name": "When chat message received",
"type": "@n8n/n8n-nodes-langchain.chatTrigger",
"position": [
-7400,
3040
],
"webhookId": "",
"parameters": {
"options": {}
},
"typeVersion": 1.1
},
{
"id": "",
"name": "Loop Over Items",
"type": "n8n-nodes-base.splitInBatches",
"position": [
-5960,
3040
],
"parameters": {
"options": {
"reset": false
}
},
"typeVersion": 3
},
{
"id": "",
"name": "Simple Memory",
"type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
"position": [
-4880,
3000
],
"parameters": {
"sessionKey": "={{$('Set model, sessionId, chatInput, sessionIdBase').item.json.sessionId}}",
"sessionIdType": "customKey"
},
"typeVersion": 1.3
},
{
"id": "",
"name": "Chat Memory Manager",
"type": "@n8n/n8n-nodes-langchain.memoryManager",
"position": [
-4980,
3180
],
"parameters": {
"options": {}
},
"typeVersion": 1.1
},
{
"id": "",
"name": "Sticky Note",
"type": "n8n-nodes-base.stickyNote",
"position": [
-8120,
2600
],
"parameters": {
"color": 5,
"width": 640,
"height": 1180,
"content": "## Easily Compare LLMs Using OpenAI and Google Sheets
This workflow allows you to **easily evaluate and compare the outputs of two language models (LLMs)** before choosing one for production.
In the chat interface, both model outputs are shown side by side. Their responses are also logged into a Google Sheet, where they can be evaluated manually or automatically using a more advanced model.
### Use Case
You're developing an AI agent, and since LLMs are non-deterministic, you want to determine which one performs best for your specific use case. This template is designed to help you compare them effectively.
### How It Works
- The user sends a message to the chat interface.
- The input is duplicated and sent to two different LLMs.
- Each model processes the same prompt independently, using its own memory context.
- Their answers, along with the user input and previous context, are logged to Google Sheets.
- You can review, compare, and evaluate the model outputs manually (or automate it later).
- In the chat, both responses are also shown one after the other for direct comparison.
### How To Use It
- Copy this [Google Sheets template](https://docs.google.com/spreadsheets/d/1grO5jxm05kJ7if9wBIOozjkqW27i8tRedrheLRrpxf4/) (File > Make a Copy).
- Set up your **System Prompt** and **Tools** in the **AI Agent** node to suit your use case.
- Start chatting! Each message will trigger both models and log their responses to the spreadsheet.
*Note: This version is set up for two models. If you want to compare more, you’ll need to extend the workflow logic and update the sheet.*
### About Models
You can use **OpenRouter** or **Vertex AI** to test models across providers.
If you're using a node for a specific provider, like OpenAI, you can compare different models from that provider (e.g., `gpt-4.1` vs `gpt-4.1-mini`).
### Evaluation in Google Sheets
This is ideal for teams, allowing non-technical stakeholders (not just data scientists) to evaluate responses based on real-world needs.
Advanced users can automate this evaluation using a more capable model (like `o3` from **OpenAI**), but note that this will increase token usage and cost.
### Token Considerations
Since **each input is processed by two different models**, the workflow will consume more tokens overall.
Keep an eye on usage, especially if working with longer prompts or running multiple evaluations, as this can impact cost.
"
},
"typeVersion": 1
},
{
"id": "",
"name": "OpenRouter Chat Model",
"type": "@n8n/n8n-nodes-langchain.lmChatOpenRouter",
"position": [
-5180,
3000
],
"parameters": {
"model": "={{$json.model}}"
},
"credentials": {
"openRouterApi": {
"id": "",
"name": ""
}
},
"typeVersion": 1
},
{
"id": "",
"name": "Sticky Note1",
"type": "n8n-nodes-base.stickyNote",
"position": [
-7220,
2620
],
"parameters": {
"color": 7,
"width": 360,
"height": 580,
"content": "## Define Models to Compare
This node defines the array of model IDs to be compared.
In this template, we compare two models using the OpenRouter API. You can modify the list by specifying the full model IDs you want to test.
Example:
**[\"openai/gpt-4.1\", \"mistralai/mistral-large\"]**
If you're using a different LLM provider (like OpenAI directly, or Google Vertex AI), make sure to update the model IDs according to that provider's naming conventions.
*Note: This template is built for two models. For more, you’ll need to adjust the workflow logic and the Google Sheet structure.*
"
},
"typeVersion": 1
},
{
"id": "",
"name": "Sticky Note2",
"type": "n8n-nodes-base.stickyNote",
"position": [
-6500,
2620
],
"parameters": {
"color": 7,
"width": 360,
"height": 580,
"content": "## Set model, sessionId, chatInput, sessionIdBase
This node prepares the variables used during the loop that queries each model.
- **model**: The ID of the model being used in the current iteration.
- **sessionId**: A unique session key combining the original session ID and model name. This ensures memory isolation per model.
- **chatInput**: The user’s input message.
- **sessionIdBase**: The original session ID without any model-specific suffix. Used in Sheets to group evaluations from the same session."
},
"typeVersion": 1
},
{
"id": "",
"name": "Set model, sessionId, chatInput, sessionIdBase",
"type": "n8n-nodes-base.set",
"position": [
-6380,
3040
],
"parameters": {
"options": {},
"assignments": {
"assignments": [
{
"id": "",
"name": "model",
"type": "string",
"value": "={{ $json.models }}"
},
{
"id": "",
"name": "sessionId",
"type": "string",
"value": "={{ $('When chat message received').item.json.sessionId }}{{$json.models }}"
},
{
"id": "",
"name": "chatInput",
"type": "string",
"value": "={{ $('When chat message received').item.json.chatInput }}"
},
{
"id": "",
"name": "sessionIdBase",
"type": "string",
"value": "={{ $('When chat message received').item.json.sessionId }}"
}
]
}
},
"typeVersion": 3.4
},
{
"id": "",
"name": "AI Agent",
"type": "@n8n/n8n-nodes-langchain.agent",
"position": [
-5480,
3180
],
"parameters": {
"options": {
"returnIntermediateSteps": false
}
},
"typeVersion": 1.8
},
{
"id": "",
"name": "Sticky Note3",
"type": "n8n-nodes-base.stickyNote",
"position": [
-5600,
3160
],
"parameters": {
"color": 7,
"width": 540,
"height": 520,
"content": "
## AI Agent
This AI Agent is connected to OpenRouter Models. The model is selected dynamically from the variable `{{$json.model}}`, defined earlier.
Memory is isolated per model using the `{{$('Set model, sessionId, chatInput, sessionIdBase').item.json.sessionId}}` key.
**⚠️ This agent currently has no system prompt or tools configured**. If you want to test specific tasks, you must define them yourself to reflect realistic use cases."
},
"typeVersion": 1
},
{
"id": "",
"name": "Sticky Note4",
"type": "n8n-nodes-base.stickyNote",
"position": [
-5040,
3160
],
"parameters": {
"color": 7,
"width": 380,
"height": 520,
"content": "
## Chat Memory Manager
This node handles retrieval of prior context for the chat session. It helps with qualitative evaluation by storing context that’s injected into the Google Sheet.
It shares memory with the AI Agent via the “Simple Memory” node.
> You can switch to Redis or Postgres memory backends if needed."
},
"typeVersion": 1
},
{
"id": "",
"name": "Sticky Note5",
"type": "n8n-nodes-base.stickyNote",
"position": [
-4640,
3160
],
"parameters": {
"color": 7,
"width": 380,
"height": 760,
"content": "
## Prepare Data for Chat and Google Sheets
This node sets the following fields:
- **output**: The model's response, formatted for chat display with visual separation to make comparison easier.
- **chatInput**: The user input that will be recorded in Google Sheets.
- **model_answer**: The actual answer from the model being evaluated.
- **model**: The name or ID of the model providing the answer, used for identifying performance.
- **context**: A history of the prior conversation (excluding the latest input). If it's the user's first message, a placeholder is used.
- **sessionId**: A unique session identifier combining model name and session, ensuring separate context windows for each model.
- **sessionIdBase**: The original user session ID (without model suffix), useful for grouping responses from different models in Sheets."
},
"typeVersion": 1
},
{
"id": "",
"name": "Concatenate Chat Answers",
"type": "n8n-nodes-base.summarize",
"position": [
-5300,
2620
],
"parameters": {
"options": {},
"fieldsToSummarize": {
"values": [
{
"field": "output",
"separateBy": "
",
"aggregation": "concatenate"
}
]
}
},
"typeVersion": 1.1
},
{
"id": "",
"name": "Sticky Note6",
"type": "n8n-nodes-base.stickyNote",
"position": [
-5080,
2120
],
"parameters": {
"color": 5,
"width": 460,
"height": 500,
"content": "## Add Model Results to Google Sheet
This Google Sheets step records both model responses along for evaluation.
⚠️ Depending on the length of model responses, you may need to adjust row height or column width.
The template includes basic evaluation fields (`model_1_eval`, `model_2_eval`) with a dropdown like:
**\"Good\", \"Correct\", \"Bad\"**, but feel free to customize with more granular rating criteria."
},
"typeVersion": 1
},
{
"id": "",
"name": "Group Model Outputs for Evaluation",
"type": "n8n-nodes-base.aggregate",
"position": [
-5300,
2440
],
"parameters": {
"options": {},
"fieldsToAggregate": {
"fieldToAggregate": [
{
"fieldToAggregate": "model_answer"
},
{
"fieldToAggregate": "context"
},
{
"fieldToAggregate": "chatInput"
},
{
"fieldToAggregate": "sessionIdBase"
},
{
"fieldToAggregate": "model"
}
]
}
},
"typeVersion": 1
},
{
"id": "",
"name": "Add Model Results to Google Sheet",
"type": "n8n-nodes-base.googleSheets",
"onError": "continueRegularOutput",
"position": [
-4940,
2440
],
"parameters": {
"columns": {
"value": {
"sessionId": "={{ $json.sessionIdBase[0] }}",
"model_1_id": "={{ $json.model[0] }}",
"model_2_id": "={{ $json.model[1] }}",
"user_input": "={{ $json.chatInput[0] }}",
"model_1_answer": "={{ $json.model_answer[0] }}",
"model_2_answer": "={{ $json.model_answer[1] }}",
"context_model_1": "={{ $json.context[0] }}",
"context_model_2": "={{ $json.context[1] }}"
},
"schema": [
{
"id": "sessionId",
"type": "string",
"display": true,
"required": false,
"displayName": "sessionId",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "model_1_id",
"type": "string",
"display": true,
"required": false,
"displayName": "model_1_id",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "model_2_id",
"type": "string",
"display": true,
"required": false,
"displayName": "model_2_id",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "user_input",
"type": "string",
"display": true,
"required": false,
"displayName": "user_input",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "model_1_answer",
"type": "string",
"display": true,
"required": false,
"displayName": "model_1_answer",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "model_2_answer",
"type": "string",
"display": true,
"required": false,
"displayName": "model_2_answer",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "model_1_eval",
"type": "string",
"display": true,
"required": false,
"displayName": "model_1_eval",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "model_2_eval",
"type": "string",
"display": true,
"required": false,
"displayName": "model_2_eval",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "context_model_1",
"type": "string",
"display": true,
"required": false,
"displayName": "context_model_1",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "context_model_2",
"type": "string",
"display": true,
"required": false,
"displayName": "context_model_2",
"defaultMatch": false,
"canBeUsedToMatch": true
}
],
"mappingMode": "defineBelow",
"matchingColumns": [],
"attemptToConvertTypes": false,
"convertFieldsToString": false
},
"options": {},
"operation": "append",
"sheetName": {
"__rl": true,
"mode": "list",
"value": "gid=0",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1grO5jxm05kJ7if9wBIOozjkqW27i8tRedrheLRrpxf4/",
"cachedResultName": "llms_eval"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1grO5jxm05kJ7if9wBIOozjkqW27i8tRedrheLRrpxf4",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1grO5jxm05kJ7if9wBIOozjkqW27i8tRedrheLRrpxf4/",
"cachedResultName": "Template - Easy LLMs Eval"
},
"authentication": "serviceAccount"
},
"credentials": {
"googleApi": {
"id": "",
"name": ""
}
},
"typeVersion": 4.5
},
{
"id": "",
"name": "Prepare Data for Chat and Google Sheets",
"type": "n8n-nodes-base.set",
"position": [
-4500,
3180
],
"parameters": {
"options": {},
"assignments": {
"assignments": [
{
"id": "",
"name": "output",
"type": "string",
"value": "=### `{{ $('Set model, sessionId, chatInput, sessionIdBase').item.json.model }}` answered :
{{ $('AI Agent').item.json.output }}
----------
"
},
{
"id": "",
"name": "chatInput",
"type": "string",
"value": "={{ $('Set model, sessionId, chatInput, sessionIdBase').item.json.chatInput }}"
},
{
"id": "",
"name": "model_answer",
"type": "string",
"value": "={{ $('AI Agent').item.json.output }}"
},
{
"id": "",
"name": "model",
"type": "string",
"value": "={{ $('Set model, sessionId, chatInput, sessionIdBase').item.json.model }}"
},
{
"id": "",
"name": "context",
"type": "string",
"value": "={{
(() => {
const history = $json[\"messages\"]; // ou adapter selon ton chemin réel
if (!Array.isArray(history) || history.length <= 1) {
return \"No prior context available — likely the user's first message or memory not yet initialized.\";
}
const truncated = history.slice(0, -1); // on enlève le dernier échange
return truncated.map(pair => `Human: ${pair.human}\nAI: ${pair.ai}`).join('\n');
})()
}}
"
},
{
"id": "",
"name": "sessionId",
"type": "string",
"value": "={{ $('Loop Over Items').item.json.sessionId }}"
},
{
"id": "",
"name": "sessionIdBase",
"type": "string",
"value": "={{ $('Loop Over Items').item.json.sessionIdBase }}"
}
]
}
},
"typeVersion": 3.4
},
{
"id": "",
"name": "Define Models to Compare",
"type": "n8n-nodes-base.set",
"position": [
-7100,
3040
],
"parameters": {
"options": {},
"assignments": {
"assignments": [
{
"id": "",
"name": "=models",
"type": "array",
"value": "=[\"openai/gpt-4.1\", \"mistralai/mistral-large\"]"
}
]
}
},
"typeVersion": 3.4
},
{
"id": "",
"name": "Split Models into Items",
"type": "n8n-nodes-base.splitOut",
"position": [
-6760,
3040
],
"parameters": {
"options": {},
"fieldToSplitOut": "models"
},
"typeVersion": 1
},
{
"id": "",
"name": "Set Output for Chat UI",
"type": "n8n-nodes-base.set",
"position": [
-4940,
2620
],
"parameters": {
"options": {},
"assignments": {
"assignments": [
{
"id": "",
"name": "output",
"type": "string",
"value": "={{ $json.concatenated_output }}"
}
]
}
},
"typeVersion": 3.4
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "",
"connections": {
"AI Agent": {
"main": [
[
{
"node": "Chat Memory Manager",
"type": "main",
"index": 0
}
]
]
},
"Simple Memory": {
"ai_memory": [
[
{
"node": "Chat Memory Manager",
"type": "ai_memory",
"index": 0
},
{
"node": "AI Agent",
"type": "ai_memory",
"index": 0
}
]
]
},
"Loop Over Items": {
"main": [
[
{
"node": "Concatenate Chat Answers",
"type": "main",
"index": 0
},
{
"node": "Group Model Outputs for Evaluation",
"type": "main",
"index": 0
}
],
[
{
"node": "AI Agent",
"type": "main",
"index": 0
}
]
]
},
"Chat Memory Manager": {
"main": [
[
{
"node": "Prepare Data for Chat and Google Sheets",
"type": "main",
"index": 0
}
]
]
},
"OpenRouter Chat Model": {
"ai_languageModel": [
[
{
"node": "AI Agent",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"Split Models into Items": {
"main": [
[
{
"node": "Set model, sessionId, chatInput, sessionIdBase",
"type": "main",
"index": 0
}
]
]
},
"Concatenate Chat Answers": {
"main": [
[
{
"node": "Set Output for Chat UI",
"type": "main",
"index": 0
}
]
]
},
"Define Models to Compare": {
"main": [
[
{
"node": "Split Models into Items",
"type": "main",
"index": 0
}
]
]
},
"When chat message received": {
"main": [
[
{
"node": "Define Models to Compare",
"type": "main",
"index": 0
}
]
]
},
"Group Model Outputs for Evaluation": {
"main": [
[
{
"node": "Add Model Results to Google Sheet",
"type": "main",
"index": 0
}
]
]
},
"Prepare Data for Chat and Google Sheets": {
"main": [
[
{
"node": "Loop Over Items",
"type": "main",
"index": 0
}
]
]
},
"Set model, sessionId, chatInput, sessionIdBase": {
"main": [
[
{
"node": "Loop Over Items",
"type": "main",
"index": 0
}
]
]
}
}
}
功能特点
- 自动检测新邮件
- AI智能内容分析
- 自定义分类规则
- 批量处理能力
- 详细的处理日志
技术分析
节点类型及作用
- @N8N/N8N Nodes Langchain.Chattrigger
- Splitinbatches
- @N8N/N8N Nodes Langchain.Memorybufferwindow
- @N8N/N8N Nodes Langchain.Memorymanager
- Stickynote
复杂度评估
配置难度:
维护难度:
扩展性:
实施指南
前置条件
- 有效的Gmail账户
- n8n平台访问权限
- Google API凭证
- AI分类服务订阅
配置步骤
- 在n8n中导入工作流JSON文件
- 配置Gmail节点的认证信息
- 设置AI分类器的API密钥
- 自定义分类规则和标签映射
- 测试工作流执行
- 配置定时触发器(可选)
关键参数
| 参数名称 | 默认值 | 说明 |
|---|---|---|
| maxEmails | 50 | 单次处理的最大邮件数量 |
| confidenceThreshold | 0.8 | 分类置信度阈值 |
| autoLabel | true | 是否自动添加标签 |
最佳实践
优化建议
- 定期更新AI分类模型以提高准确性
- 根据邮件量调整处理批次大小
- 设置合理的分类置信度阈值
- 定期清理过期的分类规则
安全注意事项
- 妥善保管API密钥和认证信息
- 限制工作流的访问权限
- 定期审查处理日志
- 启用双因素认证保护Gmail账户
性能优化
- 使用增量处理减少重复工作
- 缓存频繁访问的数据
- 并行处理多个邮件分类任务
- 监控系统资源使用情况
故障排除
常见问题
邮件未被正确分类
检查AI分类器的置信度阈值设置,适当降低阈值或更新训练数据。
Gmail认证失败
确认Google API凭证有效且具有正确的权限范围,重新进行OAuth授权。
调试技巧
- 启用详细日志记录查看每个步骤的执行情况
- 使用测试邮件验证分类逻辑
- 检查网络连接和API服务状态
- 逐步执行工作流定位问题节点
错误处理
工作流包含以下错误处理机制:
- 网络超时自动重试(最多3次)
- API错误记录和告警
- 处理失败邮件的隔离机制
- 异常情况下的回滚操作