Read sitemap and filter URLs
工作流概述
这是一个包含10个节点的复杂工作流,主要用于自动化处理各种任务。
工作流源代码
{
"id": "7fdJOvYNILCr24fH",
"meta": {
"instanceId": "568298fde06d3db80a2eea77fe5bf45f0c7bb898dea20b769944e9ac7c6c5a80"
},
"name": "Read sitemap and filter URLs",
"tags": [],
"nodes": [
{
"id": "38910330-5286-4f3f-b62e-9216acccd503",
"name": "‘Test workflow’ trigger",
"type": "n8n-nodes-base.manualTrigger",
"position": [
-460,
-60
],
"parameters": {},
"typeVersion": 1
},
{
"id": "d4e5991b-62d9-45ca-962f-c1077f3bce19",
"name": "Set sitemap URL",
"type": "n8n-nodes-base.set",
"position": [
-280,
-60
],
"parameters": {
"options": {},
"assignments": {
"assignments": [
{
"id": "d6c5ac86-6d67-42fb-96ec-9826caf452e2",
"name": "sitemapUrl",
"type": "string",
"value": "https://duckduckgo.com/sitemap.xml"
}
]
}
},
"typeVersion": 3.4
},
{
"id": "0d957deb-5830-4077-97e4-437dc7c0e527",
"name": "Split Out",
"type": "n8n-nodes-base.splitOut",
"position": [
260,
-60
],
"parameters": {
"options": {},
"fieldToSplitOut": "urlset.url"
},
"typeVersion": 1
},
{
"id": "7021088c-dfa1-4aae-b2e7-15b0ca10a750",
"name": "Get Sitemap",
"type": "n8n-nodes-base.httpRequest",
"position": [
-100,
-60
],
"parameters": {
"url": "={{ $json.sitemapUrl }}",
"options": {}
},
"typeVersion": 4.2
},
{
"id": "d3b86577-01fc-40f8-ab65-93ba420187b8",
"name": "Convert Sitemap to JSON",
"type": "n8n-nodes-base.xml",
"position": [
80,
-60
],
"parameters": {
"options": {
"trim": true,
"normalize": true,
"mergeAttrs": true,
"ignoreAttrs": true,
"normalizeTags": true
}
},
"typeVersion": 1
},
{
"id": "bc0758ae-06eb-4a29-a91e-414407ec8ade",
"name": "Filter URLs",
"type": "n8n-nodes-base.filter",
"position": [
440,
-60
],
"parameters": {
"options": {},
"conditions": {
"options": {
"version": 2,
"leftValue": "",
"caseSensitive": true,
"typeValidation": "strict"
},
"combinator": "and",
"conditions": [
{
"id": "0bf8e98c-b6c5-4129-852c-0d3e63f32f9f",
"operator": {
"type": "string",
"operation": "endsWith"
},
"leftValue": "={{ $json.loc }}",
"rightValue": ".pdf"
}
]
}
},
"typeVersion": 2.2
},
{
"id": "1d3fed97-1e72-426c-a48d-1a9683f40c4c",
"name": "Sticky Note1",
"type": "n8n-nodes-base.stickyNote",
"position": [
-300,
-140
],
"parameters": {
"color": 6,
"width": 150,
"height": 240,
"content": "**Set your sitemap.xml
url here.**"
},
"typeVersion": 1
},
{
"id": "521ec74d-6707-47fd-992d-eecebed415ab",
"name": "Sticky Note2",
"type": "n8n-nodes-base.stickyNote",
"position": [
420,
-140
],
"parameters": {
"color": 6,
"width": 150,
"height": 240,
"content": "**Create your filter here.**"
},
"typeVersion": 1
},
{
"id": "07e6c3de-cc72-490d-b614-67034ce04bfb",
"name": "Sticky Note3",
"type": "n8n-nodes-base.stickyNote",
"position": [
-140,
-180
],
"parameters": {
"color": 7,
"width": 540,
"height": 300,
"content": "## Fetch and process the sitemap.xml file
This part fetches and process the sitemap.xml file from XML data to JSON that we can work with."
},
"typeVersion": 1
},
{
"id": "abf5f02d-d2a0-43f1-9a1f-386cc4f9861b",
"name": "Sticky Note",
"type": "n8n-nodes-base.stickyNote",
"position": [
-780,
-220
],
"parameters": {
"width": 280,
"height": 420,
"content": "## Sitemap.xml reader
This workflow reads an sitemap.xml and filters out the entries you want.
By default only PDF documents are returned at the end of the workflow.
**SETUP**
- Edit the **Set sitemap URL** block and add the url to the sitemap you want to read.
- Edit the **Filter URLs** to your needs."
},
"typeVersion": 1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "74793599-4c7d-4532-bbd5-a2ce4761fbc8",
"connections": {
"Split Out": {
"main": [
[
{
"node": "Filter URLs",
"type": "main",
"index": 0
}
]
]
},
"Get Sitemap": {
"main": [
[
{
"node": "Convert Sitemap to JSON",
"type": "main",
"index": 0
}
]
]
},
"Set sitemap URL": {
"main": [
[
{
"node": "Get Sitemap",
"type": "main",
"index": 0
}
]
]
},
"Convert Sitemap to JSON": {
"main": [
[
{
"node": "Split Out",
"type": "main",
"index": 0
}
]
]
},
"‘Test workflow’ trigger": {
"main": [
[
{
"node": "Set sitemap URL",
"type": "main",
"index": 0
}
]
]
}
}
}
功能特点
- 自动检测新邮件
- AI智能内容分析
- 自定义分类规则
- 批量处理能力
- 详细的处理日志
技术分析
节点类型及作用
- Manualtrigger
- Set
- Splitout
- Httprequest
- Xml
复杂度评估
配置难度:
维护难度:
扩展性:
实施指南
前置条件
- 有效的Gmail账户
- n8n平台访问权限
- Google API凭证
- AI分类服务订阅
配置步骤
- 在n8n中导入工作流JSON文件
- 配置Gmail节点的认证信息
- 设置AI分类器的API密钥
- 自定义分类规则和标签映射
- 测试工作流执行
- 配置定时触发器(可选)
关键参数
| 参数名称 | 默认值 | 说明 |
|---|---|---|
| maxEmails | 50 | 单次处理的最大邮件数量 |
| confidenceThreshold | 0.8 | 分类置信度阈值 |
| autoLabel | true | 是否自动添加标签 |
最佳实践
优化建议
- 定期更新AI分类模型以提高准确性
- 根据邮件量调整处理批次大小
- 设置合理的分类置信度阈值
- 定期清理过期的分类规则
安全注意事项
- 妥善保管API密钥和认证信息
- 限制工作流的访问权限
- 定期审查处理日志
- 启用双因素认证保护Gmail账户
性能优化
- 使用增量处理减少重复工作
- 缓存频繁访问的数据
- 并行处理多个邮件分类任务
- 监控系统资源使用情况
故障排除
常见问题
邮件未被正确分类
检查AI分类器的置信度阈值设置,适当降低阈值或更新训练数据。
Gmail认证失败
确认Google API凭证有效且具有正确的权限范围,重新进行OAuth授权。
调试技巧
- 启用详细日志记录查看每个步骤的执行情况
- 使用测试邮件验证分类逻辑
- 检查网络连接和API服务状态
- 逐步执行工作流定位问题节点
错误处理
工作流包含以下错误处理机制:
- 网络超时自动重试(最多3次)
- API错误记录和告警
- 处理失败邮件的隔离机制
- 异常情况下的回滚操作