Read sitemap and filter URLs

工作流概述

这是一个包含10个节点的复杂工作流,主要用于自动化处理各种任务。

工作流源代码

下载
{
  "id": "7fdJOvYNILCr24fH",
  "meta": {
    "instanceId": "568298fde06d3db80a2eea77fe5bf45f0c7bb898dea20b769944e9ac7c6c5a80"
  },
  "name": "Read sitemap and filter URLs",
  "tags": [],
  "nodes": [
    {
      "id": "38910330-5286-4f3f-b62e-9216acccd503",
      "name": "‘Test workflow’ trigger",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -460,
        -60
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "d4e5991b-62d9-45ca-962f-c1077f3bce19",
      "name": "Set sitemap URL",
      "type": "n8n-nodes-base.set",
      "position": [
        -280,
        -60
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "d6c5ac86-6d67-42fb-96ec-9826caf452e2",
              "name": "sitemapUrl",
              "type": "string",
              "value": "https://duckduckgo.com/sitemap.xml"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "0d957deb-5830-4077-97e4-437dc7c0e527",
      "name": "Split Out",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        260,
        -60
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "urlset.url"
      },
      "typeVersion": 1
    },
    {
      "id": "7021088c-dfa1-4aae-b2e7-15b0ca10a750",
      "name": "Get Sitemap",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -100,
        -60
      ],
      "parameters": {
        "url": "={{ $json.sitemapUrl }}",
        "options": {}
      },
      "typeVersion": 4.2
    },
    {
      "id": "d3b86577-01fc-40f8-ab65-93ba420187b8",
      "name": "Convert Sitemap to JSON",
      "type": "n8n-nodes-base.xml",
      "position": [
        80,
        -60
      ],
      "parameters": {
        "options": {
          "trim": true,
          "normalize": true,
          "mergeAttrs": true,
          "ignoreAttrs": true,
          "normalizeTags": true
        }
      },
      "typeVersion": 1
    },
    {
      "id": "bc0758ae-06eb-4a29-a91e-414407ec8ade",
      "name": "Filter URLs",
      "type": "n8n-nodes-base.filter",
      "position": [
        440,
        -60
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "0bf8e98c-b6c5-4129-852c-0d3e63f32f9f",
              "operator": {
                "type": "string",
                "operation": "endsWith"
              },
              "leftValue": "={{ $json.loc }}",
              "rightValue": ".pdf"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "1d3fed97-1e72-426c-a48d-1a9683f40c4c",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -300,
        -140
      ],
      "parameters": {
        "color": 6,
        "width": 150,
        "height": 240,
        "content": "**Set your sitemap.xml
url here.**"
      },
      "typeVersion": 1
    },
    {
      "id": "521ec74d-6707-47fd-992d-eecebed415ab",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        420,
        -140
      ],
      "parameters": {
        "color": 6,
        "width": 150,
        "height": 240,
        "content": "**Create your filter here.**"
      },
      "typeVersion": 1
    },
    {
      "id": "07e6c3de-cc72-490d-b614-67034ce04bfb",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -140,
        -180
      ],
      "parameters": {
        "color": 7,
        "width": 540,
        "height": 300,
        "content": "## Fetch and process the sitemap.xml file
This part fetches and process the sitemap.xml file from XML data to JSON that we can work with."
      },
      "typeVersion": 1
    },
    {
      "id": "abf5f02d-d2a0-43f1-9a1f-386cc4f9861b",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -780,
        -220
      ],
      "parameters": {
        "width": 280,
        "height": 420,
        "content": "## Sitemap.xml reader
This workflow reads an sitemap.xml and filters out the entries you want.

By default only PDF documents are returned at the end of the workflow.

**SETUP**
- Edit the **Set sitemap URL** block and add the url to the sitemap you want to read.

- Edit the **Filter URLs** to your needs."
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "pinData": {},
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "74793599-4c7d-4532-bbd5-a2ce4761fbc8",
  "connections": {
    "Split Out": {
      "main": [
        [
          {
            "node": "Filter URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get Sitemap": {
      "main": [
        [
          {
            "node": "Convert Sitemap to JSON",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set sitemap URL": {
      "main": [
        [
          {
            "node": "Get Sitemap",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Convert Sitemap to JSON": {
      "main": [
        [
          {
            "node": "Split Out",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "‘Test workflow’ trigger": {
      "main": [
        [
          {
            "node": "Set sitemap URL",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

功能特点

  • 自动检测新邮件
  • AI智能内容分析
  • 自定义分类规则
  • 批量处理能力
  • 详细的处理日志

技术分析

节点类型及作用

  • Manualtrigger
  • Set
  • Splitout
  • Httprequest
  • Xml

复杂度评估

配置难度:
★★★★☆
维护难度:
★★☆☆☆
扩展性:
★★★★☆

实施指南

前置条件

  • 有效的Gmail账户
  • n8n平台访问权限
  • Google API凭证
  • AI分类服务订阅

配置步骤

  1. 在n8n中导入工作流JSON文件
  2. 配置Gmail节点的认证信息
  3. 设置AI分类器的API密钥
  4. 自定义分类规则和标签映射
  5. 测试工作流执行
  6. 配置定时触发器(可选)

关键参数

参数名称 默认值 说明
maxEmails 50 单次处理的最大邮件数量
confidenceThreshold 0.8 分类置信度阈值
autoLabel true 是否自动添加标签

最佳实践

优化建议

  • 定期更新AI分类模型以提高准确性
  • 根据邮件量调整处理批次大小
  • 设置合理的分类置信度阈值
  • 定期清理过期的分类规则

安全注意事项

  • 妥善保管API密钥和认证信息
  • 限制工作流的访问权限
  • 定期审查处理日志
  • 启用双因素认证保护Gmail账户

性能优化

  • 使用增量处理减少重复工作
  • 缓存频繁访问的数据
  • 并行处理多个邮件分类任务
  • 监控系统资源使用情况

故障排除

常见问题

邮件未被正确分类

检查AI分类器的置信度阈值设置,适当降低阈值或更新训练数据。

Gmail认证失败

确认Google API凭证有效且具有正确的权限范围,重新进行OAuth授权。

调试技巧

  • 启用详细日志记录查看每个步骤的执行情况
  • 使用测试邮件验证分类逻辑
  • 检查网络连接和API服务状态
  • 逐步执行工作流定位问题节点

错误处理

工作流包含以下错误处理机制:

  • 网络超时自动重试(最多3次)
  • API错误记录和告警
  • 处理失败邮件的隔离机制
  • 异常情况下的回滚操作