【问题标题】:Filtering Arrays in NodeJS without knowing where the value's location is在 NodeJS 中过滤数组而不知道值的位置在哪里
【发布时间】:2021-10-14 12:12:46
【问题描述】:

我正在做这个项目,该项目应该抓取网站并以 JSON 的形式输出 HTML,现在这些 JSON 中对我们有用的东西是“表单”。

我想过滤它,但原生数组过滤器仅在我知道属性相对于整个页面(DOM??)的位置时才有效,但情况并非总是如此,我害怕检查每个对象的值,直到我达到预期值是不可行的,因为

  1. 有些页面非常庞大,
  2. form 在我们不想要的其他地方是一个字符串,这是在 NodeJS 中

输入片段:

[
  {
    "type": "element",
    "tagName": "p",
    "attributes": [],
    "children": [
      {
        "type": "text",
        "content": "This is how the HTML code above will be displayed in a browser:"
      }
    ]
  },
  {
    "type": "text",
    "content": "\n"
  },
  {
    "type": "element",
    "tagName": "form",
    "attributes": [
      {
        "key": "action",
        "value": "/action_page.php"
      },
      {
        "key": "target",
        "value": "_blank"
      }
    ],
    "children": [
      {
        "type": "text",
        "content": "\nFirst name:"
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "input",
        "attributes": [
          {
            "key": "type",
            "value": "text"
          },
          {
            "key": "name",
            "value": "firstname0"
          },
          {
            "key": "value",
            "value": "John"
          }
        ],
        "children": []
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "text",
        "content": "\nLast name:"
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "input",
        "attributes": [
          {
            "key": "type",
            "value": "text"
          },
          {
            "key": "name",
            "value": "lastname0"
          },
          {
            "key": "value",
            "value": "Doe"
          }
        ],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "input",
        "attributes": [
          {
            "key": "type",
            "value": "submit"
          },
          {
            "key": "value",
            "value": "Submit"
          }
        ],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "input",
        "attributes": [
          {
            "key": "type",
            "value": "reset"
          }
        ],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      }
    ]
  },
  {
    "type": "text",
    "content": "\n"
  }
]

输出的sn-p:

[
  {
    "type": "element",
    "tagName": "form",
    "attributes": [
      {
        "key": "action",
        "value": "/action_page.php"
      },
      {
        "key": "target",
        "value": "_blank"
      }
    ],
    "children": [
      {
        "type": "text",
        "content": "\nFirst name:"
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "input",
        "attributes": [
          {
            "key": "type",
            "value": "text"
          },
          {
            "key": "name",
            "value": "firstname0"
          },
          {
            "key": "value",
            "value": "John"
          }
        ],
        "children": []
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "text",
        "content": "\nLast name:"
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "input",
        "attributes": [
          {
            "key": "type",
            "value": "text"
          },
          {
            "key": "name",
            "value": "lastname0"
          },
          {
            "key": "value",
            "value": "Doe"
          }
        ],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "element",
        "tagName": "br",
        "attributes": [],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "input",
        "attributes": [
          {
            "key": "type",
            "value": "submit"
          },
          {
            "key": "value",
            "value": "Submit"
          }
        ],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      },
      {
        "type": "element",
        "tagName": "input",
        "attributes": [
          {
            "key": "type",
            "value": "reset"
          }
        ],
        "children": []
      },
      {
        "type": "text",
        "content": "\n"
      }
    ]
  }
]

TL;DR:仅保留表单及其任何子项。

【问题讨论】:

  • 这个问题确实需要一些好的格式:D

标签: node.js arrays json filter puppeteer


【解决方案1】:

首先,这个输入看起来很不完整,它可能是一个数组或一个对象。如果我假设它是一个对象数组,那么我可以使用jsonpath 来访问任何值。

var jp = require('jsonpath');
var formNodes = jp.query(nodes, `$..[?(@.tagName=="form")]`);

您可以使用 vanila javascript there was several stackoverflow questions for that 实现相同的目标。但是我发现 jsonpath 和 xpath 比那些更容易实现。

【讨论】:

  • 非常感谢,成功了,非常感谢您的帮助
  • 现在我已经过滤了它并且我正在为每个元素附加坐标,我已经完成了坐标部分(元素处理程序边界框)并且只需要返回值和类型,关于如何解析这些数据的任何想法从 JSON 然后将它们附加回各自的对象?我发现这个问题的所有解决方案都太硬编码/特定
  • 请创建一个包含详细信息的新问题
猜你喜欢
  • 2018-09-09
  • 1970-01-01
  • 1970-01-01
  • 2014-04-02
  • 2013-05-25
  • 1970-01-01
  • 2022-07-28
  • 2018-04-29
  • 2012-09-13
相关资源
最近更新 更多