使用 XMLReader 查找节点并从当前节点和后续子节点检索 XML答案

【问题标题】：Use XMLReader to find node and retrieve XML from current node and following children使用 XMLReader 查找节点并从当前节点和后续子节点检索 XML
【发布时间】：2021-11-01 05:32:14
【问题描述】：

我正在尝试从一个巨大的 XML 文件中检索一个基于 <id> 元素的特定节点。我使用过 DOMDocument，但它并不理想，因为它首先加载整个文档。文档中有大约 1400 个<item> 节点。这是文档的简化版本：

<main>
  <body>
    ...
    <sub>
      ...
      <items>
        ...
        <item>
          <name>Abc</name>
          ...
          <id>123</id>
            <calls>
              <call>
                <name>Monkey</name>
                <text>Monkeys r cool</text>
                ...
              </call>
              <call>
                <name>Pig</name>
                <text>Pigs too!</text>
                ...
              </call>
            </calls>
            <cones>
              <cone>
                <name>Lorem</name>
                <text>Lorem ipsum</text>
                ...
              </cone>
              <cone>
                <name>More</name>
                <text>Placeholder</text>
                ...
              </cone>
            </cones>
          <a>true</a>
        </item>
        <item>
          <name>Def</name>
          ...
          <id>456</id>
            <calls>
              <call>
                <name>aa</name>
                <text>aa</text>
                ...
              </call>
              <call>
                <name>bb</name>
                <text>bb</text>
                ...
              </call>
            </calls>
            <cones>
              <cone>
                <name>cc</name>
                <text>cc</text>
                ...
              </cone>
              <cone>
                <name>dd</name>
                <text>dd</text>
                ...
              </cone>
            </cones>
          <a>true</a>
        </item>
      </items>
    </sub>
  </body>
</main>

所以基本上我试图通过匹配<id> 元素来检索当前节点及其子节点的数据。我曾尝试在 XMLReader 上查找教程，但似乎找不到那么多。这是我迄今为止尝试过的：

$xml = new XMLReader();
$xml->open('doc.xml');

while($xml->read()) {
    if($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == 'id') {
        $xml->read();
        echo $xml->value;
  }
}

这会找到每个 <id> 元素，但我想找到一个特定的并从当前节点及其子节点读取数据。也许使用示例查找节点和readInnerXml() 获取数据

我不是专家，因此非常感谢任何帮助/推动正确方向：D

【问题讨论】：

您可以使用 XMLReader 和 DOMDocument 的混合方法。例如，您到达第一个项目，用它创建一个 DOMNode（使用 XMLReader::expand），将它附加到 DOMDocument 实例并检查 id 是否正确。如果不是，请使用 XMLReader::next 跳转到下一项。
@CasimiretHippolyte 实际上您不需要附加（和删除）节点。只需将其展开为准备好的文档并使用上下文参数进行 Xpath 调用。
@ThW：我不知道可以在 DOM 树的“外部”使用 XPath（即使使用上下文参数）。如果您不必为每个项目附加或替换节点，它确实更轻。干得好。

标签： php xml xmlreader

【解决方案1】：

如果所有item 元素都是同级元素，您可以使用XMLReader::read() 查找第一个元素并使用XMLReader::next() 迭代它们。

然后使用XMLReader::expand()将item及其后代加载到DOM中，使用Xpath从中读取数据。

$searchForID = '123';

$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));

$document = new DOMDocument();
$xpath = new DOMXpath($document);

// look for the first "item" element node
while (
  $reader->read() && $reader->localName !== 'item'
) {
  continue;
}

// iterate "item" sibling elements
while ($reader->localName === 'item') {
  // expand into DOM
  $item = $reader->expand($document);
  // if the node has a child "id" with the searched contents
  if ($xpath->evaluate("count(self::*[id = '$searchForID']) > 0", $item)) {
    var_dump(
      [
        // fetch node text content as string  
        'name' => $xpath->evaluate('string(name)', $item),
        // fetch list of "call" elements and map them
        'calls' => array_map(
          function(DOMElement $call) use ($xpath) {
            return [
              'name' => $xpath->evaluate('string(name)', $call),
              'text' => $xpath->evaluate('string(text)', $call)
            ];
          },
          iterator_to_array(
            $xpath->evaluate('calls/call', $item)
          )
        )
      ] 
    );
  }
  $reader->next('item');
}
$reader->close();

带有命名空间的 XML

如果 XML 使用命名空间（如您在 cmets 中链接的命名空间），则必须考虑它。

对于 XMLReader，这意味着不仅要验证 localName（没有任何命名空间前缀/别名的节点名称），还要验证 namespaceURI。

对于 DOM 方法，这意味着使用命名空间感知方法（带有后缀 NS）并为 Xpath 表达式注册您自己的别名/前缀。

$searchForID = '2755';

$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));

// the namespace uri
$xmlns_siri = 'http://www.siri.org.uk/siri';

$document = new DOMDocument();
$xpath = new DOMXpath($document);
// register an alias for the siri namespace 
$xpath->registerNamespace('siri', $xmlns_siri);

// look for the first "item" element node
while (
  $reader->read() && 
  (
    $reader->localName !== 'EstimatedVehicleJourney' ||
    $reader->namespaceURI !== $xmlns_siri
  )
) {
  continue;
}

// iterate "item" sibling elements
while ($reader->localName === 'EstimatedVehicleJourney') {
    // validate the namespace of the node
  if ($reader->namespaceURI === $xmlns_siri) {
    // expand into DOM
    $item = $reader->expand($document);
    // if the node has a child "VehicleRef" with the searched contents
    // note the use of the registered namespace alias
    if ($xpath->evaluate("count(self::*[siri:VehicleRef = '$searchForID']) > 0", $item)) {
      var_dump(
        [
          // fetch node text content as string  
          'name' => $xpath->evaluate('string(siri:OriginName)', $item),
          // fetch list of "call" elements and map them
          'calls' => array_map(
            function(DOMElement $call) use ($xpath) {
              return [
                'name' => $xpath->evaluate('string(siri:StopPointName)', $call),
                'reference' => $xpath->evaluate('string(siri:StopPointRef)', $call)
              ];
            },
            iterator_to_array(
              $xpath->evaluate('siri:RecordedCalls/siri:RecordedCall', $item)
            )
          )
        ] 
      );
    }
  }
  $reader->next('EstimatedVehicleJourney');
}
$reader->close();

【讨论】：

它适用于简化的 xml 文档，但是当我尝试转换为更高级的 xml 文档时，我似乎无法使其工作。我想知道您是否想看一下：D，这是数据t.srd.tf/data.xml。 item = EstimatedVehicleJourney 和 id = VehicleRef
我也看到你对其进行了编码，但我无法让它工作，所以我只使用了：$reader->open('https://t.srd.tf/data.xml');。也许这就是问题所在
我使用3v4l.org/pZA7L#v8.0.10 来确保我的示例源代码有效。使用字符串源可以进行完全包含的测试。因此，从字符串创建数据 URI 以加载源。您不会得到结果，因为 XML 使用（默认）命名空间 - 我为此添加了一个示例。
嗨，ThW，如果VehicleRef 在新的子元素下，我如何找到它？现在在NEW_ELEMENT。你能看看吗？ t.srd.tf/data.xml 。我仍然想检索相同的信息
你是说在count(self::*[siri:VehicleRef = '$searchForID']) > 0里面？您可以将其添加到条件内的位置路径 - 不要忘记名称空间别名：count(self::*[siri:NEW_ELEMENT/siri:VehicleRef = '$searchForID']) > 0。我建议阅读有关 XPath 1.0 表达式的内容。