php xpath查询从标签上的href获取规范字符答案

【问题标题】：Php xpath query get spec character from href on tagsphp xpath查询从标签上的href获取规范字符
【发布时间】：2017-07-18 00:02:39
【问题描述】：

标签

<a href="http://www.example.com/5809/book>Origin of Species</a>  
<a href="http://www.example.com/author/id=124>Darwin</a>  
<a href="http://www.example.com/196/genres>Science, Biology</a>  
<span class="Xbkznofv">24/11/1859</span>

如何使用标签上的href 中的xpath 查询获取id 编号？

我想要这个例子的结果：

5809、124、196、24/11/1859

PHP 代码

$url = 'http://www.example.com/Books/Default.aspx';
libxml_use_internal_errors(true); 
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);

$elements1 = $xpath->query('//a[contains(@href, "www.example.com/Book/")]');  
$elements2 = $xpath->query('//a[contains(@href,  "www.example.com/author/id=")]');  
$elements3 = $xpath->query('//a[contains(@href, "www.example.com/genres/")]');  
$elements4 = $xpath->query('//span[contains(@class, "")]');

if (!is_null($elements)) {
  foreach ($elements as $element) {
echo "<br/>". "";

$nodes = $element->childNodes;
foreach ($nodes as $node) {
  echo $node->nodeValue. "\n";
    }
  }
}

【问题讨论】：

标签： php xpath

【解决方案1】：

Xpath 1.0 有一些有限的字符串操作，但在某些时候，读取属性并使用正则表达式提取值会容易得多。

但是这里是一个仅使用 Xpath 的示例：

$html = <<<'HTML'
<a href="http://www.example.com/5809/book">Origin of Species</a>  
<a href="http://www.example.com/author/id=124">Darwin</a>  
<a href="http://www.example.com/196/genres">Science, Biology</a>  
<span class="Xbkznofv">24/11/1859</span>
HTML;

$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);

$data = [
  'book_title' => $xpath->evaluate(
    'string(//a[contains(@href,  "www.example.com") and contains(@href, "/book")])'
  ),
  'book_id' => $xpath->evaluate(
    'substring-before(
      substring-after(
        //a[contains(@href,  "www.example.com") and contains(@href, "/book")]/@href,
        "www.example.com/"
      ),
      "/"
    )'
  ),
  'author_id' => $xpath->evaluate(
    'substring-after(
      //a[contains(@href,  "www.example.com/author/id=")]/@href,
      "/id="
    )'
  )
];

var_dump($data);

输出：

array(3) {
  ["book_title"]=>
  string(17) "Origin of Species"
  ["book_id"]=>
  string(4) "5809"
  ["author_id"]=>
  string(3) "124"
}

这些表达式只适用于DOMXpath::evaluate()，DOMXpath::query() 只能返回节点列表。

大多数时候，您将使用一个表达式来获取节点列表、迭代它们并使用多个表达式来获取值。这是一个简化的例子：

$html = <<<'HTML'
<div class="book">
  <a href="#1">Origin of Species</a>
</div>
<div class="book">
  <a href="#2">On the Shoulders of Giants</a>
</div>
HTML;

$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);

foreach ($xpath->evaluate('//div[@class="book"]') as $book) {
  var_dump(
    $xpath->evaluate('string(.//a)', $book),
    $xpath->evaluate('string(.//a/@href)', $book)
  );
}

输出：

string(17) "Origin of Species"
string(2) "#1"
string(26) "On the Shoulders of Giants"
string(2) "#2"

【讨论】：

非常感谢。还有一件事......如何在 foreach 循环函数中为不止一本书制作这个例子，作者......结果是逗号分隔的格式？
DOMXpath::evaluate() 的第二个参数是上下文节点。您需要执行类似$xpath->evaluate('string(.//a)', $outerNode) 的操作。 .// 是当前上下文节点的任何后代。对于 CSV 写入，请查找 fputcsv()。
我刚开始学习。如果你能写一个简单的例子......如果你不忙？这对我很重要。
嗨@ThW ThW 你对这个问题有什么建议吗stackoverflow.com/questions/45712585/…
嗨@ThW ThW 你对这个问题有什么建议吗[stackoverflow.com/questions/50174346/…