链接页面右侧各个图块的 HTML 格式如下 *:
<div class="details">
<a href="/store/apps/details?id=com.imangi.templerun" class="card-click-target"></a>
<a title="Temple Run" href="/store/apps/details?id=com.imangi.templerun" class="title">Temple Run
<span class="paragraph-end"/>
</a>
<div>....</div>
<div>....</div>
</div>
结果是 <a> 元素和 class="title" 唯一标识了该页面中的目标 <a> 元素。所以 XPath 可以很简单:
//a[@class="title"]/@href
无论如何,您注意到的问题似乎特定于 Chrome XPath 评估程序**。既然您提到了 Python,简单的 Python 代码证明 XPath 应该可以正常工作:
>>> from urllib2 import urlopen
>>> from lxml import html
>>> req = urlopen('https://play.google.com/store/apps/details?id=com.mojang.minecraftpe')
>>> raw = req.read()
>>> root = html.fromstring(raw)
>>> [h for h in root.xpath("//a[@class='title']/@href")]
['/store/apps/details?id=com.imangi.templerun', '/store/apps/details?id=com.lego.superheroes.dccomicsteamup', '/store/apps/details?id=com.turner.freefurall', '/store/apps/details?id=com.mtvn.Nickelodeon.GameOn', '/store/apps/details?id=com.disney.disneycrossyroad_goo', '/store/apps/details?id=com.rovio.angrybirdsstarwars.ads.iap', '/store/apps/details?id=com.rovio.angrybirdstransformers', '/store/apps/details?id=com.disney.dinostampede_goo', '/store/apps/details?id=com.turner.atskisafari', '/store/apps/details?id=com.moose.shopville', '/store/apps/details?id=com.DisneyDigitalBooks.SevenDMineTrain', '/store/apps/details?id=com.turner.copatoon', '/store/apps/details?id=com.turner.wbb2016', '/store/apps/details?id=com.tov.google.ben10Xenodrome', '/store/apps/details?id=com.turner.ggl.gumballrainbowruckus', '/store/apps/details?id=com.lego.starwars.theyodachronicles', '/store/apps/details?id=com.mojang.scrolls']
*) 精简版。您可以以此作为提供最小 HTML 示例的示例。
**) 我可以重现这个问题,@hrefs 在我的 Chrome 控制台中打印为空字符串。同样的问题也发生在其他人身上:Chrome element inspector Xpath with @href won't show link text