【问题标题】:Xpath for specific elements特定元素的 Xpath
【发布时间】:2019-04-04 17:39:34
【问题描述】:

我在从该网页中抓取特定元素文本时遇到问题:

https://www.oddsportal.com/soccer/africa/africa-cup-of-nations/benin-togo-IsfnZDFd/

这是存档结果中特定比赛的网址,我需要在此页面上从 4 位博彩公司那里获取赔率。我有数以千计的匹配网址要抓取。代码如下所示:

这是我试图找到博彩公司的赔率,但它不起作用:

pjs <- wdman::phantomjs()

eCap <- list(phantomjs.page.settings.userAgent 
             = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20120101 
  Firefox/29.0", phantomjs.page.settings.loadImages = FALSE, phantomjs.phantom.cookiesEnabled = TRUE, phantomjs.phantom.javascriptEnabled = TRUE)


remDr <- remoteDriver(browserName = "phantomjs", port = 4567L, extraCapabilities = eCap)

remDr$open()



remDr$navigate("https://www.oddsportal.com/soccer/africa/africa-cup-of-nations/benin-togo-IsfnZDFd/")
match<-remDr$findElement('xpath','//*[@id="col-content"]/h1')
result<-remDr$findElement('xpath', '//*[@id="event-status"]/p/strong')
odds<-remDr$findElements('xpath', '//*[@class="name" and contains(text(), "18Bet")]')

odds1 <- data.frame(odds = unlist(sapply(odds, function(x){x$getElementText()})))

pjs$stop()

我想要的是最后一个 div 中的 3 个赔率,但页面上有很多不同的博彩公司,我只能选择所有博彩公司的赔率,我的目标是选择准确的博彩公司赔率,但我不知道如何实现这一点,因为 div 中没有关于博彩公司的信息。

<tr class="lo odd">
    <td>
        <div class="l">
            <a class="name2" title="Go to 18bet website!" onclick="return !window.open(this.href)" href="/bookmaker/18bet/link/"><span class="blogos l416"></span></a>&nbsp;
            <a class="name" title="Go to 18bet website!" onclick="return !window.open(this.href)" href="/bookmaker/18bet/link/">18bet</a>&nbsp;&nbsp;
        </div>
        <span class="ico-bookmarker-info ico-bookmaker-detail">
          <a title="Show more details about 18bet" href="/bookmaker/18bet/"></a>
            </span>
        <span class="ico-bookmarker-info ico-bookmaker-bonus">
          <a onmouseout="globals.getBookmaker(416).cancelBonusOver();" xparam="<div class=&quot;bold&quot;>100% Bonus up to 100€!</div><div>100% first deposit bonus up to 100€! Promocode: WSB100</div>~3" onmouseover="globals.getBookmaker(416).trackBonusOver()" onclick="globals.getBookmaker(416).trackBonusClick();return !window.open(this.href);" href="/bookmaker/18bet/bonus/252"></a>
            </span>
    </td>
    <td class="right odds">
        <div onmouseout="delayHideTip()" onmouseover="page.hist(this,'P-0.00-0-0','2mlnbxv464x0x65lst',416,event,0,1)">2.05</div>
    </td>
    <td class="right odds up">
        <div onmouseout="delayHideTip()" onmouseover="page.hist(this,'P-0.00-0-0','2mlnbxv498x0x0',416,event,0,1)">3.20</div>
    </td>
    <td class="right odds">
        <div onmouseout="delayHideTip()" onmouseover="page.hist(this,'P-0.00-0-0','2mlnbxv464x0x65lsu',416,event,0,1)">3.50</div>
    </td>
    <td class="center info-value"><span>92.1%</span></td>
    <td onmouseout="delayHideTip()" class="check ch3" xparam="The match has already started~2"></td>
</tr>

感谢您提前回复。

【问题讨论】:

  • 欢迎您。请查看how to ask 指南。你的问题缺少一些东西。首先,请提供一个可重现的示例,说明您在询问之前尝试过的操作,以及您尝试抓取的代码。然后,避免发布指向图像的链接,代码示例更加明确。然后你可以使用标签更清晰,比如你使用的语言,为什么不用web-scraping标签。

标签: r xpath web-scraping rselenium


【解决方案1】:

这里是 示例,您可以通过博彩公司选择tr - 18bet

1。使用class=nametext="18bet" 查找a,使用class=lo 获取父级tr

 //a[@class="name" and .="18bet"]/ancestor::tr[contains(@class, "lo")]

2。使用class=lo 查找tr,使用class=nametext="18bet" 查找子a

//tr[contains(@class, "lo") and .//a[@class="name" and .="18bet"]]

1奇数://a[@class="name" and .="18bet"]/ancestor::tr[contains(@class, "lo")]//td[2]

X奇数://a[@class="name" and .="18bet"]/ancestor::tr[contains(@class, "lo")]//td[3]

2奇数://a[@class="name" and .="18bet"]/ancestor::tr[contains(@class, "lo")]//td[4]

Payout奇数://a[@class="name" and .="18bet"]/ancestor::tr[contains(@class, "lo")]//td[5]


Python 代码示例:

row = driver.find_element_by_xpath('//a[@class="name" and .="18bet"]/ancestor::tr[contains(@class, "lo")]')

odd_1 = row.find_element_by_xpath('.//td[2]')
odd_x = row.find_element_by_xpath('.//td[3]')
odd_2 = row.find_element_by_xpath('.//td[4]')
odd_payout = row.find_element_by_xpath('.//td[5]')

【讨论】:

  • 非常感谢。感谢您的帮助。
猜你喜欢
  • 2019-12-21
  • 1970-01-01
  • 2013-04-23
  • 1970-01-01
  • 2020-08-20
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多