【问题标题】:Web Scraping while meeting a condition满足条件时进行网页抓取
【发布时间】:2017-05-12 22:36:07
【问题描述】:

我正在从该站点抓取一些数据:https://masteroverwatch.com/profile/pc/us/calvin-1337,特别是 span.summary-hero-name。这是我这样做的代码:

scrapeIt("https://masteroverwatch.com/profile/pc/us/calvin-1337",  {
  title: "span.summary-hero-name"
}).then(page => {
  console.log(page.title)
});

这会返回应该做的McreeWidowmakerBastion,因为summary-hero-name 类有3 个英雄,但我只想要源代码中的第一个(这是最常用的)。如果那不可能,那么我希望满足href/profile/pc/us/Calvin-1337/heroes/6 的条件(这是最重要的)。

【问题讨论】:

  • 其余代码在哪里?我很想知道您是否可以将 title: 替换为 span.summary-hero-name:first 以仅获取第一个英雄。

标签: javascript html node.js web-scraping tags


【解决方案1】:

看来您需要高级 CSS 选择器!

您可能会想到:first-child:first-of-type,但它并没有像我们预期的那样工作...我已经分析了您共享的页面,我认为您需要以下选择器:

a[href$="/heroes/6"] + .row

我已经用 HTML 代码的有趣部分重现了用例。红色的div 是您要选择的那个。它包含您需要的所有信息。

.row {
  padding: 20px;
  margin: 20px;
  border: solid 1px black;
}

a[href$="/heroes/6"] + .row {
  color: red;
}
<div class="data-heroes-summary widget">
  <div class="widget-title">Favorite Hero Performance</div>
    <div class="summary-list">
      <div class="summary-row-container">
        <a class="summary-row-link" href="/profile/pc/us/Calvin-1337/heroes/6"></a>
        <div class="row">
          <div class="summary-icon col-xs-5">
            <span class="summary-icon-unit" style="background-image:url('https://blzgdapipro-a.akamaihd.net/hero/mccree/hero-select-portrait.png');"></span>
            <strong>
              <span class="summary-hero-name">McCree</span>
              <div class="summary-hero-role">Offense</div>
            </strong>
           </div>
           <div class="summary-stats col-xs-4">
            <div class="summary-stats-kda stats-assists">
              <strong>3.44</strong>:1 K/D
            </div>
            <div class="summary-stats-kills">
              <span>7,443 / 2,166</span>
            </div>
          </div>
          <div class="summary-winrate col-xs-3">
            <strong class="stats-kills">58.5%</strong>
            <span>340 Games</span>
          </div>
        </div>
      </div>
      <div class="summary-row-container">
        <a class="summary-row-link" href="/profile/pc/us/Calvin-1337/heroes/9"></a>
        <div class="row">
           <div class="summary-icon col-xs-5">
             <span class="summary-icon-unit" style="background-image:url('https://blzgdapipro-a.akamaihd.net/hero/widowmaker/hero-select-portrait.png');"></span>
             <strong>
               <span class="summary-hero-name">Widowmaker</span>
               <div class="summary-hero-role">Defense</div>
             </strong>
           </div>
           <div class="summary-stats col-xs-4">
            <div class="summary-stats-kda stats-assists">
              <strong>4.29</strong>:1 K/D
            </div>
            <div class="summary-stats-kills">
              <span>6,827 / 1,590</span>
            </div>
          </div>
          <div class="summary-winrate col-xs-3">
            <strong class="stats-kills">64.6%</strong>
            <span>339 Games</span>
          </div>
        </div>
      </div>
      <div class="summary-row-container">
        <a class="summary-row-link" href="/profile/pc/us/Calvin-1337/heroes/15"></a>
        <div class="row">
          <div class="summary-icon col-xs-5">
            <span class="summary-icon-unit" style="background-image:url('https://blzgdapipro-a.akamaihd.net/hero/bastion/hero-select-portrait.png');"></span>
            <strong>
              <span class="summary-hero-name">Bastion</span>
              <div class="summary-hero-role">Defense</div>
            </strong>
          </div>
        <div class="summary-stats col-xs-4">
          <div class="summary-stats-kda stats-assists">
            <strong>3.01</strong>:1 K/D
          </div>
          <div class="summary-stats-kills">
            <span>810 / 269</span>
          </div>
        </div>
        <div class="summary-winrate col-xs-3">
          <strong class="stats-kills">61.4%</strong>
          <span>44 Games</span>
        </div>
      </div>
    </div>
  </div>
  <a href="/profile/pc/us/Calvin-1337/heroes" class="summary-more-link">View More</a>
</div>

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-08-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-10-31
    相关资源
    最近更新 更多