Beautifulsoup find_all 没有找到所有标签 Python答案

【问题标题】：Beautifulsoup find_all does not find all tags PythonBeautifulsoup find_all 没有找到所有标签 Python
【发布时间】：2017-04-19 15:17:09
【问题描述】：

例如，我正在尝试专门抓取网页 this web page。我正在尝试抓取产品名称，但不知何故我的 find_all 方法无法正常工作并且找不到我指定的所有标签。

这就是我正在做的事情

from bs4 import BeatifulSoup

url = 'https://www.toysrus.fi/nallet-ja-pehmolelut/interaktiiviset-pehmolelut'
soup = BeautifulSoup(request.urlopen(url).read(), 'html.parser')
print(len(soup.findAll('div', {'class' : 'inner-wrapper'})))

class='inner-wrapper' 在指定页面中的长度实际上是 4，但它只找到 1。请指导从网页中抓取产品名称以及如何获得具有 @987654325 的 div 的正确标签数@。谢谢。

【问题讨论】：

只有一个inner-wrapper 类。在 javascript 部分中还有另外两个，不会被 request 或 BeautifulSoup 执行，也不会应用于 DOM。
那么如何使用 BeautifulSoup 提取产品名称信息？还是我必须使用其他东西？
不一定只用 BeatifulSoup ，你也许可以通过某种方式评估 javascript 或弄清楚如何拉取产品列表
该站点使用 JS 填充其内容。为了抓取有关产品的信息，您需要使用 Selenium 或模拟对站点的 ajax 请求。
现在我正在使用 selenium，这是我在初始化 browser=webdriver.Chrome() 后的代码 browser = browser.get('https://www.toysrus.fi/nallet-ja-pehmolelut/interaktiiviset-pehmolelut')。它在 chrome 中打开页面，但是当我在控制台上打印时它返回 None

标签： python html web-scraping beautifulsoup urllib

【解决方案1】：

Beautiful soup 只找到合适的 html divs 标签，那些恰好在脚本内部的标签会被忽略。遗憾的是美丽汤不评估脚本。只需打开 HTML 代码，您将看到一个 HTML div 类，以及一堆如下所示的脚本/js-模板

<script type="text/x-jsrender" id="product-list-skuid-template">
  <div class="product-list-component type-{{:TemplateInfo.type}} outer-wrapper">
    <div class="inner-wrapper">
      <ul class="product-list-container">
        {{for Data}} {{include tmpl="#product-template"/}} {{/for}}
      </ul>
    </div>
  </div>
  {{!-- SHADOW --}} {{if TemplateInfo.divider=='roundshadow'}}
  <div class="round-shadow"></div>
  {{else TemplateInfo.divider=='simple'}}
  <hr /> {{/if}}
</script>

【讨论】：