通过请求 html (python) 找不到脚本标记内容

【问题标题】：script tag content doesnt get found via requests html (python)通过请求 html (python) 找不到脚本标记内容
【发布时间】：2021-10-21 02:27:00
【问题描述】：

我有一个页面，我想从脚本标签中提取一个 ean 数字（这里是 8806090571589）

我尝试先获取脚本

        jsonn = r.html.find('script')[3].text
        print(title, price, jsonn)

但是没有用。

页面的源代码在这里（太长无法发布）：

查看来源：https://www.kaufland.de/product/361834606/?search_value=waschmaschine

【问题讨论】：

标签： javascript web-scraping python-requests

【解决方案1】：

当你使用 find() 时，它只会返回标签的第一次出现。由于我可以看到您需要找到第 4 次出现，因此您需要使用 findAll() 函数。它将返回所有事件的列表，然后您可以根据需要使用任何事件。

我已经尝试在我的电脑上使用下面给出的代码 -

import urllib3
from bs4 import BeautifulSoup

URL = "https://www.kaufland.de/product/361834606/?search_value=waschmaschine"

response = urllib3.PoolManager().request("GET", URL, headers={'User-Agent' : "python"})
soup = BeautifulSoup(response.data.decode('utf-8'), 'html.parser')

print(soup.findAll("script")[3])

您可以参考此代码并根据需要进行修改。

【讨论】：