从 Web 抓取的 html 页面中的 Python 脚本中提取列表答案

【问题标题】：Extract list from a Python script in web scraped html page从 Web 抓取的 html 页面中的 Python 脚本中提取列表
【发布时间】：2019-11-27 22:41:14
【问题描述】：

我是网络抓取的新手，遇到了一个小障碍，代码如下：

import requests
from bs4 import BeautifulSoup
url = "www.website.com"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
price_scripts = soup.find_all('script')[23]
print(price_scripts)

提取的脚本似乎都是 Python 脚本。以下是上面代码打印的内容：

<script>
        p.a = [0,"6.93","9.34","3.42","7.88"];
        p.output();
</script>

我正在尝试从该脚本中提取列表，但是当我尝试时它只返回“无”。

【问题讨论】：

你正在运行什么返回“无”？ print(price_scripts) 是这样做的还是？这里没有足够的信息。 soup.find_all('script')[23]的内容是什么？您能否为price_scripts 中存储的内容提供某种形式的输出？此外，在 HTML 中，这些不是 Python，而是 Javascript。如果<script> 输出是您的打印输出，那么您在尝试获取列表时返回“None”是做什么的？
www.website.com 是您要抓取的实际网站吗？

标签： python html web-scraping beautifulsoup

【解决方案1】：

您应该可以通过这种方式提取数据：

target = price_scripts.text

哪个输出：

p.a = [0,"6.93","9.34","3.42","7.88"];
    p.output();

此时您需要借助字符串操作，去掉括号之间的所有内容，如下所示：

print(target.text.split('[')[1].split(']')[0])

请注意，每次使用split() 方法都会创建一个列表，因此您必须从列表中选择正确的元素。输出：

0,"6.93","9.34","3.42","7.88"

【讨论】：

做到了！谢谢你！不知道为什么我没有考虑使用拆分方法。干杯！