Python 3 BS4 - 从 <span> 标记中提取数据（续）答案

【问题标题】：Python 3 BS4 - Extract Data from <span> tags (Continued)Python 3 BS4 - 从 <span> 标记中提取数据（续）
【发布时间】：2018-11-20 20:46:03
【问题描述】：

所以我的 HTML 代码看起来像这样。

<li data-ng-repeat="sector in data.sectors"> <a target="_self" data-ng-href="/stocks/quotes/-382G/components/A" href="/stocks/quotes/-382G/components/A"><span>SIC-3826 Laboratory Analytical Instruments</span></a> </li>

我想提取跨度标签中的信息。不幸的是，当我使用以下代码时：

tags = soup.findAll("li",attrs={"data-ng-repeat":"sector in data.sectors"})
# tags = soup.find_all("a",attrs= {"target=","data-ng-href="})
# tags = soup.find_all("a")
for tag in tags:
print(tag.text)

结果是 [[sector.description]]。我要提取的是包括“SIC-3826 Laboratory Analytical Instruments”在内的信息

任何帮助将不胜感激。我尝试了各种替代方法，但我无法获得我想要的信息。

提前谢谢你！

【问题讨论】：

什么是[[sector.description]]？
这看起来像是动态内容被抓取的经典案例。如果我不得不猜测，[[sector.description]] 是脚本的占位符，用于使用实际信息呈现页面。你需要一个支持动态内容的模块，试试selenium 或requests-html。不幸的是，bs4 无法读取动态生成的内容。
[[sector.description]] 是 print(tag.text) 响应的，而不是标签中出现的文本

标签： python beautifulsoup tags

【解决方案1】：

是的，您需要做的就是：

x = """<li data-ng-repeat="sector in data.sectors"> <a target="_self" data-ng-href="/stocks/quotes/-382G/components/A" href="/stocks/quotes/-382G/components/A"><span>SIC-3826 Laboratory Analytical Instruments</span></a> </li>"""

from bs4 import BeautifulSoup
print(BeautifulSoup(x, "lxml").text)

【讨论】：

如果x是一个完整的网页呢？然后会发生什么？
问题是我不知道在页面被提供之前 x 会是什么。要么那个，要么我可能误解了你的答案。