BeautifulSoup4在页面上找不到h3标签[重复]答案

【问题标题】：BeautifulSoup4 can`t find h3 tag on the page [duplicate]BeautifulSoup4在页面上找不到h3标签[重复]
【发布时间】：2016-10-26 20:11:35
【问题描述】：

我想解析新闻网站上的一些文章。但是bs4看不到一些标签

我的代码：

from bs4 import BeautifulSoup
import urllib.request

url="http://www.noi.md/md/news_id/86602"
page = urllib.request.urlopen(url)

soup = BeautifulSoup(page.read(), "html5lib")

heads=soup.find_all( 'h3')

for head in heads:
    print (head.string)

结果：

>>> 
None
Citiţi de asemenea:
Adăugați un comentariu:
Citiţi de asemenea:
>>>

如您所见，它找到了一些标签，但不是全部。有一个仍然隐藏。

<h3>
Debutul companiei „<a href="http://viorica.md">Viorica-Cosmetic</a>” în calitate de participant al Festivalului „Lavender Fest” a fost încărcat cu emoții pozitive și oferte tentante pentru vizitatori.
</h3>

是我的问题还是bs4/html的问题？

【问题讨论】：

试试head.text 而不是.string...
谢谢，泡泡黑客！它有效！
写作为未来文档的答案。

标签： python html parsing beautifulsoup

【解决方案1】：

取自这个答案 (enter link description here)：

Tag 类型对象上的
.string 返回 NavigableString 类型对象。另一方面， .text 获取所有子字符串并使用给定的分隔符连接返回。 .text 的返回类型是 unicode 对象

将您的代码更改为： head.text

【讨论】：