【发布时间】:2018-03-17 15:11:35
【问题描述】:
我看到 BeautifulSoup 出现一些奇怪的行为,如下例所示。
import re
from bs4 import BeautifulSoup
html = """<p style='color: red;'>This has a <b>color</b> of red. Because it likes the color red</p>
<p class='blue'>This paragraph has a color of blue.</p>
<p>This paragraph does not have a color.</p>"""
soup = BeautifulSoup(html, 'html.parser')
pattern = re.compile('color', flags=re.UNICODE+re.IGNORECASE)
paras = soup.find_all('p', string=pattern)
print(len(paras)) # expected to find 3 paragraphs with word "color" in it
2
print(paras[0].prettify())
<p class="blue">
This paragraph as a color of blue.
</p>
print(paras[1].prettify())
<p>
This paragraph does not have a color.
</p>
正如您所见,由于某种原因,<p style='color: red;'>This has a <b>color</b> of red. Because it likes the color red</p> 的第一段没有被 find_all(...) 接收,我不知道为什么没有。
【问题讨论】:
标签: python python-2.7 beautifulsoup