BeautifulSoup Python 没有捕获文件中的所有 HTML答案

【问题标题】：BeautifulSoup Python not capturing all the HTML in a fileBeautifulSoup Python 没有捕获文件中的所有 HTML
【发布时间】：2020-07-23 12:41:40
【问题描述】：

所以我在 Python (bs4) 中使用 BeautifulSoup，我正在尝试从网页中提取信息。作为参考，我正在查看的网页是一个高级搜索引擎，我想要的相应 HTML 是：

<p class="viewing">
     Viewing: <strong>        
     1</strong> - <strong>       
     50</strong> of <strong>    
     11,204</strong> papers
</p>

在 Python 中使用 bs4 后，我尝试使用以下方法提取它：

num_papers = soup.find_element_by_xpath('//*[@id="maincontent"]/div/div[1]/div/div[1]/p/strong[3]')

这是来自 HTML 的值 11,204 的 Xpath。我正在使用lxml 解析器，Stack 上的大多数答案都表明这是一个解析问题，所以我查看了html5lib，但这也不起作用。为清楚起见，我的输出将type 生成为NoneType，因为它找不到这个。我实际上为这个页面打印了汤，发现这个相应的 HTML 甚至没有记录在汤中，因此是NoneType。我觉得它是解析器，但我不知道我哪里出错了。

【问题讨论】：

标签： python html beautifulsoup

【解决方案1】：

from bs4 import BeautifulSoup
html = """<p class="viewing">
     Viewing: <strong>        
     1</strong> - <strong>       
     50</strong> of <strong>    
     11,204</strong> papers
</p>
"""


soup = BeautifulSoup(html, 'html.parser')


target = soup.find("p", class_="viewing")

print(target.contents[-2].get_text(strip=True))

输出：

11,204

或者

target = soup.find("p", class_="viewing").find_all_next("strong")[2]

print(target.get_text(strip=True))

输出：

11,204

【讨论】：

您好，感谢您的回复。这仍然不起作用，我得到了输出AttributeError: 'NoneType' object has no attribute 'find_all_next'。让我试着更具体一点