【发布时间】:2020-09-12 19:52:01
【问题描述】:
我刚开始进行网页抓取,我正在使用 beautifulsoup 来执行网页抓取,但我只想提取带有“p”标签的内容。因此,如果有其他类/样式/等,我想忽略标签...
例子:
<p>what I want to extract</p>
<p class="copy">what I do not want to extract from HTML page</p>
到目前为止,我只能用这段代码提取所有的“p”标签
from bs4 import BeautifulSoup as BS
import requests
URL = input("Enter url to scrape: ")
content = requests.get(URL)
soup = BS(content.text, 'html.parser')
content_p = soup.find_all('p')
print(content_p)
【问题讨论】:
标签: python html web-scraping beautifulsoup