【发布时间】:2020-04-01 11:30:37
【问题描述】:
我使用了从网站中提取房地产数据的代码。我的代码工作正常,但它仅提取 30 个容器的数据,而有 3000 多个容器可用。我才知道我漂亮的汤没有得到所有的 html 标签
我的代码:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.magicbricks.com/property-for-sale/residential-real-estate?proptype=Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment,Residential-House,Villa,Residential-Plot&Locality=OMR-Road&cityName=Chennai",
headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
c = r.content
bs = BeautifulSoup(c,"html5lib")
# print(bs.prettify())
soup = bs.findAll("div", {"class": "flex relative clearfix m-srp-card__container"})
print(len(soup))
【问题讨论】:
-
这很难说,因为我无法访问该 URL(“访问被拒绝”) - 你能提供另一个 URL 吗?
标签: python beautifulsoup python-requests