【发布时间】:2020-09-26 22:39:17
【问题描述】:
我正在制作一个网页抓取工具,当我尝试抓取一页数据时,它会不断加载相同的信息。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.realtor.com/realestateagents/phoenix_az'
#opening up connection, grabbing the page
uClient = uReq(my_url)
#read page
page_html = uClient.read()
#close page
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#finds all realtors on page
containers = page_soup.findAll("div",{"class":"agent-list-card clearfix"})
for container in containers:
name = page_soup.find('div', class_='agent-name text-bold')
agent_name = name.text.strip()
number = page_soup.find('div', class_='agent-phone hidden-xs hidden-xxs')
agent_number = number.text.strip()
print("name: " + agent_name)
print("number: " + agent_number)
【问题讨论】:
-
“它不断加载相同的信息”是什么意思?如果你抓取同一个页面,它为什么要加载不同的信息?
-
因为您搜索的是
page_soup,而不是container! -
你做过调试吗?请参阅How to Ask、help center。
标签: python python-3.x web-scraping