【发布时间】:2017-08-15 04:08:45
【问题描述】:
我正在尝试抓取这个网站:https://www.99acres.com
到目前为止,我已经使用 BeautifulSoup 来执行代码并从网站中提取数据;但是,我的代码现在只能让我获得第一页。我想知道是否有办法访问其他页面,因为当我点击下一页时,URL 不会改变,所以我不能每次都遍历不同的 URL。
以下是我目前的代码:
import io
import csv
import requests
from bs4 import BeautifulSoup
response = requests.get('https://www.99acres.com/search/property/buy/residential-all/hyderabad?search_type=QS&search_location=CP1&lstAcn=CP_R&lstAcnId=1&src=CLUSTER&preference=S&selected_tab=1&city=269&res_com=R&property_type=R&isvoicesearch=N&keyword_suggest=hyderabad%3B&bedroom_num=3&fullSelectedSuggestions=hyderabad&strEntityMap=W3sidHlwZSI6ImNpdHkifSx7IjEiOlsiaHlkZXJhYmFkIiwiQ0lUWV8yNjksIFBSRUZFUkVOQ0VfUywgUkVTQ09NX1IiXX1d&texttypedtillsuggestion=hy&refine_results=Y&Refine_Localities=Refine%20Localities&action=%2Fdo%2Fquicksearch%2Fsearch&suggestion=CITY_269%2C%20PREFERENCE_S%2C%20RESCOM_R&searchform=1&price_min=null&price_max=null')
html = response.text
soup = BeautifulSoup(html, 'html.parser')
list=[]
dealer = soup.findAll('div',{'class': 'srpWrap'})
for item in dealer:
try:
p = item.contents[1].find_all("div",{"class":"_srpttl srpttl fwn wdthFix480 lf"})[0].text
except:
p=''
try:
d = item.contents[1].find_all("div",{"class":"lf f13 hm10 mb5"})[0].text
except:
d=''
li=[p,d]
list.append(li)
with open('project.txt','w',encoding="utf-8") as file:
writer= csv.writer(file)
for row in list:
writer.writerows(row)
file.close()
【问题讨论】:
标签: python python-3.x web-scraping beautifulsoup