【发布时间】:2017-01-20 16:51:03
【问题描述】:
我想在Redfin网站上抓取一些图片,但是FindAll()方法好像找不到所有父类为ImageCard的图片url。
代码如下:
from bs4 import BeautifulSoup
import urllib2
def make_soup(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
req = urllib2.Request(url, headers=headers)
thepage = urllib2.urlopen(req).read()
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
soup = make_soup("https://www.redfin.com/CA/San-Diego/5747-Adobe-Falls-Rd-92120/unit-A/home/5437025")
imgcards = soup.findAll('div', {'class': 'ImageCard'})
for imgcard in imgcards:
img = imgcard.findAll('img')
print(img['src'])
I want to download all the images in this slide on the web page
元素树是: elements tree of webpage
我只能找到幻灯片的第一个图像的 div。希望有人能弄清楚!谢谢!!
【问题讨论】:
-
旁注:用
find_all()代替findAll():crummy.com/software/BeautifulSoup/bs4/doc/#method-names
标签: python-2.7 beautifulsoup findall