Python-使用请求抓取图像答案

【问题标题】：Python- Scraping images using requestsPython-使用请求抓取图像
【发布时间】：2018-07-04 09:56:07
【问题描述】：

我无法在该位置保存/下载图像。尽管代码看起来正确，但我无法找出问题所在。

我正在使用 requests 库来抓取图像。

import os
import urllib
import urllib.request
from bs4 import BeautifulSoup
import requests
import re

from lxml.html import fromstring

r = requests.get("https://www.scoopwhoop.com/subreddit-nature/#.lce3tjfci")
data = r.text
soup = BeautifulSoup(data, "lxml")

title = fromstring(r.content).findtext('.//title')

#print(title)


newPath = r'C:\Users\Vicky\Desktop\ScrappedImages\ ' + title

for link in soup.find_all('img'):
    image = link.get('src')
    if 'http' in image:
        print(image)
        imageName = os.path.split(image)[1]
        print(imageName)

        r2 = requests.get(image)

        if not os.path.exists(newPath):
            os.makedirs(newPath)
            with open(imageName, "wb") as f:
                f.write(r2.content)

【问题讨论】：

你遇到了什么错误，如果有的话？
你必须添加 and else, to that if，因为如果路径存在，那么它什么也不做
How to save an image locally using Python whose URL address I already know?的可能重复

标签： python request python-requests

【解决方案1】：

尝试将您的 r = requests.get("https://www.scoopwhoop.com/subreddit-nature/#.lce3tjfci") 包装在 try: 或 while: 语句中，以确保您正在抓取的网站返回 200 响应，这可能是该网站超时或无法满足您的请求。

【讨论】：

【解决方案2】：

import os
from bs4 import BeautifulSoup
import urllib
import requests
import urlparse

from lxml.html import fromstring

r = requests.get("https://www.scoopwhoop.com/subreddit-nature/#.lce3tjfci")
data = r.text
soup = BeautifulSoup(data, "lxml")

for link in soup.find_all('img'):
    image = link.get('src')
    if bool(urlparse.urlparse(image).netloc):
        print(image)
        imageName = image[image.rfind("/")+1:]
        print(imageName)

        urllib.urlretrieve(image,imageName)

【讨论】：