【问题标题】:Python- Scraping images using requestsPython-使用请求抓取图像
【发布时间】:2018-07-04 09:56:07
【问题描述】:

我无法在该位置保存/下载图像。尽管代码看起来正确,但我无法找出问题所在。

我正在使用 requests 库来抓取图像。

import os
import urllib
import urllib.request
from bs4 import BeautifulSoup
import requests
import re

from lxml.html import fromstring

r = requests.get("https://www.scoopwhoop.com/subreddit-nature/#.lce3tjfci")
data = r.text
soup = BeautifulSoup(data, "lxml")

title = fromstring(r.content).findtext('.//title')

#print(title)


newPath = r'C:\Users\Vicky\Desktop\ScrappedImages\ ' + title

for link in soup.find_all('img'):
    image = link.get('src')
    if 'http' in image:
        print(image)
        imageName = os.path.split(image)[1]
        print(imageName)

        r2 = requests.get(image)

        if not os.path.exists(newPath):
            os.makedirs(newPath)
            with open(imageName, "wb") as f:
                f.write(r2.content)

【问题讨论】:

标签: python request python-requests


【解决方案1】:

尝试将您的 r = requests.get("https://www.scoopwhoop.com/subreddit-nature/#.lce3tjfci") 包装在 try:while: 语句中,以确保您正在抓取的网站返回 200 响应,这可能是该网站超时或无法满足您的请求。

【讨论】:

    【解决方案2】:
    import os
    from bs4 import BeautifulSoup
    import urllib
    import requests
    import urlparse
    
    from lxml.html import fromstring
    
    r = requests.get("https://www.scoopwhoop.com/subreddit-nature/#.lce3tjfci")
    data = r.text
    soup = BeautifulSoup(data, "lxml")
    
    for link in soup.find_all('img'):
        image = link.get('src')
        if bool(urlparse.urlparse(image).netloc):
            print(image)
            imageName = image[image.rfind("/")+1:]
            print(imageName)
    
            urllib.urlretrieve(image,imageName)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-01-20
      • 1970-01-01
      • 2023-03-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多