【问题标题】:How to scrape high resolution images from google images using BS4 in python如何在 python 中使用 BS4 从谷歌图像中抓取高分辨率图像
【发布时间】:2021-07-18 19:21:23
【问题描述】:

我们制作了一个程序,它通过 tkinter GUI 接受输入并转到谷歌图像,并根据输入下载图像。代码如下:

import requests
import bs4
import random
from PIL import Image
from tkinter import messagebox as msgbox
i=0
import os
from tkinter import *
from tkinter import filedialog
ac=str(random.randint(1,20))
b=str(random.randint(20,38))
y=Tk()
def find_file():
    aaa=filedialog.askdirectory()
    return aaa
def create_folder():
    ad=find_file()
    global ac
    global b
    ads=os.path.join(ad,f"Img{ac}{b}")
    os.mkdir(ads)
    return ads


defe=Entry(bg="white")
defe.grid(row=2,column=2)
adj=Label(text="Enter the name of the photo(s) you want to download :")
adj.grid(row=2,column=1)
ack=Label(text="How many photos you want to download?")
ack.grid(row=3,column=1)
dee=Entry(bg="white")
dee.grid(row=3,column=2)



def download_images():
    defei=defe.get()
    deee=int(dee.get())
    aadgc=[]
    play=True
    if " gif" in defei or ".gif" in defei:
        msgbox.showerror("GIF not supported",".gif format is not supported by this software.Sorry for the inconvenience")
        play=False
    while play:   
        asd=create_folder()
        for start in range(0,400,20):
            bararara=f"https://www.google.co.in/search?q={defei}&source=lnms&tbm=isch&start={start}#imgrc=fTslNdnf0RRRxM"

            a=requests.get(bararara).text
            soup=bs4.BeautifulSoup(a,"lxml")
            ab=soup.find_all("img",{"class":"n3VNCb"},limit=deee)
            aadgc.extend(ab)  
        aa=[abb["src"] for abb in aadgc]
        for source in aa:
            r=random.randint(0,100) 
            ra=random.randint(0,1000) 
            
            raa=asd+"\\"+str(r)+str(ra)+".png"
            try:
                binary=requests.get(source).content
            except requests.exceptions.MissingSchema:
                binary=requests.get("http:"+source).content  
            except:
                binary=requests.get("https:"+source).content       
            with open(raa,"wb") as saaho:
                saaho.write(binary)
                saaho.close()
            global i
            i+=1
            if i==int(deee):
                break 
        
        asd=asd.replace("/","\\")
        os.system(f"explorer \"{asd}\"") 
        break  
aadg=Button(y,bg="red",text="Download!",command=lambda:download_images(),activebackground="dark red",activeforeground="grey")
aadg.grid(row=4,column=1)
y.mainloop()      
aadg=Button(y,bg="red",text="Download!",command=lambda:download_images(),activebackground="dark red",activeforeground="grey")
aadg.grid(row=4,column=1)
y.mainloop() 

但是我们得到的是图片的缩略图而不是图片,因为软件只返回低分辨率的照片,不支持 .gif 图片。

此外,我们无法找到主图像所属的类。 谢谢。

【问题讨论】:

    标签: python selenium beautifulsoup


    【解决方案1】:

    使用硒:

    1. 点击搜索结果中的图片。

    2. 等到图像可见。

    3. image_link = driver.find_element_by_css_selector(".tvh9oe.BIB1wf .eHAdSb>img").get_attribute("src")

    bs4 可以使用相同的定位器

    【讨论】:

    • 没有硒还有其他方法吗
    • 还有我应该在bs4中写什么作为定位器?
    • 您可以按照这里的建议提取属性stackoverflow.com/questions/2612548/…
    • 主要思路是一样的:找一个定位器,提取它的属性。
    【解决方案2】:

    要查找原始或全分辨率图像,您必须首先获取图像的data-tbnid

    在本例中为:sd7iKvYzujke_M。获取 ID 后,您只需使用正则表达式从页面源中提取完整的原始图像。

    您也可以使用第三方解决方案,例如 SerpApi。这是一个免费试用的付费 API。

    from serpapi import GoogleSearch
    
    params = {
      "api_key": "secret_api_key",
      "engine": "google",
      "q": "inception",
      "tbm": "isch"
    }
    
    search = GoogleSearch(params)
    results = search.get_dict()
    

    示例 JSON 输出:

    "images_results": [
      {
        "position": 1,
        "thumbnail": "https://serpapi.com/searches/60e70bf0e815af01fd163d6a/images/39eac787b1522b4ccc71382ac53fc933e15aa52342a5d06fafca53990897f2f9.jpeg",
        "source": "rottentomatoes.com",
        "title": "Inception (2010) - Rotten Tomatoes",
        "link": "https://www.rottentomatoes.com/m/inception",
        "original": "https://flxt.tmsimg.com/assets/p7825626_p_v10_af.jpg"
      },
      {
        "position": 2,
        "thumbnail": "https://serpapi.com/searches/60e70bf0e815af01fd163d6a/images/39eac787b1522b4ce9c6c6de6d244df2098ae0b7da6856fba545b4376e01d075.jpeg",
        "source": "imdb.com",
        "title": "Inception (2010) - IMDb",
        "link": "https://www.imdb.com/title/tt1375666/",
        "original": "https://m.media-amazon.com/images/M/MV5BMjAxMzY3NjcxNF5BMl5BanBnXkFtZTcwNTI5OTM0Mw@@._V1_.jpg"
      },
      {
        "position": 3,
        "thumbnail": "https://serpapi.com/searches/60e70bf0e815af01fd163d6a/images/39eac787b1522b4ca9d2321fc39291764c3894c9816af84a5af355f2d54f6921.jpeg",
        "source": "screenrant.com",
        "title": "Inception: What Each Character Represents (Confirmed By Christopher Nolan)",
        "link": "https://screenrant.com/inception-movie-christopher-nolan-characters-actors-meaning-confirmed/",
        "original": "https://static2.srcdn.com/wordpress/wp-content/uploads/2020/03/Inception-characters-film-crew.jpg?q=50&fit=crop&w=960&h=500&dpr=1.5"
      },
      ...
    ]
    

    查看documentation了解更多详情。

    免责声明:我在 SerpApi 工作。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-12-28
      • 1970-01-01
      • 2020-11-14
      • 1970-01-01
      • 2014-11-20
      • 2013-06-02
      相关资源
      最近更新 更多