【发布时间】:2021-10-03 18:12:13
【问题描述】:
我编写了以下代码来从网站抓取中提取每个产品的图像。我对此很陌生,不确定如何阻止它为每个产品创建一个新文件夹。目前,它在前一个文件夹中创建了一个名为 Whiteline Images 的新文件夹,该文件夹也名为 whiteline images - 当它的 5 个产品时手动修复很容易 - 当我将其更改为 500+ 时就没有那么多了!!我知道它在代码中的哪个位置执行此操作......只是不确定如何修复它。任何帮助表示赞赏!
import requests
from bs4 import BeautifulSoup
import os
def imagedown(url,folder):
try:
os.mkdir(os.path.join(os.getcwd(), folder))
except:
pass
os.chdir(os.path.join(os.getcwd(), folder))
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
images = soup.findAll('img',{"src":True})
for index, image in enumerate(images, start=1):
if(image.get('src').startswith('https://imageapi.partsdb.com.au/api/Image')):
link = (image.get('src'))
name = f'{soup.find("div", {"class": "head2BR"}).text} ({index})'
with open(name + '.jpg','wb') as f:
im = requests.get(link)
f.write(im.content)
print('Writing:', name)
imagedown('https://www.whiteline.com.au/product_detail4.php?part_number=KBR15', 'whiteline_images')
imagedown('https://www.whiteline.com.au/product_detail4.php?part_number=W13374', 'whiteline_images')
imagedown('https://www.whiteline.com.au/product_detail4.php?part_number=BMR98', 'whiteline_images')
imagedown('https://www.whiteline.com.au/product_detail4.php?part_number=W51210', 'whiteline_images')
imagedown('https://www.whiteline.com.au/product_detail4.php?part_number=W51211', 'whiteline_images')
【问题讨论】:
标签: python image web-scraping directory