【发布时间】:2023-03-12 10:23:01
【问题描述】:
我的程序基本上是从我制作的网站上抓取图像。我有 3 个函数,每个函数都使用参数从特定网站上抓取图像。我的程序包含以下代码。
import requests
from bs4 import BeautifulSoup
from multiprocessing import Process
img1 = []
img2 = []
img3 = []
def my_func1(img_search):
del img1[:]
url1 = "http://www.somewebsite.com/" + str(img_search)
r1 = requests.get(url1)
soup1 = BeautifulSoup(r1.content)
data1 = soup1.find_all("div",{"class":"img"})
for item in data1:
try:
img1.append(item.contents[0].find('img')['src'])
except:
img1.append("img Unknown")
return
def my_func2(img_search):
del img2[:]
url2 = "http://www.somewebsite2.com/" + str(img_search)
r2 = requests.get(url2)
soup2 = BeautifulSoup(r2.content)
data2 = soup2.find_all("div",{"class":"img"})
for item in data2:
try:
img2.append(item.contents[0].find('img')['src'])
except:
img2.append("img Unknown")
return
def my_func3(img_search):
del img3[:]
url3 = "http://www.somewebsite3.com/" + str(img_search)
r3 = requests.get(url3)
soup3 = BeautifulSoup(r3.content)
data3 = soup3.find_all("div",{"class":"img"})
for item in data3:
try:
img3.append(item.contents[0].find('img')['src'])
except:
img3.append("img Unknown")
return
my_func1("orange cat")
my_func2("blue cat")
my_func3("green cat")
print(*img1, sep='\n')
print(*img2, sep='\n')
print(*img3, sep='\n')
抓取工作得很好,但速度很慢,所以我决定使用多处理来加快速度,而多处理确实加快了速度。我基本上用这个替换了函数调用
p = Process(target=my_func1, args=("orange cat",))
p.start()
p2 = Process(target=my_func2, args=("blue cat",))
p2.start()
p3 = Process(target=my_func3, args=("green cat",))
p3.start()
p.join()
p2.join()
p3.join()
但是,当我打印 img1 、 img2 和 img3 列表时,它们是空的。我该如何解决这个问题?
【问题讨论】:
标签: python python-3.x beautifulsoup multiprocessing