【发布时间】:2021-12-26 17:54:44
【问题描述】:
这是一个类似的问题Why does python multiprocessing script slow down after a while?
使用池的代码示例:
from multiprocessing import Pool
Pool(processes=6).map(some_func, array)
经过几次迭代后,程序变慢了,最后变得比没有多处理时还要慢。也许问题在于与 Selenium 相关的功能? 这是完整的代码:
# libraries
import os
from time import sleep
from bs4 import BeautifulSoup
from selenium import webdriver
from multiprocessing import Pool
# Необходимые переменные
url = "https://eldorado.ua/"
directory = os.path.dirname(os.path.realpath(__file__))
env_path = directory + "\chromedriver"
chromedriver_path = env_path + "\chromedriver.exe"
dict1 = {"Смартфоны и телефоны": "https://eldorado.ua/node/c1038944/",
"Телевизоры и аудиотехника": "https://eldorado.ua/node/c1038957/",
"Ноутбуки, ПК и Планшеты": "https://eldorado.ua/node/c1038958/",
"Техника для кухни": "https://eldorado.ua/node/c1088594/",
"Техника для дома": "https://eldorado.ua/node/c1088603/",
"Игровая зона": "https://eldorado.ua/node/c1285101/",
"Гаджеты и аксесуары": "https://eldorado.ua/node/c1215257/",
"Посуда": "https://eldorado.ua/node/c1039055/",
"Фото и видео": "https://eldorado.ua/node/c1038960/",
"Красота и здоровье": "https://eldorado.ua/node/c1178596/",
"Авто и инструменты": "https://eldorado.ua/node/c1284654/",
"Спорт и туризм": "https://eldorado.ua/node/c1218544/",
"Товары для дома и сада": "https://eldorado.ua/node/c1285161/",
"Товары для детей": "https://eldorado.ua/node/c1085100/"}
def openChrome_headless(url1, name):
options = webdriver.ChromeOptions()
options.headless = True
options.add_experimental_option("excludeSwitches", ['enable-automation'])
options.add_argument(
'--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36"')
driver = webdriver.Chrome(executable_path=chromedriver_path, options=options)
driver.get(url=url1)
sleep(1)
try:
with open(name + ".html", "w", encoding="utf-8") as file:
file.write(driver.page_source)
except Exception as ex:
print(ex)
finally:
driver.close()
driver.quit()
def processing_goods_pages(name):
for n in os.listdir(f"brand_pages\\{name}"):
with open(f"{directory}\\brand_pages\\{name}\\{n}", encoding="utf-8") as file:
soup = BeautifulSoup(file.read(), "lxml")
if not os.path.exists(f"{directory}\\goods_pages\\{name}\\{n[:-5]}"):
if not os.path.exists(f"{directory}\\goods_pages\\{name}"):
os.mkdir(f"{directory}\\goods_pages\\{name}")
os.mkdir(f"{directory}\\goods_pages\\{name}\\{n[:-5]}")
links = soup.find_all("header", class_="good-description")
for li in links:
ref = url + li.find('a').get('href')
print(li.text)
openChrome_headless(ref, f"{directory}\\goods_pages\\{name}\\{n[:-5]}\\{li.text}")
if __name__ == "__main__":
ar2 = []
for k, v in dict1.items():
ar2.append(k)
Pool(processes=6).map(processing_goods_pages, ar2)
【问题讨论】:
-
网站可能会通过 IP 限制您的连接?也可能与多次打开和关闭chrome有关?我可能会尝试为多个请求保持相同的
driver。在我看来,代码本身没有任何问题。
标签: python selenium selenium-webdriver multiprocessing