- 轮换代理
- 延误
- 避免使用相同的模式
- IP 速率限制(可能是您的问题)
IP 速率限制。这是一个基本的安全系统,可以禁止或阻止来自同一 IP 的传入请求。这意味着普通用户不会在几秒钟内以完全相同的模式(滚动、单击、滚动、单击、打开。例如)向同一个域发出 100 个请求。
How to reduce the chance of being blocked while web scraping search engines.
或者,您可以使用来自 SerpApi 的 Google Shopping Results API。这是一个带有免费计划的付费 API。
您的情况的不同之处在于,您不必花时间弄清楚如何绕过 Google 的阻止,因为它已经为最终用户完成了。
用于解析来自 Google Shopping 和 example in the online IDE 的数据的示例代码:
import os
from serpapi import GoogleSearch
params = {
"api_key": os.getenv("API_KEY"),
"engine": "google_product",
"product_id": "14506091995175728218", # can be iterated over multiple product ids
"gl": "us", # country to search from
"hl": "en" # language
}
search = GoogleSearch(params)
results = search.get_dict()
title = results['product_results']['title']
prices = results['product_results']['prices']
reviews = results['product_results']['reviews']
rating = results['product_results']['rating']
extensions = results['product_results']['extensions']
description = results['product_results']['description']
user_reviews = results['product_results']['reviews']
reviews_results = results['reviews_results']['ratings']
print(f'{title}\n'
f'{prices}\n'
f'{reviews}\n'
f'{rating}\n'
f'{extensions}\n'
f'{description}\n'
f'{user_reviews}\n'
f'{reviews_results}')
'''
Google Pixel 4 White 64 GB, Unlocked
['$247.79', '$245.00', '$439.00']
526
3.7
['October 2019', 'Google', 'Pixel Family', 'Pixel 4', 'Android', '5.7″', 'Facial Recognition', '8 MP front camera', 'Smartphone', 'With Wireless Charging']
Point and shoot for the perfect photo. Capture brilliant color and control the exposure balance of different parts of your photos. Get the shot without the flash. Night Sight is now faster and easier to use it can even take photos of the Milky Way. Get more done with your voice. The new Google Assistant is the easiest way to send texts, share photos, and more. A new way to control your phone. Quick Gestures let you skip songs and silence calls – just by waving your hand above the screen. End the robocalls. With Call Screen, the Google Assistant helps you proactively filter our spam before your phone ever rings.
526
[{'stars': 1, 'amount': 101}, {'stars': 2, 'amount': 43}, {'stars': 3, 'amount': 39}, {'stars': 4, 'amount': 73}, {'stars': 5, 'amount': 270}]
'''
迭代多个项目 ID 的示例:
# import os
# from serpapi import GoogleSearch
# random numbers except the first one
products = ['14506091995175728218', '1450609199517512118', '145129895175728218']
for product in products:
params = {
"api_key": os.getenv("API_KEY"),
"engine": "google_product",
"product_id": product,
"gl": "us",
"hl": "en"
}
search = GoogleSearch(params)
results = search.get_dict()
title = results['product_results']['title']
print(title, sep='\n') # prints 3 titles from 3 different products
免责声明,我为 SerpApi 工作。