而且很老,但现在可能是一个相关的问题。使用SelectorGadgets 轻松获取 CSS 选择器。确保您使用的是代理,否则即使您尝试通过 selenium 发出请求,Google 也可能会阻止请求。
在线IDE中的代码和full example:
from bs4 import BeautifulSoup
import requests, lxml, os
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
proxies = {
'http': os.getenv('HTTP_PROXY')
}
html = requests.get('https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=samsung&oq=', headers=headers, proxies=proxies).text
soup = BeautifulSoup(html, 'lxml')
for result in soup.select('.gs_ri'):
snippet = result.select_one('.gs_rs').text
print(f"Snippet: {snippet}")
部分输出:
Snippet: Purpose–Extensive research has shown that country‐of‐origin (COO) information significantly affects product evaluations and buying behavior. Yet recently, a competing perspective has emerged suggesting that COO effects have been inflated in prior research …
或者,您可以使用来自 SerpApi 的 Google Scholar Organic Search Results API。这是一个付费 API,可免费试用 5,000 次搜索。
本质上,它和上面的脚本做同样的事情,除了你不需要考虑如何解决验证码或找到一个好的代理(代理)。
要集成的代码:
from serpapi import GoogleSearch
import os
params = {
"api_key": os.getenv("API_KEY"),
"engine": "google_scholar",
"q": "samsung",
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['organic_results']:
print(f"Snippet: {result['snippet']}")
部分输出:
Snippet: Purpose–Extensive research has shown that country‐of‐origin (COO) information significantly affects product evaluations and buying behavior. Yet recently, a competing perspective has emerged suggesting that COO effects have been inflated in prior research …
免责声明,我为 SerpApi 工作。