您需要通过代理服务器、VPN 路由您的请求,或者您需要在位于以色列的机器上执行您的代码。
话虽如此,以下工作(截至撰写本文时):
import pprint
from bs4 import BeautifulSoup
import requests
def make_proxy_entry(proxy_ip_port):
val = f"http://{proxy_ip_port}"
return dict(http=val, https=val)
headers = {
"User-Agent": (
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36')
}
bookdepo_url = (
'https://www.bookdepository.com/search?search=Find+book&searchTerm='
'0671646788'
)
ip_opts = ['82.166.105.66:44081', '82.81.32.165:3128', '82.81.169.142:80',
'81.218.45.159:8080', '82.166.105.66:43926', '82.166.105.66:58774',
'31.154.189.206:8080', '31.154.189.224:8080', '31.154.189.211:8080',
'213.8.208.233:8080', '81.218.45.231:8888', '192.116.48.186:3128',
'185.138.170.204:8080', '213.151.40.43:8080', '81.218.45.141:8080']
search_result = None
for ip_port in ip_opts:
proxy_entry = make_proxy_entry(ip_port)
try:
search_result = requests.get(bookdepo_url, headers=headers,
proxies=proxy_entry)
pprint.pprint('Successfully gathered results')
break
except Exception as e:
pprint.pprint(f'Failed to connect to endpoint, with proxy {ip_port}.\n'
f'Details: {pprint.saferepr(e)}')
else:
pprint.pprint('Never made successful connection to end-point!')
search_result = None
if search_result:
soup = BeautifulSoup(search_result.text, 'html.parser')
result_divs = soup.find_all("div", class_= "book-item")
pprint.pprint(result_divs)
此解决方案利用请求库的proxies 参数。我从众多免费代理列表站点之一中抓取了一个代理列表:http://spys.one/free-proxy-list/IL/
代理 IP 地址和端口列表是使用以下 JavaScript sn-p 创建的,以通过浏览器的开发工具从页面上刮取数据:
console.log(
"['" +
Array.from(document.querySelectorAll('td>font.spy14'))
.map(e=>e.parentElement)
.filter(e=>e.offsetParent !== null)
.filter(e=>window.getComputedStyle(e).display !== 'none')
.filter(e=>e.innerText.match(/\s*(\d{1,3}\.){3}\d{1,3}\s*:\s*\d+\s*/))
.map(e=>e.innerText)
.join("', '") +
"']"
)
注意:是的,JavaScript 丑陋而粗俗,但它完成了工作。
在 Python 脚本执行结束时,我确实看到最终货币根据需要解析为以色列新谢克尔 (ILS),基于生成的 HTML 中的以下元素:
<a ... data-currency="ILS" data-isbn="9780671646783" data-price="57.26" ...>