【发布时间】:2018-02-08 21:19:23
【问题描述】:
我刚开始学习 Python,而且我知道我是专家。我的专业领域是VBA。
当涉及到从 Web 中提取数据时,如果您是通过 VBA 进行的,那么 winHttp 是最好的方法。但是,使用 VBA,您仅限于单线程。解决此问题的一种(一种)方法是使用 VBScript。虽然走 VBScript 路线将是最简单的方法,因为 VBA 和 VBS 几乎相同,但在从我读过的内容中抓取时,Python 似乎是最好的语言。
我在下面提供了两个示例,一个用 VBA 编写(工作),另一个用 Python 编写(不工作)。
VBA
Dim postData As String
Dim myHttp2 As WinHttp.WinHttpRequest
'postData is the form data sent in the request body that contains a users login credentials
postData = "SMNC-ISO&LOC=US&target=HTTPS%3AIntranetSite.net&postpreservationdata=&USER=usr123&PASSWORD=pwd123"
set myHttp2 = CreateObject("winHTTP.WinHTTPrequest.5.1")
myHttp2.Open "GET", "https://login.someintranetsite.net", true
myHttp2.setRequestHeader "Request", "GET /abcd123/4567 HTTP/1.1"
myHttp2.setRequestHeader "Accept", "stuff"
myHttp2.setRequestHeader "Accept-Language", "en-US"
myHttp2.setRequestHeader "User-Agent", "stuff"
myHttp2.setRequestHeader "Accept-Encoding", "stuff"
myHttp2.setRequestHeader "Host", "login.someintranetsite.net"
myHttp2.send postData
myHttp2.WaitForResponse
Debug.Print myHttp2.responseText
'obtains the session cookie needed for other requests not shown here
cookie = myHttp2.getResponseHeader("Set-Cookie")
以上结果来自服务器的有效响应..但是使用 Python...
Python
from bs4 import BeautifulSoup
import requests
payload = 'SMNC-ISO&LOC=US&target=HTTPS%3AIntranetSite.net&postpreservationdata=&USER=usr123&PASSWORD=pwd123'
headers = {'Request': 'GET /abcd123/4567 HTTP/1.1',
'Accept' : 'stuff',
'Accept-Language': 'en-US',
'Connection': 'stuff',
'Host': 'someintranetsite.net',
'User-Agent': 'stuff',
'Accept-Encoding': 'stuff'
}
result = requests.get(url="https://login.someintranetsite.net", headers=headers, data=payload)
print result.content
当我尝试运行上面的 python 时,我得到以下结果:
File "C:\Program Files (x86)\Anaconda\lib\site-packages\requests\api.py", line 55, in get
return request('get', url, **kwargs)
File "C:\Program Files (x86)\Anaconda\lib\site-packages\requests\api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Program Files (x86)\Anaconda\lib\site-packages\requests\sessions.py", line 456, in request
resp = self.send(prep, **send_kwargs)
File "C:\Program Files (x86)\Anaconda\lib\site-packages\requests\sessions.py", line 559, in send
r = adapter.send(request, **kwargs)
File "C:\Program Files (x86)\Anaconda\lib\site-packages\requests\adapters.py", line 378, in send
raise ProxyError(e)
ProxyError: ('Cannot connect to proxy.', error(10061, 'No connection could be made because the target machine actively refused it'))
我已经搜索并搜索了解决此问题的方法,并尝试了几种不同的方法,但均未成功。我究竟做错了什么? (顺便说一句,上面是在 Spyder/IPython 中完成的)
额外问题..如何在 Python 中获取会话 cookie?非常感谢您的帮助!谢谢!
【问题讨论】:
-
禁用代理,然后重试。比如:your_proxies = { "http": None, "https": None, };result = requests.get(url='test.com', proxies= your_proxies, headers=headers, data=payload);
-
您可以获得有关如何获取会话cookie的答案。 Python Requests Cookies
-
成功了!谢谢!
标签: python web-scraping python-requests httprequest