【发布时间】:2017-06-30 06:42:20
【问题描述】:
我正在尝试自动化谷歌搜索,但不幸的是我的 IP 被阻止了。经过一番搜索,似乎使用Tor 可以动态地为我获取一个新IP。但是,将以下代码块添加到我现有的代码中后,即使在新 IP 下,谷歌仍会阻止我的尝试。所以我想知道我的代码有什么问题吗?
代码(基于this)
from TorCtl import TorCtl
import socks
import socket
import urllib2
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
__originalSocket = socket.socket
def newId():
''' Clean circuit switcher
Restores socket to original system value.
Calls TOR control socket and closes it
Replaces system socket with socksified socket
'''
socket.socket = __originalSocket
conn = TorCtl.connect(controlAddr="127.0.0.1", controlPort=9051, passphrase="mypassword")
TorCtl.Connection.send_signal(conn, "NEWNYM")
conn.close()
socket.socket = socks.socksocket
## generate a new ip
newId()
### verify the new ip
print(urllib2.urlopen("http://icanhazip.com/").read())
## run my scrape code
google_scrape()
新的错误信息
<br>Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.
</div>
IP address: 89.234.XX.25X<br>Time: 2017-02-12T05:02:53Z<br>
【问题讨论】:
标签: python sockets tor data-extraction