您的方法缺少很多必要的数据和步骤,当我第一次查看时,我看到该页面使用了很多 javascript,但监控请求我看到您实际上可以使用请求获取它,首先我们需要发布到:
http://www.reversephonelookup.com/results.php,带有正确的帖子数据:
完成后,我们需要向 http://www.reversephonelookup.com/number/the_number 发出获取请求:
所以把这些放在一起:
def Phone_Checker(number):
head = {
"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
url = 'http://www.reversephonelookup.com/results.php'
data = {"phone": number, "image.x": "26", "image.y": "37"}
with requests.Session() as s:
s.post(url, data=data, headers=head)
r = s.get("http://www.reversephonelookup.com/number/{}/".format(number),headers=head)
tree = html.fromstring(r.content)
Service_type = tree.xpath('//*[@id="content"]//fieldset//text()')
return "wireless" in Service_type
Phone_Checker("2068675309")
return Service_type and "wireless" in Service_type 仅当无线是列表中的字符串时才返回 True。我还调整了您的 xpath 以获取所有文本。
使用该函数的更有用的方法是返回 lxml 树:
def Phone_Checker(number):
head = {
"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
url = 'http://www.reversephonelookup.com/results.php'
data = {"phone": number, "image.x": "26", "image.y": "37"}
with requests.Session() as s:
s.post(url, data=data, headers=head)
r = s.get("http://www.reversephonelookup.com/number/{}/".format(number),headers=head)
return html.fromstring(r.content)
然后:
xml = Phone_Checker(....)
一个例子:
In [5]: xml = Phone_Checker("8598795756")
In [6]: print(xml.xpath("//fieldset//tr/td[text()='Original Service Type:']/following::strong/text()"))
['Landline', 'Independent Telephone Company', 'Versailles, KY', 'VRSLKYXADS0']
第一个结果是连接的类型,如果你只想使用它:
"//fieldset//tr/td[text()='Original Service Type:']/following::strong[1]/text()"