【问题标题】:Unable to produce results from a webpage using requests module无法使用请求模块从网页生成结果
【发布时间】:2022-10-04 16:24:03
【问题描述】:

访问此website 后,当我用Miami, FL 填写输入框(City or zip)并点击搜索按钮时,我可以看到该站点上显示的相关结果。

我希望使用 requests 模块来模仿相同的内容。我尝试按照开发工具中显示的步骤进行操作,但由于某种原因,下面的脚本会出现此输出You are not authorized to access this request

我尝试过:

import json
import requests
from pprint import pprint
from bs4 import BeautifulSoup

URL = \"https://www.realtor.com/realestateagents/\"
link = \'https://www.realtor.com/realestateagents/api/v3/search\'

headers = {
    \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\',
    \'Accept\': \'application/json, text/plain, */*\',
    \'referer\': \'https://www.realtor.com/realestateagents/\',
    \'accept-encoding\': \'gzip, deflate, br\',
    \'accept-language\': \'en-US,en;q=0.9,bn;q=0.8\',
    \'X-Requested-With\': \'XMLHttpRequest\',
    \'x-newrelic-id\': \'VwEPVF5XGwQHXFNTBAcAUQ==\',
    \'authorization\': \'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NjQ1MjU0NDQsInN1YiI6ImZpbmRfYV9yZWFsdG9yIiwiaWF0IjoxNjY0NTI0Nzk2fQ.Q2jryTAD5vgsJ37e1SylBnkaeK7Cln930Q8KL4ANqsM\'
}

params = {
    \'nar_only\': \'1\',
    \'offset\': \'\',
    \'limit\': \'20\',
    \'marketing_area_cities\': \'FL_Miami\',
    \'postal_code\': \'\',
    \'is_postal_search\': \'true\',
    \'name\': \'\',
    \'types\': \'agent\',
    \'sort\': \'recent_activity_high\',
    \'far_opt_out\': \'false\',
    \'client_id\': \'FAR2.0\',
    \'recommendations_count_min\': \'\',
    \'agent_rating_min\': \'\',
    \'languages\': \'\',
    \'agent_type\': \'\',
    \'price_min\': \'\',
    \'price_max\': \'\',
    \'designations\': \'\',
    \'photo\': \'true\',
    \'seoUserType\': \"{\'isBot\':\'false\',\'deviceType\':\'desktop\'}\",
    \'is_county_search\': \'false\',
    \'county\': \'\'
}

with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link,params=params)
    print(res.status_code)
    print(res.json())
  • 请确认您的访问令牌是否未过期
  • 几天前创建这篇文章时,我直接从开发工具收集了授权令牌。我没有找到任何自动更新令牌的方法。但是,很难说令牌是否仍然有效。当我运行脚本时,我收到状态 200。
  • 认为您应该再次尝试收集授权令牌,然后重试

标签: python python-3.x web-scraping python-requests


【解决方案1】:

根据您的问题-如被问及- 您希望使用请求从该网站提取信息。这是一种使用 Python 请求的方法:

import requests
from tqdm.notebook import tqdm
from bs4 import BeautifulSoup as bs

headers = {
    'User-Agent': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
    }
s = requests.Session()
s.headers.update(headers)
for x in tqdm(range(1, 5)):
    url = f'https://www.realtor.com/realestateagents/miami_fl/pg-{x}'    
    r = s.get(url)
    soup = bs(r.text, 'html.parser')
    agent_cards = soup.select('div[data-testid="component-agentCard"]')
    for a in agent_cards:
        agent_name = a.select_one('div.agent-name').get_text()
        agent_group = a.select_one('div.agent-group').get_text()
        agent_phone = a.select_one('div.agent-phone').get_text()
        print(agent_name, '|', agent_group, '|', agent_phone)

结果在终端:

100%
4/4 [00:05<00:00, 1.36s/it]
Edmy Gomez | Coldwell Banker Realty | (954) 434-0501
Nidia L Cortes PA | Beachfront Realty Inc | (786) 287-9268
Rodney Ward | Coldwell Banker Realty | (305) 253-2800
Onelia Hurtado | Elevate Real Estate Brokers | (954) 559-8252
Gustavo Cabrera | Belhouse Real Estate, Llc | (305) 794-8533
Hermes Pallaviccini |  Global Luxury Realty LLC | (305) 772-7232
Maria Carrillo | Keyes - Brickell Office | (305) 984-3180
Nancy Batchelor, P.A. | COMPASS | (305) 903-2850
Winnie Uricola | Keyes - Hollywood Office | (305) 915-7721
monica Deluca | Re/Max Powerpro Realty | (954) 552-1224
Maria Cristina Korman | Keller Williams Realty Partners SW | (954) 588-2850
Ines Hegedus-Garcia | Avanti Way | (305) 758-2323
Jean-Paul Figallo | Concierge Real Estate | (754) 281-9912
[...]

您可能希望将范围增加到总页数。

【讨论】:

    猜你喜欢
    • 2020-02-06
    • 1970-01-01
    • 1970-01-01
    • 2021-04-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多