【问题标题】:Scrape target search results with python使用 python 抓取目标搜索结果
【发布时间】:2021-11-03 01:30:07
【问题描述】:

我正在尝试在 Target 上抓取搜索结果。

比如我们去域名"https://www.target.com/s?searchTerm=lego+duplo"

并尝试提取产品名称、价格和产品 ID。

我尝试过 selenium,但我被要求验证我的身份。我已经尝试过请求,但我得到了禁止页面。我已经尝试过其他库,但我的想法已经不多了。

理想情况下,应该有一些javascript 使用 JSON 加载价格,但我似乎找不到它。有什么建议吗?

【问题讨论】:

    标签: python python-3.x selenium web-scraping python-requests


    【解决方案1】:

    数据以 Json 格式从外部 URL 加载。您可以使用下一个示例来模拟 Ajax 请求:

    import json
    import requests
    
    url = "https://www.target.com/s?searchTerm=lego+duplo"
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    }
    
    api_url = "https://redsky.target.com/redsky_aggregations/v1/web/plp_search_v1"
    
    params = {
        "key": "ff457966e64d5e877fdbad070f276d18ecec4a01",
        "channel": "WEB",
        "count": "24",
        "default_purchasability_filter": "false",
        "include_sponsored": "true",
        "keyword": "lego duplo",
        "offset": "0",
        "page": "/s/lego duplo",
        "platform": "desktop",
        "pricing_store_id": "3991",
        "useragent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0",
        "visitor_id": "AAA",
    }
    
    data = requests.get(api_url, params=params).json()
    
    # uncomment this to print all data:
    # print(json.dumps(data, indent=4))
    
    for p in data["data"]["search"]["products"]:
        print(
            "{:<10} {}".format(
                p["price"]["current_retail"],
                p["item"]["product_description"]["title"],
            )
        )
    

    打印:

    29.99      LEGO DUPLO Classic Brick Box First LEGO Set with Storage Box 10913
    23.99      LEGO DUPLO Super Heroes Lab Marvel Avengers Toy 10921
    32.99      LEGO DUPLO Creative Fun 10887
    16.39      LEGO DUPLO My First Number Train 10847
    14.99      LEGO DUPLO Large Green Building Plate 2304
    16.39      LEGO DUPLO Disney Minnie Mouse's Birthday Party 10873
    24.49      LEGO DUPLO Disney Ariel&#39;s Undersea Castle Building Toy; Princess Castle Under the Sea 10922
    16.39      LEGO DUPLO Construction Truck &#38; Tracked Excavator Digger and Tipper Building Site Toy 10931
    16.39      LEGO DUPLO Disney Frozen Toy Featuring Elsa and Olaf&#39;s Tea Party 10920
    9.99       LEGO DUPLO My First Fire Helicopter and Police Car 10957
    49.99      LEGO DUPLO Fire Station 10903
    16.39      LEGO DUPLO Submarine Adventure Bath Toy Building Set for Toddlers with Toy Submarine 10910
    4.99       LEGO DUPLO My First Space Rocket 30332 Building Kit
    15.99      LEGO City Construction Bulldozer Building Set 60252
    29.99      LEGO DUPLO Disney Mickey &#38; Minnie Birthday Train Kids&#39; Birthday Number Train Playset 10941
    41.99      LEGO DUPLO Princess Frozen Ice Castle Toy Castle Building Set with Frozen Characters 10899
    19.99      LEGO DUPLO My First Animal Train Pull-Along 10955
    19.99      LEGO DUPLO Fire Truck 10901
    9.89       LEGO Classic Bricks and Ideas 11001
    29.99      LEGO DUPLO Jurassic World T. rex and Triceratops Dinosaur Breakout 10939
    20.99      LEGO DUPLO Town Airport 10871
    23.99      LEGO Classic Medium Creative Brick Box Building Toys for Creative Play, Kids Creative Kit 10696
    9.99       LEGO DUPLO Police Bike 10900
    19.99      LEGO DUPLO Town Farm Tractor &#38; Animal Care Building Toy 10950
    

    【讨论】:

    • redsky 是针对 api 的吗?
    • @Syazvinski 如果您打开 Firefox (Chrome) 开发人员工具 -> 网络选项卡并重新加载页面,您可以看到这个带有参数的 URL。数据从此 URL 加载。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-07-14
    • 2023-01-17
    • 2011-12-06
    • 2019-10-16
    • 2015-11-23
    • 2016-01-28
    • 1970-01-01
    相关资源
    最近更新 更多