【问题标题】:scrape linked categories links until no more category抓取链接的类别链接,直到没有更多类别
【发布时间】:2021-11-14 22:43:10
【问题描述】:

在这个网站https://mavin.io/category 上有多个类别。然后每个类别进一步具有更多类别,依此类推。当一个类别最后到达时,它会显示此页面上的产品列表https://mavin.io/search?q=&cat=33695

我想遍历所有类别并获得像 https://mavin.io/search?q=&cat=33695 这样的产品列表链接(不是产品链接)

抓取这些链接类别的解决方案是什么?

import requests
from lxml.html import fromstring

url = 'https://mavin.io/category'
r = requests.get(url)

【问题讨论】:

  • 你见过scrapy吗?
  • 是的,我用过。
  • 然后寻找 BS4,它将为您解析网页以获取链接,然后您可以将其输入到上面的代码中,如果您不想只是松散地设置 scrapy 并且它会尽其所能找到
  • 有关于这方面的帮助文章吗?
  • 最好的起点是一如既往地使用原始文档:crummy.com/software/BeautifulSoup/bs4/doc

标签: python web-scraping beautifulsoup python-requests scrapy


【解决方案1】:

你可以做一个递归函数,遍历所有类别,直到没有找到:

import requests
from bs4 import BeautifulSoup

url = "https://mavin.io/category"
s = requests.session()


def recur(url, path=None):
    if path is None:
        path = []

    r = s.get(url)
    soup = BeautifulSoup(r.content, "html.parser")
    cat_links = soup.select(".item-image a:has(h4)")
    for a in cat_links:
        yield from recur(
            "https://mavin.io" + a["href"], path + [a.h4.get_text(strip=True)]
        )

    if not cat_links:
        yield r.url, path


for link, path in recur(url):
    print(link, path)

打印:

https://mavin.io/search?q=&cat=33695 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Consoles & Parts']
https://mavin.io/search?q=&cat=63691 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Cup Holders']
https://mavin.io/search?q=&cat=40017 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Dash Parts']
https://mavin.io/search?q=&cat=33698 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Glove Boxes']
https://mavin.io/search?q=&cat=179848 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Interior Door Handles']
https://mavin.io/search?q=&cat=33696 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Interior Door Panels & Parts']
https://mavin.io/search?q=&cat=33700 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Pedals & Pads']
https://mavin.io/search?q=&cat=33701 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seats']
https://mavin.io/search?q=&cat=50458 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seat Belt Shoulder Pads']
https://mavin.io/search?q=&cat=33702 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seat Covers']
https://mavin.io/search?q=&cat=33703 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Shift Knobs & Boots']
https://mavin.io/search?q=&cat=33704 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Steering Wheels & Horns']
https://mavin.io/search?q=&cat=46102 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Sun Visors']
https://mavin.io/search?q=&cat=50459 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Switches & Controls']
https://mavin.io/search?q=&cat=33697 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Floor Mats & Carpets']
https://mavin.io/search?q=&cat=63690 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Cargo Nets, Trays & Liners']
https://mavin.io/search?q=&cat=33699 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Mirrors']
https://mavin.io/search?q=&cat=33705 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Trim']
https://mavin.io/search?q=&cat=40018 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Window Cranks & Parts']
https://mavin.io/search?q=&cat=33706 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Window Motors & Parts']
https://mavin.io/search?q=&cat=33651 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Exterior', 'Racks']
https://mavin.io/search?q=&cat=36475 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Exterior', 'Body Kits']

...and so on.

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-12-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多