【发布时间】:2018-01-19 11:49:48
【问题描述】:
我有以下问题。我尝试从他的链接https://www.amazon.com/workout-clothes/b/ref=nav_shopall_sa_sp_athclg/151-4490025-2599936?ie=UTF8&node=11444071011 抓取亚马逊子类别 我使用函数 begin_crawl()。如何从此链接中提取子类别?只看这行之后的代码:subcategories = page.find_all("div", {"class": "mm-column"})。从类别中提取子类别是否有另一种选择?我有 TypeError: 'NoneType' object is not callable。我附上了所有的错误代码。我将不胜感激。
def begin_crawl():
with open(settings.start_file, "r") as f:
for line in f:
line = line.strip()
if not line or line.startswith("#"):
continue # skip blank and commented out lines
page, html = make_request(line)
count = 0
# look for subcategory links on this page
subcategories = page.find_all("div", {"class": "mm-column"})
subcategories.extend(page.find_all("ul", {"class": "mm-category-list"}))
subcategories.extend(page.find("li"))
sidebar = page.find("div", "a-col-left")
if sidebar:
subcategories.extend(sidebar.findAll("li")) # left sidebar
for subcategory in subcategories:
link = subcategory.find("a")
if not link:
continue
link = link["href"]
count += 1
enqueue_url(link)
log("Found {} subcategories on {}".format(count, line))
错误是
Traceback (most recent call last):
File "crawler.py", line 106, in <module>
begin_crawl() # put a bunch of subcategory URLs into the queue
File "crawler.py", line 35, in begin_crawl
subcategories = page.find_all("div", {"class": "mm-column"})
TypeError: 'NoneType' object is not callable
【问题讨论】: