有时您尝试抓取的信息丢失时制作网络抓取工具答案

【问题标题】：Making a webscraper when sometimes the information you are trying to scrape is missing有时您尝试抓取的信息丢失时制作网络抓取工具
【发布时间】：2021-05-15 16:25:27
【问题描述】：

当我尝试抓取页面上所有显卡的品牌时，它适用于前 15 个，但后来我得到了TypeError: 'NoneType' object is not subscriptable。

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq

# the url we want to scrape and saves it to a variable
url = 'https://www.newegg.com/p/pl?d=graphics+card&RandomID=551877219014822520210210001440&PageSize=36'

# opens the url and returns a file object
uClient = uReq(url)

# reads the object and returns the html contents as a string
page_html = uClient.read()

# closes the file
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")

# grabs each element with the class of item container and stores in a variable

containers = page_soup.findAll("div", {"class": "item-container"})

# scraping the brands of each graphics card from the website

for container in containers:
    brand = container.div.div.a.img["title"]
    print(brand)

【问题讨论】：

标签： python web-scraping beautifulsoup

【解决方案1】：

在这种情况下，捕获Exception（在您的情况下为TypeError）可能是有意义的

try:
    something_which_may_raise()
except TypeError:  # specific Exception
    my_code_to_handle_exception()  # maybe do nothing
except Exception as ex:  # generic Exception
    # NOTE you can collect the Exception object here to interact with it!
    print("caught an unexpected Exception: {}".format(repr(ex))
    raise ex  # re-raise that Exception to the calling function

对于你的情况，也许这就是你要找的

...
for index, container in enumerate(containers):
    try:
        print(container.div.div.a.img["title"])  # brand
    except Exception as ex:
        print("couldn't read brand from container {}".format(index))

【讨论】：

哇，好用，非常感谢。问题：为什么使用 enumerate() 使它起作用？枚举不只是产生一个计数器和值吗？例如选择 = ['a', 'b', 'c', 'd'] for index, value in enumerate(choices): print(index, value) 将产生 0 a 1 b 2 c 3 d
啊，不是enumerate 使它起作用，而是某种形式的将try/except 放入循环中或不重新引发异常；是的，正是docs.python.org/3/library/functions.html#enumerate
好的，我在没有枚举的情况下尝试了它并且它可以工作，所以我认为使它工作的原因是你的 except 语句“except Exception as ex:”。你能解释一下为什么会这样吗，异常参数为 ex 的实际作用是什么
它几乎可以肯定是重新加注（我的第一个例子是raise ex，并且简单地raise 会选择最近的）或者包裹整个循环体（错误地？）然后编辑超过中间代码！ as ex 组在 except 块 (more here, but it may be unpleasantly theory-heavy) 的上下文中将异常分配给 name ex（可以是任何名称）。这允许您与异常对象see Python datamodel