【问题标题】:Code works line-by-line in command prompt, but new problem appears when I run the file as a whole代码在命令提示符下逐行运行,但是当我将文件作为一个整体运行时出现新问题
【发布时间】:2020-04-08 08:35:19
【问题描述】:

我刚刚复制了 Python 网络抓取指南中使用的代码,当我在命令提示符下测试每一行时,一切正常。

但是,当我运行整个文件时,我会收到以下消息:

File "web_scrape_practice.py", line 23, in "module"
    shipping = shipping_container[0].text.strip()
IndexError: list index out of range

这是我的代码:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://www.newegg.com/p/pl?d=graphics+cards"

uReq(my_url)
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {"class": "item-container"})
container = containers[4]
brandDiv = container.find("div","item-info")

for container in containers:
    brand = brandDiv.div.img["title"]

    title_container = container.findAll("a", {"class":"item-title"})
    product_name = title_container[0].text

    shipping_container = container.findAll("li", {"class":"price-ship"})
    shipping = shipping_container[0].text.strip()


    print("brand: " + brand)
    print("product_name: " + product_name)
    print("shipping: " + shipping)<br><br>

第 23 行是这个:

shipping = shipping_container[0].text.strip()

任何帮助表示赞赏。

【问题讨论】:

  • “当我在命令提示符下测试每一行时,一切正常。” 你是在为containers 的每个元素测试这个吗?
  • IndexError: list index out of range 表示您正在尝试访问不在列表中的对象(例如访问 3 对象长列表中的第 6 个对象)。这意味着 container.findAll 以某种方式返回一个空列表。可以测试列表是否至少有 1 个元素长 if len(shipping_container) &gt; 0
  • 错误很明显。索引超出范围。使用try/exceptif 条件处理。

标签: python html indexing error-handling beautifulsoup


【解决方案1】:

问题是,您正在使用id=recommendItems 捕获标签内的物品,并且它们没有任何运输信息。最简单的解决方案是将它们从搜索中排除。

例如:

import requests
from bs4 import BeautifulSoup as soup

my_url = "https://www.newegg.com/p/pl?d=graphics+cards"
page_soup = soup(requests.get(my_url).text, "html.parser")

for container in page_soup.select(':not(#recommendItems).items-view .item-container'):
    brand = container.select_one('a.item-brand img[alt]')['alt']
    product_name = container.select_one('a.item-title').get_text(strip=True)
    shipping = container.select_one('li.price-ship').get_text(strip=True)

    print("brand: ", brand)
    print("product_name: ", product_name)
    print("shipping: ", shipping)

打印:

brand:  MSI
product_name:  MSI GeForce RTX 2070 DirectX 12 RTX 2070 GAMING 8G 8GB 256-Bit GDDR6 PCI Express 3.0 x16 HDCP Ready Video Card
shipping:  Free Shipping
brand:  EVGA
product_name:  EVGA GeForce RTX 2080 Ti DirectX 12 11G-P4-2281-KR 11GB 352-Bit GDDR6 PCI Express 3.0 HDCP Ready SLI Support BLACK EDITION GAMING Video Card, Dual HDB Fans & RGB LED
shipping:  $6.99 Shipping
brand:  MSI
product_name:  MSI GeForce RTX 2070 SUPER DirectX 12 RTX 2070 Super GAMING X 8GB 256-Bit GDDR6 PCI Express 3.0 x16 HDCP Ready SLI Support Video Card
shipping:  Free Shipping
brand:  ZOTAC

... and so on.

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-03-18
    • 1970-01-01
    • 2015-08-26
    • 2020-11-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多