BeautifulSoup - 处理 variable.find( ).string 返回空的情况答案

【问题标题】：BeautifulSoup - Handing of cases where variable.find( ).string returns emptyBeautifulSoup - 处理 variable.find( ).string 返回空的情况
【发布时间】：2013-12-24 21:09:38
【问题描述】：

from bs4 import BeautifulSoup
import codecs
import sys

import urllib.request
site_response= urllib.request.urlopen("http://site/")
html=site_response.read()
file = open ("cars.html","wb") #open file in binary mode
file.write(html)
file.close()


soup = BeautifulSoup(open("cars.html"))
output = (soup.prettify('latin'))
#print(output) #prints whole file for testing

file_output = open ("cars_out.txt","wb")
file_output.write(output)
file_output.close()

fulllist=soup.find_all("div", class_="row vehicle")
#print(fulllist) #prints each row vehicle class for debug

for item in fulllist:
    item_print=item.find("span", class_="modelYearSort").string
    item_print=item_print + "|" + item.find("span", class_="mmtSort").string
    seller_phone=item.find("span", class_="seller-phone")
    print(seller_phone)
    # item_print=item_print + "|" + item.find("span", class_="seller-phone").string
    item_print=item_print + "|" + item.find("span", class_="priceSort").string
    item_print=item_print + "|" + item.find("span", class_="milesSort").string
    print(item_print)

我有上面的代码，它会解析一些 html 代码并生成一个管道划定文件。它工作正常，除了有一些条目在 html 代码中缺少其中一个元素（卖家电话）。并非所有条目都有卖家电话号码。

item.find("span", class_="seller-phone").string

我在这里失败了。当卖家电话丢失时，线路故障我并不感到惊讶。我得到 'AttributeError' NoneType 对象没有属性字符串。

我可以在没有 '.string' 的情况下执行 'item.find' 并取回完整的 html 块。但我不知道如何为这些情况提取文本。

【问题讨论】：

标签： python python-3.x beautifulsoup

【解决方案1】：

你是对的，如果没有找到元素，soup.find 返回None。

您可以只添加一个if/else 子句来避免这种情况：

for item in fulllist:
    span = item.find("span", class_="modelYearSort")
    if span:
        item_print = span.string
        item_print=item_print + "|" + item.find("span", class_="mmtSort").string
        seller_phone=item.find("span", class_="seller-phone")
        print(seller_phone)
        # item_print=item_print + "|" + item.find("span", class_="seller-phone").string
        item_print=item_print + "|" + item.find("span", class_="priceSort").string
        item_print=item_print + "|" + item.find("span", class_="milesSort").string
        print(item_print)
    else:
        continue #It's empty, go on to the next loop.

或者，如果您喜欢，请使用try/except 块：

for item in fulllist:
    try:
        item_print=item.find("span", class_="modelYearSort").string
    except AttributeError:
        continue #skip to the next loop.
    else:
        item_print=item_print + "|" + item.find("span", class_="mmtSort").string
        seller_phone=item.find("span", class_="seller-phone")
        print(seller_phone)
        # item_print=item_print + "|" + item.find("span", class_="seller-phone").string
        item_print=item_print + "|" + item.find("span", class_="priceSort").string
        item_print=item_print + "|" + item.find("span", class_="milesSort").string
        print(item_print)

希望这会有所帮助！

【讨论】：

谢谢，这很有帮助...如果电话号码不存在，我想我并不清楚我想做什么。我实际上不想跳到下一个项目，我只想把它当作空值，所以我的字符串有 ||在那个位置。但是，我认为我可以利用您上面提供的内容来做到这一点，因为错误处理部分是我卡住的地方。一会儿我试试
也许只需要尝试item_print = item.find('span', class_='modelYearSort', text=True)... 看看是否可行 - 这应该只返回以非空字符串开头的节点
@Jon Mmm.. 我认为问题是因为 BS 找不到跨度？
@aIKid sighs 是的......我想我会再喝一杯咖啡:)
我可以看到 find 的问题，但 find_all() 在列表理解中更糟糕，因为它崩溃了。我看不到用 except 子句甚至在理解中捕获它的方法。当开发人员忘记标记表格中的最后一列时，问题就出现了。例如 [th.get_text() for th in table.find("tr").find_all("th)] 其中 find_all 扼杀了空白名称。