【问题标题】:error with read html tag attributes with beautifulsoup使用 beautifulsoup 读取 html 标记属性时出错
【发布时间】:2018-01-09 10:07:15
【问题描述】:

我正在尝试使用 beautifulsoup 根据数据属性读取和列出 td 中的文本

 tr=BeautifulSoup(str(input),'lxml')
        tags=tr.findAll('td')
        for t in tags:      
            if t.attrs['data-property']== 'OSVersion':
               ver=t.text

这给了我没有细节的错误

KeyError: 'data-property'

请参阅以下示例 tr 提取为输入

<tr > 
<td class=" resizable reorderable" data-property="OSVersion">10.2.1</td>
<td class=" resizable reorderable" data-property="DisplayModel">iPad Mini 4 (64 GB Space Gray)</td>
<td class=" resizable reorderable" data-property="PhoneNumber"></td>
<td class="grid_customvariable_colsize resizable reorderable" data-property="DeviceCustomAttributeDetails"></td>
<td class=" resizable reorderable" data-property="DeviceTagDetails"></td>
<td class=" resizable reorderable" data-property="EnrollmentStatusName">    <div class="grid_resizable_col">Enrolled</div>
</td>
<td class=" resizable reorderable" data-property="ComplianceStatusName">    <div class="grid_resizable_col">Compliant</div>
</td>

<td class=" resizable reorderable" data-property="IMEI"></td>
<td class=" resizable reorderable" data-property="LocationGroupName">iOS</td>
<td class=" resizable reorderable" data-property="IsCompromisedYN">No</td>
<td class=" resizable reorderable" data-property="HomeCarrier">Not Reported </td>
<td class=" resizable reorderable" data-property="CurrentCarrier">Not Reported </td>
<td class=" resizable reorderable" data-property="WiFiIPAddress"></td>

<td class=" resizable reorderable" data-property="Notes"></td>
<td class=" resizable reorderable" data-property="WnsStatus">        <span>Disconnected</span>
</td>
<td class=" resizable reorderable" data-property="DmLastSeenTime">    <span class="icon arrow_down_stretched red">-</span>
</td>                    
</tr>

如果我按照以下方式使用单个 dict,它可以正常工作

d={'class': ['', 'resizable', 'reorderable'], 'data-property': 'FriendlyName'}
print d['data-property']

有人知道如何解决吗?

谢谢

【问题讨论】:

  • BeautifulSoup 中重命名变量str - 这是一个保留字。
  • 试过了,不是这样,如果是这个原因,它甚至不会运行传递到下一行

标签: python beautifulsoup python-requests


【解决方案1】:

来了。 代码:

from bs4 import BeautifulSoup
with open("xmlfile.xml", "r") as f: # opening xml file
    content = f.read() # xml content stored in this variable
soup = BeautifulSoup(content, "lxml")
for values in soup.findAll("td"):
    if  values["data-property"] == "OSVersion":
        print values.text

输出:

10.2.1

【讨论】:

    【解决方案2】:

    是的,正确。我们错了。

    在您的代码中进行以下更改,因为您获得了KeyError

    if 'data-property' in t.attrs and t.attrs['data-property']== 'OSVersion':

    我对演示代码的回答:

    t.attrs 返回元组列表。例如[(u'class', u' resizable reorderable'), (u'data-property', u'OSVersion')].

    我们需要通过dict方法转换成字典格式。例如attributes = dict(t.attrs)

    在条件下,检查键是否存在。例如if 'data-property' in attributes and attributes['data-property']== 'OSVersion':

    演示:

    import BeautifulSoup
    tr = BeautifulSoup.BeautifulSoup(data)
    tags = tr.findAll('td')
    for t in tags:    
        attributes = dict(t.attrs)
        if 'data-property' in attributes and attributes['data-property']== 'OSVersion':
            ver = t.text
    

    如果您还有任何问题,请告诉我。免费ping我。

    【讨论】:

    • 这确实有效,我检查了所有键,结果发现有一个 td 由于未知原因缺少数据属性 attr,一旦我添加了一行来测试这个 attr 的存在,一切正常,非常感谢您的帮助!
    【解决方案3】:

    不用和attrs搞混了:

    from bs4 import BeautifulSoup as BS
    
    html = """<tr > 
    <td class=" resizable reorderable" data-property="OSVersion">10.2.1</td>
    <td class=" resizable reorderable" data-property="DisplayModel">iPad Mini 4 (64 GB Space Gray)</td>
    <td class=" resizable reorderable" data-property="PhoneNumber"></td>
    <td class="grid_customvariable_colsize resizable reorderable" data-property="DeviceCustomAttributeDetails"></td>
    <td class=" resizable reorderable" data-property="DeviceTagDetails"></td>
    <td class=" resizable reorderable" data-property="EnrollmentStatusName">    <div class="grid_resizable_col">Enrolled</div>
    </td>
    <td class=" resizable reorderable" data-property="ComplianceStatusName">    <div class="grid_resizable_col">Compliant</div>
    </td>
    
    <td class=" resizable reorderable" data-property="IMEI"></td>
    <td class=" resizable reorderable" data-property="LocationGroupName">iOS</td>
    <td class=" resizable reorderable" data-property="IsCompromisedYN">No</td>
    <td class=" resizable reorderable" data-property="HomeCarrier">Not Reported </td>
    <td class=" resizable reorderable" data-property="CurrentCarrier">Not Reported </td>
    <td class=" resizable reorderable" data-property="WiFiIPAddress"></td>
    
    <td class=" resizable reorderable" data-property="Notes"></td>
    <td class=" resizable reorderable" data-property="WnsStatus">        <span>Disconnected</span>
    </td>
    <td class=" resizable reorderable" data-property="DmLastSeenTime">    <span class="icon arrow_down_stretched red">-</span>
    </td>                    
    </tr>"""
    
    soup = BS(html)
    tags=soup.findAll('td')
    for t in tags:
        if t['data-property'] == 'OSVersion':
            ver=t.text
            print(ver)
    

    输出:

    10.2.1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-08-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多