【问题标题】:Pandas DataFrames KeyError:0Pandas DataFrames KeyError:0
【发布时间】:2018-09-28 18:27:26
【问题描述】:

我只是想从该站点的表格中提取第一列网址。而且我一直遇到KeyError:0。我刚刚开始学习python。

Traceback (most recent call last):
  File "riscribble.py", line 13, in <module>
    lic_link = soup_data[0].find('a').text
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\bs4\element.py", line 1071, in __getitem__
    return self.attrs[key]
KeyError: 0

任何关于我为什么会收到此错误和/或如何更正的想法将不胜感激。

from bs4 import BeautifulSoup as soup
import requests as r
import pandas as pd

url = 'http://www.crb.state.ri.us/verify_CRB.php?page=0&letter='

data = r.get(url)

page_data = soup(data.text, 'html.parser')

soup_data = page_data.find('table')

lic_link = soup_data[0].find('a').text

df = pd.DataFrame()

for each in soup_data:
    lic_link = each.find('a').text

    df=df.append(pd.DataFrame({'LicenseURL': lic_link}, index=[0]))

df.to_csv('RI_License_urls.csv', index=False)

【问题讨论】:

    标签: python pandas dataframe keyerror


    【解决方案1】:

    进口:

    from bs4 import BeautifulSoup as soup
    import requests as r
    import pandas as pd
    import re
    

    获取您的页面:

    url = 'http://www.crb.state.ri.us/verify_CRB.php?page=0&letter='
    
    data = r.get(url)
    
    page_data = soup(data.text, 'html.parser')
    

    选择您的链接:

    links = [link.text for link in page_data.table.tr.find_all('a') if re.search('licensedetail.php', str(link))]
    
    links -> 32922
    
    # or
    
    links = [link for link in page_data.table.tr.find_all('a') if re.search('licensedetail.php', str(link))]
    
    links -> <a href="licensedetail.php?link=32922&amp;type=Resid">32922</a>
    
    # or
    
    links = [link['href'] for link in page_data.table.tr.find_all('a') if re.search('licensedetail.php', str(link))]
    
    links -> licensedetail.php?link=32922&type=Resid
    
    # or
    
    links = [r'www.crb.state.ri.us/' + link['href'] for link in page_data.table.tr.find_all('a') if re.search('licensedetail.php', str(link))]
    
    links -> www.crb.state.ri.us/licensedetail.php?link=32922&type=Resid
    

    完成:

    df = pd.DataFrame(links, columns=['LicenseURL'])
    
    df.to_csv('RI_License_urls.csv', index=False)
    

    【讨论】:

      【解决方案2】:

      soup_data = page_data.find('table') 更改为soup_data = page_data.find_all('table')find 只查找第一个匹配对象,而find_all 查找所有匹配对象。请参阅here 了解更多信息。

      from bs4 import BeautifulSoup as soup
      import requests as r
      import pandas as pd
      
      url = 'http://www.crb.state.ri.us/verify_CRB.php?page=0&letter='
      
      data = r.get(url)
      
      page_data = soup(data.text, 'html.parser')
      
      soup_data = page_data.find_all('table')
      
      df = pd.DataFrame()
      
      for each in soup_data:
          lic_link = each.find('a').text
      
          df=df.append(pd.DataFrame({'LicenseURL': lic_link}, index=[0]))
      
      df.to_csv('RI_License_urls.csv', index=False)
      

      【讨论】:

      • 另外,您可以使用findAllfind_all,它们是相同的方法。它曾经只是 findAll,但我相信他们添加了 find_all 由于 PEP8。
      • 谢谢。错误消失了,但它只是在打印 LicenseURL 返回搜索页面 返回搜索页面 32922
      猜你喜欢
      • 2018-05-16
      • 2015-06-02
      • 1970-01-01
      • 2023-02-04
      • 1970-01-01
      • 2018-08-19
      • 1970-01-01
      • 2018-03-16
      • 1970-01-01
      相关资源
      最近更新 更多