【问题标题】:How do I locate which file has a keyerror in python?如何在python中找到哪个文件有keyerror?
【发布时间】:2021-07-20 16:38:42
【问题描述】:

我用 python 编写了一个预处理脚本,有助于巩固信心。以下是我的脚本:

import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob


inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/') 

for file in inp_dir.glob('*.csv'):
    df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
    df['confidence'] = df['confidence'].replace(np.nan, 0.01)
    df.to_csv(file,index=False)

错误:

Traceback (most recent call last):

  File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)

  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'confidence'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "<ipython-input-1-0cbf17caf540>", line 11, in <module>
    df['confidence'] = df['confidence'].replace(np.nan, 0.01)

  File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
    raise KeyError(key) from err

KeyError: 'confidence'

我不明白我收到此错误是因为我的目录中的一个文件没有“置信度”列。但是如何找到该文件或打印文件名?

【问题讨论】:

  • 当你读取df中的csv时,检查列confidence是否存在!!比如if 'confidence' in df.columns,你有file变量,打印出来。
  • print(file) 放在循环的开头。问题出在错误之前打印的文件名中。

标签: python pandas keyerror


【解决方案1】:

添加tryexception 案例:

import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob


inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/') 

for file in inp_dir.glob('*.csv'):
    try:
        df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
        df['confidence'] = df['confidence'].replace(np.nan, 0.01)
        df.to_csv(file,index=False)
    except:
        # assumes error is known
        print("Invalid column in file:", file)

您也可以使用sys module 获取异常的错误输出。

【讨论】:

  • 注意,最好在except 字段中指定要捕获的错误(例如except KeyError:)以避免捕获其他错误。
【解决方案2】:

也许检查列名是否列出了confidence,如果没有则中断...

import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob


inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/') 

for file in inp_dir.glob('*.csv'):
    df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
    if 'confidence' not in df.columns:
        print('filename: ' + str(file))
        break
    df['confidence'] = df['confidence'].replace(np.nan, 0.01)
    df.to_csv(file,index=False)

【讨论】:

    【解决方案3】:

    打印您正在处理的文件的最简单方法。

    import pandas as pd
    import numpy as np
    from pathlib import Path
    import glob as glob
    
    
    inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/') 
    
    for file in inp_dir.glob('*.csv'):
        print(f"Reading: {file}")
        df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
        df['confidence'] = df['confidence'].replace(np.nan, 0.01)
        df.to_csv(file,index=False)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-07-04
      • 2016-02-02
      • 1970-01-01
      • 1970-01-01
      • 2021-06-30
      • 2021-10-23
      • 1970-01-01
      相关资源
      最近更新 更多