【问题标题】:missing data in pandas read_csv熊猫 read_csv 中缺少数据
【发布时间】:2013-03-21 09:31:08
【问题描述】:

我的数据:


a,b,c,d,e,f
1.5,4.8,,6.3
1.60,5.2,6.5,7.2
1.70,5.5,6.6,8.3,5.7
1.80,6.1,6.7,9.7,6.2
1.90,7.1,6.8,11.1,6.7
2,,6.8,12.5,7.3
2.08,,,,7.8
2.1,,7.2
2.2,,8.0
2.3,,8.7
2.4,,9.2,8.2

from pandas import read_csv
ds = read_csv ('lin-nan.dat', index_col=0, sep=',')

Traceback (most recent call last):
  File "read_lin.py", line 7, in <module>
    ds = read_csv ('lin-nan.dat', index_col=0, sep=',')
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 253, in read_csv
    return _read(TextParser, filepath_or_buffer, kdict)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 202, in _read
    return parser.get_chunk()
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 844, in get_chunk
    alldata = self._rows_to_cols(content)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 809, in _rows_to_cols
    raise ValueError(msg)
ValueError: Expecting 6 columns, got 5 in row 1

【问题讨论】:

    标签: pandas


    【解决方案1】:

    您可以使用read_csv 函数的error_bad_lines=False 选项。它会自动跳过格式错误的行并打印出来。

    【讨论】:

      【解决方案2】:

      问题是你没有任何长度为 6 的列(最长为 5),我不认为read_csv 中有一个关键字可以克服这个问题。 p>

      一个解决方案是更明确:

      In [1]: df = pd.read_csv('lin-nan.dat', names=list('abcde'), index_col=0, skiprows=1)
      
      In [2]: df['f'] = np.nan
      
      In [3]: df
      Out[3]: 
              b    c     d    e   f
      a                            
      1.50  4.8  NaN   6.3  NaN NaN
      1.60  5.2  6.5   7.2  NaN NaN
      1.70  5.5  6.6   8.3  5.7 NaN
      1.80  6.1  6.7   9.7  6.2 NaN
      1.90  7.1  6.8  11.1  6.7 NaN
      2.00  NaN  6.8  12.5  7.3 NaN
      2.08  NaN  NaN   NaN  7.8 NaN
      2.10  NaN  7.2   NaN  NaN NaN
      2.20  NaN  8.0   NaN  NaN NaN
      2.30  NaN  8.7   NaN  NaN NaN
      2.40  NaN  9.2   8.2  NaN NaN
      

      【讨论】:

        猜你喜欢
        • 2021-12-18
        • 1970-01-01
        • 2017-06-22
        • 1970-01-01
        • 2019-12-31
        • 2012-11-04
        • 2017-05-14
        • 2018-01-27
        • 2016-03-30
        相关资源
        最近更新 更多