【问题标题】:Python Pandas pivot table how to handle '\xc2\xa0'?Python Pandas 数据透视表如何处理'\xc2\xa0'?
【发布时间】:2016-02-25 08:16:29
【问题描述】:

我有一个示例数据集如下:

所以我想设置时间序列,因此将所有时间序列设置为列标题。所以我的脚本如下:

#!/usr/bin/python
import pandas as pd
import os
from os.path import basename


def generate_timeSeries(fileToProcess):

    df = pd.read_csv(fileToProcess)
    timestamps = df.pivot_table('C_Number',['A_Id', 'P_Id'], 'Time Stamp')

    return timestamps

def main():

    folder_path = "Input/"

    for files in os.listdir(folder_path):

        print "processing",files
        file_to_open = os.path.join(folder_path, files)
        unicoded_file = unicode(file_to_open).encode('utf8')
        TimeSeries_dataframe = generate_timeSeries(unicoded_file)


        TimeSeries_dataframe.to_csv('Output/%s_timeseries.csv' % os.path.splitext(files)[0], sep=',', encoding='utf-8')


if __name__ == "__main__":
    main()

当我尝试运行脚本时,我收到以下错误:

pandas.core.groupby.DataError: No numeric types to aggregate

这是完整的错误跟踪:

Traceback (most recent call last):
  File "Error_AuthorTimeSeries.py", line 43, in <module>
    main()
  File "Error_AuthorTimeSeries.py", line 33, in main
    TimeSeries_dataframe = generate_timeSeries(unicoded_file)
  File "Error_AuthorTimeSeries.py", line 16, in generate_timeSeries
    timestamps = df.pivot_table('C_Number',['A_ID', 'P_ID'], 'Time Stamp')
  File "/usr/lib/python2.7/dist-packages/pandas/tools/pivot.py", line 104, in pivot_table
    agged = grouped.agg(aggfunc)
  File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 437, in agg
    return self.aggregate(func, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1994, in aggregate
    return getattr(self, arg)(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 452, in mean
    return self._cython_agg_general('mean')
  File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1917, in _cython_agg_general
    new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
  File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1964, in _cython_agg_blocks
    raise DataError('No numeric types to aggregate')
pandas.core.groupby.DataError: No numeric types to aggregate

P.S:与此问题几乎重复的是123。但是,他们没有为我的问题提供令人满意的答案。

我尝试了fill_valueastype 方法。他们运气不好。

编辑: 我试图通过添加以下内容来查找导致错误的原因(基于建议

pd.unique(df['C_number'].values)

得到如下结果:

['163' '143' '51' '43' '34' '24' '20' '15' '14' '12' '11' '10' '9' '8' '7'
 '6' '5' '4' '3' '2' '1' '\xc2\xa0' '145' '35' '16' '164' '146' '36' '21'
 '165' '148' '37' '171' '154' '52' '44' '22' '17' '13' '158' '160' '147'
 '161']

所以我相信 '\xc2\xa0' 是罪魁祸首,尽管在 UTF-8 中反复使用编码。所以我在函数generate_timeSeries()中添加了以下两行:

df.loc[df['Cited By Numbers']=='\xc2\xa0', 'Cited By Numbers' ] = '0'
df['Cited By Numbers'] = df['Cited By Numbers'].astype(int)

虽然对于具有'\xc2\xa0' 的文件来说,这似乎是一个临时解决方案,但对于没有 具有这些字符的文件来说似乎是一个问题,因为它会导致以下错误跟踪:

Traceback (most recent call last):
  File "imeSeries.py", line 66, in <module>
    main()
  File "TimeSeries.py", line 56, in main
    TimeSeries_dataframe = generate_timeSeries(unicoded_file)
  File "TimeSeries.py", line 23, in generate_timeSeries
    df.loc[df['C_Numbers']=='\xc2\xa0', 'C_Numbers' ] = '0'
  File "/usr/lib/python2.7/dist-packages/pandas/core/ops.py", line 563, in wrapper
    res = na_op(values, other)
  File "/usr/lib/python2.7/dist-packages/pandas/core/ops.py", line 532, in na_op
    raise TypeError("invalid type comparison")
TypeError: invalid type comparison

解决此问题的正确方法是什么?

任何帮助将不胜感激。

【问题讨论】:

  • 请显示您输入的 csv 的一些行。似乎它被识别为文本。
  • 哦,我明白了,您在该列中有一些字形。可能用空值替换它们(在导入熊猫之前)?
  • @MKesper 如何替换?我不知道他们是什么角色。当我在 gedit 等文本编辑器中打开时,它们显示为空格。
  • 你能试试像 dataframe.fillna(0) 这样的东西吗?
  • @Sword 试过了。没有帮助。同样的错误。

标签: python csv numpy pandas encoding


【解决方案1】:

我设法通过在原始脚本中添加以下行来解决这个问题。

df = df.convert_objects(convert_numeric=True)

【讨论】:

    猜你喜欢
    • 2018-02-03
    • 2021-04-18
    • 2015-12-01
    • 1970-01-01
    • 2023-03-23
    • 2018-10-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多