【问题标题】:Python - CSV - Calculate average of column values by a column idPython - CSV - 按列 ID 计算列值的平均值
【发布时间】:2018-11-14 19:54:29
【问题描述】:

我有一个非常大的 CSV 文件,我设法按列 ID 排序,但我无法计算具有该列 ID 的平均列值。

88741,42.84286022,16.41829224,1
88797,42.78081536,16.40743455,1
88797,42.78081536,16.21153455,1
88823,42.51512511,16.43304948,2
88885,42.88204193,16.12412548,2
87227,42.88204193,16.64223948,3
and so on...

我需要获取一个没有 SchoolCode 列的新 csv,并为每个集群平均 Lat 和 Long。而且,数字应该是相同的。我试过 pandas,它给我抛出了这个错误。

输出应该是这样的:

Lat,Long,Cluster
<average_lat_forCluster1>,<average_long_forCluster1>,1
<average_lat_forCluster2>,<average_long_forCluster2>,2
<average_lat_forCluster3>,<average_long_forCluster3>,3
and so on...

我的代码:

import pandas as pd

df = pd.read_csv('SortedCluster.csv', names=[
             'SchoolCode', 'Lat', 'Long', 'Cluster'])
df2 = df.groupby('Cluster')['Lat','Long'].mean()
df2.to_csv('AverageOutput.csv')

错误:

    Traceback (most recent call last):
  File "averager.py", line 6, in <module>
    df2 = df.groupby('Cluster')['Lat','Long'].mean()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1306, in mean
    return self._cython_agg_general('mean', **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 3974, in _cython_agg_general
    how, alt=alt, numeric_only=numeric_only, min_count=min_count)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 4046, in _cython_agg_blocks
    raise DataError('No numeric types to aggregate')
pandas.core.base.DataError: No numeric types to aggregate

【问题讨论】:

  • 你想要df.groupby('Cluster')['Lat','Long'].mean() 吗?

标签: python-3.x pandas csv


【解决方案1】:

如果需要,我认为需要先将值转换为数字:

df[['Lat','Long']] = df[['Lat','Long']].apply(pd.to_numeric, errors='coerce')

然后按组聚合mean

df.groupby('Cluster')['Lat','Long'].mean()

【讨论】:

  • 对不起。我忘记了文件中的标题。如果您可以查看,我编辑了我的问题
  • print (df.columns.tolist()) 是什么?
  • 在哪里?我没看到。我没有那条线
  • 您可以在df = pd.read_csv('SortedCluster.csv')下添加此行
  • 它打印这个:['88741', '42.84286022', '16.41829224', '1']
猜你喜欢
  • 2018-06-23
  • 2013-04-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-08-03
  • 2014-06-04
  • 1970-01-01
  • 2012-07-12
相关资源
最近更新 更多