【发布时间】:2018-11-14 19:54:29
【问题描述】:
我有一个非常大的 CSV 文件,我设法按列 ID 排序,但我无法计算具有该列 ID 的平均列值。
88741,42.84286022,16.41829224,1
88797,42.78081536,16.40743455,1
88797,42.78081536,16.21153455,1
88823,42.51512511,16.43304948,2
88885,42.88204193,16.12412548,2
87227,42.88204193,16.64223948,3
and so on...
我需要获取一个没有 SchoolCode 列的新 csv,并为每个集群平均 Lat 和 Long。而且,数字应该是相同的。我试过 pandas,它给我抛出了这个错误。
输出应该是这样的:
Lat,Long,Cluster
<average_lat_forCluster1>,<average_long_forCluster1>,1
<average_lat_forCluster2>,<average_long_forCluster2>,2
<average_lat_forCluster3>,<average_long_forCluster3>,3
and so on...
我的代码:
import pandas as pd
df = pd.read_csv('SortedCluster.csv', names=[
'SchoolCode', 'Lat', 'Long', 'Cluster'])
df2 = df.groupby('Cluster')['Lat','Long'].mean()
df2.to_csv('AverageOutput.csv')
错误:
Traceback (most recent call last):
File "averager.py", line 6, in <module>
df2 = df.groupby('Cluster')['Lat','Long'].mean()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1306, in mean
return self._cython_agg_general('mean', **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 3974, in _cython_agg_general
how, alt=alt, numeric_only=numeric_only, min_count=min_count)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 4046, in _cython_agg_blocks
raise DataError('No numeric types to aggregate')
pandas.core.base.DataError: No numeric types to aggregate
【问题讨论】:
-
你想要
df.groupby('Cluster')['Lat','Long'].mean()吗?
标签: python-3.x pandas csv