dropna() 标签问题答案

【问题标题】：dropna() trouble with labelsdropna() 标签问题
【发布时间】：2013-09-26 11:01:58
【问题描述】：

我正在尝试对 pandas 中的一组数据进行平均。来自 csv 文件的数据。我有一个名为“track”的系列。在早期阶段，我使用dropna() 方法去除读取 csv 文件时导入的一些空白行。

我使用的方法是平均超过 5 行的列。我不能使用 rolling_mean 方法，因为我想使用当前值之前的两行、当前值和当前值之后的两行来取平均值。

当我获取到标签也已删除 NaN 数据的数据时遇到了问题。

def get_data(filename):
    '''function to read the data form the input csv file to use in the analysis'''
    with open(filename, 'r') as f:
        reader = pd.read_csv(f, sep=',', usecols=('candidate',' final track' ,' status'))                      
    print reader[0:20]            
    reader=reader.dropna()
    print reader[0:20]
    return reader 

def relative_track(nb):

    length= len(reader) 
    track=current_tracks.loc[:,' final track']
    for el in range(2, length):
        means=pd.stats.moments.rolling_mean(track, 5)
        print means

这给出了输出（注意第二次打印中缺少 15、16 处的标签）：

                candidate   final track  status
0                       1           719       *
1                       2           705       *
2                       3           705       *
3                       4           706       *
4                       5           704       *
5                       1           708       *
6                       2           713       *
7                       3           720       *
8                       4           726       *
9                       5           729       *
10                      1           745       *
11                      2           743       *
12                      3           743       *
13                      4           733       *
14                      5           717       *
15                    NaN           NaN     NaN
16  *** Large track split           NaN     NaN
17                      1           714       *
18                      2           695       *
19                      3           690       *
   candidate   final track  status
0          1           719       *
1          2           705       *
2          3           705       *
3          4           706       *
4          5           704       *
5          1           708       *
6          2           713       *
7          3           720       *
8          4           726       *
9          5           729       *
10         1           745       *
11         2           743       *
12         3           743       *
13         4           733       *
14         5           717       *
17         1           714       *
18         2           695       *
19         3           690       *
20         4           671       *
21         5           657       *

但是当我尝试使用第二个函数计算均值时，我得到了错误：

    raise KeyError("stop bound [%s] is not in the [%s]" % (key.stop,self.obj._get_axis_name(axis)))
KeyError: 'stop bound [15] is not in the [index]'

这是因为索引 15 不存在。如果有人可以提供帮助，那就太好了。

【问题讨论】：

标签： python csv pandas average

【解决方案1】：

我不能使用 rolling_mean 方法，因为我想使用当前值之前的两行、当前值和当前值之后的两行来取平均值。

使用关键字参数center=True，在this section of the documentation 末尾描述。

另外，pd.stats.moments.rolling_mean 可以简单地作为pd.rolling_mean 访问；它是 pandas 中的顶级函数。

附：我想我在这里理解了您的意图，但是您的代码可能存在一些与您的问题无关的问题。（例如，最后一个 for 循环中的 el 计数变量没有被使用——看起来它只是重复做同样的事情。）但也许 center 关键字无论如何都可以消除你现有的大部分工作。

【讨论】：