Pandas 使用 For 循环赋值时出现 KeyError答案

【问题标题】：KeyError when Assigning value using For Loop by PandasPandas 使用 For 循环赋值时出现 KeyError
【发布时间】：2017-09-04 05:35:58
【问题描述】：

我有一长串数据，有意义的数据夹在 0 值之间，如下所示

0 和有意义的值序列的长度是可变的。我想提取有意义的序列，将它们中的每一个都提取到数据框中的一行中。比如上面的数据可以抽取成这样：

1
2   3   1
1

我使用这段代码“切片”了有意义的数据：

import pandas as pd
import numpy as np

raw = pd.read_csv('data.csv')

df = pd.DataFrame(index=np.arange(0, 10000),columns = ['DT01', 'DT02', 'DT03', 'DT04', 'DT05', 'DT06', 'DT07', 'DT08', 'DT02', 'DT09', 'DT10', 'DT11', 'DT12', 'DT13', 'DT14', 'DT15', 'DT16', 'DT17', 'DT18', 'DT19', 'DT20',])
a = 0
b = 0
n=0

for n in range(0,999999):
    if raw.iloc[n].values > 0:
        df.iloc[a,b] = raw.iloc[n].values
        a=a+1
        if raw [n+1] == 0:
            b=b+1
            a=0

但我不断收到 KeyError: n，而 n 是第一行之后的行，其值不同于 0。

我的代码哪里出了问题？就速度和内存成本而言，有什么方法可以改进它吗？非常感谢

【问题讨论】：

标签： python python-3.x pandas dataframe

【解决方案1】：

你可以使用：

df['Group'] = df['col'].eq(0).cumsum()
df = df.loc[ df['col'] != 0]

df = df.groupby('Group')['col'].apply(list)
print (df)

Group
2          [1]
4    [2, 3, 1]
8          [1]
Name: col, dtype: object

df = pd.DataFrame(df.groupby('Group')['col'].apply(list).values.tolist())
print (df)
   0    1    2
0  1  NaN  NaN
1  2  3.0  1.0
2  1  NaN  NaN

【讨论】：

我正在尝试所有的解决方案，如果我有任何问题，我会告诉你，非常感谢！
使用第一个代码时出现 KeyError 'col'，使用第二个代码时出现组错误，我在这里缺少什么？
我的输入数据框有col 作为列名df = pd.DataFrame({'col' : [0,0,1,0,0,2,3,1,0,0,0,0,1,0]})。所以只需要改变它。
如果列名是 0 则 df['col'] 可以更改为 df[0] 或 df['0'] - 如果 0 是 int 或 string
哦，对了，我太傻了，现在它就像一个魅力，谢谢！

【解决方案2】：

让我们试试这个输出一个数据框：

df.groupby(df[0].eq(0).cumsum().mask(df[0].eq(0)),as_index=False)[0]\
  .apply(lambda x: x.reset_index(drop=True)).unstack(1)

输出：

     0    1    2
0  1.0  NaN  NaN
1  2.0  3.0  1.0
2  1.0  NaN  NaN

或者一个字符串：

df.groupby(df[0].eq(0).cumsum().mask(df[0].eq(0)),as_index=False)[0]\
  .apply(lambda x: ' '.join(x.astype(str)))

输出：

0        1
1    2 3 1
2        1
dtype: object

或者作为一个列表：

df.groupby(df[0].eq(0).cumsum().mask(df[0].eq(0)),as_index=False)[0]\
  .apply(list)

输出：

0          [1]
1    [2, 3, 1]
2          [1]
dtype: object

【讨论】：

我正在尝试所有的解决方案，如果我有任何问题，我会告诉你，非常感谢！

【解决方案3】：

试试这个，我分解步骤

df.LIST=df.LIST.replace({0:np.nan})
df['Group']=df.LIST.isnull().cumsum()
df=df.dropna()
df.groupby('Group').LIST.apply(list)
Out[384]: 
Group
2              [1]
4        [2, 3, 1]
8              [1]
Name: LIST, dtype: object

数据输入

df = pd.DataFrame({'LIST' : [0,0,1,0,0,2,3,1,0,0,0,0,1,0]})

【讨论】：

我正在尝试所有解决方案，如果我有任何问题，我会告诉你，非常感谢！

【解决方案4】：

让我们从将原始数据打包到 pandas 数据框开始（在现实生活中，您可能会使用 pd.read_csv() 来生成此数据框）：

raw = pd.DataFrame({'0' : [0,0,1,0,0,2,3,1,0,0,0,0,1,0]})

默认索引将帮助您定位零跨度：

s1 = raw.reset_index()
s1['index'] = np.where(s1['0'] != 0, np.nan, s1['index'])
s1['index'] = s1['index'].fillna(method='ffill').fillna(0).astype(int)
s1[s1['0'] != 0].groupby('index')['0'].apply(list).tolist()
#[[1], [2, 3, 1], [1]]

【讨论】：

我正在尝试所有的解决方案，如果我有任何问题，我会告诉你，非常感谢！