“如何从 Pandas Dataframe 的单列中提取定期数据”答案

【问题标题】："How to extract periodical data from a single column of the Pandas Dataframe"“如何从 Pandas Dataframe 的单列中提取定期数据”
【发布时间】：2019-05-17 16:59:07
【问题描述】：

我有一个 161941 行 × 76 列的大数据 CSV 文件，我从中提取了 161941 行 × 3 列的有用数据。

现在我的数据框看起来是这样的

Extracted Dataframme of size 161941 rows × 3 columns

“bKLR_Touchauswertung”列是周期数据，看这个表格

"bKLR_Touchauswertung"
7
7
10
10
10
10
10
7
7
0
0
0
0
0
0
0
0
0
0
7
7
10
10
10
10
10
10
7
7
0
0
0
0
0
0
0
0
7
7
10
10
10
10
10
7
7
0
0
0
0
0
0

它会一直重复到最后

我想从中得到什么。

列中的每组非零值都应作为新列添加到数据框中。

可以说，第一组非零值应作为新列“set1”，依此类推..

如果我能找到任何可能的解决方案，那就太好了。谢谢，阿比奈

以下是初始数据帧和预期数据帧的更详细示例：

这是我下面的数据框

               temp     toucha
Timestamp      

**185            83         7
191            83         7
197            83         10
.              .          .
.              .          .
.              .          .
2051           83         10**

2057           83         0
2063           83         0
2057           83         0
.              .          .
.              .          .
.              .          .
3000           83         0

**3006           83         7
3012           83         7
3018           83         10
.              .          .
.              .          .
.              .          .
6000           83         10**

6006           83         0
6012           83         0
6018           83         0
.              .          .
.              .          .
.              .          .
8000           83         0

这个序列还在继续，

现在，我需要一个看起来像这样的数据框

                temp     toucha  set1   set2    ste3.............
Timestamp      

**185            83         7     7      0
191            83         7      7      0
197            83         10     10     0 
.              .          .      .      .
.              .          .      .      .
.              .          .      .      .
2051           83         10     10     0**

2057           83         0      0      0
2063           83         0      0      0
2057           83         0      0      0
.              .          .      .      .
.              .          .      .      .
.              .          .      .      .
3000           83         0      0      0

**3006           83         7     0      7
3012           83         7      0      7
3018           83         10     0      10
.              .          .      .      .
.              .          .      .      .
.              .          .      .      .
6000           83         10     0      10**

6006           83         0      0      0
6012           83         0      0      0
6018           83         0      0      0
.              .          .      .      .
.              .          .      .      .
.              .          .      .      . 
8000           83         0      0      0

【问题讨论】：

请以正确的格式给出预期的输出和输入数据帧的最小示例。
所以例如所有7s 得到一个set1，所有10s 得到一个set2，这就是你的意思吗？
我的头疼...我只是无法解读你的问题。你能再解释一下吗？尝试显示您想要的输出。
欢迎来到 StackOverflow！到目前为止，这个问题并不是我从新用户那里看到的最糟糕的问题，但是这个网站有你没有遵守的规则。因此，您真的应该阅读How to Ask 以了解如何正确提出有助于获得答案的问题。具体来说，你没有展示你当前的研究，你已经发布了一个图片的链接，当你可以直接在问题中进行测试并且你没有描述预期的结果时。您应该尝试改进这个问题（在阅读How to Ask...之后）
我想知道为什么我的附件在这里看不到，我也尝试编辑帖子，我可以看到我的问题正文和附件。

标签： python pandas dataframe

【解决方案1】：

如果您可以接受 setxx 列的编号不一定是连续的，则可以使用 shift 来检测 0 和非 0 值之间的变化，然后 np.split 拆分这些变化的数据帧索引。

完成此操作后，很容易为每个序列添加一个 0 的新列并复制其中的原始值。但是由于np.split，使用简单的连续索引更容易。所以代码可能是：

# use a simple consecutive index
df.reset_index(inplace=True)

# split the indices on transition between null and non null values
subs = np.split(df.index.values,
                df[((df.toucha == 0)&(df.toucha.shift() != 0)
                     |(df.toucha != 0)&(df.toucha.shift() == 0))
                    ].index.values)

# process those sequences
for i, a in enumerate(subs):
    # ignore empty or 0 value sequences
    if len(a) == 0: continue
    if df.toucha[a[0]] == 0: continue
    df['set'+str(i)] = 0    # initialize a new column with 0
    df.loc[a, 'set'+str(i)] = df.toucha.loc[a]  # and copy values

# set the index back
df.set_index('Timestamp', inplace=True)

使用以下示例数据

           temp  toucha
Timestamp              
185          83       7
191          83       7
197          83      10
2051         83      10
2057         83       0
2063         83       0
2057         83       0
3000         83       0
3006         83       7
3012         83       7
3018         83      10
6000         83      10
6006         83       0
6012         83       0
6018         83       0
8000         83       0

它给出：

           temp  toucha  set0  set2
Timestamp                          
185          83       7     7     0
191          83       7     7     0
197          83      10    10     0
2051         83      10    10     0
2057         83       0     0     0
2063         83       0     0     0
2057         83       0     0     0
3000         83       0     0     0
3006         83       7     0     7
3012         83       7     0     7
3018         83      10     0    10
6000         83      10     0    10
6006         83       0     0     0
6012         83       0     0     0
6018         83       0     0     0
8000         83       0     0     0

【讨论】：

非常感谢，非常感谢您的帮助。我几乎有了我想要的最终解决方案。
@Abhinayb：我可以读到您现在已删除 edit in answer 并用它更新我的答案

【解决方案2】：

# use a simple consecutive index
df.reset_index(inplace=True)

# split the indices on transition between null and non null values
subs = np.split(df.index.values,
            df[((df.toucha == 0)&(df.toucha.shift() != 0)
                 |(df.toucha != 0)&(df.toucha.shift() == 0))
                ].index.values)

# process those sequences
for i, a in enumerate(subs):
    # ignore empty or 0 value sequences
    if len(a) == 0: continue
    if df.toucha[a[0]] == 0: continue
    df['set'+str(i)] = 0    # initialize a new column with 0
    df.loc[a, 'set'+str(i)] = df.toucha.loc[a]  # and copy values

# set the index back
df.set_index('Timestamp', inplace=True)

【讨论】：