如何从包含一系列值的列创建新的递增值列？答案

【问题标题】：How do I create a new column of incremented values from a column containing a range of values?如何从包含一系列值的列创建新的递增值列？
【发布时间】：2018-02-28 15:01:26
【问题描述】：

我对 Pandas/Python 很陌生，所以如果这很直接，我深表歉意。我正在做一个工作项目，需要一些帮助。

我有一些数据描述了存储箱中血样的位置。我目前在名为“位置”的列中以“1_5”的形式拥有一定数量的样本所占据的位置范围，这意味着这些样本占据了存储箱中的位置 1、2、3、4 和 5。

[此处显示的数据框][1] [1]：https://i.stack.imgur.com/DMhZm.jpg.

我想要的是在提供的范围内的每个样本都有自己的单独位置编号。 所以不是当前看起来像这样的数据：

患者 - 框 - 位置

患者 1 - 盒子 1 - 97_100

患者 1 - 盒子 2 - 30_32

我希望它看起来像这样：

患者 - 框 - 位置

患者 1 - 框 1 - 97

患者 1 - 盒子 1 - 98

患者 1 - 盒子 1 - 99

患者 1 - 盒子 1 - 100

患者 1 - 盒子 2 - 30

病人 1 - 盒子 2 - 31

病人 1 - 盒子 2 - 32

有人知道解决这个问题的方法吗？

谢谢

【问题讨论】：

标签： python pandas dataframe jupyter-notebook

【解决方案1】：

用途：

df['Position'] = (df.groupby('Position').cumcount() + 
                 df['Position'].str.split('_').str[0].astype(int))
print (df)
     Patient    Box  Position
0  patient 1  box 1        97
1  patient 1  box 1        98
2  patient 1  box 1        99
3  patient 1  box 1       100
4  patient 1  box 2        30
5  patient 1  box 2        31
6  patient 1  box 2        32

详情：

通过GroupBy.cumcount获取每个组的计数：

print (df.groupby('Position').cumcount())
0    0
1    1
2    2
3    3
4    0
5    1
6    2
dtype: int64

并在_ 转换为integer 之前添加提取的列Position 的第一个值：

print (df['Position'].str.split('_').str[0].astype(int))
0    97
1    97
2    97
3    97
4    30
5    30
6    30
Name: Position, dtype: int32

【讨论】：