如何创建一个列来衡量另一个字符串列中退出的项目数？答案

【问题标题】：How to create a column that measures the number of items that exits in another string column?如何创建一个列来衡量另一个字符串列中退出的项目数？
【发布时间】：2022-01-30 18:51:21
【问题描述】：

我有包含员工及其级别的数据框。

import pandas as pd
d = {'employees': ["John", "Jamie", "Ann", "Jane", "Kim", "Steve"],  'Level': ["A/Ba", "C/A", "A", "C", "Ba/C", "D"]}
df = pd.DataFrame(data=d)

如何添加一个新列来衡量具有相同级别的员工数量。例如，John 将有 3 个，因为有 2 个 A（Jamie 和 Ann）和另一个 Ba（Kim）。请注意，在这种情况下，John 级别的员工不计入该计数。

我的目标是最终的数据框是这样的。

【问题讨论】：

标签： python pandas dataframe group-by pandas-groupby

【解决方案1】：

试试这个：

df['Number of levels'] = df['Level'].str.split('/').explode().map(df['Level'].str.split('/').explode().value_counts()).sub(1).groupby(level=0).sum()

输出：

>>> df
  employees Level  Number of levels
0      John  A/Ba                 3
1     Jamie   C/A                 4
2       Ann     A                 2
3      Jane     C                 2
4       Kim  Ba/C                 3
5     Steve     D                 0

【讨论】：

【解决方案2】：

exploded = df.Level.str.split("/").explode()
counts = exploded.groupby(exploded).transform("count").sub(1)
df["Num Levels"] = counts.groupby(level=0).sum()

我们首先通过拆分“/”来分解“级别”列，以便我们可以到达每个级别：

>>> exploded = df.Level.str.split("/").explode()
>>> exploded

0     A
0    Ba
1     C
1     A
2     A
3     C
4    Ba
4     C
5     D
Name: Level, dtype: object

我们现在需要这个系列中每个元素的计数，所以我们自己分组并按计数进行转换：

>>> exploded.groupby(exploded).transform("count")
0    3
0    2
1    3
1    3
2    3
3    3
4    2
4    3
5    1
Name: Level, dtype: int64

由于它自己计算元素但你看其他地方，我们减 1 得到counts：

>>> counts = exploded.groupby(exploded).transform("count").sub(1)
>>> counts
0    2
0    1
1    2
1    2
2    2
3    2
4    1
4    2
5    0
Name: Level, dtype: int64

现在，我们需要“回来”，而索引是我们的帮手；我们按它分组（level=0 表示）并将其计数相加：

>>> counts.groupby(level=0).sum()
0    3
1    4
2    2
3    2
4    3
5    0
Name: Level, dtype: int64

这是最终结果，分配给df["Num Levels"]。

得到

  employees Level  Num Levels
0      John  A/Ba           3
1     Jamie   C/A           4
2       Ann     A           2
3      Jane     C           2
4       Kim  Ba/C           3
5     Steve     D           0

这在“1 行”中都是可写的，但它可能会妨碍可读性和进一步的调试！

df["Num Levels"] = (df.Level
                      .str.split("/")
                      .explode()
                      .pipe(lambda ex: ex.groupby(ex))
                      .transform("count")
                      .sub(1)
                      .groupby(level=0)
                      .sum())

【讨论】：