使用 groupby 作为用户 ID 并组合字符串答案

【问题标题】：Using groupby for user id and combining strings使用 groupby 作为用户 ID 并组合字符串
【发布时间】：2019-08-03 07:34:03
【问题描述】：

我在预处理数据时遇到问题。我的数据看起来像

我想按一个表示名为 Account Number 的用户的字段进行分组，并且我想创建一个新字段，它是每个 Account Number 的 Customer Event Type 的所有值的串联。

我试过了：

df_by_accnum = df.groupby('Account Number')[['Customer Event Type']].agg(','.join).reset_index()

但它会导致连接所有列名而不是 hte 值 (https://i.imgur.com/VR5JjC3.png)

我能帮我解决这个问题吗？谢谢

【问题讨论】：

我刚试过这个，它不起作用，我得到：KeyError: "Columns not found: 'Customer', 'Event', 'Type'"
我的错，我以为你有三个字段！底线是您需要提供minimal reproducible example。您至少可以包含df 的内容示例吗？
我有31个变量，其中关心的只有2个：账号和客户事件类型，都是字符串。

标签： python pandas dataframe group-by preprocessor

【解决方案1】：

通过尝试和错误，我发现这可能是由于列客户事件类型的类型。我猜它可能包含 join 假定的非字符串值，并且出于某种原因使用了列名。

尝试创建一个将客户事件类型转换为字符串的新列并使用该列：

>>> d = {'Account Number': [1, 2, 3, 1], 'Customer Event Type': [1, 1, 2, 2]}
>>> df = pd.DataFrame(data=d)
>>> df['Customer Event Type str'] = df['Customer Event Type'].astype(str)
>>> df.groupby('Account Number')[['Customer Event Type str']].agg(','.join).reset_index()
   Account Number Customer Event Type str
0               1                     1,2
1               2                       1
2               3                       2

而使用数字列会产生：

>>> df.groupby('Account Number')[['Customer Event Type']].agg(','.join).reset_index()
   Account Number                                Customer Event Type
0               1  Account Number,Customer Event Type,Customer Ev...
1               2  Account Number,Customer Event Type,Customer Ev...
2               3  Account Number,Customer Event Type,Customer Ev...

希望对您有所帮助。如果您找出这种行为的原因，请告诉我们。谢谢！

【讨论】：