【问题标题】:Iterate through each category in column and add values from another column as a separate df遍历列中的每个类别并将另一列中的值添加为单独的 df
【发布时间】:2021-08-12 05:11:31
【问题描述】:

我想为Customer_Acquisition_Channel 列中的每个类别添加Days_To_Acquisition 列中的所有值到单独的df。

所有 Customer_ID 值在下面的数据集中都是唯一的

DF

Customer_ID Customer_Acquisition_Channel  Days_To_Acquisition
323         Organic                       2
583         Organic                       5
838         Organic                       2
193         Website                       7
241         Website                       7
642         Website                       1

期望的输出: Days_To_Acq_Organic_Df

Index Days_To_Acquisition
0     2
1     5
2     2

Days_To_Acq_Website_Df

Index Days_To_Acquisition
0     7
1     7
2     1

这是我迄今为止尝试过的,但我想使用 for 循环而不是手动遍历每一列

sub_1 = df.loc[df['Customer_Acquisition_Channel'] == 'Organic']
Days_To_Acq_Organic_Df=sub_1[['Days_To_Acquisition']]

sub_2 = df.loc[df['Customer_Acquisition_Channel'] == 'Website']
Days_To_Acq_Website_Df=sub_2[['Days_To_Acquisition']]

【问题讨论】:

  • 我不确定您要对结果做什么,但我认为 pd.groupby 可以提供帮助并与 aggapply 结合使用或不使用 lambda 函数或即使有一个列表理解也可以帮助您从所需的结果中获得更多。
  • df_dict = {f'Days_To_Acquisition_{g}_df':k.drop('Customer_Acquisition_Channel', 1) for g,k in df.groupby('Customer_Acquisition_Channel')} ??

标签: python pandas


【解决方案1】:

您可以遍历通道列的唯一值并创建新数据框、更改列名并将它们附加到列表中:

dataframes = []
for channel in df.Customer_Acquisition_Channel.unique():
    new_df = df[df['Customer_Acquisition_Channel'] == channel][['Customer_ID','Days_To_Acquisition']]
    new_df.columns = ['Customer_ID',f'Days_To_Acquisition_{channel}_df']
    dataframes.append(new_df)

输出:

for df in dataframes:
    print(df,'\n__________')

   Customer_ID  Days_To_Acquisition_Organic_df
0          323                               2
1          583                               5
2          838                               2 
__________
   Customer_ID  Days_To_Acquisition_Website_df
3          193                               7
4          241                               7
5          642                               1 
__________

或者,您可以将数据框存储到字典中,以便您可以命名它们并单独调用它们:

dataframes = {}
for channel in df.Customer_Acquisition_Channel.unique():
    new_df = df[df['Customer_Acquisition_Channel'] == channel][['Customer_ID','Days_To_Acquisition']]
    new_df.columns = ['Customer_ID',f'Days_To_Acquisition_{channel}']
    dataframes[f'Days_To_Acquisition_{channel}_df'] = new_df

输出:

print(dataframes['Days_To_Acquisition_Organic_df'])

   Customer_ID  Days_To_Acquisition_Organic
0          323                            2
1          583                            5
2          838                            2

【讨论】:

  • 我怎样才能只打印 df 的 - 让我们说第一个?打印例如“Days_To_Acquisition_Organic_df”时出现以下错误->错误:未定义名称
  • 查看我修改后的答案
猜你喜欢
  • 1970-01-01
  • 2021-10-06
  • 1970-01-01
  • 2019-04-13
  • 2021-02-13
  • 2023-04-10
  • 1970-01-01
  • 1970-01-01
  • 2021-09-20
相关资源
最近更新 更多