【问题标题】:Splitting a dataframe with the help of a loop (python)借助循环(python)拆分数据框
【发布时间】:2021-10-13 00:48:27
【问题描述】:

我需要帮助。我想通过一个DF。部分我的用户有 2 个团队(在行中用逗号分隔)。我用逗号将它们分开,并将它们写在新列 Team_1 和 Team_2 中。 如果只有一个团队,则团队名称进入团队 1。

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'name': [
        'abby',
        'bella',
        'coco',
        'deedee',
        'elliot'],
    'email': [
        'a@test.com',
        'b@test.com',
        'c@test.com',
        'd@test.com',
        'e@test.com'],
    'team(s)': [
        'alpha',
        'omega',
        'alpha,omega',
        'beta',
        'beta,omega'
    ]})

df_split_teams = df.join(df['team(s)'].str.split(',', 1, expand=True)).rename(columns={0: 'Team_1', 1:'Team_2'})
if 'None' in df.index:
    df_split_teams['Team_1'] == df_split_teams['team(s)']

我可以使用一个函数来显示可用的各个团队名称:

def get_team_names(df):
    team_names = set(df['team(s)'])
    split_team_names = set()
    for team in team_names:
        for name in team.split(','):
            split_team_names.add(name)
    return split_team_names

但是:现在我想为所有团队设置一个单独的 DF。每队一名 DF。最好的自动化循环。首先我是这样做的:

df_alpha = df_split_teams[(df_split_teams['Team_1'].isin(['alpha'])) | (df_split_teams['Team_2'].isin(['alpha']))]
df_beta = df_split_teams[(df_split_teams['Team_1'].isin(['beta'])) | (df_split_teams['Team_2'].isin(['beta']))]
df_omega = df_split_teams[(df_split_teams['Team_1'].isin(['omega'])) | (df_split_teams['Team_2'].isin(['omega']))]

但不时会添加新团队,或者我们有不同的团队。代码通常应该是有效的。也适用于我同事的其他人。因此,我无法在我的代码中预定义团队名称。

希望您能提供帮助。 问候 贝基里

【问题讨论】:

    标签: python pandas dataframe loops


    【解决方案1】:

    您非常接近解决方案。

    您只需要使用globals() 方法遍历您的函数get_team_names()

    globals() 方法返回当前全局符号表的字典。 (符号表是由编译器维护的数据结构)

    for i in get_team_names(df):
        globals()['df_' + str(i)] = df_split_teams[(df_split_teams['Team_1'].isin([i])) | (df_split_teams['Team_2'].isin([i]))]
    

    输出:

    df_alpha
    Out[29]: 
       name       email      team(s) Team_1 Team_2
    0  abby  a@test.com        alpha  alpha   None
    2  coco  c@test.com  alpha,omega  alpha  omega
    
    df_beta
    Out[30]: 
         name       email     team(s) Team_1 Team_2
    3  deedee  d@test.com        beta   beta   None
    4  elliot  e@test.com  beta,omega   beta  omega
    
    df_omega
    Out[31]: 
         name       email      team(s) Team_1 Team_2
    1   bella  b@test.com        omega  omega   None
    2    coco  c@test.com  alpha,omega  alpha  omega
    4  elliot  e@test.com   beta,omega   beta  omega
    

    【讨论】:

    • 非常感谢!!!!但是现在我有另一个问题:(在我的源代码中,团队名称中有空格。'df_' + str(i) 不能很好地转换。我可以直接把空格去掉吗?或者我应该先把它清理掉?
    • 用下划线替换空格.... globals()['df_' + str(i.replace(" ", "_"))] = .......... ....
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-04-08
    • 1970-01-01
    • 1970-01-01
    • 2020-12-28
    • 2021-09-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多