【问题标题】:Convert wide dataframe to long dataframe with specific conditions and addition of new columns将宽数据帧转换为具有特定条件并添加新列的长数据帧
【发布时间】:2021-11-26 01:49:09
【问题描述】:

我有一个示例数据框,如下所示。

import pandas as pd
import numpy as np

NaN = np.nan
data = {'ID':['A','A','A','A','A','A','A','A','A','C','C','C','C','C','C','C','C'],
    'Week': ['Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2','Week3',
             'Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2'],
    'Risk':['High','','','','','','','','','High','','','','','','',''],
    'Testing':[NaN,'Pos',NaN,'Neg',NaN,NaN,NaN,NaN,'Pos', NaN, 
              NaN,NaN,'Negative',NaN,NaN,NaN,'Positive'],
    'Week1_adher':['Yes',NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,'No',NaN,NaN,NaN,NaN,NaN,NaN,NaN],
    'Week2_adher':['No',NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,'No',NaN,NaN,NaN,NaN,NaN,NaN,NaN],
    'Week3_adher':['No',NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,'No',NaN,NaN,NaN,NaN,NaN,NaN,NaN]}
    
df1 = pd.DataFrame(data)
df1 

最终的数据框必须使得每个参与者的行数必须与周数一样多。将周列转换为行后,它应该有其对应的值。

此外,每个参与者每周在“测试”列中的 notna 值的数量应添加到“#of test”值中。

最终的数据框应该如下图所示。

【问题讨论】:

    标签: python-3.x pandas dataframe data-science data-preprocessing


    【解决方案1】:

    通过创建两个新列来预处理您的数据框,然后按 IDWeek 分组,最后聚合新列:

    df1['SurveyAdherence'] = df1.filter(regex=r'Week\d+_adher').eq('Yes').any(axis=1)
    df1['#Tests'] = df1['Testing'].notna()
    
    mi = pd.MultiIndex.from_product([df1['ID'].unique(), df1['Week'].unique()],
                                    names=['ID', 'Week'])
    
    out = df1.groupby(['ID', 'Week']) \
             .agg({'SurveyAdherence': 'max', '#Tests': 'sum'}) \
    
    out = out.reindex(mi) \
             .fillna({'SurveyAdherence': False, '#Tests': 0}) \
             .astype({'SurveyAdherence': bool, '#Tests': int}) \
             .reset_index()
    

    输出:

    >>> df1
      ID   Week  SurveyAdherence  #Tests
    0  A  Week1             True       2
    1  A  Week2            False       0
    2  A  Week3            False       1
    3  C  Week1            False       1
    4  C  Week2            False       1
    5  C  Week3            False       0
    

    【讨论】:

    • 感谢优雅的解决方案。如果我还需要图像中显示的最后一行怎么办。对于 ID 'C',第 3 周,此处未显示。
    猜你喜欢
    • 2021-12-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-09-19
    • 2015-10-20
    • 1970-01-01
    相关资源
    最近更新 更多