从具有多个条件的当前 DataFrame 创建一个 DataFrame答案

【问题标题】：Create a DataFrame from present Dataframe with multiple conditions从具有多个条件的当前 DataFrame 创建一个 DataFrame
【发布时间】：2021-09-25 12:15:46
【问题描述】：

我有一个如下所示的数据框。

data = {'Participant':['A', 'B', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'],
    'Total test Result':[1, 4, 4, 4, 4, 2, 2, 3, 3, 3], 
    'result' : ['negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', ], 
    'time': ['2021-06-14', '2021-06-21', '2021-06-24', '2021-06-28', '2021-07-01', '2021-07-05', '2021-07-08', '2021-06-17', '2021-06-17', '2021-06-20'] }
pres_df = pd.DataFrame(data)
pres_df

注意：如果有帮助，“时间”列采用 DateTime 格式。

我想创建一个新的数据框，其中“参与者”的多个值合并为 1 行，并创建多行时间和结果。所需的最终结果如下所示。

任何帮助是极大的赞赏。谢谢。

【问题讨论】：

标签： python python-3.x pandas dataframe data-preprocessing

【解决方案1】：

你可以使用pd.pivot_table:

df.rename(columns={'time':'date'},inplace=True)
df = df.assign(test_res = 'Test' + df.groupby('Participant').cumcount().add(1).astype(str))
df1 = df.pivot_table(index=['Participant','Total test Result'], 
                                      columns=['test_res'],
                                      values=['date','result'],
                                      aggfunc = 'first'
                                      )
df1.columns = df1.columns.map(lambda x: f"{x[1]}_{x[0]}" if ('Test' in x[1]) else x[0])
df1 = df1[sorted(df1.columns)].reset_index()

df1:

【讨论】：

这不会成功，因为对于参与者“A”，对于第一次测试，结果是“阴性”。在这种情况下，它的'NaN'。我认为它随机取值，我不确定。
@shivakumar：抱歉，问题出在df1.columns = sorted(df1.columns)
@shivakumar：实际上我犯了错误。我试图对列进行排序，而是重命名列。这就是为什么你看到 NaN 是 date 列的一部分而不是 result

【解决方案2】：

试试：

x = pres_df.groupby("Participant", as_index=False).agg(
    {"Total test Result": "first", "result": list, "time": list}
)

a = x.pop("result").apply(
    lambda x: pd.Series(
        x, index=[f"test{v}_Result" for v in range(1, len(x) + 1)]
    )
)
b = x.pop("time").apply(
    lambda x: pd.Series(
        x, index=[f"test{v}_date" for v in range(1, len(x) + 1)]
    )
)

out = pd.concat([x, a, b], axis=1).sort_index(axis=1)
print(out)

打印：

  Participant  Total test Result test1_Result  test1_date test2_Result  test2_date test3_Result  test3_date test4_Result  test4_date
0           A                  1     negative  2021-06-14          NaN         NaN          NaN         NaN          NaN         NaN
1           B                  4     negative  2021-06-21     negative  2021-06-24     negative  2021-06-28     negative  2021-07-01
2           C                  2     negative  2021-07-05     negative  2021-07-08          NaN         NaN          NaN         NaN
3           D                  3     negative  2021-06-17     negative  2021-06-17     negative  2021-06-20          NaN         NaN

【讨论】：

工作得很好。非常感谢！。这是 pandas、pop 和 Series 函数的一些用法。