【问题标题】:Comparing One Column against Multiple比较一列与多列
【发布时间】:2021-03-09 13:04:43
【问题描述】:

解释这个有点复杂(请参阅下面的示例表以供参考)。

我有一个带有“接收日期”列(日期时间)的数据框

我想将“收到日期”与阶段列中的日期进行比较,看看是准时还是迟到。 我遇到的问题是每个文档对应一个不同的阶段,例如,文件 26 可能有一个阶段 4 日期,而文件 28 可能是阶段 1。

如何让 Python 搜索正确的阶段列,然后与收到的日期进行比较?

Filename Date Received  Stage 1 Expected  Stage 2 Expected  Stage 3 Expected  Stage 4 Expected
File 1   01/01/2021     15/12/2020        NaN               NaN               NaN
File 2   01/01/2021     NaN               05/01/2021        NaN               NaN

【问题讨论】:

  • 两个阶段能否同时拥有特定文件的数据?
  • 不,每个文件只对应一个阶段
  • 比较时是否需要知道它属于哪个阶段?
  • 理想情况下会很有用

标签: python pandas dataframe multiple-columns


【解决方案1】:

如果你融合你的数据框来比较列会更好。

df1 = pd.melt(df,id_vars=['Filename','Date_Received'],var_name='Expected',value_name='Date')

#df1[['Date_Received','Date']] = df1[['Date_Received','Date']].apply(pd.to_datetime)

print(df1)

  Filename Date_Received          Expected       Date
0   File_1    2021-01-01  Stage_1_Expected 2020-12-15
1   File_2    2021-01-01  Stage_1_Expected        NaT
2   File_1    2021-01-01  Stage_2_Expected        NaT
3   File_2    2021-01-01  Stage_2_Expected 2021-05-01
4   File_1    2021-01-01  Stage_3_Expected        NaT
5   File_2    2021-01-01  Stage_3_Expected        NaT
6   File_1    2021-01-01  Stage_4_Expected        NaT
7   File_2    2021-01-01  Stage_4_Expected        NaT

df1.loc[df1['Date'].isna(),'Status'] = 'Not Received'
df1.loc[df1['Date'] >= df1['Date_Received'], 'Status'] = 'On Time'
df1['Status'] = df1['Status'].fillna('Late')

print(df1)

 Filename Date_Received          Expected       Date        Status
0   File_1    2021-01-01  Stage_1_Expected 2020-12-15          Late
1   File_2    2021-01-01  Stage_1_Expected        NaT  Not Received
2   File_1    2021-01-01  Stage_2_Expected        NaT  Not Received
3   File_2    2021-01-01  Stage_2_Expected 2021-05-01       On Time
4   File_1    2021-01-01  Stage_3_Expected        NaT  Not Received
5   File_2    2021-01-01  Stage_3_Expected        NaT  Not Received
6   File_1    2021-01-01  Stage_4_Expected        NaT  Not Received
7   File_2    2021-01-01  Stage_4_Expected        NaT  Not Received

【讨论】:

  • 就是这样!非常感谢您的帮助
  • @NICode 比较日期时间数据很有趣,编码愉快!
【解决方案2】:

您可以将meltdropna() 一起使用:

df2 = df.melt(['Filename','Date Received']).dropna()

df2 = df2.reset_index(drop=True).rename({'variable':'Stage','value':'Date'},axis='columns')

输出:

>>> df2
  Filename Date Received             Stage        Date
0   File 1    01/01/2021  Stage 1 Expected  15/12/2020
1   File 2    01/01/2021  Stage 2 Expected  05/01/2021

虽然原始数据仍保存在df

现在比较:

df2['Date']=pd.to_datetime(df2['Date'], format='%d/%m/%Y')
df2['Date Received']=pd.to_datetime(df2['Date Received'], format='%d/%m/%Y')

df2['Status']=(df2['Date Received']>df2['Date']).map({False:'On-Time',True:'Late'})

比较的输出:

>>> df2
  Filename Date Received             Stage       Date   Status
0   File 1    2021-01-01  Stage 1 Expected 2020-12-15     Late
1   File 2    2021-01-01  Stage 2 Expected 2021-01-05  On-Time

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-07-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-09-18
    • 2021-04-13
    • 1970-01-01
    相关资源
    最近更新 更多