【问题标题】:Comparing date with multiple columns in Pandas将日期与 Pandas 中的多个列进行比较
【发布时间】:2021-03-07 17:01:00
【问题描述】:

我有一个包含 6 列的数据框,我想使用其中包含日期的 5 列(即第一次旅行、第二次旅行、第三次旅行)。从这 5 列中,我想考虑最大日期并将其与给定日期“2020-09-25 00:00:00

进行比较

必须满足以下条件:

  • 如果日期大于输入日期,则很好。如果不是,我们将不得不在名为 RESULT 的新列中写为“Offer Expired”。

,

  Customer Name        FIRST TRAVEL       SECOND TRAVEL        THIRD TRAVEL       FOURTH TRAVEL        FIFTH TRAVEL         RESULT
0         USER1                 NaT 2020-09-02 08:21:59                 NaT                 NaT                 NaT  Offer Expired
1         USER2                 NaT 2014-11-05 15:23:38                 NaT                 NaT                 NaT  Offer Expired
2         USER3                 NaT                 NaT                 NaT                 NaT                 NaT            NaN
3         USER4                 NaT                 NaT                 NaT                 NaT                 NaT            NaN
4         USER5                 NaT                 NaT                 NaT                 NaT                 NaT            NaN
5         USER6                 NaT                 NaT                 NaT                 NaT                 NaT            NaN
6         USER7                 NaT                 NaT                 NaT                 NaT                 NaT            NaN
7         USER8                 NaT                 NaT                 NaT                 NaT                 NaT            NaN
8         USER9                 NaT 2020-09-02 10:07:11                 NaT                 NaT                 NaT  Offer Expired
9        USER10 2020-03-16 00:00:00                 NaT                 NaT                 NaT                 NaT  Offer Expired
10       USER11 2019-12-11 00:00:00                 NaT                 NaT                 NaT                 NaT  Offer Expired
11       USER12 2020-09-26 00:00:00 2020-04-14 00:00:00                 NaT                 NaT                 NaT            NaN
12       USER13 2020-04-20 00:00:00 2019-10-18 00:00:00                 NaT                 NaT                 NaT  Offer Expired
13       USER14 2020-02-21 00:00:00 2020-04-20 00:00:00                 NaT                 NaT                 NaT  Offer Expired
14       USER15 2020-01-17 00:00:00 2019-10-17 00:00:00                 NaT                 NaT                 NaT  Offer Expired
15       USER16                 NaT 2020-04-20 00:00:00                 NaT                 NaT                 NaT  Offer Expired
16       USER17                 NaT 2019-08-24 00:00:00                 NaT                 NaT                 NaT  Offer Expired
17       USER18                 NaT 2019-11-01 00:00:00                 NaT                 NaT                 NaT  Offer Expired
18       USER19                 NaT 2019-09-13 00:00:00                 NaT                 NaT                 NaT  Offer Expired
19       USER20                 NaT 2020-01-13 00:00:00                 NaT                 NaT                 NaT  Offer Expired
20       USER21                 NaT 2019-09-13 00:00:00                 NaT                 NaT                 NaT  Offer Expired
21       USER22                 NaT 2020-04-20 00:00:00                 NaT                 NaT                 NaT  Offer Expired
22       USER23                 NaT 2020-02-12 00:00:00                 NaT                 NaT                 NaT  Offer Expired
23       USER24                 NaT 2019-10-18 00:00:00                 NaT                 NaT                 NaT  Offer Expired
24       USER25 2020-09-06 22:09:22 2020-04-07 00:00:00 2020-08-28 10:17:50 2020-09-04 17:03:20 2020-06-03 19:45:36  Offer Expired
25       USER26 2020-09-06 22:09:22 2020-04-21 00:00:00 2020-08-28 10:17:50 2020-09-04 17:03:20 2020-06-03 19:45:36  Offer Expired
26       USER27                 NaT                 NaT                 NaT 2020-09-04 17:03:20 2020-06-03 19:45:36  Offer Expired
27       USER28                 NaT                 NaT                 NaT 2020-09-04 17:03:20 2020-06-03 19:45:36  Offer Expired
28       USER29 2020-09-06 22:09:22 2020-04-17 00:00:00 2020-08-28 10:17:50 2020-09-04 17:03:20 2020-06-03 19:45:36  Offer Expired
29       USER30 2020-09-06 22:09:22                 NaT                 NaT                 NaT 2020-06-03 19:45:36  Offer Expired
30       USER31 2020-09-06 22:09:22                 NaT                 NaT                 NaT 2020-06-03 19:45:36  Offer Expired
31       USER32 2020-09-06 22:09:22                 NaT                 NaT                 NaT 2020-06-03 19:45:36  Offer Expired
32       USER33 2020-09-06 22:09:22                 NaT                 NaT                 NaT 2020-06-03 19:45:36  Offer Expired
33       USER34 2020-09-06 22:09:22 2020-10-27 00:00:00 2020-08-28 10:17:50 2020-09-04 17:03:20 2020-06-03 19:45:36            NaN
34       USER35 2020-09-06 22:09:22 2019-06-18 00:00:00 2020-08-28 10:17:50 2020-09-04 17:03:20 2020-06-03 19:45:36  Offer Expired
35       USER36 2020-09-06 22:09:22 2020-04-15 00:00:00 2020-08-28 10:17:50 2020-09-04 17:03:20 2020-06-03 19:45:36  Offer Expired
36       USER37 2020-09-06 22:09:22 2020-09-04 15:29:45 2020-08-28 10:17:50 2020-09-04 17:03:20 2020-06-03 19:45:36  Offer Expired
37       USER38 2020-09-06 22:09:22                 NaT                 NaT 2020-09-25 17:03:20 2020-06-03 19:45:36            NaN
38       USER39                 NaT                 NaT                 NaT 2020-09-04 17:03:20 2020-06-03 19:45:36  Offer Expired

注意:这在 Excel 中更简单,我们可以使用以下公式。但是,我找不到这样做的 Pandas 方法。

=IF(COUNTBLANK($B2:$F2)=5,"", IF(MAX($B2:$F2)>$H$1,"","Offer Expired"))

感谢任何帮助。

【问题讨论】:

  • 我将首先应用 pandas melt 函数来获取一个只有两列“用户”和“旅行日期”的新数据框,请参阅以下Pandas Melt with Multiple Value Vars 的方法。然后,您可以按日期排序并按用户分组,轻松找到每个用户的旅行日期,并确定他们是否属于您的时间范围。

标签: python-3.x pandas dataframe


【解决方案1】:

这可以解决问题(cmets 内联)

import numpy as np
import pandas as pd

# I'm assuming all the relevant columns are already converted:

dt = pd.to_datetime("2020-09-25 00:00:00") 

# you need to indicate somehow the columns to compare - using regex on column names:

dftravels = df.filter(regex=".* TRAVEL$", axis=1)

# NaT and any logical 2 argument operation on it always will evaluate to False
# hence you check only ones where there's not a single date after dt and where there's at least one date overall

df["Result"] = np.where(~dftravels.gt(dt).any(axis=1) & dftravels.any(axis=1), "Offer expired", "")

【讨论】:

  • 您好,感谢您的帮助。是否可以在没有正则表达式的情况下使用,因为我的其他列中几乎没有包含 TRAVEL 这个词,因此我不想使用正则表达式。
  • 是的 - 只需替换分配行:dftravels = df[[<column names for columns with dates>]]
【解决方案2】:
import pandas as pd

# Date to compare with
my_date = pd.to_datetime("2020-09-25 00:00:00")

# Columns to search in
columns = [
  "FIRST TRAVEL", "SECOND TRAVEL", "THIRD TRAVEL",
  "FOURTH TRAVEL", "FIFTH TRAVEL"
]

# Function to find if the offer expired
def offer_expired(row):
  # Returns True if the offer expires
  date_found = False
  expired = True
  for column in columns:
    # Valid date
    if not pandas.isnull(row[column]):
      date_found = True
      # Expired and date newer than given date
      if row[column] >= my_date and expired:
        expired = False
  return expired and date_found

df["RESULT"] = df.apply(lambda row: "Offer Expired" if offer_expired(row) else "", axis=1)

【讨论】:

    【解决方案3】:

    翻译 excel 表达式的最佳方法是使用np.where():

    given_date = pd.to_datetime("2020-09-25 00:00:00")
    columns = df.columns.str.endswith('TRAVEL') # select as per you want you just need a list of columns to go with
    df.RESULT = np.where(df.loc[:,columns].max(axis = 1) < given_date, 'Offer Valid', 'Offer Expired')
    

    还尝试从左到右阅读表达式:如果 df 的选定列的日期按行最大值小于给定日期,则报价有效,否则报价已过期。

    它完全可以转化为您的问题! 1 行,功能齐全且高度直观!熊猫很有趣!

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-04-24
      • 1970-01-01
      • 2018-08-25
      • 1970-01-01
      • 2022-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多