【问题标题】:Pandas fill column based on date and time data of entire framePandas 根据整个帧的日期和时间数据填充列
【发布时间】:2022-01-12 13:47:12
【问题描述】:

我有一个 python 数据框,例如:

Timestamp (UTC) App Status Start Time End Time
11/18/2021 17:13:01 App 1 passing 17:13:01
11/18/2021 17:07:28 App 1 failing 17:07:28
11/18/2021 16:31:11 App 1 failing 16:31:11
11/18/2021 16:15:22 App 1 passing 16:15:22
11/18/2021 16:07:51 App 1 failing 16:07:51
11/22/2021 13:56:18 App 2 passing 13:56:18
11/22/2021 03:43:33 App 2 failing 03:43:33
11/22/2021 02:48:06 App 2 failing 02:48:06
11/19/2021 10:30:21 App 3 passing 10:30:21
11/17/2021 13:42:11 App 3 failing 13:42:11

这是一个数据样本,我将使用的数据看起来相同,只是记录更多。 我需要它来计算每个应用程序从第一个失败事件到第一个通过事件的停机时间。如果有多个通过状态,我需要它以时间格式计算单个停机时间序列和应用程序的总停机时间,并将这些值放在不同的列中。

我正在使用 Pandas 进行 csv 操作。

所以最终的 Data Frame 看起来像。

Timestamp (UTC) App Status Start Time End Time Downtime Downtime Minutes
11/18/2021 17:13:01 App 1 passing 17:13:01 41:50 49:21
11/18/2021 17:07:28 App 1 failing 17:07:28 41:50 49:21
11/18/2021 16:31:11 App 1 failing 16:31:11 41:50 49:21
11/18/2021 16:15:22 App 1 passing 16:15:22 07:31 49:21
11/18/2021 16:07:51 App 1 failing 16:07:51 07:31 49:21
11/22/2021 13:56:18 App 2 passing 13:56:18 11:08:12 668.12
11/22/2021 03:43:33 App 2 failing 03:43:33 11:08:12 668.12
11/22/2021 02:48:06 App 2 failing 02:48:06 11:08:12 668.12
11/19/2021 10:30:21 App 3 passing 10:30:21 44:48:10 2688.10
11/17/2021 13:42:11 App 3 failing 13:42:11 44:48:10 2688.10

任何帮助将不胜感激。

我知道这些表格不容易阅读,但我必须在 Stack Overflow 发布之前将其格式化为代码

这里是示例 df 的代码


import pandas as pd

data = {'TimeStamp': ['11/18/2021 17:13:01','11/18/2021 17:07:28','11/18/2021 16:31:11','11/18/2021 16:15:22',
              '11/18/2021 16:07:51','11/22/2021 13:56:18','11/22/2021 03:43:33','11/22/2021 02:48:06',
                      '11/19/2021 10:30:21','11/17/2021 13:42:11'],
'App': ['App1','App1','App1','App1','App1','App2','App2','App2','App3','App3'],
'Status': ['Passing','Failing','Failing','Passing','Failing','Passing','Failing','Failing','Passing','Failing']}

df = pd.DataFrame(data)

print(df)

【问题讨论】:

    标签: python pandas date time


    【解决方案1】:

    你可以这样做:

    # Setup
    df = pd.DataFrame(data).sort_values(
        by=["App", "TimeStamp", "Status"], ignore_index=True
    )
    df["TimeStamp"] = pd.to_datetime(df["TimeStamp"])
    
    # Calculate difference between rows and deal with first one, convert values
    df["Downtime"] = df["TimeStamp"].diff().fillna(method="bfill").dt.total_seconds()
    
    # Iterate to deal with change of sequences
    df["group"] = 0
    for i in df.index:
        if i == 0:
            df.loc[i, "Downtime"] = 0
            continue
        if df.loc[i - 1, "Status"] == "Passing":
            df.loc[i, "Downtime"] = 0
            df.loc[i:, "group"] += 1
    
    # Add cumulative sums by app
    cum_sums = df.groupby(["App"]).sum()
    for app in df["App"].unique():
        df.loc[df["App"] == app, "Total Downtime"] = cum_sums.loc[app, "Downtime"]
    
    # Add cumulative sums by group
    cum_sums = df.groupby(["group"]).sum()
    for group in df["group"].unique():
        df.loc[df["group"] == group, "Downtime"] = cum_sums.loc[group, "Downtime"]
    
    # Cleanup
    df = df.drop(columns="group")
    df["Downtime"] = df["Downtime"].apply(
        lambda x: f"{int(x // 3600):02}:{int((x % 3600) // 60):02}:{int(x % 60):02}"
    )
    df["Total Downtime"] = df["Total Downtime"].apply(
        lambda x: f"{int(x // 3600):02}:{int((x % 3600) // 60):02}:{int(x % 60):02}"
    )
    
    print(df)
    # Outputs
                TimeStamp   App   Status  Downtime Total Downtime
    0 2021-11-18 16:07:51  App1  Failing  00:07:31       00:49:21
    1 2021-11-18 16:15:22  App1  Passing  00:07:31       00:49:21
    2 2021-11-18 16:31:11  App1  Failing  00:41:50       00:49:21
    3 2021-11-18 17:07:28  App1  Failing  00:41:50       00:49:21
    4 2021-11-18 17:13:01  App1  Passing  00:41:50       00:49:21
    5 2021-11-22 02:48:06  App2  Failing  11:08:12       11:08:12
    6 2021-11-22 03:43:33  App2  Failing  11:08:12       11:08:12
    7 2021-11-22 13:56:18  App2  Passing  11:08:12       11:08:12
    8 2021-11-17 13:42:11  App3  Failing  44:48:10       44:48:10
    9 2021-11-19 10:30:21  App3  Passing  44:48:10       44:48:10
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-04-06
      • 1970-01-01
      • 2020-09-23
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多