【问题标题】:Count number of times an element in column appears over timestamp计算列中元素出现在时间戳上的次数
【发布时间】:2019-09-17 20:46:16
【问题描述】:

对于 Pandas DataFrame 中的给定行,我需要计算当前列值的次数,例如“destination_address_IP”在过去(例如)2 秒内使用“Time_stamp”列发生,并将值放入新列“count”。

【问题讨论】:

标签: pandas dataframe time count


【解决方案1】:

您可以通过重复移动数据框来执行以下操作。基本假设是,数据帧按时间戳列排序:

# define the threshold in milliseconds (2 seconds)
time_threshold= 2000000
df['ip_count']=1
df_shifted= df
# loop over the dataframe and shift it by one row
# until the time_threshold is violated for all rows
while True:
    # shift the copy of the dataframe
    df_shifted= df_shifted.shift(1)
    # check if time range is ok
    ser_time_diff= (df['Time_stamp'] - df_shifted['Time_stamp'])
    ser_in_time= ser_time_diff.dt.microseconds + ser_time_diff.dt.seconds * 1000000 < time_threshold
    if ser_in_time.any():
        # there are still rows left, where the shifted
        # frame's timestamp lies within the threshold
        # so we need to count the matches for those rows
        # if there are any
        ser_match= ser_in_time & (df['destination_address_IP'] == df_shifted['destination_address_IP'])
        df['ip_count']+= ser_match.astype('int')
    else:
        # none of the rows of the shifted df was within
        # the threshold of the original df
        # so further shifts will not change the result
        # anymore
        break

df

我的 testdata 的结果如下所示:

                   Time_stamp destination_address_IP  ip_count
0  2019-09-17 19:20:45.093209          157.111.73.31         1
1  2019-09-17 19:20:45.297932              127.0.0.1         1
2  2019-09-17 19:20:45.750725          157.111.73.31         2
3  2019-09-17 19:20:46.787009          192.168.21.15         1
4  2019-09-17 19:20:47.601051           52.18.181.18         1
5  2019-09-17 19:20:47.863428           52.18.181.17         1
6  2019-09-17 19:20:48.418591           52.18.181.18         2
7  2019-09-17 19:20:48.596764           52.18.181.17         2
8  2019-09-17 19:20:49.057553          192.168.21.15         1
9  2019-09-17 19:20:49.153256          192.168.21.15         2
10 2019-09-17 19:20:49.712312              127.0.0.1         1
11 2019-09-17 19:20:50.000119           52.18.181.17         2
12 2019-09-17 19:20:50.248562           52.18.181.18         2
13 2019-09-17 19:20:50.603783           52.18.181.18         2
14 2019-09-17 19:20:50.921631           52.18.181.17         2
15 2019-09-17 19:20:51.366193           52.18.181.18         3
16 2019-09-17 19:20:51.528611           52.18.181.18         4
17 2019-09-17 19:20:51.773429           131.53.97.59         1
18 2019-09-17 19:20:52.618215          192.168.21.15         1
19 2019-09-17 19:20:52.936181           52.18.181.18         3

它是根据这些数据生成的:

import io
import pandas as pd

raw=\
"""Time_stamp                   destination_address_IP
2019-09-17T19:20:45.093209   157.111.73.31
2019-09-17T19:20:45.297932   127.0.0.1
2019-09-17T19:20:45.750725   157.111.73.31
2019-09-17T19:20:46.787009   192.168.21.15
2019-09-17T19:20:47.601051   52.18.181.18
2019-09-17T19:20:47.863428   52.18.181.17
2019-09-17T19:20:48.418591   52.18.181.18
2019-09-17T19:20:48.596764   52.18.181.17
2019-09-17T19:20:49.057553   192.168.21.15
2019-09-17T19:20:49.153256   192.168.21.15
2019-09-17T19:20:49.712312   127.0.0.1
2019-09-17T19:20:50.000119   52.18.181.17
2019-09-17T19:20:50.248562   52.18.181.18
2019-09-17T19:20:50.603783   52.18.181.18
2019-09-17T19:20:50.921631   52.18.181.17
2019-09-17T19:20:51.366193   52.18.181.18
2019-09-17T19:20:51.528611   52.18.181.18
2019-09-17T19:20:51.773429   131.53.97.59
2019-09-17T19:20:52.618215   192.168.21.15
2019-09-17T19:20:52.936181   52.18.181.18
"""

df= pd.read_csv(
        io.StringIO(raw), 
        sep='\s{2,}', dtype={
                'Time_stamp': 'datetime64', 
        'destination_address_IP': 'str'}, 
        engine='python')

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-06-09
    • 2011-08-10
    • 2018-11-20
    • 1970-01-01
    • 1970-01-01
    • 2012-08-03
    • 1970-01-01
    相关资源
    最近更新 更多