【问题标题】:How to include dynamic time?如何包含动态时间?
【发布时间】:2016-11-15 14:48:12
【问题描述】:

我正在尝试提取有关时隙的日志。没有时,下面的程序运行得很好。给出小时数,并提取该范围内的日志。

但是现在我还要动态地给出包含 Start 和 end 的内容。即在8 am to 8pm6am to 8am 等之间说。

我如何得到它?当前程序中的任何编辑也可以执行,或者单独的程序也可以执行。

输入:INPUT 的迷你版

代码:

import pandas as pd
from datetime import datetime,time
import numpy as np

fn = r'00_Dart.csv'
cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2']
df = pd.read_csv(fn, header=None, names=cols)

df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime

# 'start' and 'end' for the reporting DF: `r`
# which will contain equal intervals (1 hour in this case)
start = pd.to_datetime(df.StartTime.min(), unit='s').date()
end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)

# building reporting DF: `r`
freq = '1H'  # 1 Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)

# 1 hour in seconds, minus one second (so that we will not count it twice)
interval = 60*60 - 1

r['LogCount'] = 0
r['UniqueIDCount'] = 0

for i, row in r.iterrows():
        # intervals overlap test
        # https://en.wikipedia.org/wiki/Interval_tree#Overlap_test
        # i've slightly simplified the calculations of m and d
        # by getting rid of division by 2,
        # because it can be done eliminating common terms
    u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID
    r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]

r['Date'] = pd.to_datetime(r.start, unit='s').dt.date
r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time

#r.to_csv('results.csv', index=False)
#print(r[r.LogCount > 0])
#print (r['StartTime'], r['EndTime'], r['Day'], r['LogCount'], r['UniqueIDCount'])

rout =  r[['Date', 'StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount'] ]
#print rout
rout.to_csv('one_hour.csv', index=False, header=False)

编辑:

简单来说,我应该可以在程序中给出StartTimeEndTIme。下面的代码非常接近我想要做的。但是如何将其转换为熊猫。

from datetime import datetime,time

start = time(8,0,0)
end =   time(20,0,0)

with open('USC28days_0_20', 'r') as infile, open('USC28days_0_20_time','w') as outfile:
    for row in infile:
        col = row.split()
        t1 = datetime.fromtimestamp(float(col[2])).time()
        t2 = datetime.fromtimestamp(float(col[3])).time()
        print (t1 >= start and t2 <= end)

编辑二: Pandas 中的有效答案

从所选答案中的@MaxU 答案中参与其中。下面的代码在给定的StartTimeStopTime 之间去除所需的日志组

import pandas as pd
from datetime import datetime,time
import numpy as np

fn = r'00_Dart.csv'
cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2']

df = pd.read_csv(fn, header=None, names=cols)

#df['m'] = df.StopTime + df.StartTime
#df['d'] = df.StopTime - df.StartTime

# filter input data set ... 
start_hour = 8
end_hour = 9
df = df[(pd.to_datetime(df.StartTime, unit='s').dt.hour >= start_hour) & (pd.to_datetime(df.StopTime, unit='s').dt.hour <= end_hour)]

print df

df.to_csv('time_hour.csv', index=False, header=False)

但是:如果有可能控制分钟和秒也是很好的解决方案。

目前,这还会删除具有StopTime 小时的日志,以及直到下一个小时的分钟和秒。

有点像

start_hour = 8:0:0
end_hour = 9:0:0 - 1 # -1 to get the logs until 8:59:59

但这给了我一个错误

【问题讨论】:

  • 你能发布一个示例输入和所需的输出数据集吗?
  • @MaxU 我进行了有问题的编辑,并包含了完整数据集的迷你版
  • 您能解释一下您要如何计数吗?是否要排除/忽略时间戳不在指定小时范围内的所有行?
  • @MaxU 我已经做过了.. 但是这里的程序再次帮助我将数据与给定的小时数分开。即每小时或2小时或12小时n等等。但现在我想给出时间,即从早上 6 点到早上 8 点或 8 到 20 小时(早上 8 点到晚上 8 点)等等。
  • I should be able to give StartTime and EndTIme - 这部分很清楚。 :) 那么你打算如何处理这些变量呢?您是要为一天的上午 8 点至下午 6 点生成报告,还是要为所有天(不包括非工作时间时间)创建报告?

标签: python csv datetime numpy pandas


【解决方案1】:

试试这个:

import pandas as pd
from datetime import datetime,time
import numpy as np

fn = r'D:\data\gDrive\data\.stack.overflow\2016-07\dart_small.csv'
cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2']

df = pd.read_csv(fn, header=None, names=cols)

df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime

# filter input data set ... 
start_hour = 8
end_hour = 20
df = df[(pd.to_datetime(df.StartTime, unit='s').dt.hour >= 8) & (pd.to_datetime(df.StartTime, unit='s').dt.hour <= 20)]


# 'start' and 'end' for the reporting DF: `r`
# which will contain equal intervals (1 hour in this case)
start = pd.to_datetime(df.StartTime.min(), unit='s').date()
end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)

# building reporting DF: `r`
freq = '1H'  # 1 Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r = r[(r.index.hour >= start_hour) & (r.index.hour <= end_hour)]
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)

# 1 hour in seconds, minus one second (so that we will not count it twice)
interval = 60*60 - 1

r['LogCount'] = 0
r['UniqueIDCount'] = 0

for i, row in r.iterrows():
        # intervals overlap test
        # https://en.wikipedia.org/wiki/Interval_tree#Overlap_test
        # i've slightly simplified the calculations of m and d
        # by getting rid of division by 2,
        # because it can be done eliminating common terms
    u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID
    r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]

r['Date'] = pd.to_datetime(r.start, unit='s').dt.date
r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time

#r.to_csv('results.csv', index=False)
#print(r[r.LogCount > 0])
#print (r['StartTime'], r['EndTime'], r['Day'], r['LogCount'], r['UniqueIDCount'])

rout =  r[['Date', 'StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount'] ]
#print rout

旧答案:

from_time = '08:00'
to_time = '18:00'
rout.between_time(from_time, to_time).to_csv('one_hour.csv', index=False, header=False)

【讨论】:

  • 谢谢!这看起来非常接近。但是如何解析 StartTime &amp; EndTime 以在 from_time &amp; to_time 中考虑?无论日期如何,这看起来都是可行的。
  • @SitzBlogz,对不起,我不太懂你,你能举个小例子吗?
  • 这看起来是正确的.. 完全没有问题.. 但我对如何将输入放入其中以便我们可以有输出感到困惑.. 简而言之,我该如何解析输入跨度>
  • @SitzBlogz,我的想法是过滤输出/报告DF,而不是输入
  • 哦哦..请输入有可能..因为输出已经完善..
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2013-09-04
  • 2019-05-23
  • 2010-10-05
  • 2011-03-29
  • 2011-05-02
  • 2019-09-18
  • 1970-01-01
相关资源
最近更新 更多