【发布时间】:2020-10-22 13:17:06
【问题描述】:
我有来自多个运动传感器的以下示例数据 (multiple_sensors.csv):
sensorid,date_time,value
303,2012-06-25 11:15:35,0
404,2012-06-25 11:15:35,0
101,2012-06-25 11:15:35,0
202,2012-06-25 11:15:35,0
303,2012-06-25 11:15:36,0
404,2012-06-25 11:15:36,0
101,2012-06-25 11:15:36,0
202,2012-06-25 11:15:36,1
303,2012-06-25 11:15:37,0
404,2012-06-25 11:15:37,0
101,2012-06-25 11:15:37,0
202,2012-06-25 11:15:37,1
303,2012-06-25 11:15:38,0
404,2012-06-25 11:15:38,0
101,2012-06-25 11:15:38,0
202,2012-06-25 11:15:38,0
303,2012-06-25 11:15:39,0
404,2012-06-25 11:15:39,1
101,2012-06-25 11:15:39,0
202,2012-06-25 11:15:39,0
303,2012-06-25 11:15:40,0
404,2012-06-25 11:15:40,1
101,2012-06-25 11:15:40,0
202,2012-06-25 11:15:40,0
303,2012-06-25 11:15:41,1
404,2012-06-25 11:15:41,0
101,2012-06-25 11:15:41,0
202,2012-06-25 11:15:41,0
303,2012-06-25 11:15:42,1
404,2012-06-25 11:15:42,0
101,2012-06-25 11:15:42,0
202,2012-06-25 11:15:42,0
303,2012-06-25 11:15:43,1
404,2012-06-25 11:15:43,0
101,2012-06-25 11:15:43,0
202,2012-06-25 11:15:43,0
303,2012-06-25 11:15:44,0
我需要按发生顺序返回每个运动传感器事件的 id 和 duration(请参阅 expected_output.png)。 value 列确定是否触发了动作(1 - 表示已触发动作,0 - 表示无动作),date_time 列表示动作开始或结束的时间。
目前,我设法使用下面的单个运动传感器 (single_sensor.csv) 提取了 id 和持续时间(请参阅 single_sensor_output.png)。
sensorid,date_time,value
202,2012-06-25 00:01:07,0
202,2012-06-25 00:01:08,1
202,2012-06-25 00:01:09,1
202,2012-06-25 00:01:10,0
202,2012-06-25 00:02:12,0
202,2012-06-25 00:02:13,1
202,2012-06-25 00:02:14,1
202,2012-06-25 00:02:15,1
202,2012-06-25 00:02:16,0
202,2012-06-25 00:03:40,0
202,2012-06-25 00:03:41,1
202,2012-06-25 00:03:42,1
202,2012-06-25 00:03:43,1
202,2012-06-25 00:03:44,0
202,2012-06-25 00:05:11,0
202,2012-06-25 00:05:12,1
202,2012-06-25 00:05:13,1
202,2012-06-25 00:05:14,0
202,2012-06-25 00:06:19,0
202,2012-06-25 00:06:20,1
202,2012-06-25 00:06:21,1
202,2012-06-25 00:06:22,0
对于涉及单个传感器的代码,我遵循此处的示例 (Calculate duration between events with pandas)
import pandas as pd
import numpy as np
from pandas import read_csv
from datetime import datetime
from datetime import timedelta
data_time_format = '%Y-%m-%d %H:%M:%S'
df = read_csv('single_sensor.csv')
df['date_time'] = pd.to_datetime(df['date_time'], format=data_time_format)
a = (df['value'] != 1).cumsum().mask(df['value'] == 1)
df['value group'] = a.bfill()
df_final = df.groupby('value group').filter(lambda x: set(x['value']) == set([1,0]))\
.groupby('value group')['date_time'].agg(['first','last'])\
.rename(columns={'first':'start','last':'end'})\
.reset_index()
df_final['id'] = df['sensorid']
df_final['duration'] = df_final['end'].values - df_final['start']
df_final['duration'] = df_final['duration'].dt.total_seconds().astype(int)
print(df_final)
如何扩展它以使用 multiple_sensors.csv
实现我的预期输出【问题讨论】:
-
列值是多少?您认为它们何时是开始时间和停止时间?
标签: python pandas dataframe csv time-series