【发布时间】:2020-04-07 13:50:53
【问题描述】:
我有以下数据集,每个员工在公司办公场所打卡和下班都有一条线。
我想创建一个矩阵(摘要),显示每个部门在半小时内有多少人在大楼里,如下所示:
我已经编写了半小时时间段内建筑物中有多少人的代码,但我无法弄清楚如何计算每个部门中有多少人在这些时间段内建筑物中。我尝试了许多不同的技术,但我无法弄清楚。楼里一共有多少人我写了下面的代码:
import pandas as pd
from pandas import Timestamp # import pandas date time
# import a few rows of data. our actual real data is much larger
sample_data = pd.DataFrame({'direction_in': {37196: Timestamp('2019-09-26 16:11:11'), 2364: Timestamp('2019-09-03 13:37:48'), 36266: Timestamp('2018-04-05 06:06:14'), 27159: Timestamp('2019-09-04 07:31:22'), 48518: Timestamp('2018-09-05 05:44:46')}, 'emp': {37196: 152.0, 2364: 10.0, 36266: 150.0, 27159: 115.0, 48518: 187.0}, 'direction_out': {37196: Timestamp('2019-09-26 16:32:20'), 2364: Timestamp('2019-09-03 22:21:04'), 36266: Timestamp('2018-04-05 18:15:21'), 27159: Timestamp('2019-09-04 15:58:02'), 48518: Timestamp('2018-09-05 15:51:51')}, 'time_difference': {37196: '0 days 00:21:09', 2364: '0 days 08:43:16', 36266: '0 days 12:09:07', 27159: '0 days 08:26:40', 48518: '0 days 10:07:05'}, 'complete_record': {37196: 'yes', 2364: 'yes', 36266: 'yes', 27159: 'yes', 48518: 'yes'}, 'terminal': {37196: 1.0, 2364: 1.0, 36266: 1.0, 27159: 1.0, 48518: 3.0}, 'job_title': {37196: 59.0, 2364: 14.0, 36266: 83.0, 27159: 82.0, 48518: 4.0}, 'division': {37196: 2.0, 2364: 1.0, 36266: 2.0, 27159: 1.0, 48518: 4.0}})
# Create a new dataframe the sumerised data.
# The dataframe will contain 30 minute intervals from the first date to the last date in the above data
department_clocked_in_matrix = pd.DataFrame() # Creates new dataframe
department_clocked_in_matrix["date_time_from"] = pd.date_range(start="2018-02-12 00:00:00",end="2019-12-09 23:30:00",freq='30min') # Create from column
department_clocked_in_matrix['date_time_to'] = (department_clocked_in_matrix['date_time_from'].shift(-1)).fillna(0) # Creates to_column, 30 minutes distance from the from column
# chop off the last value as it shows a zero value
department_clocked_in_matrix = department_clocked_in_matrix.iloc[0:-1]
department_clocked_in_matrix
def sum_function(temp_df):
temp_sample = sample_data.loc[(temp_df.date_time_from >= sample_data.direction_in ) & (temp_df.date_time_to <= sample_data.direction_out),["division"] ].count()
return temp_sample
department_clocked_in_matrix2 = department_clocked_in_matrix.apply(sum_function, axis=1) # axis one is accross column summing
department_clocked_in_matrix["count"] = department_clocked_in_matrix2["division"]
【问题讨论】:
标签: python pandas pandas-groupby