【发布时间】:2015-02-03 08:53:49
【问题描述】:
如何计算 pandas 中字符串列的总数?
myl=[('2012-11-07 19:16:07', ' 2012-11-07 19:21:07', ' 0h 05m 00s'),
('2012-11-13 06:16:07', ' 2012-11-13 06:21:07', ' 0h 05m 00s'),
('2012-11-15 09:56:07', ' 2012-11-15 11:41:07', ' 1h 45m 00s'),
('2012-11-15 22:26:07', ' 2012-11-16 07:01:07', ' 8h 35m 00s')]
import pandas as pd
df = pd.DataFrame(myl, columns=['from', 'to', 'downtime'])
以上代码将在单个列中返回“停机时间”。如何获取该列中整数值的总和?
In [5]: df
Out[5]:
from to downtime
0 2012-11-07 19:16:07 2012-11-07 19:21:07 0h 05m 00s
1 2012-11-13 06:16:07 2012-11-13 06:21:07 0h 05m 00s
2 2012-11-15 09:56:07 2012-11-15 11:41:07 1h 45m 00s
3 2012-11-15 22:26:07 2012-11-16 07:01:07 8h 35m 00s
例如在上述输出中,预计停机时间总列将是 9h 90m 00s
更新:
我如何计算每天的停机时间?
预期结果:
2012-11-07 0h 05m 00s
2012-11-13 0h 05m 00s
2012-11-15 10h 20m 00s
这是按预期工作的:
df['downtime_t'] = pd.to_timedelta(df['downtime'])
df['year'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).year
df['month'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).month
df['day'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).day
df.groupby(['year', 'month', 'day'])['downtime_t'].sum()
这也适用于年份分组:
df['from_d2'] = pd.to_datetime(df['from'])
df.groupby(df['from_d2'].map(lambda x: x.year ))['downtime_t'].sum()
但这不起作用:
df.groupby(df['from_d2'].map(lambda x: x.year, x.month, x.day))['downtime_t'].sum()
还有其他方法可以实现按总数分组吗?
【问题讨论】:
-
你想要那个结果,还是
10h 30m 00s也不错? (或更好?) -
10h 30m 00s 更好更正确!
-
您应该先将日期列转换为
datetimes,将停机时间列转换为timedeltas,然后您就可以这样做df.groupby(df['from'].dt.date()).mean() -
对不起,我是
df['from'].dt.date没有括号(属性而不是方法) -
那行得通。谢谢。