【问题标题】:Adding random number of days to a series of datetime values将随机天数添加到一系列日期时间值
【发布时间】:2020-05-23 14:35:02
【问题描述】:
我正在尝试将随机天数添加到一系列日期时间值而不迭代数据帧的每一行,因为它需要很多时间(我有一个大数据帧)。我经历了 datetime 的 timedelta、pandas DateOffset 等,但他们没有选择一次给出随机天数,即使用列表作为输入(我们必须一个一个地给出随机数)。
代码:
df['date_columnA'] = df['date_columnB'] + datetime.timedelta(days = n)
以上代码将添加相同的天数,即 n 到所有行,而我希望添加随机数。
【问题讨论】:
标签:
python
pandas
dataframe
datetime
timedelta
【解决方案1】:
如果性能很重要,则通过to_timedelta 和numpy.random.randint 创建所有随机时间增量并添加到列中:
np.random.seed(2020)
df = pd.DataFrame({'date_columnB': pd.date_range('2015-01-01', periods=20)})
td = pd.to_timedelta(np.random.randint(1,100, size=len(df)), unit='d')
df['date_columnA'] = df['date_columnB'] + td
print (df)
date_columnB date_columnA
0 2015-01-01 2015-04-08
1 2015-01-02 2015-01-11
2 2015-01-03 2015-03-12
3 2015-01-04 2015-03-13
4 2015-01-05 2015-04-07
5 2015-01-06 2015-01-10
6 2015-01-07 2015-03-20
7 2015-01-08 2015-03-06
8 2015-01-09 2015-02-08
9 2015-01-10 2015-02-28
10 2015-01-11 2015-02-13
11 2015-01-12 2015-02-06
12 2015-01-13 2015-03-29
13 2015-01-14 2015-01-24
14 2015-01-15 2015-03-08
15 2015-01-16 2015-01-28
16 2015-01-17 2015-03-14
17 2015-01-18 2015-03-22
18 2015-01-19 2015-03-28
19 2015-01-20 2015-03-31
10k 行的性能:
np.random.seed(2020)
df = pd.DataFrame({'date_columnB': pd.date_range('2015-01-01', periods=10000)})
In [357]: %timeit df['date_columnA'] = df['date_columnB'].apply(lambda x:x+timedelta(days=random.randint(0,100)))
158 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [358]: %timeit df['date_columnA1'] = df['date_columnB'] + pd.to_timedelta(np.random.randint(1,100, size=len(df)), unit='d')
1.53 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
【解决方案2】:
import numpy as np
import pandas as pd
df['date_columnA'] = df['date_columnB'] +np.random.choice(pd.date_range('2000-01-01', '2020-01-01' , len(df))
【解决方案3】:
import random
df['date_columnA'] = df['date_columnB'].apply(lambda x:x+timedelta(days=random.randint(0,100))