如果 Excel 文件中不存在日期,我认为您正在尝试创建 NaN 行。
这是一种方法。您可以使用df.merge 选项。
我正在创建 df1 来模拟 excel 文件。它有两列sale_dt 和sale_amt。如果sale_dt 不存在,那么我们要创建一个单独的行,其中NaN 在列中。为了确保我们模拟它,我创建了一个从 1998-12-31 到 2020-06-23 的日期范围,中间跳过 4 天。所以我们有一个数据框,每两行之间缺少 4 个日期。该解决方案应按升序创建 4 个具有正确日期的虚拟行。
import pandas as pd
import random
#create the sales dataframe with missing dates
df1 = pd.DataFrame({'sale_dt':pd.date_range(start='1998-12-31', end='2020-06-23', freq='5D'),
'sale_amt':random.sample(range(1, 2000), 1570)
})
print (df1)
#now create a dataframe with all the dates between '1998-12-31' and '2020-06-23'
df2 = pd.DataFrame({'date':pd.date_range(start='1998-12-31', end='2020-06-23', freq='D')})
print (df2)
#now merge both dataframes with outer join so you get all the rows.
#i am also sorting the data in ascending order so you can see the dates
#also dropping the original sale_dt column and renaming the date column as sale_dt
#then resetting index
df1 = (df1.merge(df2,left_on='sale_dt',right_on='date',how='outer')
.drop(columns=['sale_dt'])
.rename(columns={'date':'sale_dt'})
.sort_values(by='sale_dt')
.reset_index(drop=True))
print (df1.head(20))
原始数据框是:
sale_dt sale_amt
0 1998-12-31 1988
1 1999-01-05 1746
2 1999-01-10 1395
3 1999-01-15 538
4 1999-01-20 1186
... ... ...
1565 2020-06-03 560
1566 2020-06-08 615
1567 2020-06-13 858
1568 2020-06-18 298
1569 2020-06-23 1427
输出将是(前 20 行):
sale_amt sale_dt
0 1988.0 1998-12-31
1 NaN 1999-01-01
2 NaN 1999-01-02
3 NaN 1999-01-03
4 NaN 1999-01-04
5 1746.0 1999-01-05
6 NaN 1999-01-06
7 NaN 1999-01-07
8 NaN 1999-01-08
9 NaN 1999-01-09
10 1395.0 1999-01-10
11 NaN 1999-01-11
12 NaN 1999-01-12
13 NaN 1999-01-13
14 NaN 1999-01-14
15 538.0 1999-01-15
16 NaN 1999-01-16
17 NaN 1999-01-17
18 NaN 1999-01-18
19 NaN 1999-01-19