通常,它总是包含“ACD”的列减去包含“ECD”的列。
此答案假定列标题没有噪音,即它们仅在“ACD”/“ECD”方面有所不同,并且除此之外完全相同(包括大写/小写)。还假设总是有一个匹配的列。此代码不检查它是否覆盖了将日期差异写入的列。
这种方法在线性时间内起作用,因为我们对列集进行一次迭代,并直接按名称访问匹配的列。
test.csv
Site Visit ECD,Site Visit ACD,Sold ECD (Loc A),Sold ACD (Loc A)
2018-06-01,2018-06-04,2018-07-05,2018-07-06
2017-02-22,2017-03-02,2017-02-27,2017-03-02
代码
import pandas as pd
df = pd.read_csv("test.csv", delimiter=",")
for col_name_acd in df.columns:
# Skip columns that don't have "ACD" in their name
if "ACD" not in col_name_acd: continue
col_name_ecd = col_name_acd.replace("ACD", "ECD")
# we assume there is always a matching "ECD" column
assert col_name_ecd in df.columns
col_name_diff = col_name_acd.replace("ACD", "Cycle Time")
df[col_name_diff] = df[col_name_acd].astype('datetime64[ns]') - df[col_name_ecd].astype('datetime64[ns]')
print(df.head())
输出
Site Visit ECD Site Visit ACD Sold ECD (Loc A) Sold ACD (Loc A) \
0 2018-06-01 2018-06-04 2018-07-05 2018-07-06
1 2017-02-22 2017-03-02 2017-02-27 2017-03-02
Site Visit Cycle Time Sold Cycle Time (Loc A)
0 3 days 1 days
1 8 days 3 days