【发布时间】:2021-04-04 21:55:21
【问题描述】:
我有下面的for循环函数:
def add_CQI_iterrows(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 0
series = []
for index, row in df.iterrows():
if row['Date'] == previous_row:
previous_row = row['Date']
print(CQI_index)
else:
CQI_index += 1
previous_row = row['Date']
series.append(CQI_index)
df['CQI'] = series
return df
我想找到一种将这个 for 循环转换为 apply 方法的方法。像这样的东西(不起作用):
def add_CQI_apply(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 1
series = []
df['CQI'] = df.apply(lambda row: previous_row = row['Date'] if row['Date'] == previous_row else CQI_index += 1 and previous_row = row['Date'], axis=1)
return df
我想做这个转换,因为我想看看 apply 方法有多快,以及是否可以在 Pandas 系列上对 apply 方法进行矢量化。
这是我的数据(data.json):
[
{
"Date": "9/20/2020 8:50",
"UE": 1
},
{
"Date": "9/20/2020 8:50",
"UE": 2
},
{
"Date": "9/20/2020 8:50",
"UE": 3
},
{
"Date": "9/20/2020 8:57",
"UE": 1
},
{
"Date": "9/20/2020 8:57",
"UE": 8
},
{
"Date": "9/20/2020 8:57",
"UE": 2
},
{
"Date": "9/20/2020 9:12",
"UE": 1
},
{
"Date": "9/20/2020 9:12",
"UE": 5
},
{
"Date": "9/20/2020 9:12",
"UE": 3
},
{
"Date": "9/20/2020 9:20",
"UE": 1
},
{
"Date": "9/20/2020 9:20",
"UE": 4
},
{
"Date": "9/20/2020 9:20",
"UE": 3
}
]
最后是上传这些数据的函数:
def upload_data(file):
df = pd.read_json(file)
df['Date'] = pd.to_datetime(df['Date'], format="%Y-%d-%m %H:%M:%S")
df['CQI'] = np.nan
return df
【问题讨论】:
标签: python pandas for-loop vectorization apply