【发布时间】:2023-04-04 00:24:01
【问题描述】:
我有大量带有后缀“mean”或“sum”的列。有时带有“平均”后缀的是NaN。发生这种情况时,我也想将带有“sum”后缀的那个也变成 NaN。我有大量变量,所以我需要 (?) 使用循环。我创建了一个假数据框,并添加了基于 SO 中类似帖子尝试过的 3 件事。不幸的是,没有任何效果
original_data_set = (pd.DataFrame
(
{
'customerId':[1,2]
,'usage_1_sum':[100, 200]
,'usage_1_mean':[np.nan,100]
,'usage_2_sum':[420,330]
,'usage_2_mean':[45,np.nan]
}
)
)
print('original dataset')
original_data_set
desired_data_set = (pd.DataFrame
(
{
'customerId':[1,2]
,'usage_1_sum':[np.nan, 200]
,'usage_1_mean':[np.nan,100]
,'usage_2_sum':[420,np.nan]
,'usage_2_mean':[45,np.nan]
}
)
)
print('desired dataset')
desired_data_set
holder_set = original_data_set.copy()
for number in range(1,3):
holder_set['usage_{}_sum'.format(number)] = (
holder_set['usage_{}_sum'.format(number)]
.where(holder_set['usage_{}_mean'.format(number)] == np.nan, np.nan
)
)
print('using an np.where statement changed all sum variables into NaN with no discretion')
holder_set
holder_set = original_data_set.copy()
for number in range(1,3):
conditions = [holder_set['usage_{}_mean'.format(number)]==np.nan]
outcome = [np.nan]
holder_set['usage_{}_sum'.format(number)] = np.select(conditions, outcome, default=holder_set['usage_{}_sum'.format(number)])
print('using an np.select did not have any effect on the dataframe')
holder_set
holder_set = original_data_set.copy()
for number in range(1,3):
holder_set.loc[holder_set['usage_{}_mean'.format(number)]==np.nan, 'usage_{}_sum'.format(number)] = 12
print('using a loc did not have any effect on the dataframe')
holder_set
【问题讨论】:
-
也许可以尝试查看
DataFrame.where()功能。您应该能够直接索引到问题区域,而无需自己编写 for 循环。
标签: python pandas loops conditional-statements calculated-columns