【发布时间】:2021-05-16 19:33:03
【问题描述】:
对于我正在使用的数据集,它可以在 Kaggle 上的 link 上找到
我正在这样做:
import pandas as pd
df = pd.read_csv('./survey_results_public.csv')
df = df.dropna(subset=['Salary'], axis = 0).drop(['Respondent','ExpectedSalary','Salary'], axis = 1)
print(df['HoursPerWeek'].mean())
print(sum(df['HoursPerWeek'].isnull()))
# Method 1
df1 = df
df1 = df1.select_dtypes(include=['float']).fillna(df1.mean())
print(df['HoursPerWeek'].mean())
print(sum(df['HoursPerWeek'].isnull()))
print(df1['HoursPerWeek'].mean())
print(sum(df1['HoursPerWeek'].isnull()))
# Method 2
df2 = df
num_vars = df2.select_dtypes(include = ['float']).columns
for col in num_vars:
df2[col].fillna(df2[col].mean(),inplace = True)
print(df['HoursPerWeek'].mean())
print(sum(df['HoursPerWeek'].isnull()))
print(df2['HoursPerWeek'].mean())
print(sum(df2['HoursPerWeek'].isnull()))
我的问题是:为什么“方法 2”也会改变 df,正如在最后 4 个打印语句中观察到的那样,其中空值的平均值和数量是 df 和 df2 之间的值?
当我在 python 中对普通变量做类似的事情时,这不会发生
a=2
b=a
c=a
print(a,b,c)
b += 2
print(a,b,c)
c += 3
print(a,b,c)
在这个例子中,a 没有改变。
【问题讨论】:
标签: python python-3.x pandas dataframe