【发布时间】:2021-06-22 07:10:41
【问题描述】:
我有一个 3454 行的 h3_15min
Timestamp KE EH LA PR AG
0 2013-02-27 00:00:00 1.000000 2.0 0.03 201.289993 4.36
1 2013-02-27 00:15:00 0.990000 2.0 0.03 210.070007 4.38
2 2013-02-27 00:30:00 0.950000 2.0 0.02 207.779999 4.35
3 2013-02-27 00:45:00 0.990000 2.0 0.03 151.960007 4.34
4 2013-02-27 01:00:00 341.209991 2.0 0.04 0.000000 4.41
... ... ... ... ... ... ...
3449 2013-04-03 22:15:00 NaN 2.0 0.03 0.000000 NaN
3450 2013-04-03 22:30:00 NaN NaN 0.07 0.000000 NaN
3451 2013-04-03 22:45:00 NaN NaN NaN 0.000000 NaN
3452 2013-04-03 23:00:00 NaN NaN NaN 0.000000 NaN
3453 2013-04-03 23:15:00 NaN NaN NaN 0.000000 NaN
就是这样描述的
KE EH LA PR AG
count 3439.000000 3450.000000 3451.000000 3454.000000 3416.000000
mean 7.361526 60.447796 20.266174 17.185938 506.416779
std 48.624306 286.459686 168.753860 59.658848 623.306396
min 0.000000 2.000000 0.000000 0.000000 4.010000
25% 0.970000 2.000000 0.020000 0.000000 170.047501
50% 0.990000 2.000000 0.040000 0.000000 245.834991
75% 0.990000 2.000000 0.080000 0.000000 526.140015
max 652.210022 2199.290039 2214.550049 278.029999 3543.469971
我想删除异常值,以便计算平均值并替换 NaN 值。
我尝试使用以下源自 [this][1] 帖子的代码:
h3_15min[(np.abs(stats.zscore(h3_15min.loc[:, h3_15min.columns != "Timestamp" ])) < 3)]
这导致 h3_15min 有 3272 行"
Timestamp KE EH LA PR AG
3 2013-02-27 00:45:00 0.990000 2.0 0.03 151.960007 4.34
4 2013-02-27 01:00:00 341.209991 2.0 0.04 0.000000 4.41
5 2013-02-27 01:15:00 1.000000 2.0 0.02 0.000000 4.29
6 2013-02-27 01:30:00 0.990000 2.0 0.04 0.000000 4.19
7 2013-02-27 01:45:00 0.990000 2.0 0.01 0.000000 4.15
... ... ... ... ... ... ...
3449 2013-04-03 22:15:00 NaN 2.0 0.03 0.000000 NaN
3450 2013-04-03 22:30:00 NaN NaN 0.07 0.000000 NaN
3451 2013-04-03 22:45:00 NaN NaN NaN 0.000000 NaN
3452 2013-04-03 23:00:00 NaN NaN NaN 0.000000 NaN
3453 2013-04-03 23:15:00 NaN NaN NaN 0.000000 NaN
似乎它没有删除最大异常值,而只是删除了一些随机行。
对于 KE,异常值 > 1,对于 EH > 2,对于 LA > 1,对于 PR > 300。关于如何删除数据框的异常值而不必为每一列手动输入的任何想法?我的另一个数据集有 50 列,如果可以自动完成,那就太好了。 [1]:Detect and exclude outliers in Pandas data frame
【问题讨论】:
标签: python pandas dataframe nan outliers