【发布时间】:2019-12-18 02:20:48
【问题描述】:
我有下面的数据框:
etf_id<-c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
factor<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C")
normalized<-c(-0.048436801,2.850578601,2.551666490,0.928625186,-0.638111793,
-0.540615895,-0.501691539,-1.099239823,-0.040736139,-0.192048665,
0.198915407,-0.092525810,0.214317734,2.550478998,0.024613778)
df<-data.frame(etf_id,factor,normalized)
我试图用 2 种方法去除异常值。首先我尝试outlier.color = NA,outlier.size = 0,outlier.shape = NA:
library(ggplot2)
library(plotly)
ggplotly(df %>%
ggplot(aes(factor, normalized, color = factor)) +
geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
coord_cartesian(ylim = quantile(df$normalized, c(0.01, 0.99), na.rm = T)))
钻石数据集的第二个示例。
p<-ggplotly(diamonds %>%
ggplot(aes(cut,price, color = cut)) +
geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA))
然后我尝试:
ggplotly(df %>%
ggplot(aes(factor, normalized, color = factor)) +
geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
coord_cartesian(ylim = quantile(boxplot.stats(df$normalized)$stats[c(1, 5)]*1.5, c(0.01, 0.99), na.rm = T)))
但这种方式似乎减少了我的情节限制,我需要一个通用的解决方案。
【问题讨论】: