【发布时间】:2020-07-16 20:32:15
【问题描述】:
我正在尝试绘制来自 3 个集群的一维数据的相对频率。我想要的是一个使用颜色来区分 3 个集群的单个直方图,并且我希望每个 bin 的高度代表特定集群的该值范围的相对频率。
代码如下:
library(mvtnorm)
library(gtools)
library(ggplot2)
K = 3 # number of clusters
p_p = c(0.25, 0.25, 0.5) # population weights
theta_p = c(2, 5, 15) # population gamma params - shape
phi_p = c(2,2, 5) # population gamma params - scale
N_p = c(25, 25, 50) # sample size within each cluster
set.seed(1) # set seed so that the results are the same each time
y <- numeric()
## We will now sample data from all three clusters
y[1:N_p[1]] <- rgamma(N_p[1], theta_p[1], phi_p[1])
y[(N_p[1]+1): (N_p[1]+N_p[2])] <- rgamma(N_p[2], theta_p[2], phi_p[2])
y[(N_p[1]+N_p[2]+1): sum(N_p)] <- rgamma(N_p[3], theta_p[3], phi_p[3])
Data = data.frame(y = y, source = as.factor(c(rep(1,25), rep(2,25), rep(3,50))))
ggplot(Data, aes(x=y, color = source))+
geom_histogram(aes(y=..count../sum(..count..)),fill="white", position="dodge", binwidth = 0.5) +
theme(legend.position="top")+labs(title="Samples against Theoretical Dist",y="Frequency", x="Sample Value")
length(which(y[1:25]<=0.5))/length(y)
length(which(y[1:25]<=0.5))/length(y[0:25])
现在,我想要的是第一个红色直方图条的高度等于长度(which(y[1:25]
但是,我的身高大约为 0.12,与这两个值都不匹配,让我认为我完全误解了 ..count.. 和 sum(..count..)。
【问题讨论】: