【发布时间】:2017-12-30 10:00:57
【问题描述】:
我在 csv 文件中有以下数据:
Date Model Color Value Samples
6/19/2017 Gold Blue 0.5 500
6/19/2017 Gold Red 0.0 449
6/19/2017 Silver Blue 0.75 1320
6/19/2017 Silver Blue 1.5 103
6/19/2017 Gold Red 0.7 891
6/19/2017 Gold Blue 0.41 18103
6/19/2017 Copper Blue 0.83 564
6/19/2017 Silver Pink 1.17 173
6/19/2017 Platinum Brown 0.43 793
6/19/2017 Platinum Red 0.71 1763
6/19/2017 Gold Orange 1.92 503
我使用fread函数创建data.table:
library(dplyr)
library(data.table)
df <- fread("test_data.csv",
header = TRUE,
fill = TRUE,
sep = ",")
然后我按Model对数据进行子集化,如下:
df_subset <- subset(df, df$Model=='Gold' & df$Value > 0)
然后,我根据Color 变量创建一些百分位数,如下所示:
df_subset[, .(Samples = sum(Samples),
'50th' = quantile(AvgValue, probs = c(0.50)),
'99th' = quantile(AvgValue, probs = c(0.99)),
'99.9th' = quantile(AvgValue, probs = c(0.999)),
'99.99th' = quantile(AvgValue, probs = c(0.9999))),
by = Color]
它给出以下输出:
Color Samples 50th 99th 99.9th 99.99th
1: Blue 18603 0.455 0.4991 0.49991 0.499991
2: Red 1340 0.975 1.2445 1.24945 1.249945
3: Orange 503 1.920 1.9200 1.92000 1.920000
我正在尝试遍历 Model 值列表并为每个 Model 值输出相关的百分位值。
我尝试了以下方法(但失败了):
models <- unique(df$Model)
for (model in models){
df$model[, .(Samples = sum(Samples),
'50th' = quantile(Value, probs = c(0.50)),
'99th' = quantile(Value, probs = c(0.99)),
'99.9th' = quantile(Value, probs = c(0.999)),
'99.99th' = quantile(Value, probs = c(0.9999))),
by = Color]
}
错误信息是:
Error in .(Samples = sum(Samples), `50th` = quantile(Value, probs = c(0.5)), : could not find function "."
【问题讨论】:
-
dplyr包:group_by和mutate。 -
什么是
AvgValue?
标签: r for-loop dataframe data.table dplyr