【发布时间】:2018-05-17 13:44:37
【问题描述】:
在对 data.table 进行剪切时遇到以下问题。 我不知道为什么它不能正确地通过“prod”进行过滤。 如果我在外面运行切割,如下所示,它会正确切割但不在数据表内。 你知道为什么以及如何解决它吗? 谢谢你
library(data.table)
db<-data.frame(count=c(331948, 334999, 321000, 305000, 324100, 310000, 305000, 325000, 305000, 329999, 315000,531948, 534999, 521000, 505000, 524100, 510000, 505000, 525000, 505000, 529999, 515000), prod=c("a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b"))
head(db)
count prod
1 331948 a
2 334999 a
3 321000 a
4 305000 a
5 324100 a
6 310000 a
setDT(db)[ , id := cut(count,8,digits=1,dig.lab = 7), by = prod]
count prod id
1: 331948 a (531249.1,535029]
2: 334999 a (531249.1,535029]
3: 321000 a (519999.5,523749.4]
4: 305000 a (504970,508749.9]
5: 324100 a (523749.4,527499.2]
6: 310000 a (508749.9,512499.8]
table(db[db$prod=='a',]$id)
(504970,508749.9] (508749.9,512499.8] (512499.8,516249.6] (516249.6,519999.5] (519999.5,523749.4] (523749.4,527499.2] (527499.2,531249.1] (531249.1,535029]
3 1 1 0 1 2 1 2
table(cut(db[db$prod=='a',]$count,8,digits=1,dig.lab = 7))
(304970,308749.9] (308749.9,312499.8] (312499.8,316249.6] (316249.6,319999.5] (319999.5,323749.4] (323749.4,327499.2] (327499.2,331249.1] (331249.1,335029]
3 1 1 0 1 2 1 2
【问题讨论】:
-
上述代码中
cut函数的用途是什么? -
我希望为每个产品生成范围。我需要这个用于实际的数据集。