【发布时间】:2018-07-03 07:14:08
【问题描述】:
我正在处理的数据集是不平衡的,所以我试图通过欠采样来平衡数据集,但我得到一个错误如何解决这个错误? 这是我得到的错误 函数错误(公式、数据、方法、子集、na.action、N、P=0.5、: 响应变量只有一个类。 如何解决这个错误?
我尝试过的:
library(ROSE)
data_frame <- click.csv
data_frame2 <- buy.csv
colnames(data_frame) [1] "Session ID" "Timestamp" "Item ID" "Category"
colnames(data_frame2) [1] "Session ID" "Timestamp" "Item ID" "Price" "Quantity"
> mydata<- merge(x=data_frame, y=data_frame2, by = "SessionID", all.x = TRUE, allow.cartesian=TRUE)# left outer join mydata
> mydata
Session ID Timestamp.x Item ID.x Category Timestamp.y Item ID.y Price Quantity 1: 1 2014-04-07T10:51:09.277Z 214536502 0 2: 1 2014-04-07T10:54:09.868Z 214536500 0 3: 1 2014-04-07T10:54:46.998Z 214536506 0 4: 1 2014-04-07T10:57:00.306Z 214577561 0 5: 10000001 2014-09-08T10:35:38.841Z 214854230 S --- 40596049: 9999997 2014-09-07T18:12:46.466Z 214854159 S 40596050: 9999997 2014-09-07T18:13:04.315Z 214643036 S 40596051: 9999997 2014-09-07T18:14:47.365Z 214854159 S 40596052: 9999998 2014-09-07T20:53:43.120Z 214541597 0 40596053: 9999999 2014-09-04T04:44:46.942Z 214644650 S
mydataItemID.y[!is.na(mydataItemID.y[!is.na(mydataItemID.y)]<-1
mydataItemID.y[is.na(mydataItemID.y[is.na(mydataItemID.y)]<-0
table(mydata$ItemID.y)
0 1
29698257 10897796
str(mydata) Classes ‘data.table’ and 'data.frame': 40596053 obs. of 8 variables:SessionID:Factorw/9249729levels"1","10000001",..:1111222223...SessionID:Factorw/9249729levels"1","10000001",..:1111222223...Timestamp.x: Factor w/ 32937845 levels "2014-04-01T03:00:00.124Z",..: 1406509 1407501 1407712 1408409 29083768 29085345 29085440 29085649 29088238 29247009 ...ItemID.x:Factorw/52739levels"1178793047","1178794001",..:20832082208499065023064116410502305018748852...ItemID.x:Factorw/52739levels"1178793047","1178794001",..:20832082208499065023064116410502305018748852... Category : Factor w/ 339 levels "0","1","10","11",..: 1 1 1 1 339 339 339 339 339 339 ...Timestamp.y:Factorw/1136477levels"2014−04−01T03:05:31.743Z",..:NANANANANANANANANANA...Timestamp.y:Factorw/1136477levels"2014−04−01T03:05:31.743Z",..:NANANANANANANANANANA...ItemID.y : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...Price:Factorw/735levels"0","10052","1015",..:NANANANANANANANANANA...Price:Factorw/735levels"0","10052","1015",..:NANANANANANANANANANA... Quantity : Factor w/ 28 levels "0","1","10","11",..: NA NA NA NA NA NA NA NA NA NA ... - attr(*, ".internal.selfref")
data_balanced_over <- ovun.sample(ItemID.y ~ ., data = mydata, method = "over",N = 800)
Error in function (formula, data, method, subset, na.action, N, P=0.5, :
The response variable has only one class.
【问题讨论】:
-
欢迎来到 StackOverflow!请阅读有关how to ask a good question 的信息以及如何提供reproducible example。这将使其他人更容易帮助您。
-
答案在你的错误
The response variable has only one class.如果你的响应变量只有一个类,你怎么能欠采样?你需要不止一门课。 -
先生,你说的不止一个类是什么意思??
-
你需要给我们一些可以使用的东西,至少使用
dput(mydata)..还有其他方法你可以简单地欠采样,看看caret包 -
另外,别再用不同的账号问同样的问题了:stackoverflow.com/questions/51129796/…stackoverflow.com/questions/51126377/…
标签: r