结果是来自 ctree 分类的向量值而不是标量答案

【问题标题】：Result being vector values from ctree classification rather than scalar结果是来自 ctree 分类的向量值而不是标量
【发布时间】：2014-02-05 23:01:16
【问题描述】：

在我通过 as.factor(response) 将我的响应变量设置为一个因素后，我运行：

tree = ctree(response~., data=trainingset)

当我绘制这棵树时：它给了我图中 y 的向量值作为示例： y=(0.095, 0.905, 0) 我注意到这 3 个值的总和为 1。

但事实上，实际响应变量仅包含 0、1、99 的值。

谁能帮我在 ctree 图中解释这个向量吗？谢谢！

具体代码如下：

response = as.factor(data$response) 
newdata = cbind(predictor.matrix, response)

ind = sample(2, nrow(newdata), replace=TRUE, prob=c(0.7, 0.3))
trainData = newdata[ind==1,]
testData = newdata[ind==2,]

tree = ctree(response~., data=trainData)
plot(tree, type="simple")

【问题讨论】：

这些是每个类的后验概率；即，对于 1 类，观察值约为 0.9 (90%) 的后验概率。
感谢 Gavin，我使用了命令 plot(tree, type="simple")
对于 is.factor() 问题，返回值为 TRUE。 :)
具体代码请看我上面编辑的帖子原文。谢谢！
重新转换response的代码，不太理想。你想要response 在trainingset 中。最好是trainingset <- transform(trainingset, response = as.factor(response))。

标签： r classification

【解决方案1】：

这些是您每个班级的后验概率；即，对于 1 类，该节点的后验概率约为 0.9 (90%)（假设您的因子水平顺序为 c(0, 1, 99)。

实际上，这意味着该节点中约 90% 的观察属于 1 类，约 5% 属于 0 类，并且没有任何观察属于 99 类。

我认为让您感到震惊的是，您的课程是数字级别，并且情节具有后验概率，也是数字的。如果我们查看 party 包中的一个示例，其中响应是角色级别的一个因素，希望您能更好地理解树的情节和输出。

来自?ctree

library("party")
irisct <- ctree(Species ~ ., data = iris)
irisct

R> irisct

     Conditional inference tree with 4 terminal nodes

Response:  Species 
Inputs:  Sepal.Length, Sepal.Width, Petal.Length, Petal.Width 
Number of observations:  150 

1) Petal.Length <= 1.9; criterion = 1, statistic = 140.264
  2)*  weights = 50 
1) Petal.Length > 1.9
  3) Petal.Width <= 1.7; criterion = 1, statistic = 67.894
    4) Petal.Length <= 4.8; criterion = 0.999, statistic = 13.865
      5)*  weights = 46 
    4) Petal.Length > 4.8
      6)*  weights = 8 
  3) Petal.Width > 1.7
    7)*  weights = 46

这里，Species 是一个带水平的因子变量

R> with(iris, levels(Species))
[1] "setosa"     "versicolor" "virginica"

绘制树显示终端节点中的数字后验概率：

plot(irisct, type = "simple")

一个更丰富的情节是：

plot(irisct)

因为这清楚地表明每个节点都有来自一个或多个类的多个观察值。这就是后验概率的计算方式。

来自树的预测由predict() 方法给出

predict(irisct)

R> predict(irisct)
  [1] setosa     setosa     setosa     setosa     setosa     setosa    
  [7] setosa     setosa     setosa     setosa     setosa     setosa    
 [13] setosa     setosa     setosa     setosa     setosa     setosa
....

您可以通过treeresponse 函数获得每个观测的后验概率

R> treeresponse(irisct)[145:150]
[[1]]
[1] 0.00000 0.02174 0.97826

[[2]]
[1] 0.00000 0.02174 0.97826

[[3]]
[1] 0.00000 0.02174 0.97826

[[4]]
[1] 0.00000 0.02174 0.97826

[[5]]
[1] 0.00000 0.02174 0.97826

[[6]]
[1] 0.00000 0.02174 0.97826

【讨论】：