【问题标题】:Why predict is not delivering the expected result?为什么预测没有提供预期的结果?
【发布时间】:2018-05-19 10:51:12
【问题描述】:
data <- data.frame(day_type = c("weekend", "weekend", "weekend","weekend",
                                "weekday", "weekday", "weekday", "weekday"),
                   vehicle = c("car", "car", "car", "car",
                               "bus", "bus", "bus", "bus"))

library(naivebayes)

model <- naive_bayes(vehicle ~ day_type, data = data)

predict(model, data.frame(day_type = "weekend"))
    [1] bus
Levels: bus car

这里的预期答案应该是汽车,但我得到的是公共汽车作为答案。请帮助识别错误。

【问题讨论】:

  • 是因子水平不匹配吗?尝试确保输入和预测数据集中的 day_type 级别相同。
  • 如果它不会让您的过程变得更慢,我建议您在 data.frames 中使用stringsAsFactors = F 构建您的模型。这将解决关卡造成的任何问题,因为您将使用字符变量。

标签: r naivebayes


【解决方案1】:

这将帮助您理解问题:

data <- data.frame(day_type = c("weekend", "weekend", "weekend","weekend",
                                "weekday", "weekday", "weekday", "weekday"),
                   vehicle = c("car", "car", "car", "car",
                               "bus", "bus", "bus", "bus"))

library(naivebayes)

model <- naive_bayes(vehicle ~ day_type, data = data)

dt_test1 = data.frame(day_type = "weekend")
dt_test2 = data.frame(day_type = "weekday")
dt_test3 = data.frame(day_type = c("weekend","weekday"))

predict(model, newdata = dt_test1)

# [1] bus
# Levels: bus car

predict(model, newdata = dt_test2)

# [1] bus
# Levels: bus car

predict(model, newdata = dt_test3)

# [1] car bus
# Levels: bus car

测试数据集 1 和 2 有 1 个级别,它们分别将值 1 分配给“周末”和“工作日”。然后模型理解值 1 和 2(基于您在原始数据集 data 中的内容)并且不关心标签(工作日/周末)。 但是,在测试数据集 3 中,您有两个标签,它们得到了正确的值(wwekend/weekday -> 1/2)。

作为极端情况,请检查:

dt_test4 = data.frame(day_type = c("January","February"))

predict(model, newdata = dt_test4)

# [1] car bus
# Levels: bus car

您仍然会得到预测!因为那些模型甚至无法理解的值被编码为 1 和 2。

因此,正如@Aaron 建议的那样,请确保您确保因子水平匹配,或使用字符变量而不是因子变量。

【讨论】:

    猜你喜欢
    • 2020-12-31
    • 2014-09-06
    • 1970-01-01
    • 2013-03-16
    • 1970-01-01
    • 2015-10-07
    • 2022-11-20
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多