【问题标题】:R - arules package association rules as a dataframeR - arules 将关联规则封装为数据框
【发布时间】:2018-11-21 13:22:42
【问题描述】:

我在 R 中使用 arules 包生成了关联规则。 已为 6 列/字段生成规则。 我想要的是一个由 6 列组成的数据框,这些列应该由关联规则填充。

例如:-

  1. lhs rhs 支持信心提升
  2. {性别=M,婚姻状况=Y,有工作吗?=Y} {Loan=Y} 0.7 0.8 1.9
  3. {性别=F,婚姻状况=Y,有工作吗?=Y} {Loan=Y} 0.6 0.7 1.1
  4. {Sex=M,有工作吗?=N} {Loan=N} 0.3 0.9 14.0

这应该以这种方式放入数据框中。

  • Sex MaritalStatus 有工作吗?贷款支持信心提升
  • M Y Y Y 0.7 0.8 1.9
  • F Y Y Y 0.6 0.7 1.1
  • M - N N 0.3 0.9 14

【问题讨论】:

    标签: r dataframe associations rules arules


    【解决方案1】:

    这需要一些编码和理解 R 和 arules 中使用的数据结构。这里有一些代码(希望)能做你想做的事。

    library(arules)
    
    # create some data
    dat <- data.frame(
       Sex = c("M", "F", "M"), 
       Status = c("Y", "Y", "N"), 
       Job = c("Y", "Y", "N"),
       Loan = c("Y", "Y", "N")
       )
    
    trans <- as(dat, "transactions")
    itemInfo(trans)
    
    #     labels variables levels
    # 1    Sex=F       Sex      F
    # 2    Sex=M       Sex      M
    # 3 Status=N    Status      N
    # 4 Status=Y    Status      Y
    # 5    Job=N       Job      N
    # 6    Job=Y       Job      Y
    # 7   Loan=N      Loan      N
    # 8   Loan=Y      Loan      Y
    
    # arulesCBA can mine classification rules (CARs) with items for the 
    # class variable in the RHS.
    library(arulesCBA)
    rules <- mineCARs(Loan ~ ., trans, parameter = list(supp = 1/3, conf = 0))
    inspect(head(rules))
    
    #     lhs           rhs      support   confidence lift count
    # [1] {}         => {Loan=N} 0.3333333 0.3333333  1.0  1    
    # [2] {}         => {Loan=Y} 0.6666667 0.6666667  1.0  2    
    # [3] {Sex=F}    => {Loan=Y} 0.3333333 1.0000000  1.5  1    
    # [4] {Status=N} => {Loan=N} 0.3333333 1.0000000  3.0  1    
    # [5] {Job=N}    => {Loan=N} 0.3333333 1.0000000  3.0  1    
    # [6] {Sex=M}    => {Loan=N} 0.3333333 0.5000000  1.5  1    
    
    # rules store information about how the items relate to the original variables
    ii <- itemInfo(rules)
    ii
    
    #     labels variables levels
    # 1    Sex=F       Sex      F
    # 2    Sex=M       Sex      M
    # 3 Status=N    Status      N
    # 4 Status=Y    Status      Y
    # 5    Job=N       Job      N
    # 6    Job=Y       Job      Y
    # 7   Loan=N      Loan      N
    # 8   Loan=Y      Loan      Y
    
    # start with translating the rules into a logical matrix
    m <- as(items(rules), "matrix")
    head(m)
    
    #      Sex=F Sex=M Status=N Status=Y Job=N Job=Y Loan=N Loan=Y
    # [1,] FALSE FALSE    FALSE    FALSE FALSE FALSE   TRUE  FALSE
    # [2,] FALSE FALSE    FALSE    FALSE FALSE FALSE  FALSE   TRUE
    # [3,]  TRUE FALSE    FALSE    FALSE FALSE FALSE  FALSE   TRUE
    # [4,] FALSE FALSE     TRUE    FALSE FALSE FALSE   TRUE  FALSE
    # [5,] FALSE FALSE    FALSE    FALSE  TRUE FALSE   TRUE  FALSE
    # [6,] FALSE  TRUE    FALSE    FALSE FALSE FALSE   TRUE  FALSE
    
    # do some R tricks to create the data.frame
    df <- do.call(cbind, 
       lapply(unique(ii$variables), FUN = function(var) {
       cols <- which(ii$variables == var)
       df <- data.frame(factor(apply(t(m[,cols])*(1:length(cols)), MARGIN = 2, max), 
         levels = 1:length(cols), 
         labels = ii$levels[cols]))
       colnames(df) <- var
       df
       }))
    
    # add quality measures
    df <- cbind(df, quality(rules))
    head(df)
    
    #    Sex Status  Job Loan   support confidence lift count
    # 1 <NA>   <NA> <NA>    N 0.3333333  0.3333333  1.0     1
    # 2 <NA>   <NA> <NA>    Y 0.6666667  0.6666667  1.0     2
    # 3    F   <NA> <NA>    Y 0.3333333  1.0000000  1.5     1
    # 4 <NA>      N <NA>    N 0.3333333  1.0000000  3.0     1
    # 5 <NA>   <NA>    N    N 0.3333333  1.0000000  3.0     1
    # 6    M   <NA> <NA>    N 0.3333333  0.5000000  1.5     1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-05-21
      • 2015-07-20
      • 1970-01-01
      • 2019-04-26
      • 1970-01-01
      • 2017-12-05
      • 2015-04-19
      • 2021-04-29
      相关资源
      最近更新 更多