【问题标题】:Order a list of words in one column of R在 R 的一列中排序单词列表
【发布时间】:2023-03-11 13:07:01
【问题描述】:

我有先验的输出数据框,规则如下:

rules
{A,B} => {C}
{C,A} => {B}
{A,B} => {D}
{A,D} => {B}
{A,B} => {E}
{E,A} => {B}

直到我将每个规则中的项目分组(data.frame 是 df_basket)

rules           basket
{A,B} => {C}    A,B,C
{C,A} => {B}    C,A,B
{A,B} => {D}    A,B,D
{A,D} => {B}    A,D,B
{A,B} => {E}    A,B,E
{E,A} => {B}    E,A,B

我希望能够按以下字母顺序订购篮子:

rules           basket  Group
{A,B} => {C}    A,B,C   A,B,C
{C,A} => {B}    C,A,B   A,B,C
{A,B} => {D}    A,B,D   A,B,D
{A,D} => {B}    A,D,B   A,B,D
{A,B} => {E}    A,B,E   A,B,E
{E,A} => {B}    E,A,B   A,B,E

我使用了下面的代码,它适用于小数据帧并完成了工作。对于大型数据帧,for 循环效率低下。请帮助我在 R 中优化这个原子操作:

for(i in 1:nrow(df_basket))
{
  df_basket$Basket[i]<- ifelse(1==1,paste(unlist(strsplit(df_basket$basket[i],","))
                                          [order(unlist(strsplit(df_basket$basket[i],",")))],collapse=","))

} 

如果有任何简单或更直接的方法可以获取我的数据框的“组”字段,请告诉我。

【问题讨论】:

  • 请以可重复的形式提供您的输入数据,例如使用dput(head(df_basket))

标签: r apply sapply arules


【解决方案1】:

尝试调整此解决方案:

f<-function(x)
{
  sorted<-sort(unlist(strsplit(x,",")))
  return(paste0(sorted,collapse = ","))

}
cbind(basket,unlist(lapply(basket,f)))

输入数据:

basket<-c("A,B,C","C,A,B","A,B,D","A,D,B","A,B,E","E,A,B")

输出:

     basket         
[1,] "A,B,C" "A,B,C"
[2,] "C,A,B" "A,B,C"
[3,] "A,B,D" "A,B,D"
[4,] "A,D,B" "A,B,D"
[5,] "A,B,E" "A,B,E"
[6,] "E,A,B" "A,B,E"

【讨论】:

  • 是的!这行得通。谢谢。我已经在 10K 上对其进行了测试,它的运行速度比 for 循环快得多。我现在正在 3M 上试用它,这需要一些时间。
  • 不客气,lapply 比 for 循环更好,但 3M 行需要几分钟。
  • 只用了不到 5 分钟,绝对是比 for 循环更好的选择。非常感谢
【解决方案2】:

这是使用来自arules 的更多支持的另一种方式:

### create some random data and mine rules
library("arules")
dat <- replicate(10, sample(LETTERS[1:5], size = 3), simplify = FALSE)
trans <- as(dat, "transactions")
rules <- apriori(trans)
inspect(rules)

    lhs      rhs support confidence lift     count
[1] {}    => {A} 0.8     0.8        1.000000 8    
[2] {B}   => {A} 0.6     1.0        1.250000 6    
[3] {C,D} => {E} 0.2     1.0        1.428571 2    
[4] {B,D} => {A} 0.1     1.0        1.250000 1    
[5] {B,C} => {A} 0.2     1.0        1.250000 2    
[6] {B,E} => {A} 0.3     1.0        1.250000 3   

### Get the itemsets that generated each rule and convert the itemsets 
### into a list. I use a list, since in gerneral, rules will not all 
### have the same number of items.
itemsets <- as(items(generatingItemsets(rules)), "list")

### sort the item labels alphabetically. Note that you could already 
### start with the item labels correctly sorted in the transaction set
### (see manual page for itemcoding in arules).
lapply(itemsets, sort)

[[1]]
[1] "A"

[[2]]
[1] "A" "B"

[[3]]
[1] "C" "D" "E"

[[4]]
[1] "A" "B" "D"

[[5]]
[1] "A" "B" "C"

[[6]]
[1] "A" "B" "E"

如果所有规则都有相同数量的项目,那么您可以将此列表放入矩阵中。

如果您希望它们作为单个字符串,那么您可以这样做:

sapply(lapply(itemsets, sort), paste0, collapse = ",")
[1] "A"     "A,B"   "C,D,E" "A,B,D" "A,B,C" "A,B,E"

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-01-11
    • 1970-01-01
    • 2016-08-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-12-01
    相关资源
    最近更新 更多