【问题标题】:how can I apply a formula for each row如何为每一行应用公式
【发布时间】:2019-04-19 20:22:25
【问题描述】:

我有这样的数据

df<-structure(list(data = structure(c(8L, 2L, 3L, 2L, 2L, 2L, 2L, 
1L, 7L, 5L, 6L, 5L, 4L), .Label = c("1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"3, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"M1yrtr", "Mitered"), class = "factor")), row.names = c(NA, -13L), class = "data.frame")

我正在尝试为每一行计算以下内容

例如对于第二行是

2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

我要计算这个

n =5
(-(2/n)*log2(2/n)) + (-(1/n)*log2(1/n)) +(-(1/n)*log2(1/n))+ (-(1/n)*log2(1/n)) 

第三个是

2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

我会计算这个

(-(2/n)*log2(2/n)) + (-(2/n)*log2(2/n)) + (-(1/n)*log2(1/n))

所以输出看起来像这样

dfout<- structure(list(data = structure(c(8L, 2L, 3L, 2L, 2L, 2L, 2L, 
1L, 7L, 5L, 6L, 5L, 4L), .Label = c("1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"3, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0", 
"M1yrtr", "Mitered"), class = "factor"), X = structure(c(8L, 
3L, 2L, 3L, 3L, 3L, 3L, 1L, 7L, 6L, 4L, 6L, 5L), .Label = c("0.2604594", 
"1.03563", "1.168964", "2.020935", "2.077468", "2.204594", "M1yrtr", 
"Mitered"), class = "factor")), class = "data.frame", row.names = c(NA, 
-13L))

【问题讨论】:

    标签: r dataframe apply sapply


    【解决方案1】:

    在 R 中,所有基本运算(加减法、乘法、对数...)都是矢量化的。这意味着,例如,如果x 是一个向量,那么log(x) 只是组件式log 函数,或者1 / x 只是组件式除法。

    因此,您可以执行以下操作:

    x <- as.numeric(str_split(df[2, ], ", ", simplify = T))
    n <- 5
    sum((-(x[x > 0]/n)*log2(x[x > 0]/n)))
    [1] 1.921928
    

    如果你想对所有行应用这个,你可以像这样使用sapply 函数:

    myfun <- function(x){
     if (! grepl(",", x)) return(as.character(x))
      n <- 5
      y <- as.numeric(str_split(x, ", ", simplify = T))
      as.character(sum((-(y[y > 0]/n)*log2(y[y > 0]/n))))
    }
    
    df$newcol <- sapply(df[,1], myfun) 
    

    【讨论】:

    • 我对以编程方式进行更感兴趣。如果我想用你的方法,那和我做的一样。有没有办法像我上面显示的那样获得输出?
    • 我为整个 data.frame 添加了一种方法
    • 非常感谢您抽出宝贵时间,但您计算的结果与我上面计算的不同
    • dfout 中的值与您提供的公式不匹配。例如:(-(2/n)*log2(2/n)) + (-(1/n)*log2(1/n)) +(-(1/n)*log2(1/n))+ (-(1/n)*log2(1/n)) = 1.921928。但是在dfout 你有 1.168964
    • 我想我已经有足够的时间了,我会尽力解决这个问题。谢谢一堆。我喜欢并接受了你的回答
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-07-30
    • 1970-01-01
    • 1970-01-01
    • 2019-08-31
    相关资源
    最近更新 更多