【问题标题】:How to apply a custom function to each participant in a data frame如何将自定义函数应用于数据框中的每个参与者
【发布时间】:2014-12-10 16:45:49
【问题描述】:

我创建了一个计算 dPrime 的函数。该函数将数据框作为其参数。这可以正常工作,但是必须将列称为“stimDiff”和“stimSame”,因为函数使用这些特定名称计算 dPrime。我想将此函数应用于具有多个主题的数据框,并且能够计算每个主题的 dPrime,结果是一个新的数据框,其中包含每个主题的 dPrime 分数。测试数据框如下所示:

stimDiff0 <- c(rep("diff", 20), rep("same", 5))
stimSame0 <- c(rep("diff", 10), rep("same", 15))

stimDiff1 <- c(rep("diff", 10), rep("same", 15))
stimSame1 <- c(rep("diff", 10), rep("same", 15))

stimDiff2 <- c(rep("diff", 19), rep("same", 6))
stimSame2 <- c(rep("diff", 11), rep("same", 14))

stimDiff3 <- c(rep("diff", 21), rep("same", 4))
stimSame3 <- c(rep("diff",  9), rep("same", 16))

stimDiff4 <- c(rep("diff", 18), rep("same", 7))
stimSame4 <- c(rep("diff", 12), rep("same", 13))

stimDiff5 <- c(rep("diff", 22), rep("same", 3))
stimSame5 <- c(rep("diff", 14), rep("same", 11))

stimDiff <- c(stimDiff0, stimDiff1, stimDiff2,
              stimDiff3, stimDiff4, stimDiff5)
stimSame <- c(stimSame0, stimSame1, stimSame2,
              stimSame3, stimSame4, stimSame5)
subject <- rep(0:5, each = 25)

x <- data.frame(subject = subject, stimDiff = stimDiff, stimSame = stimSame)

我正在尝试使用以下代码按主题数据框获取 dPrim:

tapply(c(x$stimDiff, x$stimSame), x$subject, data = x, FUN = dPrime)

我收到以下错误:

Error en tapply(list(x$stimDiff, x$stimSame), x$subject, data = x, FUN = dPrime) : 
arguments must have same length

我知道有些软件包可以计算 dPrime。我这样做是为了学习如何编写函数。我更愿意使用基础 R 找到解决方案。

这里是函数 dPrime 的代码:

dPrime <- function(x) {

# Calculate number of same, diff and total responses
# for the stimuli that were actually different
stimDiffRdiff <- nrow(x[x$stimDiff == 'diff', ])
stimDiffRsame <- nrow(x[x$stimDiff == 'same', ])
stimDiffTotal <- length(x$stimDiff)

# Calculate number of same, diff and total responses
# for the stimuli that were actually the same
stimSameRdiff <- nrow(x[x$stimSame == 'diff', ])
stimSameRsame <- nrow(x[x$stimSame == 'same', ])
stimSameTotal <- length(x$stimSame)

# Hit rate = the number of correct responses 'diff'
# when the stimuli were actually diff, divided by 
# the total number of responses
hitRate <- stimDiffRdiff / stimDiffTotal

# Miss rate = the number of incorrect responses
# 'same' when the stimuli were actually diff
# divided by the total number of responses
missRate <- stimDiffRsame / stimDiffTotal

# False alarm = the number responses 'diff'
# when the stimuli were actually the same
# divided by the total number of responses
falseAlarm <- stimSameRdiff / stimSameTotal

# Correct rejection = the number of responses
# same when the stimuli were actually the same
# divided by the number of total responses
corrReject <- stimSameRsame / stimSameTotal

# Calculate z-critical values for hit rate
# and false alarm rate
zHitRate <- qnorm(hitRate)
zFalseAlarm <- qnorm(falseAlarm)

# Calculate d prime
dPrime <- zHitRate - zFalseAlarm

print(dPrime)
}

【问题讨论】:

  • 请提供功能码。
  • @SvenHohenstein 我已经适当地编辑了这个问题。

标签: r function tapply


【解决方案1】:

以@jvcasill 的原始功能和其他用户的响应为基础:

dPrime <- function (data, subj = 1, stimDiff = 2, stimSame = 3) {
    # dPrime() returns a vector of the length of the number of subjects
    #+ in data[, subj] that contains the sensitivity index "d'" for each.
    # `data`:     data frame
    # `subj`:     index of "subject" column in `data`; default is 1
    # `stimDiff`: index of "stimDiff" column in `data`; default is 2
    # `stimSame`: index of "stimSame" column in `data`; default is 3
    if (is.data.frame(data)) {
        # Divide `data` by subject with split(), as have done others who've
        #+ responded to this question
        data.by.subj   <- split(data, data[, subj])
        # Calculate number of subjects and create vector of same length
        #+ to return
        n.subj         <- length(data.by.subj)
        dPrime.by.subj <- vector(mode = "double", length = n.subj)
        # Loop through "data.by.subj" subject by subject and calculate d'
        for (subj in seq_len(n.subj)) {
            # For clarity, create temporary data set with data of
            #+ current "subj"
            data.tmp      <- data.by.subj[[subj]]
            stimDiffRdiff <- nrow(data.tmp[data.tmp[, stimDiff] == "diff", ])
            stimDiffRsame <- nrow(data.tmp[data.tmp[, stimDiff] == "same", ])
            stimDiffTotal <- length(data.tmp[, stimDiff])
            stimSameRdiff <- nrow(data.tmp[data.tmp[, stimSame] == "diff", ])
            stimSameRsame <- nrow(data.tmp[data.tmp[, stimSame] == "same", ])
            stimSameTotal <- length(data.tmp[, stimSame])
            hitRate       <- stimDiffRdiff / stimDiffTotal
            missRate      <- stimDiffRsame / stimDiffTotal
            falseAlarm    <- stimSameRdiff / stimSameTotal
            # The following appears unused in the original function
            # corrReject  <- stimSameRsame / stimSameTotal
            zHitRate      <- qnorm(hitRate)
            zFalseAlarm   <- qnorm(falseAlarm)
            dPrime        <- zHitRate - zFalseAlarm
            dPrime.by.subj[subj] <- dPrime
        }
        # For clarity, give each d' value in vector to be returned,
        #+ "dPrime.by.subj", name of corresponding subject
        names(dPrime.by.subj) <- names(data.by.subj)
        return(dPrime.by.subj)
    } else stop("'data' is not a data frame")
}

请注意,我不确定它返回的值(对于@jvcasill 提供的示例数据集)是否与使用@Splendour 方法获得的值相同。

【讨论】:

    【解决方案2】:

    试试 data.table(使用长度函数而不是 dPrime):

    library(data.table)
    xt = data.table(x)
    xt[,list(len=length(c(stimSame,stimDiff))),by=subject]
       subject len
    1:       0  50
    2:       1  50
    3:       2  50
    4:       3  50
    5:       4  50
    6:       5  50
    

    使用基础 R:

    sapply(split(x, x$subject), dPrime)
    [1] 1.094968
    [1] 0
    [1] 0.8572718
    [1] 1.352917
    [1] 0.6329951
    [1] 1.024018
            0         1         2         3         4         5 
    1.0949683 0.0000000 0.8572718 1.3529167 0.6329951 1.0240176 
    

    重复输出是因为 dPrime 函数中的“print(dPrime)”语句。您应该用 return(dPrime) 替换它。更好的是,由于 dPrime 也是一个函数,您应该将 'dPrime

    ret = dPrime <- zHitRate - zFalseAlarm
    return(ret)
    

    【讨论】:

      【解决方案3】:

      这是基础 R 中的(有点不雅的)解决方案:

      将数据框拆分为列表,每个主题一个:

      by.subject <- split(x, x$subject)
      

      计算每个块的 dPrime,返回一个命名的数字向量:

      dPrime.values <- unlist(lapply(by.subject, dPrime), use.names=T)
      

      构造一个新的数据框:

      df <- data.frame(dPrime=dPrime.values)
      df$subject <- as.numeric(rownames(df))
      

      【讨论】:

        猜你喜欢
        • 2021-08-07
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2023-03-06
        • 2015-12-05
        • 2021-08-03
        相关资源
        最近更新 更多