【问题标题】:How to convert data frame to contingency table in R?如何将数据框转换为R中的列联表?
【发布时间】:2017-11-29 22:53:28
【问题描述】:

我有一个简单的问题。如何将数据框转换为 Fisher 精确检验的列联表?

我有 data 大约有 19000 行:

head(data)

          R_T1   R_T2    NR_T1  NR_T2
GMNN      14      60     70     157
GORASP2    7      67     39     188
TTC34      5      69     41     186
ZXDC       8      66     37     190
ASAH2      9      65     46     181

我想将每一行转换为列联表以执行 Fisher 精确检验。例如,对于GMNN

       R   NR
T1    14   70
T2    60  157

fisher.test(GMNN, alternative="two.sided")

Fisher's Exact Test for Count Data

data:  GMNN
p-value = 0.05273
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.2531445 1.0280271
sample estimates:
odds ratio 
0.5243787 

由于我有 19000 行数据,我希望输出为:

          R_T1   R_T2    NR_T1  NR_T2    p-value    odds_ratio
GMNN      14      60     70     157      0.05273    0.5243787 
GORASP2    7      67     39     188       0.1367     0.504643
TTC34      5      69     41     186      0.02422    0.3297116
ZXDC       8      66     37     190       0.3474    0.6233377
ASAH2      9      65     46     181       0.1648    0.5458072

我不知道如何做到这一点。有人可以帮忙吗?谢谢!

【问题讨论】:

    标签: r dataframe contingency


    【解决方案1】:

    您可以使用matrix 将每一行转换为列联表:

    ft.res <- apply(data, 1, function(x){
        t1 <- fisher.test(matrix(x, nrow = 2))
        data.frame(p_value = t1$p.value, odds_ratio = t1$estimate)
    })
    
    cbind(data, do.call(rbind, ft.res))
    #         R_T1 R_T2 NR_T1 NR_T2    p_value odds_ratio
    # GMNN      14   60    70   157 0.05273179  0.5243787
    # GORASP2    7   67    39   188 0.13671487  0.5046430
    # TTC34      5   69    41   186 0.02421765  0.3297116
    # ZXDC       8   66    37   190 0.34744964  0.6233377
    # ASAH2      9   65    46   181 0.16478480  0.5458072
    

    【讨论】:

      【解决方案2】:

      您可以使用apply,循环通过DataFrame的行:

      ## Replicating the data
      d  = data.frame(R_T1=c(14,7,5,8,9),R_T2=c(60,67,69,66,65),NR_T1=c(70,39,41,37,46),NR_T2=c(157,188,186,190,181))
      row.names(d) = c("GMNN","GORASP2","TTC34","ZXDC","ASAH2")
      ## Computing the fisher test and getting the values for each row 
      d[,c("p_value","odds_ratio")] = t(apply(d,1,function(x) {f=fisher.test(matrix(x,2,2));c(f$p.value,f$estimate)}
      
              R_T1 R_T2 NR_T1 NR_T2    p_value odds_ratio
      GMNN      14   60    70   157 0.05273179  0.5243787
      GORASP2    7   67    39   188 0.13671487  0.5046430
      TTC34      5   69    41   186 0.02421765  0.3297116
      ZXDC       8   66    37   190 0.34744964  0.6233377
      ASAH2      9   65    46   181 0.16478480  0.5458072
      

      【讨论】:

        【解决方案3】:

        以下是使用dplyrmutate 使用rowwise 的方法:

        df <- read.table(text="rowname R_T1   R_T2    NR_T1  NR_T2
        GMNN      14      60     70     157
        GORASP2    7      67     39     188
        TTC34      5      69     41     186
        ZXDC       8      66     37     190
        ASAH2      9      65     46     181",
        header=TRUE,stringsAsFactors = FALSE)
        
        df%>%
        rowwise%>%
        mutate(p.value=fisher.test(matrix(c(R_T1,R_T2,NR_T1,NR_T2),nrow=2))$p.value,
               odds_ratio=fisher.test(matrix(c(R_T1,R_T2,NR_T1,NR_T2),nrow=2))$estimate)
        
          rowname  R_T1  R_T2 NR_T1 NR_T2    p.value odds_ratio
            <chr> <int> <int> <int> <int>      <dbl>      <dbl>
        1    GMNN    14    60    70   157 0.05273179  0.5243787
        2 GORASP2     7    67    39   188 0.13671487  0.5046430
        3   TTC34     5    69    41   186 0.02421765  0.3297116
        4    ZXDC     8    66    37   190 0.34744964  0.6233377
        5   ASAH2     9    65    46   181 0.16478480  0.5458072
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2020-07-29
          • 2021-11-21
          • 2021-02-07
          • 2015-12-11
          • 1970-01-01
          • 1970-01-01
          • 2021-10-17
          • 1970-01-01
          相关资源
          最近更新 更多