【问题标题】:get the value from columns in two files从两个文件的列中获取值
【发布时间】:2013-06-13 19:33:36
【问题描述】:

我最初的观察是这样的:

名称分析物 弹簧 0.1 冬天 0.4

为了计算 p 值,我做了自举模拟:

名称分析物 弹簧 0.001 冬天 0 弹簧 0 冬天 0.2 弹簧 0.03 冬天 0 弹簧 0.01 冬天 0.02 弹簧 0.1 冬天 0.5 弹簧 0 冬天 0.04 弹簧 0.2 冬天 0 弹簧 0 冬天 0.06 弹簧 0 冬天 0 ......

现在我想计算经验 p 值:在原始数据中,冬季分析物 = 0.4 - 如果在自举数据中,冬季分析物 >=0.4(例如 1 次)并且已完成自举(例如 100 次),那么经验计算冬季分析物的 p 值:

1/100 = 0.01

(数据与原始数据相同或更高的次数 除以观察总数) 对于弹簧分析物 p 值为:

2/100 = 0.02

我想用 awk 计算这些 p 值。 我对春天的解决方案是:

awk -v VAR="spring" '($1==VAR && $2>=0.1) {n++} END {print VAR,"p-value=",n/100}'

弹簧 p 值 = 0.02 我需要的帮助是将原始文件(名称为 spring 和 Winter 及其分析物、观察结果观察次数)传递到 awk 并分配它们。

【问题讨论】:

    标签: regex perl sed awk grep


    【解决方案1】:

    说明及脚本内容:

    像这样运行它:awk -f script.awk original bootstrap

    # Slurp the original file in an array a
    # Ignore the header
    
    NR==FNR && NR>1 {
    
    # Index of this array will be type
    # Value of that type will be original value
    
        a[$1]=$2
        next
    }
    
    # If in the bootstrap file value
    # of second column is greater than original value
    
    FNR>1 && $2>a[$1] { 
    
    # Increment an array indexed at first column
    # which is nothing but type
    
        b[$1]++
    }
    
    # Increment another array regardless to identify
    # the number of times bootstrapping was done
    {
        c[$1]++
    }
    
    # for each type in array a
    
    END {
        for (type in a) {
    
    # print the type and calculate empirical p-value 
    # which is done by dividing the number of times higher value
    # of a type was seen and total number of times
    # bootstrapping was done. 
    
            print type, b[type]/c[type]
        }
    }
    

    测试:

    $ cat original 
    name Analyte
    spring 0.1
    winter 0.4
    
    $ cat bootstrap 
    name Analyte
    spring 0.001
    winter 0
    spring 0
    winter 0.2
    spring 0.03
    winter 0
    spring 0.01
    winter 0.02
    spring 0.1
    winter 0.5
    spring 0
    winter 0.04
    spring 0.2
    winter 0
    spring 0
    winter 0.06
    spring 0
    winter 0
    
    $ awk -f s.awk original bootstrap 
    spring 0.111111
    winter 0.111111
    

    分析:

    Spring Original Value is 0.1
    Winter Original Value is 0.4
    Bootstrapping done is 9 times for this sample file
    Count of values higher than Spring original value = 1
    Count of values higher than Winter's original value = 1
    So, 1/9 = 0.111111
    

    【讨论】:

      【解决方案2】:

      这对我有用,(GNU awk 3.1.6):

      FNR == NR {
           a[$1] = $2
           next
      }
      
      $2 > a[$1] {
          b[$1]++
          }
      
      {
          c[$1]++
      }
      
      END {
          for (i in a) print i, "p-value=",b[i]/c[i]
          }
      

      ..输出是:

      winter p-value= 0.111111
      spring p-value= 0.111111
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-05-16
        相关资源
        最近更新 更多