【问题标题】:Calculate value with two columns based on four conditions in R根据R中的四个条件计算两列的值
【发布时间】:2019-08-03 19:10:05
【问题描述】:

我在 r 中上传了一个大型数据集(简短版本见下文):我想为每个 CruiseidSamplenrSpeciesAge 计算一个值(因此基于四个条件) :

Cruiseid    Samplenr Species Age Length LK  TNumStat    TNumLK
197502      37        154   0   12,5    2   2,791666667 5,583333
197502      37        154   0   17,5    3   2,166666667 6,5
197502      37        154   2   172,5   34  11,54166667 392,4167
197502      37        154   2   177,5   35  12,0625 422,1875
197502      37        154   2   182,5   36  2,083333333 75
197502      35        154   0   112,5   22  11,85654008 260,8439
197502      35        154   2   197,5   39  2,109704641 82,27848
197502      35        154   2   217,5   43  2,109704641 90,7173
197502      35        154   2   232,5   46  2,109704641 97,04641
197502      36        154   0   12,5    2   4,685314685 9,370629
197502      36        154   2   182,5   36  3,496503497 125,8741
197502      41        154   0   17,5    3   2,260869565 6,782609
197502      41        154   2   202,5   40  4,347826087 173,913
197502      41        154   2   212,5   42  2,173913043 91,30435
197502      41        154   2   242,5   48  2,173913043 104,3478
197503      56        154   0   17,5    3   7,428571429 22,28571
197503      56        154   0   147,5   29  10,30952381 298,9762
197503      56        154   2   172,5   34  13,19047619 448,4762
197503      56        154   2   187,5   37  2,380952381 88,09524
197503      54        154   0   12,5    2   3,35        6,7
197503      54        154   0   157,5   31  12          372
197503      54        154   0   167,5   33  13,25       437,25
197503      54        154   2   172,5   34  13,85       470,9
197503      54        154   2   187,5   37  2,5         92,5
197503      54        154   2   217,5   43  2,5         107,5
197503      53        154   0   12,5    2   2,875536481 5,751073
197503      53        154   0   97,5    19  4,806866953 91,33047
197503      53        154   0   107,5   21  5,622317597 118,0687
197503      53        154   0   142,5   28  8,776824034 245,7511

我想计算:((TNumStat$TNumLK/TNumStat$TNumStat)*0.5+0.25)*10for 每个CruiseidSamplenrSpeciesAge

我已经在循环构造中尝试过一些东西:

#######################
Cruise <- unique(TNumStat$Cruiseid)
Track <- unique(TNumStat$Samplenr)
#######################
AvrLengthCr <- c()
AvrLengthCr <- rep(NA, length(TNumStat$Species))
#######################
for(j in 1:length(Cruise)){
  t1.ss <- which(TNumStat$Cruiseid ==  Cruise[j])
  ###
  for(i in 1:length(Track)){
    t2.ss <- which(TNumStat$Samplenr[t1.ss] ==  Track[i])
    ###
    AvrLengthCr[t1.ss][t2.ss] <- ((TNumStat$TNumLK[t1.ss][t2.ss]/TNumStat$TNumStat[t1.ss][t2.ss])*0.5+0.25)*10
  }}

但它似乎不起作用。我也一直在用 dcast 研究一些东西:

TNumStat2<-dcast(TNumStat,Cruiseid+Samplenr+Species+Age,formula = (((TNumStat$TNumLK/TNumStat$TNumStat*0.5+0.25)*10) )),na.rm=TRUE)

我尝试过的选项似乎都不起作用,我不知道如何解决这个问题。有人可以帮帮我吗?

谢谢

【问题讨论】:

  • 你需要library(dplyr); df %&gt;% group_by(Cruiseid, Samplenr, Species, Age) %&gt;% mutate(ratio = ((TNumLK/TNumStat)*0.5+0.25)*10) 吗?
  • 您可能必须使用TNumStat[c("TNumStat", "TNumLK")] &lt;- lapply(TNumStat[c("TNumStat", "TNumLK")], function(x) as.numeric(gsub(",", ".", x))) 清理您的数据拳头才能获得真正的小数点和数值。
  • @jay.sf:为什么现在不行呢?现在的数字不正确吗?

标签: r loops aggregate conditional-statements dcast


【解决方案1】:

早安,

我认为这个问题并不完全清楚。但你可以尝试类似(使用 dplyr)

sample <- sample %>%
  mutate(calculate = ((TNumLK/TNumStat) * 0.5 + 0.25) * 10) %>%
  group_by(Cruiseid, Samplenr, Species, Age)

summarisedDF <- sample %>%
  summarise(avg.calculate = mean(calculate))

【讨论】:

    【解决方案2】:

    让我印象深刻的是您的列"Length", "TNumStat", "TNumLK", 而不是.,因此是不能轻易强制转换为数字的字符格式。

    TNumStat[c("TNumStat", "TNumLK")] <- 
      lapply(TNumStat[c("TNumStat", "TNumLK")], 
             function(x) as.numeric(gsub(",", ".", x)))
    

    这可能取决于您的系统区域设置,因此如果适合您,请忽略此步骤。

    然后,您可以使用by 来应用您的公式。

    l <- by(TNumStat, TNumStat[c("Cruiseid", "Samplenr", "Species")],
            function(x) cbind(unique(x[1:3]),
                              value=with(x, ((mean(TNumLK)/mean(TNumStat))*0.5+0.25)*10)))
    

    这会为您提供一个列表,您可以通过rbind 获得结果。

    TNumStat.new <- do.call(rbind, l)
    
    TNumStat.new
    #    Cruiseid Samplenr Species     value
    # 6    197502       35     154 148.46288
    # 10   197502       36     154  85.14956
    # 1    197502       37     154 149.61421
    # 12   197502       41     154 174.24600
    # 26   197503       53     154 106.86347
    # 20   197503       54     154 159.17545
    # 16   197503       56     154 131.26698
    

    数据

    TNumStat <- structure(list(Cruiseid = c(197502L, 197502L, 197502L, 197502L, 
    197502L, 197502L, 197502L, 197502L, 197502L, 197502L, 197502L, 
    197502L, 197502L, 197502L, 197502L, 197503L, 197503L, 197503L, 
    197503L, 197503L, 197503L, 197503L, 197503L, 197503L, 197503L, 
    197503L, 197503L, 197503L, 197503L), Samplenr = c(37L, 37L, 37L, 
    37L, 37L, 35L, 35L, 35L, 35L, 36L, 36L, 41L, 41L, 41L, 41L, 56L, 
    56L, 56L, 56L, 54L, 54L, 54L, 54L, 54L, 54L, 53L, 53L, 53L, 53L
    ), Species = c(154L, 154L, 154L, 154L, 154L, 154L, 154L, 154L, 
    154L, 154L, 154L, 154L, 154L, 154L, 154L, 154L, 154L, 154L, 154L, 
    154L, 154L, 154L, 154L, 154L, 154L, 154L, 154L, 154L, 154L), 
        Age = c(0L, 0L, 2L, 2L, 2L, 0L, 2L, 2L, 2L, 0L, 2L, 0L, 2L, 
        2L, 2L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 2L, 2L, 2L, 0L, 0L, 0L, 
        0L), Length = structure(c(3L, 8L, 9L, 10L, 11L, 2L, 13L, 
        16L, 17L, 3L, 11L, 8L, 14L, 15L, 18L, 8L, 5L, 9L, 12L, 3L, 
        6L, 7L, 9L, 12L, 16L, 3L, 19L, 1L, 4L), .Label = c("107,5", 
        "112,5", "12,5", "142,5", "147,5", "157,5", "167,5", "17,5", 
        "172,5", "177,5", "182,5", "187,5", "197,5", "202,5", "212,5", 
        "217,5", "232,5", "242,5", "97,5"), class = "factor"), LK = c(2L, 
        3L, 34L, 35L, 36L, 22L, 39L, 43L, 46L, 2L, 36L, 3L, 40L, 
        42L, 48L, 3L, 29L, 34L, 37L, 2L, 31L, 33L, 34L, 37L, 43L, 
        2L, 19L, 21L, 28L), TNumStat = structure(c(16L, 11L, 2L, 
        5L, 9L, 3L, 10L, 10L, 10L, 21L, 19L, 13L, 20L, 12L, 12L, 
        24L, 1L, 6L, 14L, 18L, 4L, 7L, 8L, 15L, 15L, 17L, 22L, 23L, 
        25L), .Label = c("10,30952381", "11,54166667", "11,85654008", 
        "12", "12,0625", "13,19047619", "13,25", "13,85", "2,083333333", 
        "2,109704641", "2,166666667", "2,173913043", "2,260869565", 
        "2,380952381", "2,5", "2,791666667", "2,875536481", "3,35", 
        "3,496503497", "4,347826087", "4,685314685", "4,806866953", 
        "5,622317597", "7,428571429", "8,776824034"), class = "factor"), 
        TNumLK = structure(c(16L, 18L, 11L, 12L, 21L, 8L, 22L, 25L, 
        29L, 24L, 4L, 20L, 5L, 26L, 1L, 6L, 9L, 14L, 23L, 19L, 10L, 
        13L, 15L, 28L, 2L, 17L, 27L, 3L, 7L), .Label = c("104,3478", 
        "107,5", "118,0687", "125,8741", "173,913", "22,28571", "245,7511", 
        "260,8439", "298,9762", "372", "392,4167", "422,1875", "437,25", 
        "448,4762", "470,9", "5,583333", "5,751073", "6,5", "6,7", 
        "6,782609", "75", "82,27848", "88,09524", "9,370629", "90,7173", 
        "91,30435", "91,33047", "92,5", "97,04641"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -29L))
    

    【讨论】:

    • 如果您不介意,请考虑接受作为答案或投票;)
    • 我做了,但我可以按“向上”按钮? :/...所以我把东西变成了绿色!
    • 啊,那是因为你还没有15个声望。既然你现在有了它们,它现在应该可以工作了。 :)
    • 完成!感谢您的帮助!
    猜你喜欢
    • 2021-06-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-15
    • 2017-10-21
    • 2021-07-16
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多