【问题标题】:Doing a ranged lookup with multiple variables in a matrix in R在R中的矩阵中使用多个变量进行范围查找
【发布时间】:2018-07-22 23:07:22
【问题描述】:

我觉得我遇到了一个复杂的问题(或者至少对我来说是这样!)。

我有一张价格表,需要从 csv 中读取,看起来就像这样:

V1 <- c("","Destination","Spain","Spain","Spain","Portugal","Portugal","Portugal","Italy","Italy","Italy")
V2 <- c("","Min_Duration",rep(c(1,3,6),3))
V3 <- c("","Max_Duration",rep(c(2,5,10),3))
V4 <- c("Full-board","Level_1",runif(9,100,200))
V5 <- c("Full-board","Level_2",runif(9,201,500))
V6 <- c("Full-board","Level_3",runif(9,501,1000))
V7 <- c("Half-board","Level_1",runif(9,100,200))
V8 <- c("Half-board","Level_2",runif(9,201,500))
V9 <- c("Half-board","Level_3",runif(9,501,1000))
Lookup_matrix <- as.data.frame(cbind(V1,V2,V3,V4,V5,V6,V7,V8))

上表中的价格当然会有点奇怪,因为它们是完全随机的——但我们可以忽略...

我也有这样一张桌子:

Destination <- c("Spain", "Italy", "Portugal")
Duration <- c(2,4,8)
Level <- c(1,3,3)
Board <- c("Half-board","Half-board","Full-board")
Price <- "Empty"
Price_matrix <- as.data.frame(cbind(Destination,Duration,Level,Board,Price))

我的问题是 - 如何使用查找矩阵中可以找到的相应价格填充价格矩阵的“价格”列?请注意,价格矩阵的持续时间变量必须适合查找矩阵中“Min_Duration”和“Max_Duration”列之间的范围。

在 Excel 中,我会使用 Index,Match 公式。但我被 R 难住了。

提前致谢, 丹

【问题讨论】:

    标签: r lookup


    【解决方案1】:

    这是tidyverse 的可能性

    首先,请注意我重命名了您的输入对象; Price_matrixLookup_matrix 都是 data.frames(不是矩阵)。

    df1 <- Price_matrix
    df2 <- Lookup_matrix
    

    接下来我们需要修复df2 = Lookup_matrix的列名。

    # Fix column names
    colnames(df2) <- gsub("^_", "", apply(df2[1:2, ], 2, paste0, collapse = "_"))
    df2 <- df2[-(1:2), ]
    

    我们现在基本上做df1df2的左连接;为了使df2 具有合适的格式,我们将数据从宽向长传播,为每个BoardLevel 提取Price 值,并将条目从Min_Duration 扩展到Max_Duration。然后我们通过DestinationDurationLevelBoard 加入。

    请注意,在您的示例中,Destination = ItalyLookup_matrix 中没有Level = 3 条目;因此,我们为此条目获得Price = NA

    library(tidyverse)
    left_join(
        df1 %>%
            mutate_if(is.factor, as.character) %>%
            select(-Price),
        df2 %>%
            mutate_if(is.factor, as.character) %>%
            gather(key, Price, -Destination, -Min_Duration, -Max_Duration) %>%
            separate(key, into = c("Board", "Level"), sep = "_", extra = "merge") %>%
            mutate(Level = sub("Level_", "", Level)) %>%
            rowwise() %>%
            mutate(Duration = list(seq(as.numeric(Min_Duration), as.numeric(Max_Duration)))) %>%
            unnest() %>%
            select(-Min_Duration, -Max_Duration) %>%
            mutate(Duration = as.character(Duration)))
    #Joining, by = c("Destination", "Duration", "Level", "Board")
    #  Destination Duration Level      Board            Price
    #1       Spain        2     1 Half-board 119.010942545719
    #2       Italy        4     3 Half-board             <NA>
    #3    Portugal        8     3 Full-board 764.536124917446
    

    【讨论】:

      【解决方案2】:

      使用数据表:

      library(data.table)
      
      nms = trimws(do.call(paste, transpose(Lookup_matrix[1:2, ])))# column names
      
      cat(do.call(paste, c(collapse="\n", Lookup_matrix[-(1:2), ])), file = "mm.csv") 
        # Rewrite the data in the correct format. You do not have to.
        # Just doing Lookup_matrix1 = setNames(Lookup_matrix[-(1:2),],nms) is enough 
        # but it will not have rectified the column classes. 
      
      Lookup_matrix1 = fread("mm.csv", col.names = nms)  
      
      melt(Lookup_matrix1, 1:3)[,
              c("Board", "Level") := .(sub("[.]", "-", sub("\\.Leve.*", "", variable)), sub("\\D+", "", variable))][
              Price_matrix[, -5], on=c("Destination", "Board", "Level", "Min_Duration <= Duration", "Max_Duration >= Duration")]
      
        Destination Min_Duration Max_Duration           variable    value      Board Level
      1:       Spain            2            2 Half.board.Level_1 105.2304 Half-board     1
      2:       Italy            4            4               <NA>       NA Half-board     3
      3:    Portugal            8            8 Full.board.Level_3 536.5132 Full-board     3
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-12-27
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-11-22
        相关资源
        最近更新 更多