【问题标题】:Transforming columns of unequal arrays to column of single values in R将不相等数组的列转换为R中的单个值列
【发布时间】:2021-02-04 16:40:47
【问题描述】:

作为previous question 之后的下一步,假设有多个长度不同的数组列。例如:

Col_A Col_B Col_C
[0.1,0.5,0.7] [1.54E12, 1.54E12, 1.54E12] [1, 3, 4, 5}

我怎样才能采用这种格式并将其重新格式化为以下在适当的情况下为 Col_A 和 Col_b 提供 NA:

Col_A Col_B Col_C
0.1 1.54E12 1
0.5 1.54E12 3
0.7 1.54E12 4
NA NA 5

此代码适用于所有数组都相等但如果数组不相等则会抛出错误:

library(dplyr)
library(stringr)
library(tidyr)
df  %>% 
   mutate(across(everything(), str_extract_all, "(?<=\\[)[^]]+")) %>% 
   unnest(c(NDVIs, dates)) %>% 
   separate_rows(c(NDVIs, dates), sep=",\\s+", convert = TRUE)

【问题讨论】:

    标签: r tidyr


    【解决方案1】:

    我们可以使用来自splitstackshapecSplit

    library(splitstackshape)
    library(data.table)
    cSplit(setDT(df)[, lapply(.SD, gsub, pattern = "[][}]", 
        replacement = "")], names(df), sep=",", fixed = FALSE, "long")
    #   Col_A    Col_B Col_C
    #1:   0.1 1.54e+12     1
    #2:   0.5 1.54e+12     3
    #3:   0.7 1.54e+12     4
    #4:    NA       NA     5
    

    数据

    df <- structure(list(Col_A = "[0.1,0.5,0.7]", Col_B = "[1.54E12, 1.54E12, 1.54E12]", 
        Col_C = "[1, 3, 4, 5}"), class = "data.frame", row.names = c(NA, 
    -1L))
    

    【讨论】:

      【解决方案2】:

      这里tidyverse 的经验不足,所以这是我使用data.table 的解决方案。我在步骤和结果之间包含了所有内容,以显示发生了什么......

      library( data.table )
      #create sample data
      DT <- fread("Col_A  Col_B   Col_C
      [0.1,0.5,0.7]   [1.54E12, 1.54E12, 1.54E12]     [1, 3, 4, 5]")
      #            Col_A                       Col_B        Col_C
      # 1: [0.1,0.5,0.7] [1.54E12, 1.54E12, 1.54E12] [1, 3, 4, 5]
      
      #melt to long format
      ans <- melt( DT, measure.vars = names(DT), variable.factor = FALSE )
      #    variable                       value
      # 1:    Col_A               [0.1,0.5,0.7]
      # 2:    Col_B [1.54E12, 1.54E12, 1.54E12]
      # 3:    Col_C                [1, 3, 4, 5]
      
      #remove [] and split the value column using ', ' as sepatator
      ans[, value := gsub( "\\[|\\]", "", value ) ]
      ans[, paste0( "v", 1:length( tstrsplit(ans$value, "," ) ) ) := 
            lapply( tstrsplit(value, "," ), as.numeric ) ][]
      #    variable                     value       v1       v2       v3 v4
      # 1:    Col_A               0.1,0.5,0.7 1.00e-01 5.00e-01 7.00e-01 NA
      # 2:    Col_B 1.54E12, 1.54E12, 1.54E12 1.54e+12 1.54e+12 1.54e+12 NA
      # 3:    Col_C                1, 3, 4, 5 1.00e+00 3.00e+00 4.00e+00  5
      
      #transpose (without value-columns) to get wide format again
      transpose( ans[, -"value"], make.names = "variable" )
      #    Col_A    Col_B Col_C
      # 1:   0.1 1.54e+12     1
      # 2:   0.5 1.54e+12     3
      # 3:   0.7 1.54e+12     4
      # 4:    NA       NA     5
      

      【讨论】:

      • 这对我有用,直到转置位,我认为我做错了什么,正要使用 pivot_longer 函数重新格式化它,但第二个答案效果很好。感谢您的宝贵时间!
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-09-30
      • 2016-06-27
      • 2020-01-08
      • 2020-10-15
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多