【问题标题】:Transforming data frame using column name使用列名转换数据框
【发布时间】:2017-01-05 15:16:16
【问题描述】:

这是从另一个问题开始的 Extracting from Nested list to data frame

使用更新后的答案,我得到了我将开始使用的数据框。

然后我使用df <- data.frame(start = df3[5,])

所以我只剩下:

dput(df)
structure(list(start.X1_1 = structure(4L, .Names = "experience.start", .Label = c("", 
" ", "1", "2015"), class = "factor"), start.X2_2 = structure(3L, .Names = "experience.start", .Label = c(" ", 
"1", "2011"), class = "factor"), start.X3_2 = structure(3L, .Names = "experience.start", .Label = c(" ", 
"1", "2007"), class = "factor"), start.X4_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ", 
"1"), class = "factor"), start.X5_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ", 
"1"), class = "factor"), start.X6_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ", 
"1"), class = "factor"), start.X7_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ", 
"1"), class = "factor"), start.X8_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ", 
"1"), class = "factor"), start.X9_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ", 
"1"), class = "factor"), start.X10_3 = structure(3L, .Names = "experience.start", .Label = c(" ", 
"1", "2016", "3000"), class = "factor"), start.X11_3 = structure(3L, .Names = "experience.start", .Label = c(" ", 
"1", "2015", "3000"), class = "factor"), start.X12_3 = structure(4L, .Names = "experience.start", .Label = c("", 
" ", "1", "2015", "2016", "EE"), class = "factor"), start.X13_3 = structure(4L, .Names = "experience.start", .Label = c("", 
" ", "1", "2014", "2015"), class = "factor"), start.X14_3 = structure(3L, .Names = "experience.start", .Label = c(" ", 
"1", "2013", "2014"), class = "factor"), start.X15_3 = structure(3L, .Names = "experience.start", .Label = c(" ", 
"1", "2010", "2011", "Virtusa"), class = "factor")), .Names = c("start.X1_1", 
"start.X2_2", "start.X3_2", "start.X4_2", "start.X5_2", "start.X6_2", 
"start.X7_2", "start.X8_2", "start.X9_2", "start.X10_3", "start.X11_3", 
"start.X12_3", "start.X13_3", "start.X14_3", "start.X15_3"), row.names = "experience.start", class = "data.frame")

现在我想了解格式:

  v1    v2  v3   v4   v5   v6   v7   v8
1 2015
2 2011 2007 null null null null null null
3 2016 2015 2015 2015 2013 2010

我可以使用以下内容来查找匹配的列

sR <- function(x, n){
    substr(x, nchar(x)-n+1, nchar(x))}

 sR(names(df),2)
 [1] "_1" "_2" "_2" "_2" "_2" "_2" "_2" "_2" "_2" "_3" "_3" "_3" "_3" "_3" "_3"

所以我认为从这里必须有一种方法可以达到我想要的输出。

或者我相信有人会告诉我更好的方法

【问题讨论】:

    标签: r dataframe transform


    【解决方案1】:

    主要思想是根据下划线后的后缀split你的数据框。这样你得到一个包含 3 个元素的列表,每个后缀 1 个(在你的情况下为 123

    df[] <- lapply(df[], as.character)
    l1 <- lapply(split(stack(df), as.numeric(sub('.*_', '', stack(df)[,2]))), '[', 1)
    lapply(l1, head, 2)
    
    #$`1`
    #  values
    #1   2015
    
    #$`2`
    #  values
    #2   2011
    #3   2007
    
    #$`3`
    #   values
    #10   2016
    #11   2015
    

    现在我们需要做的就是cbind 这三个元素放在一起,这有点棘手,因为它们的长度不同。幸运的是,这里有很好的答案,我们可以使用(见下面的免责声明)来解决这个问题。

    t(do.call(cbindPad, l1))
    
    #       1      2      3      4      5      6      7  8 
    #values "2015" NA     NA     NA     NA     NA     NA NA
    #values "2011" "2007" NA     NA     NA     NA     NA NA
    #values "2016" "2015" "2015" "2014" "2013" "2010" NA NA
    

    免责声明

    函数cbindPad 取自@Joran 在this post 中的回答

    另外,plyr 包中的函数rbind.fill 可以在转置后使用,以给出一种cbind.fill 结果。

    plyr::rbind.fill(lapply(l1, function(i) as.data.frame(t(i))))
    
    #     1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
    #1 2015 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
    #2 <NA> 2011 2007 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
    #3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2016 2015 2015 2014 2013 2010
    

    【讨论】:

    • 我无法执行第一行:stack.data.frame(df) 中的错误:未选择向量列
    • 哦,你必须转换成字符。我会添加它
    • 没有问题。它只是让我想起了那个功能。反正加一个
    • 只需在会话中输入?[``
    • 请注意,我必须将 as.numeric 放在子部分周围,否则 100 会出现在 1 之后,并且所有编号顺序都会被丢弃
    猜你喜欢
    • 2014-08-18
    • 2015-04-30
    • 2017-10-17
    • 2017-07-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多