【问题标题】:Special join of four columns into new two ones in R在 R 中将四列特殊连接成新的两列
【发布时间】:2021-07-10 20:14:07
【问题描述】:

我在 R 工作,遇到了一个有趣的问题。 我想转换下一个数据框:

DF = data.frame(ID = c(1, 2, 3),
              Person1 = c("Devin Davey", "Rui Butt", "Keon Dotson"),
              Sign = "artist",
              Person2 = c("Eli Greer", "Alvin Simons", "Leona Ford"),
              Sex = c("female", "male", "female"),
              Score = c(10, 20, 30)) 



  ID     Person1   Sign      Person2    Sex Score
1  1 Devin Davey artist    Eli Greer female    10
2  2    Rui Butt artist Alvin Simons   male    20
3  3 Keon Dotson artist   Leona Ford female    30

格式如下:

  ID         Name   Sign Score
1  1  Devin Davey artist    10
2  1    Eli Greer female    10
3  2     Rui Butt artist    20
4  2 Alvin Simons   male    20
5  3  Keon Dotson artist    30
6  3   Leona Ford female    30

也就是说,有一个特殊的将四列连接成两个新列的方法。

我有一个想法如下:

PART1 <- DF %>% 
            select(ID, Person1, Person2, Score) %>%
            gather(key, Name, -c(ID, Score), na.rm = TRUE) %>%
            select(-key) %>%
            arrange(ID) %>%
            mutate(temp_id = 1:n())

PART2 <- DF %>% 
            select(ID, Sign, Sex) %>%
            gather(key, Sign, -ID, na.rm = TRUE) %>%
            select(-key) %>%
            arrange(ID) %>%
            mutate(temp_id = 1:n())

PART1 %>%
        left_join(PART2, by = c("ID" = "ID", "temp_id" = "temp_id")) %>%
        select(-temp_id) %>%
        relocate(Score, .after = Sign)

但是在我看来这样的解决方案不是很漂亮,我认为这个问题可以用更好的方式解决。

因此,我将感谢您使用tidyverse 解决此问题的想法。

【问题讨论】:

    标签: r dplyr tidyverse tidyr data-manipulation


    【解决方案1】:

    我们可以将名称从“Sign”、“Sex”更改为通用名称“Sign”,并附加一个序列作为后缀以匹配 Person,然后使用 pivot_longer

    library(dplyr)
    library(tidyr)
    DF %>% 
       rename_at(vars(c('Sign', 'Sex')), ~ c('Sign1', 'Sign2')) %>% 
       pivot_longer(cols = -c(ID, Score), names_to = c(".value", "grp"), 
            names_sep = "(?<=[a-z])(?=\\d)") %>%
       select(ID, Name = Person, Sign, Score)
    

    -输出

    # A tibble: 6 x 4
    #     ID Name         Sign   Score
    #  <dbl> <chr>        <chr>  <dbl>
    #1     1 Devin Davey  artist    10
    #2     1 Eli Greer    female    10
    #3     2 Rui Butt     artist    20
    #4     2 Alvin Simons male      20
    #5     3 Keon Dotson  artist    30
    #6     3 Leona Ford   female    30
    

    【讨论】:

      【解决方案2】:

      在基础 R 中,您可以使用函数 reshape。由于这给出了不同的排序,我们将重新排序以获得如上所示的确切数据。虽然没有必要

      DF1<-reshape(DF, matrix(2:5, 2), dir="long")
      DF1[order(DF1$ID),c("ID", "Person1","Sign", "Score")]
      
          ID      Person1   Sign Score
      1.1  1  Devin Davey artist    10
      1.2  1    Eli Greer female    10
      2.1  2     Rui Butt artist    20
      2.2  2 Alvin Simons   male    20
      3.1  3  Keon Dotson artist    30
      3.2  3   Leona Ford female    30
      

      【讨论】:

        【解决方案3】:

        您可以显式选择列名并使用bind_rows

        library(tidyverse)
        bind_rows(DF %>% select(ID, Name = Person1, Sign = Sex, Score),
                  DF %>% select(ID, Name = Person2, Sign, Score)) %>% 
          arrange(ID)
        #>   ID         Name   Sign Score
        #> 1  1  Devin Davey female    10
        #> 2  1    Eli Greer artist    10
        #> 3  2     Rui Butt   male    20
        #> 4  2 Alvin Simons artist    20
        #> 5  3  Keon Dotson female    30
        #> 6  3   Leona Ford artist    30
        

        full_join

        library(tidyverse)
        DF %>% select(ID, Name = Person1, Sign = Sex, Score) %>% 
          full_join(DF %>% select(ID, Name = Person2, Sign, Score)) %>% 
          arrange(ID)
        #> Joining, by = c("ID", "Name", "Sign", "Score")
        #>   ID         Name   Sign Score
        #> 1  1  Devin Davey female    10
        #> 2  1    Eli Greer artist    10
        #> 3  2     Rui Butt   male    20
        #> 4  2 Alvin Simons artist    20
        #> 5  3  Keon Dotson female    30
        #> 6  3   Leona Ford artist    30
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2018-12-25
          • 1970-01-01
          • 2018-06-06
          • 2019-03-31
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2022-11-04
          相关资源
          最近更新 更多