【问题标题】:Expand data frame and add a new variable展开数据框并添加一个新变量
【发布时间】:2020-07-24 10:19:57
【问题描述】:

我有一个结构如下的数据框: +----------+------+--------+-------+ | Location | year | group1 | Value |
+----------+------+--------+-------+ | a | 2020 | 1 | x | | a | 2020 | 2 | y | | a | 2020 | 3 | z | | a | 2021 | 1 | x | | a | 2021 | 2 | y | | a | 2021 | 3 | z | | b | 2020 | 1 | x | | b | 2020 | 2 | y | | b | 2020 | 3 | z | +----------+------+--------+-------+
我想扩展数据框以包含每个位置、年份和 group1 组合的 3 行,并生成一个 group2 变量来标识这些新组合 (1-3)。理想情况下,数据框将如下所示: +----------+------+--------+-------+--------+ | Location | year | group1 | Value | group2 | +----------+------+--------+-------+--------+ | a | 2020 | 1 | x | 1 | | a | 2020 | 1 | x | 2 | | a | 2020 | 1 | x | 3 | | a | 2020 | 2 | y | 1 | | a | 2020 | 2 | y | 2 | | a | 2020 | 2 | y | 3 | | ... | ... |... |... |... | +----------+------+--------+-------+--------+

我能够使用以下代码将数据框扩展到正确的总行数:

df[rep(seq_len(nrow(df)),3), 1:4]

但不知道如何添加上面显示的 group2 变量。

【问题讨论】:

    标签: r


    【解决方案1】:

    使用tidyr,您可以使用expand - 这会将您的数据框扩展为您的1到3序列的所有值组合:

    library(tidyverse)
    
    df %>%
      group_by(Location, year, group1, Value) %>%
      expand(group2 = 1:3)
    

    输出

       Location  year group1 Value group2
       <fct>    <dbl>  <int> <fct>  <int>
     1 a         2020      1 x          1
     2 a         2020      1 x          2
     3 a         2020      1 x          3
     4 a         2020      2 y          1
     5 a         2020      2 y          2
     6 a         2020      2 y          3
     ...
    

    您的方法看起来很接近,我想您可以像这样添加group2

    cbind(df[rep(seq_len(nrow(df)), each = 3), ], group2 = 1:3)
    

    【讨论】:

      【解决方案2】:

      这是您正在寻找的解决方案

      library(dplyr)
      
      # 1. Data set
      df <- data.table(
        location = c("a","a","a","a","a","a","b","b","b"),
        year = c(2020,2020,2020,2021,2021,2021,2020,2020,2020),
        group1 = c(1,2,3,1,2,3,1,2,3),
        value = c("x","y","z","x","y","z","x","y","z"),
        stringsAsFactors = FALSE)
      
      # 2. Your code to expand data frame
      df <- df[rep(seq_len(nrow(df)), 3), 1:4]
      
      # 3. Arrange
      df <- df %>% arrange(location, year, group1, value)
      
      # 4. Add 'group2'
      df <- df %>% 
        group_by(location, year, group1, value) %>% 
        mutate(group2 = cumsum(group1) / group1) %>% 
        arrange(location, year, group1, value, group2)
      

      希望有效果

      【讨论】:

        【解决方案3】:

        我们可以从tidyr使用crossing

        library(tidyr)
        library(dplyr)
        crossing(df1, group2 = 1:3)
        # A tibble: 27 x 5
        #   Location  year group1 Value group2
        #   <chr>    <int>  <int> <chr>  <int>
        # 1 a         2020      1 x          1
        # 2 a         2020      1 x          2
        # 3 a         2020      1 x          3
        # 4 a         2020      2 y          1
        # 5 a         2020      2 y          2
        # 6 a         2020      2 y          3
        # 7 a         2020      3 z          1
        # 8 a         2020      3 z          2
        # 9 a         2020      3 z          3
        #10 a         2021      1 x          1
        # … with 17 more rows
        

        或者创建一个list 列然后unnest

        df1  %>%
               mutate(group2 = list(1:3)) %>% 
               unnest(c(group2))
        

        数据

        df1 <- structure(list(Location = c("a", "a", "a", "a", "a", "a", "b", 
        "b", "b"), year = c(2020L, 2020L, 2020L, 2021L, 2021L, 2021L, 
        2020L, 2020L, 2020L), group1 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
        2L, 3L), Value = c("x", "y", "z", "x", "y", "z", "x", "y", "z"
        )), class = "data.frame", row.names = c(NA, -9L))
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2018-08-04
          • 2022-10-05
          • 1970-01-01
          • 2019-10-29
          • 1970-01-01
          • 2020-08-19
          • 1970-01-01
          • 2020-07-13
          相关资源
          最近更新 更多