【问题标题】:tidyr::spread tidyr::pivot_wider with multiple different values per keytidyr::spread tidyr::pivot_wider 每个键有多个不同的值
【发布时间】:2019-12-31 18:36:23
【问题描述】:

鉴于此数据:

+----+---------+----------+------+------+------+
| id |  type   |   name   | var1 | var2 | var3 |
+----+---------+----------+------+------+------+
| 10 | Country | Norway   |  169 | 14   |  164 |
| 10 | Sport   | Skii     |  169 | 14   |  164 |
| 10 | Format  | Video    |  169 | 14   |  164 |
| 11 | Country | Spain    |  150 | 16   |  178 |
| 11 | Format  | Photo    |  150 | 16   |  178 |
| 11 | Sport   | Bike     |  150 | 16   |  178 |
| 11 | Sport   | Soccer   |  150 | 16   |  178 |
| 11 | Sport   | Basket   |  150 | 16   |  178 |
| 12 | Country | USA      |    0 | 0    |    0 |
| 12 | Format  | Video    |    0 | NA   |    0 |
| 12 | Sport   | Baseball |    0 | 0    |    0 |
+----+---------+----------+------+------+------+

什么是最简单,最干净的传播方式如下:

+----+------+------+------+---------+--------+----------+---------+---------+
| id | var1 | var2 | var3 | Country | Format | Sport_1  | Sport_2 | Sport_3 |
+----+------+------+------+---------+--------+----------+---------+---------+
| 10 |  169 |   14 |  164 | Norway  | Video  | Skii     | NA      | NA      |
| 11 |  150 |   16 |  178 | Spain   | Photo  | Bike     | Soccer  | Basket  |
| 12 |    0 |    0 |    0 | USA     | Video  | Baseball | NA      | NA      |
+----+------+------+------+---------+--------+----------+---------+---------+

还要注意 id 12 的 NA。

我尝试过使用:

data2 <- data %>% pivot_wider(names_from = type, values_from = name)

但它给了我一个警告,说“名称”中的值不是唯一标识的,这对于 id 11 是正确的(类型 Sport 重复了 3 次)。

另外,我希望 id 12 中的 NA 也会产生问题,因为该函数不会组合在一起:

| 12 | Country | USA      |    0 | 0    |    0 |
| 12 | Sport   | Baseball |    0 | 0    |    0 |

还有这个:

| 12 | Format  | Video    |    0 | NA   |    0 |

因为 NA,尽管拥有相同的 id。

非常感谢任何帮助。非常感谢!

【问题讨论】:

  • 你能用dput显示数据吗

标签: r dplyr tidyr


【解决方案1】:

这是一种方法,借用@akrun的数据:

library(tidyr)
df1 %>%
  replace_na(list(var2=0)) %>%
  pivot_wider(names_from = "type", values_from = "name", values_fn = list(name=list)) %>%
  mutate_at(vars(Country, Format), unlist) %>%
  mutate_at("Sport", unclass) %>%
  unnest_wider(Sport, names_sep = "_", names_repair = ~sub("..." , "", ., fixed=TRUE))

# New names:
# * `` -> ...1
# New names:
# * `` -> ...1
# * `` -> ...2
# * `` -> ...3
# New names:
# * `` -> ...1
# # A tibble: 3 x 9
#     id  var1  var2  var3 Country Sport_1  Sport_2 Sport_3 Format
#   <int> <int> <dbl> <int> <chr>   <chr>    <chr>   <chr>   <chr> 
# 1    10   169    14   164 Norway  Skii     NA      NA      Video 
# 2    11   150    16   178 Spain   Bike     Soccer  Basket  Photo 
# 3    12     0     0     0 USA     Baseball NA      NA      Video 

【讨论】:

    【解决方案2】:

    我们可以通过filter从'type'中提取'Sport'元素然后在单独的spread数据集上执行join来做到这一点

    sportdf <- df1 %>% 
                filter(type == "Sport") %>%
                group_by(id) %>% 
                mutate(type = str_c(type, row_number())) %>%
                spread(type, name)
    formatCountrydf <- df1 %>% 
                        filter(type != "Sport")  %>%
                        mutate(var2 = replace_na(var2, 0)) %>%  
                        spread(type, name)
    inner_join(sportdf, formatCountrydf)
    # A tibble: 3 x 9
    # Groups:   id [3]
    #     id  var1  var2  var3 Sport1   Sport2 Sport3 Country Format
    #  <int> <int> <dbl> <int> <chr>    <chr>  <chr>  <chr>   <chr> 
    #1    10   169    14   164 Skii     <NA>   <NA>   Norway  Video 
    #2    11   150    16   178 Bike     Soccer Basket Spain   Photo 
    #3    12     0     0     0 Baseball <NA>   <NA>   USA     Video 
    

    数据

    df1 <- structure(list(id = c(10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 
    12L, 12L, 12L), type = c("Country", "Sport", "Format", "Country", 
    "Format", "Sport", "Sport", "Sport", "Country", "Format", "Sport"
    ), name = c("Norway", "Skii", "Video", "Spain", "Photo", "Bike", 
    "Soccer", "Basket", "USA", "Video", "Baseball"), var1 = c(169L, 
    169L, 169L, 150L, 150L, 150L, 150L, 150L, 0L, 0L, 0L), var2 = c(14L, 
    14L, 14L, 16L, 16L, 16L, 16L, 16L, 0L, NA, 0L), var3 = c(164L, 
    164L, 164L, 178L, 178L, 178L, 178L, 178L, 0L, 0L, 0L)),
    class = "data.frame", row.names = c(NA, 
    -11L))
    

    【讨论】:

      猜你喜欢
      • 2019-05-19
      • 2020-02-22
      • 2022-01-03
      • 1970-01-01
      • 1970-01-01
      • 2015-07-22
      • 2019-07-15
      • 2021-07-24
      • 2016-05-24
      相关资源
      最近更新 更多