【问题标题】:Using pivot_wider to get true or false [duplicate]使用 pivot_wider 得到真或假[重复]
【发布时间】:2020-12-06 18:46:34
【问题描述】:

我正在尝试使用 pivot_wider 来获得 1991 年至 1995 年间每个国家/地区的二进制结果,如下表所示:

+------+-------+--------+--------+
| year | USA   | Israel | Sweden |
| 1991 | FALSE | TRUE   | TRUE   |
| 1992 | FALSE | FALSE  | TRUE   |
| 1993 | FALSE | TRUE   | TRUE   |
| 1994 | FALSE | FALSE  | TRUE   |
| 1995 | TRUE  | TRUE   | TRUE   |
+------+-------+--------+--------+

当然,除了真/假之外,任何二进制指示都会很棒。

但是,我的数据框看起来像:

 country =  c("Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Israel", "Israel",
                   "Israel", "USA")  
    year = c(1991,1992,1993,1994,1995,1991,1993,1995,1995)
      df = as.data.frame(cbind(year,country))
    df


+---------+------+
| country | Year |
| Sweden  | 1991 |
| Sweden  | 1992 |
| Sweden  | 1993 |
| Sweden  | 1994 |
| Sweden  | 1995 |
| Israel  | 1991 |
| Israel  | 1993 |
| Israel  | 1995 |
| USA     | 1995 |
+---------+------+

我尝试了下面的代码,得到的结果不是我要找的结果

  library(dplyr)
    df2 =  df %>%
      group_by(country) %>%
      mutate(row = row_number()) %>%
      pivot_wider(names_from = country, values_from = year) %>%
      select(-row)
    df2

+------+--------+--------+
| USA  | Israel | Sweden |
| 1995 | 1991   | 1991   |
| NA   | 1993   | 1992   |
| NA   | 1995   | 1993   |
| NA   | NA     | 1994   |
| NA   | NA     | 1995   |
+------+--------+--------+

【问题讨论】:

    标签: r dplyr data-manipulation


    【解决方案1】:

    你可以试试这个:

    library(dplyr)
    library(tidyr)
    df %>% mutate(val=1) %>% pivot_wider(names_from = country,values_from = val) %>% 
      mutate(across(-year, ~replace_na(.x, 0))) %>%
      mutate(across(-year, ~ifelse(.x==1, TRUE,FALSE)))
    

    输出:

    # A tibble: 5 x 4
      year  Sweden Israel USA  
      <fct> <lgl>  <lgl>  <lgl>
    1 1991  TRUE   TRUE   FALSE
    2 1992  TRUE   FALSE  FALSE
    3 1993  TRUE   TRUE   FALSE
    4 1994  TRUE   FALSE  FALSE
    5 1995  TRUE   TRUE   TRUE 
    

    【讨论】:

    • 使用 dplyr 执行此操作的更好方法是在旋转之前添加一个“TRUE”列并使用 values_fill: df %>% mutate("TRUE" = TRUE) %>% pivot_wider(names_from = 国家,values_from = "TRUE",values_fill = FALSE)
    【解决方案2】:

    这里有一个data.table 解决方案

    library( data.table )
    #custom function, odetermins is the length of a vector >1 (TRUE/FALSE)
    cust_fun <- function(x) length(x) > 0
    #cast to wide, aggregating with the custom function above
    dcast( setDT(df), year ~ country, fun.aggregate = cust_fun )
    
    #    year Israel Sweden   USA
    # 1: 1991   TRUE   TRUE FALSE
    # 2: 1992  FALSE   TRUE FALSE
    # 3: 1993   TRUE   TRUE FALSE
    # 4: 1994  FALSE   TRUE FALSE
    # 5: 1995   TRUE   TRUE  TRUE
    

    【讨论】:

      猜你喜欢
      • 2014-02-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-12-05
      • 1970-01-01
      • 1970-01-01
      • 2018-11-19
      • 1970-01-01
      相关资源
      最近更新 更多