在 R 中重命名多个列并使用 dplyr 进行收集答案

【问题标题】：Renaming multiple columns and gathering with dplyr in R在 R 中重命名多个列并使用 dplyr 进行收集
【发布时间】：2018-08-05 20:53:51
【问题描述】：

我正在尝试找到一种使用 tidyverse 重命名多个列的便捷方法。说我有一个小标题

df <- tibble(a = 1, b = 2, tmp_2000 = 23, tmp_2001 = 22.1, tmp_2002 = 25, pre_2000, pre_2001, pre_2002)

# A tibble: 1 x 8
  a     b tmp_2000 tmp_2001 tmp_2002 pre_2000 pre_2001 pre_2002
<dbl> <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
  1     2       23     22.1       25      100      103      189

temp 和 pre 代表温度和降水。我想把这张表重新组织成一个整洁的形式，即一列temperature，一列precipitations，每一行都是对应年份的值。

现在我发现的唯一选择就是做这样的事情

df <- df %>%
  select(-starts_with("pre"))

names(df)[3:5] <- substr(names(df)[3:5],5,8) 

df<-df %>%
  gather(`2000`:`2002`,key = "year",value="temp")  %>%
  mutate("year" = as.integer(year)) 

# A tibble: 3 x 4
  a     b  year  temp
<dbl> <dbl> <int> <dbl>
  1     2  2000  23  
  1     2  2001  22.1
  1     2  2002  25

这并不好，因为我需要对降水做同样的事情，然后加入两个表。将来我会得到更多的天气变量，这个过程很快就会变得很痛苦。

有没有人知道如何使用 tidyverse 更有效地做到这一点？

谢谢，

乔

PS：我看到的唯一类似的帖子提到了重新编码变量（使用 mutate_at），或者使用上面显示的names 重命名列。

【问题讨论】：

reshape(df,3:ncol(df),sep="_",dir="long")
Onyambu 这没用，我得到Warning messages: 1: Setting row names on a tibble is deprecated. 2: Setting row names on a tibble is deprecated. 3: Setting row names on a tibble is deprecated.
警告是因为你有一个tibble 仅此而已。即你可以做reshape(data.frame(df),3:ncol(df),idvar = 1:2,sep="_",dir="long") 然后将rownames 设置为NULL
好的，谢谢。比 tidyverse 更简洁，但可读性较差。
可读性差是什么意思？我想可能是因为reshape 这个函数对你来说是新的？？我不知道..你也许可以使用data.table::melt

标签： r dplyr

【解决方案1】：

你可以这样做：

library(tidyverse)
df %>%
    gather(measure, value, -a, -b) %>% 
    separate(measure, into = c("type", "year"), sep = "_") %>% 
    mutate(type = case_when(type == "tmp" ~ "temp", type == "pre" ~ "precip")) %>% 
    spread(type, value)
#       a     b year  precip  temp
# 1     1     2 2000     100  23  
# 2     1     2 2001     103  22.1
# 3     1     2 2002     189  25

我们首先以长格式收集所有数据，然后将年份与测量值分开，然后更改测量值的名称，最后将数据传播回宽格式。

【讨论】：

谢谢 AndS.，它完美地完成了这项工作，而且非常优雅。

【解决方案2】：

data.frame(df)%>%
   reshape(3:ncol(df),sep="_",dir="long")%>%
   `rownames<-`(NULL)
  a b time  tmp pre id
1 1 2 2000 23.0 100  1
2 1 2 2001 22.1 103  1
3 1 2 2002 25.0 189  1

【讨论】：

【解决方案3】：

df <- tibble(
  a = 1,
  b = 2,
  tmp_2000 = 23,
  tmp_2001 = 22.1,
  tmp_2002 = 25,
  pre_2000=100,
  pre_2001=103,
  pre_2002=189
)


df %>% 
  gather(key, value, -a:-b) %>% 
  separate(key, c("type", "year")) %>% 
  spread(type, value= value )

#> # A tibble: 3 x 5
#>       a     b year    pre   tmp
#>   <dbl> <dbl> <chr> <dbl> <dbl>
#> 1     1     2 2000    100  23  
#> 2     1     2 2001    103  22.1
#> 3     1     2 2002    189  25

```

【讨论】：