重塑 R 数据框，使一列的值现在是它们自己的列并按其他列分组答案

【问题标题】：Reshaping R dataframe so values of one column are now their own column and grouping by other columns重塑 R 数据框，使一列的值现在是它们自己的列并按其他列分组
【发布时间】：2021-10-03 16:34:13
【问题描述】：

有人提出过类似的问题，但程度不高。我有一个包含如下信息的数据框

location    field    sample    date          height    temp   
loc1        fieldA   1_1       202001        1         86     
loc1        fieldA   1_1       202001        10        92     
loc1        fieldA   2_1       202001        1         88
loc1        fieldA   2_1       202001        10        82
loc1        filedA   1_2       202002        1         81
loc1        fieldA   1_2       202002        10        90
loc1        filedA   2_2       202002        1         88
loc1        filedA   2_2       202002        10        82

每个位置都有几个字段，每个字段有两个测量位置，每个位置都有两个测量高度。例如，在 location1 字段中，样本 1_1 指的是第一个位置和第一个样本，并且为此有两个高度，并且是在某个日期拍摄的。然后是 location1, fieldA sample 1_2，它指的是第一个位置，但在第二个日期是第二个样本。这在 locB 和更多字段名称中继续存在，但这是基本思想

理想情况下我需要以下内容

location    field   1_1_temp  1_10_temp  2_1_temp    2_10_temp     date
loc1        fieldA  86        92         88          82            202001
loc1        fieldA  81        90         88          82            202002

对于每个位置和每个字段，我都需要数据的时间序列。 location1 fieldA 将有一个时间序列，location1 fieldB 将有一个时间序列，location2 fieldAA 将有一个，依此类推。其中 1_1_temp 将是第一个位置和高度 1，1_10_temp 将是第一个高度 10 的位置，依此类推。我确定我需要 dplyr 和 tidy 但不确定如何做到这一点。类似的东西

df <- group_by(location) %>%
       group_by(field) %>%
      mutate()

非常感谢任何帮助。谢谢！

【问题讨论】：

请提供具有足够多样性的可重现数据以重现问题（例如，location 和 field 的多个条目）。您可以使用 dput 或仅定义一个示例 tibble inline。
在field列中，不应该一直是fieldA吗？我的意思是，在你的例子中，filedA 是一个错字。
@IceCreamToucan 我认为新的列名（sample_1_1_temp 和 sample_2_10_temp）可以分解为 4 个组件，全部由 "_" 分隔。 (1) 静态前缀"sample"。 (2) 动态的sample index——就像dplyr::group_by(sample)生成的cur_group_id()——分别是1和2对于samples "1_1" 和 "2_1"；对于samples "1_2" 和"2_2"，此索引将继续通过3 和4。 (3) height 值，分别为 1 和 10。 (4) 静态后缀"temp"。
值得注意的是，这种特定的命名约定是不可取的，因为结果的结构——列的数量和分类——即使在范围和排序方面有微小的变化也会有很大的不同。 sample 或 height 内的数据。
我不得不快速写下这篇文章，我为不够清晰深表歉意，我做了一个快速编辑，希望能有所帮助。每个样本在每个位置和字段的相同位置和高度采集。

标签： r dataframe dplyr reshape tidyr

【解决方案1】：

假设filedA 是一个错误，会回答你的问题的下一个代码吗？

library(dplyr)
library(tidyr)

df <- read.table(text = 'location    field    sample    date          height    temp   
loc1        fieldA   1_1       202001        1         86     
loc1        fieldA   1_1       202001        10        92     
loc1        fieldA   2_1       202001        1         88
loc1        fieldA   2_1       202001        10        82
loc1        fieldA   1_2       202002        1         81
loc1        fieldA   1_2       202002        10        90
loc1        fieldA   2_2       202002        1         88
loc1        fieldA   2_2       202002        10        82', header = TRUE)

df %>% 
    mutate(sample = sub("(\\d)_\\d","\\1",sample)) %>% 
    pivot_wider(id_cols = c(location, field, date, sample), names_from = c(height), values_from = temp, names_prefix = "sample")
# A tibble: 4 × 6
  location field    date sample sample1 sample10
  <chr>    <chr>   <int> <chr>    <int>    <int>
1 loc1     fieldA 202001 1           86       92
2 loc1     fieldA 202001 2           88       82
3 loc1     fieldA 202002 1           81       90
4 loc1     fieldA 202002 2           88       82

问题更新后更新：

df %>% 
    mutate(sample = sub("(\\d)_\\d","\\1",sample)) %>% 
    pivot_wider(id_cols = c(location, field, date), names_from = c(sample, height), values_from = temp, names_prefix = "sample") %>% 
    mutate(date = lubridate::ym(as.character(date)))
# A tibble: 2 × 7
  location field  date       sample1_1 sample1_10 sample2_1 sample2_10
  <chr>    <chr>  <date>         <int>      <int>     <int>      <int>
1 loc1     fieldA 2020-01-01        86         92        88         82
2 loc1     fieldA 2020-02-01        81         90        88         82

【讨论】：

这似乎非常接近我的需要。对于造成的混乱，我深表歉意，因为我必须快速写下这篇文章，并且没有像预期的那样彻底审查。我做了一个快速编辑，希望能提供一些清晰度。使用 pivot_wider 可能是我所需要的
@user2113499 我根据您想要的输出更新了我的答案
谢谢！这是一个很好的解决方案。我现在对 dplyr 和 tidy 只是稍微熟悉，并且不知道 pivot_wider 函数。很高兴了解那个！