使用 dplyr 在 R 中对数据帧进行分组后完全连接答案

【问题标题】：Full-join after goruping dataframe in R using dplyr使用 dplyr 在 R 中对数据帧进行分组后完全连接
【发布时间】：2020-02-08 20:58:22
【问题描述】：

我有一个如下所示的数据框：

+--------+---------------+----+
| period |     label     | n  |
+--------+---------------+----+
|      4 | Engaged       |  2 |
|      4 | Remarkable    |  1 |
|      5 | Engaged       |  1 |
|      5 | Inconsistent  |  2 |
|      5 | Remarkable    |  5 |
|      6 | Engaged       |  1 |
|      6 | Inconsistent  |  1 |
|      6 | Remarkable    |  5 |
|      7 | Engaged       |  2 |
|      7 | Remarkable    |  3 |
|      7 | Transactional |  2 |
+--------+---------------+----+

而且我需要使标签 (Inconsistent, Transactional, Engaged, Remarkable) 的每个选项在每个时期都存在。如果不是在每个时期都使用每个标签，那么应该在那个时期插入它，n 的值等于 0。

我考虑过将数据框从长到宽旋转，然后用 0 填充缺失值，但有时可能在任何时期都看不到每个值。我还考虑过按时间段对数据框进行分组，然后对所有标签进行完全连接，但似乎在连接数据框时会忽略组。

我最终需要一个如下所示的数据框：

+--------+---------------+----+
| period |     label     | n  |
+--------+---------------+----+
|      4 | Inconsistent  |  0 |
|      4 | Transactional |  0 |
|      4 | Engaged       |  2 |
|      4 | Remarkable    |  1 |
|      5 | Inconsistent  |  2 |
|      5 | Transactional |  0 |
|      5 | Engaged       |  1 |
|      5 | Remarkable    |  5 |
|      6 | Inconsistent  |  1 |
|      6 | Transactional |  0 |
|      6 | Engaged       |  1 |
|      6 | Remarkable    |  5 |
|      7 | Inconsistent  |  0 |
|      7 | Transactional |  2 |
|      7 | Engaged       |  2 |
|      7 | Remarkable    |  3 |
+--------+---------------+----+

这是我正在使用的示例数据：

df <- as.data.frame(
  list(
    period = c(4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
                  7L, 7L, 7L),
    label = c(
      "Engaged",
      "Remarkable",
      "Engaged",
      "Inconsistent",
      "Remarkable",
      "Engaged",
      "Inconsistent",
      "Remarkable",
      "Engaged",
      "Remarkable",
      "Transactional"
    ),
    n = c(2L, 1L, 1L,
          2L, 5L, 1L, 1L, 5L, 2L, 3L, 1L)
  )
)

options <- as.data.frame(
  list(
    label = c(
      "Inconsistent",
      "Transactional",
      "Engaged",
      "Remarkable"
    ),
    n = c(0L, 0L, 0L, 0L)
  )
)

【问题讨论】：

标签： r dplyr

【解决方案1】：

我们可以按“句号”分组，然后根据“选项”数据集中的“标签”值complete“标签”

library(dplyr)
library(tidyr)
df %>% 
     group_by(period) %>%
     complete(label = options$label, fill = list(n = 0))

【讨论】：