【发布时间】:2020-02-08 20:58:22
【问题描述】:
我有一个如下所示的数据框:
+--------+---------------+----+
| period | label | n |
+--------+---------------+----+
| 4 | Engaged | 2 |
| 4 | Remarkable | 1 |
| 5 | Engaged | 1 |
| 5 | Inconsistent | 2 |
| 5 | Remarkable | 5 |
| 6 | Engaged | 1 |
| 6 | Inconsistent | 1 |
| 6 | Remarkable | 5 |
| 7 | Engaged | 2 |
| 7 | Remarkable | 3 |
| 7 | Transactional | 2 |
+--------+---------------+----+
而且我需要使标签 (Inconsistent, Transactional, Engaged, Remarkable) 的每个选项在每个时期都存在。如果不是在每个时期都使用每个标签,那么应该在那个时期插入它,n 的值等于 0。
我考虑过将数据框从长到宽旋转,然后用 0 填充缺失值,但有时可能在任何时期都看不到每个值。我还考虑过按时间段对数据框进行分组,然后对所有标签进行完全连接,但似乎在连接数据框时会忽略组。
我最终需要一个如下所示的数据框:
+--------+---------------+----+
| period | label | n |
+--------+---------------+----+
| 4 | Inconsistent | 0 |
| 4 | Transactional | 0 |
| 4 | Engaged | 2 |
| 4 | Remarkable | 1 |
| 5 | Inconsistent | 2 |
| 5 | Transactional | 0 |
| 5 | Engaged | 1 |
| 5 | Remarkable | 5 |
| 6 | Inconsistent | 1 |
| 6 | Transactional | 0 |
| 6 | Engaged | 1 |
| 6 | Remarkable | 5 |
| 7 | Inconsistent | 0 |
| 7 | Transactional | 2 |
| 7 | Engaged | 2 |
| 7 | Remarkable | 3 |
+--------+---------------+----+
这是我正在使用的示例数据:
df <- as.data.frame(
list(
period = c(4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
7L, 7L, 7L),
label = c(
"Engaged",
"Remarkable",
"Engaged",
"Inconsistent",
"Remarkable",
"Engaged",
"Inconsistent",
"Remarkable",
"Engaged",
"Remarkable",
"Transactional"
),
n = c(2L, 1L, 1L,
2L, 5L, 1L, 1L, 5L, 2L, 3L, 1L)
)
)
options <- as.data.frame(
list(
label = c(
"Inconsistent",
"Transactional",
"Engaged",
"Remarkable"
),
n = c(0L, 0L, 0L, 0L)
)
)
【问题讨论】: