【发布时间】:2021-08-19 02:06:42
【问题描述】:
我有数据要分箱并转换为因子。不过,我在理解因子变量的情况时遇到了一些麻烦。我正在尝试根据连续变量对因子变量进行排序。
我已经阅读了它,但是我看到的所有示例都只包含每个因子级别的一个实例,而我的示例包含多个因子级别的实例。
这是示例数据:
df <- structure(list(Group = c("Grp1", "Grp1", "Grp1", "Grp1", "Grp1",
"Grp1", "Grp1", "Grp2", "Grp2", "Grp2", "Grp2", "Grp2"), Ind = c("A",
"B", "C", "D", "E", "F", "G", "A", "B", "C", "D", "E"), Value = c(0.155903329567489,
0.0582906870761889, 0.180600101489814, 0.26357423622443, 0.0637832368895064,
0.213803701918138, 0.0640447068344333, 0.333501508730367, 0.160676738803951,
0.279178514111584, 0.145767023637501, 0.0808762147165962)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
根据这些数据,我创建了一个因子并检查了每个元素的顺序。
library(dplyr)
library(forcats)
df %>%
group_by(Group) %>%
mutate(Bin = cut_interval(Value, n = nrow(.))) %>%
mutate(Order = labels(Bin)) %>%
ungroup()
# A tibble: 12 x 5
Group Ind Value Bin Order
<chr> <chr> <dbl> <fct> <chr>
1 Grp1 A 0.156 (0.144,0.161] 1
2 Grp1 B 0.0583 [0.0583,0.0754] 2
3 Grp1 C 0.181 (0.178,0.195] 3
4 Grp1 D 0.264 (0.246,0.264] 4
5 Grp1 E 0.0638 [0.0583,0.0754] 5
6 Grp1 F 0.214 (0.212,0.229] 6
7 Grp1 G 0.0640 [0.0583,0.0754] 7
8 Grp2 A 0.334 (0.312,0.334] 1
9 Grp2 B 0.161 (0.144,0.165] 2
10 Grp2 C 0.279 (0.27,0.291] 3
11 Grp2 D 0.146 (0.144,0.165] 4
12 Grp2 E 0.0809 [0.0809,0.102] 5
然后在创建它后尝试根据“值”对因子重新排序,但顺序似乎没有改变。
df %>%
group_by(Group) %>%
mutate(Bin = cut_interval(Value, n = nrow(.)),
Bin = fct_reorder(Bin, Value)) %>%
mutate(Order = labels(Bin)) %>%
ungroup()
# A tibble: 12 x 5
Group Ind Value Bin Order
<chr> <chr> <dbl> <fct> <chr>
1 Grp1 A 0.156 (0.144,0.161] 1
2 Grp1 B 0.0583 [0.0583,0.0754] 2
3 Grp1 C 0.181 (0.178,0.195] 3
4 Grp1 D 0.264 (0.246,0.264] 4
5 Grp1 E 0.0638 [0.0583,0.0754] 5
6 Grp1 F 0.214 (0.212,0.229] 6
7 Grp1 G 0.0640 [0.0583,0.0754] 7
8 Grp2 A 0.334 (0.312,0.334] 1
9 Grp2 B 0.161 (0.144,0.165] 2
10 Grp2 C 0.279 (0.27,0.291] 3
11 Grp2 D 0.146 (0.144,0.165] 4
12 Grp2 E 0.0809 [0.0809,0.102] 5
然后我在创建因子之前将数据排列在“价值”上,并得到了正确的顺序。
df %>%
arrange(Group, Value) %>%
group_by(Group) %>%
mutate(Bin = cut_interval(Value, n = nrow(.))) %>%
mutate(Order = labels(Bin)) %>%
ungroup()
# A tibble: 12 x 5
Group Ind Value Bin Order
<chr> <chr> <dbl> <fct> <chr>
1 Grp1 B 0.0583 [0.0583,0.0754] 1
2 Grp1 E 0.0638 [0.0583,0.0754] 2
3 Grp1 G 0.0640 [0.0583,0.0754] 3
4 Grp1 A 0.156 (0.144,0.161] 4
5 Grp1 C 0.181 (0.178,0.195] 5
6 Grp1 F 0.214 (0.212,0.229] 6
7 Grp1 D 0.264 (0.246,0.264] 7
8 Grp2 E 0.0809 [0.0809,0.102] 1
9 Grp2 D 0.146 (0.144,0.165] 2
10 Grp2 B 0.161 (0.144,0.165] 3
11 Grp2 C 0.279 (0.27,0.291] 4
12 Grp2 A 0.334 (0.312,0.334] 5
那么首先,为什么fct_reorder 没有做我想做的事?其次,为什么“Grp1”中有 7 个值,“Grp2”中有 5 个值?由于每组中重复的“Bin”值,不应该分别只有 5 个和 4 个吗?
【问题讨论】: