键排序与使用gather() 对原始列进行排序答案

【问题标题】：Key ordering vs. ordering of original columns with gather()键排序与使用gather() 对原始列进行排序
【发布时间】：2016-04-13 08:20:06
【问题描述】：

键排序是否取决于我是否首先列出要收集的列与那些不收集的列？

这是我的数据框：

library(tidyr)
wide_df <- data.frame(c("a", "b"), c("oh", "ah"), c("bla", "ble"), stringsAsFactors = FALSE)
colnames(wide_df) <- c("first", "second", "third")
wide_df

 first second third
1     a     oh   bla
2     b     ah   ble

首先，我按特定顺序收集所有列，并且我的顺序在键列表中被尊重为 second, first，尽管这些列实际上是按 first, second 排序的>：

long_01_df <- gather(wide_df, my_key, my_value, second, first, third)
long_01_df

  my_key my_value
1 second       oh
2 second       ah
3  first        a
4  first        b
5  third      bla
6  third      ble

然后我决定从收集中排除一列：

long_02_df <- gather(wide_df, my_key, my_value, second, first, -third)
long_02_df

 third my_key my_value
1   bla second       oh
2   ble second       ah
3   bla  first        a
4   ble  first        b

键再次按第二，第一排序。然后我像这样编码，相信做同样的事情：

long_03_df <- gather(wide_df, my_key, my_value, -third, second, first)
long_03_df

我得到了根据原始data.frame中的真实列顺序排序的键：

 third my_key my_value
1   bla  first        a
2   ble  first        b
3   bla second       oh
4   ble second       ah

当我用factor_key = TRUE 调用函数时，这种行为甚至没有改变。我错过了什么？

【问题讨论】：

有趣。似乎排除项应该是尾巴。也适用于dplyr::select(iris[, 1:3], -Sepal.Length, Petal.Length, Sepal.Width)。

标签： r dataframe tidyr

【解决方案1】：

总结

这样做的原因是您不能混合使用负索引和正索引。（您也不应该这样做：这根本没有意义。）如果您这样做，gather() 将忽略一些索引。

详细解答

同样对于标准索引，您不能混合正负索引：

x <- 1:10
x[c(4, -2)]
## Error in x[c(4, -2)] : only 0's may be mixed with negative subscripts

这是有道理的：使用4 进行索引告诉R 只保留第四个元素。不需要明确告诉它另外丢弃第二个元素。

根据gather() 的文档，选择列的工作方式与 dplyr 的select() 相同。所以让我们一起玩吧。我将使用mtcars 的一个子集：

mtcars <- mtcars[1:2, 1:5]
mtcars
##                mpg cyl disp  hp drat
## Mazda RX4     21.0   6  160 110 3.90
## Mazda RX4 Wag 21.0   6  160 110 3.90

您可以对select() 使用正负索引：

select(mtcars, mpg, cyl)
##              mpg cyl
## Mazda RX4      21   6
## Mazda RX4 Wag  21   6

select(mtcars, -mpg, -cyl)
##               disp  hp drat
## Mazda RX4      160 110  3.9
## Mazda RX4 Wag  160 110  3.9

同样对于select()，混合正负索引是没有意义的。但select() 似乎忽略了所有与第一个符号不同的索引，而不是抛出错误：

select(mtcars, mpg, -hp, cyl)
##               mpg cyl
## Mazda RX4      21   6
## Mazda RX4 Wag  21   6

select(mtcars, -mpg, hp, -cyl)
##               disp  hp drat
## Mazda RX4      160 110  3.9
## Mazda RX4 Wag  160 110  3.9

如你所见，结果和之前完全一样。

对于gather() 的示例，您使用以下两行：

long_02_df <- gather(wide_df, my_key, my_value, second, first, -third)
long_03_df <- gather(wide_df, my_key, my_value, -third, second, first)

根据我上面显示的内容，这些行与以下内容相同：

long_02_df <- gather(wide_df, my_key, my_value, second, first)
long_03_df <- gather(wide_df, my_key, my_value, -third)

请注意，第二行中没有任何内容表明您首选的键顺序。它只是说应该省略third。

【讨论】：