为什么在绘图时排序的数据框会重新排列？答案

【问题标题】：Why does the sorted dataframe rearrange when plotting?为什么在绘图时排序的数据框会重新排列？
【发布时间】：2020-04-26 18:39:05
【问题描述】：

最初数据帧 df 被排序，我假设为一个字符串，但能够对字母数字向量进行排序：

df <- df[mixedorder(as.character(df$ID)),]

创建条形图时，（x 轴）顺序变回 1 10a 10b 11，即使我明确将顺序更改为 1 2 3 4 5

【问题讨论】：

在字符 1 然后 10 然后 11。因为它不是数字。如果您只是为每个分配一个数值，请在第一个 aes() 中的 x 值上使用 reorder() 函数。示例x = reorder(ID,numericID) 然后它会按照您要查找的方式排序

标签： r ggplot2 alphanumeric

【解决方案1】：

library(tidyverse)

df <- data.frame(ID=(c("1", "2", "3", "4", "5", "10a", "10b", "11")), 
                 y=c(seq(100,500,100), 150, 155, 180), stringsAsFactors = FALSE)

简单数据的简单修复

df$numId<-1:nrow(df)

ggplot(df, aes(x=reorder(ID,numId), y = y)) +
  geom_col() +
  labs(x='ID', y='Value')

结果

创建一个生成数值的函数

create_id<-function(x) {
  if(!grepl('[a-z]',x,ignore.case = TRUE)) {
    return(as.numeric(x))
  } else {
    letter<-tolower(gsub('[0-9]+',"",x))
    letter_value<-which(letters==letter)/100
    number<-as.numeric(gsub('[a-z]',"",x)) + letter_value
    return(number)
  }
}

df<-df %>%
  group_by(ID, y) %>%
  mutate(nid = round(create_id(ID),3))

ggplot(df, aes(x=reorder(ID,nid), y = y)) +
  geom_col() +
  labs(x='ID', y='Value')

结果

感谢@user12728748 的回答以及提供数据帧代码。我的回答只是为了满足问题中的 ggplot2 标签。上面的答案同样合适。

【讨论】：

【解决方案2】：

您似乎正在处理因子，或者在绘图时被强制转换为按字母顺序排序的因子的字符向量，因此如果它还不是因子，则将其转换为因子，并重新排序因子级别，而不是排序data.frame 按 ID：

df <- data.frame(a=factor(as.character(c(1, 2, 3, 10, 11, 20, 21, 22))))
df <- data.frame(ID=factor(c("1", "2", "3", "4", "5", "10a", "10b", "11")), 
                 y=c(seq(100,500,100), 150, 155, 180))
df <- df[order(df$ID), ]
df$ID
#> [1] 1   10a 10b 11  2   3   4   5  
#> Levels: 1 10a 10b 11 2 3 4 5
df <- df[gtools::mixedorder(as.character(df$ID)),]
df$ID
#> [1] 1   2   3   4   5   10a 10b 11 
#> Levels: 1 10a 10b 11 2 3 4 5
barplot(y~ID, data=df)

df$ID <- factor(df$ID, levels=levels(df$ID)[gtools::mixedorder(levels(df$ID))])
barplot(y~ID, data=df)

^{由reprex package (v0.3.0) 于 2020 年 4 月 26 日创建}

已编辑处理因素重新平衡以解决引入的错误。

【讨论】：

levels(df$ID) <- levels(df$ID)[gtools::mixedorder(levels(df$ID))] 这对 ID 向量进行了排序，但它完全重新排列了 Freq 列。
@Ajaff - 你是对的，为错误道歉。我已经在我的编辑中修复了代码。