y 轴上的比例在 geom_col 上未正确缩放答案

【问题标题】：Scale on y-axis is not scaled correctly on geom_coly 轴上的比例在 geom_col 上未正确缩放
【发布时间】：2021-12-05 18:43:50
【问题描述】：

输出：

summary(EIS_V_Sub_Year$MN_Salary_2020)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  34183   53656   78712   92298  127715  197588

我的代码：

EIS_V_Sub_Year <- EIS_V%>%filter(Year==2020)%>%group_by(Education)


EIS_V_Sub_Year%>%
  ggplot(aes(x=Education, y=MN_Salary_2020, fill=Education, na.rm=TRUE))+geom_col()+
  theme(axis.text.x=element_text(angle=45, hjust=1))+ 
  labs(title = "Average Salary of Full-Time Workers", subtitle="Based on Highest Level of Education in 2020", caption=
         "Source: US Census Bureau",
    x= "Education",     
    y= "Mean Salary in Dollars (USD)")+
  scale_y_continuous(breaks = c(25000, 50000, 75000, 100000, 125000, 150000,175000,200000))

输出：

dput(EIS_V_Year)

structure(list(Year = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 
2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 
2020L, 2020L, 2020L), Gender = c("M", "F", "M", "F", "M", "F", 
"M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F"), 
    Education = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 6L, 
    6L, 5L, 5L, 7L, 7L, 8L, 8L, 9L, 9L), .Label = c("No Diploma", 
    "High School Diploma", "Some College", "Associate's Degree", 
    "Bachelor's Degree", "Bachelor's Degree or More", "Master's Degree", 
    "Professional Degree", "Doctorate's Degree"), class = c("ordered", 
    "factor")), MN_Salary_CD = c(44208, 34183, 58025, 43522, 
    68493, 50686, 71347, 52200, 120288, 83143, 103724, 74281, 
    132346, 84937, 197588, 138507, 173694, 130191), MN_Salary_2020 = c(44208, 
    34183, 58025, 43522, 68493, 50686, 71347, 52200, 120288, 
    83143, 103724, 74281, 132346, 84937, 197588, 138507, 173694, 
    130191), MD_Salary_CD = c(44208, 34183, 58025, 43522, 68493, 
    50686, 71347, 52200, 120288, 83143, 103724, 74281, 132346, 
    84937, 197588, 138507, 173694, 130191), MD_Salary_2020 = c(37413, 
    29741, 49661, 36256, 56267, 41400, 61100, 45813, 91515, 67371, 
    81339, 61341, 101130, 71584, 150509, 110717, 131268, 97471
    )), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L), groups = structure(list(Education = structure(1:9, .Label = c("No Diploma", 
"High School Diploma", "Some College", "Associate's Degree", 
"Bachelor's Degree", "Bachelor's Degree or More", "Master's Degree", 
"Professional Degree", "Doctorate's Degree"), class = c("ordered", 
"factor")), .rows = structure(list(1:2, 3:4, 5:6, 7:8, 11:12, 
    9:10, 13:14, 15:16, 17:18), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L), .drop = TRUE))

条形超出最大值 (197,588)。

不确定是什么原因造成的。我在使用比例尺的任何其他可视化中都没有遇到过这个问题。

【问题讨论】：

你能发布dput(EIS_V_Sub_Year)的输出吗？
是的，我刚刚在帖子中添加了它

标签： r r-markdown

【解决方案1】：

正如我在评论中所说，最好发布dput(EIS_V_Sub_Year) 的输出，这样我们就可以看到你的真实数据是什么样的。但是，请看一下这种行为：

df <- data.frame(x=letters[1:5],y=1:5)

ggplot(df,aes(x,y))+
  geom_col()

df2 <- rbind(df,df)

ggplot(df2,aes(x,y))+
  geom_col()

正如您在第二种情况下看到的那样，y 的比例增加了一倍，因为默认情况下，占据相同 x 位置的多个条将堆叠在一起，尽管 max 与 df 和df2

最大（df$y）最大值（df2$y）

你可以尝试添加position="dodge2 来左右躲避占据相同x位置的col

ggplot(df2,aes(x,y))+
  geom_col(position="dodge2")

编辑

你必须根据性别躲避geom_col。您在这里有 2 个选项： 1）使用fill=Gender，如果丢失viridis色标不成问题（这是无用的，因为变量是在x轴中定义的） 2）使用facet_warp(~Gender)，如果你想保持色阶

EIS_V_Sub_Year%>%
  ggplot(aes(
    x = Education,
    y = MN_Salary_2020,
    fill = Gender,
    na.rm = TRUE
  )) +
  geom_col(position="dodge") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(
    title = "Average Salary of Full-Time Workers",
    subtitle = "Based on Highest Level of Education in 2020",
    caption =
      "Source: US Census Bureau",
    x = "Education",
    y = "Mean Salary in Dollars (USD)"
  ) +
  scale_y_continuous(breaks = c(25000, 50000, 75000, 100000, 125000, 150000, 175000, 200000))

EIS_V_Sub_Year%>%
  ggplot(aes(
    x = Education,
    y = MN_Salary_2020,
    fill = Education,
    na.rm = TRUE
  )) +
  geom_col(position="dodge") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(
    title = "Average Salary of Full-Time Workers",
    subtitle = "Based on Highest Level of Education in 2020",
    caption =
      "Source: US Census Bureau",
    x = "Education",
    y = "Mean Salary in Dollars (USD)"
  ) +
  scale_y_continuous(breaks = c(25000, 50000, 75000, 100000, 125000, 150000, 175000, 200000))+
  facet_wrap(~Gender)

【讨论】：