【问题标题】:Replicate Excel plot with ggplot2 in R [duplicate]在R中使用ggplot2复制Excel图[重复]
【发布时间】:2021-09-10 22:44:49
【问题描述】:

这是一个随机的问题,但我目前正在绘制一些数据并获得一个基本概念,我在 excel 中做了这个以便快速查看。

现在我已经设法从 here 获得了一些不错的数据 - 在 Excel 中,我使用数据透视表创建了这个图,该数据求和男性人数

但是当我尝试在 R 中使用 ggplot2 重新创建它时,我被卡住了。

认为我只是无法弄清楚 Excel 如何求和 MaleCount 并且无法在 R 中复制 - 这是我在 R 中得到的情节没有求和

这是用于创建的代码:

ggplot(data = df, 
       aes(x = df$AgeBand, # Axis (Categories)
           y = df$MaleCount)) + # This should be summed somehow.
  geom_line(aes(colour = factor(HealthBoard))) + # Legend
  ggtitle("I have no idea")

如果您还有其他问题,请告诉我。
提前致谢,马克。

编辑:在下面添加结构。

tibble [50 x 11] (S3: tbl_df/tbl/data.frame)
 $ Period            : num [1:50] 202004 202004 202004 202004 202004 ...
 $ PracticeCode      : chr [1:50] "W96016" "W95001" "W93021" "W91054" ...
 $ PostCode          : chr [1:50] "NP8 1AG" "CF44 7DD" "NP16 5XR" "LL12 7TH" ...
 $ OrgCode           : chr [1:50] "7A7" "7A5" "7A6" "7A1" ...
 $ AgeBand           : num [1:50] 8 24 11 14 68 24 4 56 85 17 ...
 $ MaleCount         : num [1:50] 37 94 49 41 28 53 16 20 4 40 ...
 $ FemaleCount       : num [1:50] 41 98 41 31 28 64 20 14 7 50 ...
 $ IndeterminateCount: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
 $ Count             : num [1:50] 78 192 90 72 56 117 36 34 11 90 ...
 $ Year              : num [1:50] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
 $ Month             : chr [1:50] "April" "April" "April" "April" ...

在下面添加 dput。

structure(list(Period = c(202004, 202004, 202004, 202004, 202004, 
202004, 202004, 202004, 202004, 202004, 202004, 202004, 202004, 
202004, 202004, 202004, 202004, 202004, 202004, 202004, 202004, 
202004, 202004, 202004, 202004, 202004, 202004, 202004, 202004, 
202004, 202004, 202004, 202004, 202004, 202004, 202004, 202004, 
202004, 202004, 202004, 202004, 202004, 202004, 202004, 202004, 
202004, 202004, 202004, 202004, 202004), PracticeCode = c("W95023", 
"W95086", "W91015", "W93045", "W93125", "W97623", "W95073", "W95042", 
"W94017", "W97025", "W95016", "W92048", "W98033", "W94018", "W93116", 
"W93059", "W94035", "W93046", "W92058", "W97016", "W94021", "W98048", 
"W94026", "W97069", "W98012", "W92052", "W93072", "W91044", "W96015", 
"W97060", "W97008", "W94609", "W91038", "W97010", "W92023", "W97067", 
"W93049", "W97028", "W91058", "W97048", "W92023", "W93061", "W91610", 
"W94007", "W95034", "W95024", "W93075", "W95032", "W95087", "W93029"
), PostCode = c("CF48 1BZ", "CF48 3AL", "CH5 3PA", "NP20 6EY", 
"NP18 2JB", "CF5 5LQ", "CF83 3JZ", "CF45 4YB", "LL55 4SU", "CF14 3NB", 
"CF44 6HY", "SA14 8TU", "SA3 5UA", "LL30 3EU", "NP10 8UX", "NP11 6BJ", 
"LL23 7BA", "NP20 4JS", "SA62 6SS", "CF11 9SH", "LL52 0RR", "SA10 6UF", 
"LL65 1RA", "CF3 0SH", "SA4 3ED", "SA15 3BD", "NP25 3PL", "CH7 4RQ", 
"SY16 1EF", "CF24 1AG", "CF23 9PN", "LL54 6NN", "LL22 8LJ", "CF23 8SQ", 
"SA34 0AJ", "CF11 9DG", "NP19 7DQ", "CF14 1LT", "LL13 8RG", "CF24 2HB", 
"SA34 0AJ", "NP10 9DU", "LL12 9LG", "LL36 9HL", "CF33 4LD", "CF37 2DR", 
"NP13 1BQ", "CF46 5HE", "CF44 7AY", "NP44 4TA"), OrgCode = c("7A5", 
"7A5", "7A1", "7A6", "7A6", "7A4", "7A6", "7A5", "7A1", "7A4", 
"7A5", "7A2", "7A3", "7A1", "7A6", "7A6", "7A1", "7A6", "7A2", 
"7A4", "7A1", "7A3", "7A1", "7A4", "7A3", "7A2", "7A6", "7A1", 
"7A7", "7A4", "7A4", "7A1", "7A1", "7A4", "7A2", "7A4", "7A6", 
"7A4", "7A1", "7A4", "7A2", "7A6", "7A1", "7A1", "7A5", "7A5", 
"7A6", "7A5", "7A5", "7A6"), AgeBand = c(87, 31, 44, 53, 23, 
91, 24, 12, 93, 83, 26, 38, 92, 47, NA, 23, 27, 80, 93, 2, 46, 
82, 11, 45, 72, 18, 26, 54, 89, 71, 30, 27, 18, 37, 50, 4, 8, 
51, 59, 8, 4, 64, 92, 13, 88, 85, 78, 56, 45, 44), MaleCount = c(12, 
153, 52, 59, 16, 0, 10, 39, 1, 9, 33, 33, 13, 44, 3, 37, 31, 
15, 0, 17, 18, 8, 39, 24, 143, 84, 24, 23, 6, 30, 129, 21, 61, 
72, 55, 23, 86, 68, 82, 81, 42, 57, 0, 23, 12, 24, 27, 43, 18, 
63), FemaleCount = c(14, 133, 73, 62, 22, 1, 18, 36, 3, 10, 36, 
25, 21, 38, 20, 44, 24, 21, 1, 18, 21, 19, 30, 26, 151, 71, 23, 
17, 27, 20, 132, 17, 65, 70, 55, 28, 73, 73, 69, 80, 28, 74, 
2, 25, 24, 27, 24, 33, 33, 64), IndeterminateCount = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0), Count = c(26, 286, 125, 121, 38, 1, 28, 75, 
4, 19, 69, 58, 34, 82, 23, 81, 55, 36, 1, 35, 39, 27, 69, 50, 
294, 155, 47, 40, 33, 50, 261, 38, 126, 142, 110, 51, 159, 141, 
151, 161, 70, 131, 2, 48, 36, 51, 51, 76, 51, 127), Year = c(2020, 
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
2020, 2020, 2020, 2020, 2020), Month = c("April", "April", "April", 
"April", "April", "April", "April", "April", "April", "April", 
"April", "April", "April", "April", "April", "April", "April", 
"April", "April", "April", "April", "April", "April", "April", 
"April", "April", "April", "April", "April", "April", "April", 
"April", "April", "April", "April", "April", "April", "April", 
"April", "April", "April", "April", "April", "April", "April", 
"April", "April", "April", "April", "April")), row.names = c(NA, 
-50L), class = c("tbl_df", "tbl", "data.frame"))

【问题讨论】:

  • 如果你添加你的数据结构(通过dput(df))我们可以讨论一下
  • 数据有 11 个变量的 39368 个 obs - 会不会太大而无法发布?
  • 您可以 dput(sample_n(df,200)) 从您的数据中获取随机 200 个样本的结构。
  • 因子错误(HealthBoard):找不到对象“HealthBoard”运行您的代码您的数据中没有这样的东西您是如何使用该因子获得颜色的?

标签: r ggplot2


【解决方案1】:

我根据你的cmets将OrgCode改名为HealthBoard

library(dplyr)

df %>%
rename(HealthBoard=OrgCode) %>%
group_by(HealthBoard,AgeBand) %>%
summarise(MaleCount=sum(MaleCount),.groups='drop') %>%
ggplot(aes(x=AgeBand,y=MaleCount,color=HealthBoard))+
geom_line()+
ggtitle('You have some idea now.')

输出;

【讨论】:

  • 嘿@Samet - 我在我的 cleaned 数据集上尝试过,它看起来更接近 - 我认为你在总结部分是正确的!非常感谢您的时间和帮助。 imgur.com/a/nhhiZL9
  • @M4rk3h11 欢迎您,如果您觉得足够满意,请考虑批准和投票。
  • 我仍然低于 15 代表,但我补充说您为我发布了正确答案。 :)
猜你喜欢
  • 2021-12-10
  • 2013-12-28
  • 1970-01-01
  • 1970-01-01
  • 2016-12-04
  • 2011-09-04
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多