【问题标题】:R: Create plot with mean from dataframeR:使用数据框的平均值创建图
【发布时间】:2016-02-26 05:43:38
【问题描述】:

我是 R 的新手。

我在第 3 到第 6 列的数据框中有一些值,我想在点图中绘制这些值。第 3 到第 6 列各代表一个月,第 1 到 30 列代表月中的第几天。数据框中的数字代表温度。

我想绘制一个图,其中 y 轴为温度,x 轴为月份。然后,您在图上会有代表每个温度的点和一条穿过的线,您可以在其中跟踪每个月的平均温度。

但是有些温度是相同的,所以我想给其中一个添加一个非常小的值,这样你就可以在最常见的温度下看到很多点。

我试过了:

boxplot(dat3[,3:6],dat3=mean, geom="point", shape=18,
        size=3, color="red")

但是,这并没有在平均值之间划出一条线,而是将温度绘制为条形图。我只想要平均值之间的点和线。

这可能吗?

谢谢大家。

【问题讨论】:

  • 能否提供一个小型数据集? stackoverflow.com/help/mcve
  • 为了快速绘制数据框,我建议查看 ggplot2。它包括绘制散点图、折线图和组合图的功能,以及添加抖动和计算均值的功能。

标签: r plot


【解决方案1】:

我制作了一个很小(且不真实)的数据框,但您可以合并自己的数据。

structure(list(Month = structure(1:4, .Label = c("April", "May", 
"June", "July"), class = "factor"), X1 = c(50, 55, 57, 68), X2 = c(60, 
66, 68.4, 81.6), X3 = c(65, 71.5, 74.1, 88.4), X4 = c(40, 44, 
45.6, 54.4), X5 = c(50, 55, 57, 68), X6 = c(60, 66, 68.4, 81.6
), X7 = c(65, 71.5, 74.1, 88.4), X8 = c(40, 44, 45.6, 54.4), 
    X9 = c(50, 55, 57, 68), X10 = c(60, 66, 68.4, 81.6), X11 = c(65, 
    71.5, 74.1, 88.4), X12 = c(40, 44, 45.6, 54.4), X13 = c(50, 
    55, 57, 68), X14 = c(60, 66, 68.4, 81.6), X15 = c(65, 71.5, 
    74.1, 88.4), X16 = c(40, 44, 45.6, 54.4), X17 = c(50, 55, 
    57, 68), X18 = c(60, 66, 68.4, 81.6), X19 = c(65, 71.5, 74.1, 
    88.4), X20 = c(40, 44, 45.6, 54.4), X21 = c(50, 55, 57, 68
    ), X22 = c(60, 66, 68.4, 81.6), X23 = c(65, 71.5, 74.1, 88.4
    ), X24 = c(40, 44, 45.6, 54.4), X25 = c(50, 55, 57, 68), 
    X26 = c(60, 66, 68.4, 81.6), X27 = c(65, 71.5, 74.1, 88.4
    ), X28 = c(40, 44, 45.6, 54.4), X29 = c(50, 55, 57, 68), 
    X30 = c(50, 55, 57, 68)), .Names = c("Month", "X1", "X2", 
"X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11", "X12", 
"X13", "X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", 
"X22", "X23", "X24", "X25", "X26", "X27", "X28", "X29", "X30"
), row.names = c(NA, -4L), class = "data.frame")

经过一些清理工作,有几种方法可以绘制数据,但这里有一种:

library(dplyr)
df$Month <- factor(df$Month, levels = c("April", "May", "June", "July"))    # changed the order from alphabetical
df.m <- melt(df, id.vars = "Month")                        # melted the data frame into long format
df.m$variable <- str_replace_all(string = df.m$variable, pattern = "X", replacement = "")   # remove the X before dates

avg.temp <- df.m %>% group_by(Month) %>% summarise(avg = mean(value))       # calculated the monthly mean for plotting

library(ggplot2)
ggplot(df.m, aes(x = factor(variable), y = value)) +
  geom_point() +
  geom_point(data = avg.temp, aes(x = 15, y = avg), size = 7, color = "red") +
  facet_wrap(~Month) +
  theme_bw() +
  labs(x = "Days of the Month", y = "Temperature (F)", title = "Distribution of Temperatures -- Monthly Mean in Red")

【讨论】:

  • 哇,答案很酷,令人印象深刻!是否有可能像点图一样,x 轴上是月份,y 轴上是不同的温度。并且所有天都只属于一个月列,这样我就可以在平均值之间划一条线。
  • 是的,这当然是可能的。你为什么不把我的代码和你的数据用它来完成呢? SO 不是编码服务;我们尝试回答人们提交的特定编码问题。顺便说一句,如果这回答了您的问题,即使不是您希望的进一步改进,请考虑通过单击接受箭头来接受它。
【解决方案2】:

使用 ggplot2(用于绘图)、tidyr(用于将表格转换为更易于处理的数据框)和 dplyr(用于处理数据框)的解决方案:

df <- structure(list(Jan = c(50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L,
50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L,
60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 50L), Feb = c(50L, 60L,
65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L,
40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L,
50L, 50L), Mar = c(50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L,
60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L,
65L, 40L, 50L, 60L, 65L, 40L, 50L, 50L), Apr = c(50L, 60L, 65L,
40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L,
50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L,
50L), May = c(50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L,
65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L,
40L, 50L, 60L, 65L, 40L, 50L, 50L), Jun = c(55L, 66L, 71L, 44L,
55L, 66L, 71L, 44L, 55L, 66L, 71L, 44L, 55L, 66L, 71L, 44L, 55L,
66L, 71L, 44L, 55L, 66L, 71L, 44L, 55L, 66L, 71L, 44L, 55L, 55L
), Jul = c(57L, 68L, 74L, 45L, 57L, 68L, 74L, 45L, 57L, 68L,
74L, 45L, 57L, 68L, 74L, 45L, 57L, 68L, 74L, 45L, 57L, 68L, 74L,
45L, 57L, 68L, 74L, 45L, 57L, 57L), Aug = c(68L, 81L, 88L, 54L,
68L, 81L, 88L, 54L, 68L, 81L, 88L, 54L, 68L, 81L, 88L, 54L, 68L,
81L, 88L, 54L, 68L, 81L, 88L, 54L, 68L, 81L, 88L, 54L, 68L, 68L
)), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
"Aug"), class = "data.frame", row.names = c(NA, -30L))

library(ggplot2)
library(tidyr)
library(dplyr)

df.temps <- df %>% select(Mar:Jun) %>% gather(month, temperature)
df.avg <- df.temps %>% group_by(month) %>% summarise(average=mean(temperature))

ggplot() +
  geom_point(data=df.temps, aes(x=temperature, y=month), position=position_jitter(width=1, height=0)) +
  geom_point(data=df.avg, aes(x=average, y=month), color="red", size=3) +
  geom_line(data=df.avg, aes(x=average, y=month, group=NA)) +
  labs(x = "Temperature (in F)", y = "Month")

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-04-09
    • 2021-03-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-10-10
    • 2021-12-22
    相关资源
    最近更新 更多