【问题标题】:Visualizing multivariate data using aesthetic mapping in ggplot2在 ggplot2 中使用美学映射可视化多元数据
【发布时间】:2019-03-21 09:33:26
【问题描述】:

我正在尝试使用带有多元数据的 ggplot2 制作 geom_point 图,但我遇到了对数据进行颜色编码以及直观地绘制它的问题。我在下面分享了我的数据:我对努力(X 轴)与头发变化(y 轴)感兴趣,并按头发类型(脱发类型:扩散、额叶/颞叶和/或顶点)对数据进行颜色编码。调查的性质是多变量的,患者能够认可多种脱发类型(HairType 1、2 和/或 3)。前 20 名参与者的代码如下:

Figure3Data = structure(list(MonthsMassage = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1), 
MinutesPerDayMassage = c("0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", 
"11-20 minutes daily", "11-20 minutes daily", "11-20 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily"), Minutes = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 15, 15, 15, 5, 5, 5, 5, 5, 5, 5), hairchange = c(-1, -1, 0, 
-1, 0, -1, -1, 0, 0, -1, 0, -1, -1, 0, 0, -1, 0, -1, 0, -1), 
HairType1 = c("Templefrontal", "Templefrontal", "Templefrontal", 
"Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", 
"other", "Templefrontal", "Templefrontal", "Templefrontal", 
"Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", 
"Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", 
"Templefrontal"), HairType2 = c("other", "other", "other", 
"other", "other", "other", "other", "other", "other", "Vertexthinning", 
"Vertexthinning", "other", "Vertexthinning", "other", "other", 
"Vertexthinning", "other", "Vertexthinning", "Vertexthinning", 
"other"), HairType3 = c("other", "Diffusethinning", "other", 
"Diffusethinning", "other", "other", "Diffusethinning", "Diffusethinning", 
"Diffusethinning", "other", "Diffusethinning", "Diffusethinning", 
"other", "other", "Diffusethinning", "Diffusethinning", "other", 
"Diffusethinning", "Diffusethinning", "Diffusethinning"), 
Effort = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.5, 2.5, 
2.5, 2.5, 2.5, 2.5, 2.5), EffortGroup = c("<5", "<5", "<5", 
"<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", 
"<5", "<5", "<5", "<5", "<5", "<5", "<5")), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

由于患者认可属于多列的发型,我无法使用以下代码直观地分离数据:

ggplot(data, aes(x=Effort, y=hairchange, color  = hairtype????)+geom_point()

如果数据以某种方式在 1 列中显示脱发,则很容易可视化:

因此,我想知道是否有一种方法可以组织数据以允许对 3 种脱发类型进行可视化和颜色编码?我已经尝试过 reshape2 并且没有任何运气就融化了。我想避免创建第 4 类“报告的多种类型”,因为这会使很多人无法获得我想要获得的见解。

另外,我们将不胜感激有关绘制此数据(密度/线图)的替代方法的建议。我的一个想法是有四个单独的线图——每种脱发类型(即平均、扩散、顶点、时间)一个——x 轴作为努力,y 轴作为平均感知头发变化。

【问题讨论】:

  • 这是一个相当广泛的问题,似乎更多的是“建议”领域,而不是特定的编程问题。你比这里的任何人都处于更好的位置,知道什么样的数字最能显示你的结论。因此,如果您可以具体说明如何实现这一目标,您可能会得到更好的建议。也就是说,您可以尝试使用color = interaction(HairType1, HairType2, HairType3) 在一个图表中显示不同的组合,或者查看UpSetR 包作为显示跨特征组合的单变量数据的一种方式。
  • 很抱歉给您带来了困惑。我特别想制作一个geom_dotplot,显示由三种脱发类型编码的努力(a 轴)和头发变化(y 轴)颜色。我只想用 3 种颜色直观地显示 3 种不同的脱发类型,无论数据点是否重叠(即患者认可 2 种或更多脱发类型 - 只需绘制 2 倍)。我还想消除每列 5-7 中的“其他”脱发类型。对于我提供的前 20 个数据点,我想在 6 个顶点数据点和 12 个漫反射数据点之上绘制 19 个时间数据点。每个颜色都不一样。有意义吗?

标签: r ggplot2


【解决方案1】:

我使用了以下代码 sn-p:

library(ggplot2)
library(data.table)

dt <- data.table(MonthsMassage = c(0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1), 
                      MinutesPerDayMassage = c("0-10 minutes daily", "0-10 minutes daily", 
                                               "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
                                               "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
                                               "0-10 minutes daily", "0-10 minutes daily", 
                                               "11-20 minutes daily", "11-20 minutes daily", "11-20 minutes daily", 
                                               "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
                                               "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
                                               "0-10 minutes daily"),
                      Minutes = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 15, 15, 5, 5, 5, 5, 5, 5, 5),
                      hairchange = c(-1, -1, 0, -1, 0, -1, -1, 0, 0, -1, 0, -1, -1, 0, 0, -1, 0, -1, 0, -1), 
                      HairType1 = c("Templefrontal", "Templefrontal", "Templefrontal", 
                                      "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", 
                                      "other", "Templefrontal", "Templefrontal", "Templefrontal", 
                                      "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", 
                                      "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", 
                                      "Templefrontal"),
                      HairType2 = c("other", "other", "other", "other", "other", "other", "other", "other",
                                    "other", "Vertexthinning", "Vertexthinning", "other", "Vertexthinning",
                                    "other", "other", "Vertexthinning", "other", "Vertexthinning", 
                                    "Vertexthinning", "other"),
                      HairType3 = c("other", "Diffusethinning", "other", "Diffusethinning", "other", "other",
                                    "Diffusethinning", "Diffusethinning", "Diffusethinning", "other", 
                                    "Diffusethinning", "Diffusethinning", "other", "other", "Diffusethinning", 
                                    "Diffusethinning", "other", "Diffusethinning", "Diffusethinning", "Diffusethinning"), 
                      Effort = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5), 
                      EffortGroup = c("<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", 
                                      "<5", "<5", "<5", "<5", "<5", "<5", "<5"))

您可以创建一个结合三种发型的全新列,只需将第 5、6 和 7 列粘贴在一起作为新的“combinedHair”列:

dt[, CombinedHair:=do.call(paste0,.SD), .SDcols=c(5,6,7)]

如果要绘制这个数据表的数据,它有overplotting,所以我建议geom_jitter()函数:

ggplot(data = dt, aes(x=Effort, y=hairchange, color  = CombinedHair))+geom_jitter(width = 0.1, height = 0.1)

如果你想要更好的类名,那么我认为你可以用空引号替换'default'。

【讨论】:

  • 谢谢!但是,是否有可能创建一个全新的列,将三种头发类型组合成一个组合头发列,但不创建新类别(即 DiffusexVertex 细化),但只有原始 3。我只想绘制三个不同的3 种颜色的头发类型(漫反射、顶点、时间),无论数据点是否重叠(即,如果患者报告漫反射和时间变细,则两个数据点绘制在对应于漫反射和时间颜色的同一位置。上面的代码总共创建了6种脱发类型。还有我如何消除“其他”
  • @jbearazesh 您可以像dt[dt == 'other'] &lt;- '' 一样消除“其他”。因此,您选择其中具有“其他”值的单元格,并将这些值替换为空字符 ''。
【解决方案2】:

这是一种将位置移动到它自己的变量中的方法(此处未显示,但如果您愿意,您可以将其映射到刻面、点形状或其他美学),然后根据头发类型绘制颜色,去除“其他”发型。

library(tidyverse)
Figure3Data_long <- Figure3Data %>%
  gather(location, hairtype, HairType1:HairType3) %>%
  filter(hairtype != "other")

ggplot(Figure3Data_long,
       aes(Effort, hairchange, color = hairtype)) +
  # geom_point() +  
  geom_jitter(width = 0.03, height = 0.01)  # illustrative to show overplots 

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多