【问题标题】:How can I add line labels in ggparcoord in R?如何在 R 的 ggparcoord 中添加线标签?
【发布时间】:2017-04-11 14:38:27
【问题描述】:

最近我在R中使用ggparcoord()时遇到了一个问题。我想为平行坐标图中的线条添加一些标签,但我似乎无法做到。

这是一个 MWE:

A <- rnorm(200, 60, 200)
B <- rnorm(200, 40, 126)
C <- rnorm(200, 200, 800)
D <- c( rep("C1", 50), rep("C2", 50), rep("C3", 50), rep("C4", 50) )

df <- data.frame(A, B, C, D)

ggparcoord(df, columns = c(1, 2, 3), groupColumn = 4) + 
  geom_line(size = 0.25) + geom_text(label = "x", hjust = -0.5) +
  ggtitle("Var relationships across clusters") + 
  xlab("My dimensions") + ylab("Scaled values") +
  scale_colour_manual(values = c("C1" = "#2166ac", 
                                 "C2" = "#67a9cf", 
                                 "C3" = "#ef8a62",
                                 "C4" = "#b2182b"))

所以这种方法有效,并在 3 个轴上添加了一个“x”。当我想提供适当长度的字符向量而不是“x”时,就会出现问题。所以,例如:

my_labs <- sample(LETTERS, nrow(df), replace = T)

ggparcoord(df, columns = c(1, 2, 3), groupColumn = 4) + 
  geom_line(size = 0.25) + geom_text(label = rep(my_labs, 3), hjust = -0.5 ) +
  ggtitle("Var relationships across clusters") + 
  xlab("My dimensions") + ylab("Scaled values") +
  scale_colour_manual(values = c("C1" = "#2166ac", 
                                 "C2" = "#67a9cf", 
                                 "C3" = "#ef8a62",
                                 "C4" = "#b2182b"))

在这里,我将 my_labs 向量乘以 3,以匹配 ggparcoord() 对 3 个轴所需的长度(理论上)。但令人惊讶的是,这仍然失败:Error: Aesthetics must be either length 1 or the same as the data (4): label, hjust。我真的不明白这甚至意味着什么,data (4) 在那里。感谢您的帮助!

附言。在我的真实数据中,我计划实际上只标记相关的行子集;其他人将在字符向量中使用""。所以我不太担心情节过于拥挤。谢谢!

【问题讨论】:

  • 我在运行您的代码时收到的错误消息是标签变量应该是 600 长,这是有道理的,因为您需要 600 个标签(数据集中的 200 行乘以 3 列)。将标签添加到数据集应该会有所帮助:df$my_labs &lt;- sample(LETTERS, nrow(df), replace = TRUE)。然后将label = my_labs 放入aes 放入geom_text

标签: r ggplot2 ggally


【解决方案1】:

按照上面评论中的说明,我不确定您是否会对标签进行太多控制。另一种选择,虽然涉及更多,但从ggparcoord 移开并使用ggplot 创建绘图。如果你这样做,你可以标记任何你喜欢的点。缺点是更多的工作,你必须自己重新扩展。

A <- rnorm(200, 60, 200)
B <- rnorm(200, 40, 126)
C <- rnorm(200, 200, 800)
D <- c( rep("C1", 50), rep("C2", 50), rep("C3", 50), rep("C4", 50) )

df <- data.frame(A, B, C, D)

# Re-scaling the numeric columns, and adding column D to a new data frame
# Use a different type of scaling if needed
dfScaled <- data.frame(scale(df[,1:3]), D)

# Check that we get mean of 0 and sd of 1
colMeans(dfScaled[,1:3])
apply(dfScaled[,1:3], 2, sd)

require(reshape2)
# Turn the data into long format
# Add a "row" variable that will help keep track of what row the data came from
# Use df or dfScaled
df2 <- melt(data.frame(dfScaled, row = 1:nrow(dfScaled)),
            id.vars = c("D", "row"),
            measure.vars = c("A", "B", "C" ),
            variable.name = "OrgCol",
            value.name = "Value"
)

# Reordering may help see the original structure better
# the first 3 rows was your original first row
odf2 <- df2[order(df2$row, df2$OrgCol), ]

# Add whatever labels you want, making them all blank here
odf2$my_labs <- ""

# Here only labeling the end (far right point) of the first line
# (first line is from row 1 of the original df)
odf2$my_labs[3] <- "A"

# See the structure
head(odf2)

# Create the plot with lines connected by row, colored by D
# I colored the one labeled point green just to make it stand out
ggplot(odf2, aes(x = OrgCol, y = Value, group = row, color = D)) + geom_line() +
 geom_text(aes(label = my_labs), colour = "green") +
  ggtitle("Var relationships across clusters") + 
  xlab("My dimensions") + ylab("Scaled values") +
  scale_colour_manual(values = c("C1" = "#2166ac", 
                                 "C2" = "#67a9cf", 
                                 "C3" = "#ef8a62",
                                 "C4" = "#b2182b"))

【讨论】:

    【解决方案2】:

    感谢@aosmith 的帮助(非常感谢!),我找到了具体问题的答案。这不是直接将标签放在数据框之外保存其他列的问题,而是关键问题是我没有将标签包装在aes() 内,geom_text() 内。

    所以,我将把我的标签放在实际数据之外,因为我想手动调整 600 长度的向量。有点hacky,我知道,但它会工作。这是因为如果我将 200 个标签放在数据框中,它们会在所有 3 个 ggparcoord() 轴上重复,这是我不想要的。我希望它们仅位于绘图/轴的一侧,其余最多 600 的位置被转换为空填充器(或"")。所以我找到的解决方法是这样的,虽然确实在geom_text() 中使用了aes()

    # Given same data above:
    
    # Creating a label vector:
    my_labs <- sample(LETTERS, nrow(df), replace = T)
    
    # Adding some gaps to avoid overcrowding. 
    # Shall keep only one in 10 labels, to illustrate what the 4 groups are about :
    to_keep <- seq( 1, length( my_labs ), by = 10 )
    to_remove <- setdiff( 1 : length( my_labs ), to_keep )
    my_labs[ to_remove ] <- ""
    
    # Here adding filler to the vector, to create a length of 600:
    my_labs <- c( my_labs, rep( "", 2 * length( my_labs ) ) )
    
    
    ggparcoord(df, columns = c(1, 2, 3), groupColumn = 4) + 
      geom_line(size = 0.25) + geom_text( aes(label = my_labs), hjust = 1.5 ) +
      ggtitle("Var relationships across clusters") + 
      xlab("My dimensions") + ylab("Scaled values") +
      scale_colour_manual(values = c("C1" = "#2166ac", 
                                     "C2" = "#67a9cf", 
                                     "C3" = "#ef8a62",
                                     "C4" = "#b2182b"))
    

    【讨论】:

      猜你喜欢
      • 2020-01-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-08-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多