【问题标题】:ggplot2 reodering heatmap base on hierachical clustering基于层次聚类的ggplot2排序热图
【发布时间】:2017-08-07 19:35:41
【问题描述】:

我在 ggplot2 上苦苦挣扎,尽管我发现了非常相似的问题,但我没有设法让它工作。我想根据分层聚类按列和行重新排序热图。

这里是我的实际代码:

# import
library("ggplot2")
library("scales")
library("reshape2")

# data loading
data_frame = read.csv(file=input_file, header=TRUE, row.names=1, sep='\t')

# clustering with hclust on row and on column
dd.col <- as.dendrogram(hclust(dist(data_frame)))
dd.row <- as.dendrogram(hclust(dist(t(data_frame))))

# ordering based on clustering
col.ord <- order.dendrogram(dd.col)
row.ord <- order.dendrogram(dd.row)


# making a new data frame reordered 
new_df = as.data.frame(data_frame[col.ord, row.ord])
print(new_df)   # when mannualy looking new_df it seems working 

# get the row name
name = as.factor(row.names(new_df))

# reshape
melte_df = melt(cbind(name, new_df))

# the solution is here to reorder the name column factors levels.
melte_df$name = factor(melte_df$name, levels = row.names(data_frame)[as.vector(row.ord)])

# ggplot2 dark magic
(p <- ggplot(melte_df, aes(variable, name)) + geom_tile(aes(fill = value),
 colour = "white") + scale_fill_gradient(low = "white",
 high = "steelblue") + theme(text=element_text(size=12),
 axis.text.y=element_text(size=3)))

# save fig
ggsave(file = "test.pdf")

# result is ordered as only by column what I have missed?

如果你能提出你的答案,我是 R 的新手。

【问题讨论】:

    标签: r ggplot2 heatmap hierarchical-clustering


    【解决方案1】:

    如果没有要重现的示例数据集,我不能 100% 确定这是原因,但我猜你的问题取决于这一行:

    name = as.factor(row.names(new_df))
    

    当您使用一个因子时,排序基于该因子水平的排序。您可以根据需要对数据框重新排序,绘图时使用的顺序将是您的级别的顺序。

    这是一个例子:

    data_frame <- data.frame(x = c("apple", "banana", "peach"), y = c(50, 30, 70))
    data_frame
           x  y
    1  apple 50
    2 banana 30
    3  peach 70
    
    data_frame$x <- as.factor(data_frame$x) # Make x column a factor
    
    levels(data_frame$x) # This shows the levels of your factor
    [1] "apple"  "banana" "peach" 
    
    data_frame <- data_frame[order(data_frame$y),] # Order by value of y
    data_frame
       x  y
    2 banana 30
    1  apple 50
    3  peach 70
    
    # Now let's plot it:
    p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
    p
    

    这是结果:

    看到了吗?它不是按我们想要的 y 值排序的。它按因子的水平排序。现在,如果这确实是您的问题所在,这里有解决方案 R - Order a factor based on value in one or more other columns

    dplyr 解决方案的应用示例:

    library(dplyr)
    data_frame <- data_frame %>%
           arrange(y) %>%          # sort your dataframe
           mutate(x = factor(x,x)) # reset your factor-column based on that order
    
    data_frame
           x  y
    1 banana 30
    2  apple 50
    3  peach 70
    
    levels(data_frame$x) # Levels of the factor are reordered!
    [1] "banana" "apple"  "peach" 
    
    p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
    p
    

    这是现在的结果:

    我希望这会有所帮助,否则,您可能想举一个原始数据集的示例!

    【讨论】:

    • 您的回答对于指出问题非常有用。但最后我找到了一种更方便的方法。通过重新排序因子水平。我将编辑我的问题以添加使它起作用的原因,但再次感谢您的帮助。
    猜你喜欢
    • 2011-01-26
    • 2021-08-20
    • 1970-01-01
    • 2020-06-28
    • 2012-08-03
    • 2013-05-08
    • 2011-12-29
    • 2021-10-26
    • 1970-01-01
    相关资源
    最近更新 更多