【问题标题】:Using pheatmap to sort data by row annotations?使用 pheatmap 按行注释对数据进行排序?
【发布时间】:2021-01-04 16:54:01
【问题描述】:

我正在尝试创建一个包含测试数据列和单个研究参与者行的热图。参与者可以分为三个不同的组。我想用三个组来注释绘图,然后对每个组内的数据进行聚类以了解它们之间的差异。

我是创建热图的新手,我无法让行注释起作用。一旦我让注释工作,我也不确定如何只在每个组内进行聚类。我认为包“pheatmap.type”可以工作,但不幸的是,它不适用于 R 版本 4.0.2。

我无法发布确切的数据(机密),但我已附上示例文件,我将描述到目前为止我所做的工作并发布代码。我有一个数据框,第一列作为标签,其中包括参与者 ID 和组(使用 row.names=1 执行此操作),然后是 12 列数字数据(无 NA)。然后,我按行名对数据进行排序,并使用 scale 函数对数据进行缩放并生成矩阵。然后,我尝试通过几种不同的方式将组信息添加到数据框中来创建注释行。到目前为止我尝试过的如下:

#dataframe with Group and ID as row names and 12 numerical columns  

df_1_HM <- data.frame(df_1$Group_ID, df_1$Test1, df_1$Test2, df_1$Test3, df_1$Test4, df_1$Test5, df_1$Test6, df_1$Test7, df_1$Test8, df_1$Test9, df_1$Test10, df_1$Test11, df_1$Test12, row.names=1)

#ordering the dataframe so that the groups are in order 
df_1_HM_ordered <- df_1_HM[ order(row.names(df_1_HM)), ]

#Z-scoring (scaling) data 
df_HM_matrix_1 <- scale(df_1_HM)

#creating a color palette 
my_palette <- colorRampPalette(c("white", "grey", "black"))(n = 100)


#Plotting heatmap 
install.packages("gplots")
library(gplots)

#trying to plot the heatmap with annotation_row data 
#The method below does not work for me. The plot will run with no errors but does not actually plot - it ends up becoming a list of 4 with no data.

pheatmap(df_HM_matrix_1,
         scale="none",
         color=my_palette,
         fontsize=14, 
         annotation_row=annotation_row)

annotation_row = data.frame(
  df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)))
)

rownames(annotation_row) = paste("df_Group", 1:28, sep = "")

rownames(annotation_row) = rownames(df_HM_matrix_1) # name matching

#I also tried to use a dataframe with just the groups as column 1 to get row annotation 
pheatmap(df_HM_matrix_1,
         scale="none",
         color=my_palette,
         fontsize=14, 
         annotation_row=df_Group)

df_Group <- data.frame(df_1$Group, df_1$ID)

#Also tried using the select function to create a dataframe for the row annotation 
df_Group_1 <- select(df_1, Group) 

#When I use either of the data frame methods above I get the following error: Error in cut.default(a, breaks = 100) : 'x' must be numeric

在这方面的任何帮助都会很棒!

这是示例数据:

structure(list(Group_ID = structure(1:28, .Label = c("Group1_10", 
"Group1_13", "Group1_15", "Group1_2", "Group1_20", "Group1_26", 
"Group1_27", "Group1_3", "Group1_6", "Group1_8", "Group2_1", 
"Group2_12", "Group2_14", "Group2_16", "Group2_21", "Group2_23", 
"Group2_25", "Group2_28", "Group2_7", "Group2_9", "Group3_11", 
"Group3_17", "Group3_18", "Group3_19", "Group3_24", "Group3_4", 
"Group3_5", "Group3_6"), class = "factor"), Test1 = c(1.44, 4.36, 
0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 
0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 
1.43, 2.58, 2.49, 2.64), Test2 = c(1.44, 4.36, 0.75, 0.59, 1.67, 
0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 
1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 
2.64), Test3 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 
2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 2.64), Test4 = c(1.44, 
4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 
2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 
1.49, 1.43, 2.58, 2.49, 0.31), Test5 = c(1.44, 4.36, 0.75, 0.59, 
1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 
0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 
2.49, 0.31), Test6 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 
0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 0.31), 
    Test7 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
    ), Test8 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
    ), Test9 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
    ), Test10 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
    ), Test11 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
    ), Test12 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 
    0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 
    3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
    )), class = "data.frame", row.names = c(NA, -28L))

【问题讨论】:

  • 如果您可以共享一些数据(可能是假数据,不一定是您的机密数据集),我想我可以提供帮助。现在它太抽象了。
  • @Werner 我刚刚上传了一个示例文件。感谢您的帮助!
  • 所以annotation_row 有 28 行,而df_HM_matrix_1 有 27 行。这是故意的还是错误的?
  • @Werner 抱歉 - 这是故意的。我弄乱了一个不同的热图包并删除了 df_HM_matrix_1 中的异常值。

标签: r heatmap pheatmap


【解决方案1】:

要使注释与pheatmap 一起使用,因子必须有序。为此,请将ordered = TRUE 添加到factor()

annotation_row = data.frame(df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)), ordered = TRUE))

您也可以使用as.ordered() 来完成同样的事情。

要按注释组对热图行进行排序,只需将参数cluster_rows = F 添加到pheatmap()

pheatmap(df_HM_matrix_1,
         scale="none",
         color=my_palette,
         fontsize=14, 
         annotation_row=annotation_row,
         cluster_rows = F)

这就是它现在的样子:

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-07-09
    • 1970-01-01
    • 2010-11-20
    • 1970-01-01
    • 2012-07-12
    相关资源
    最近更新 更多