pheatmap 中有一些奇怪的东西（一个错误？）答案

【问题标题】：Something weird in pheatmap (a bug?)pheatmap 中有一些奇怪的东西（一个错误？）
【发布时间】：2018-12-20 15:45:20
【问题描述】：

可重现的数据：

data(crabs, package = "MASS")
df <- crabs[-(1:3)]
set.seed(12345)
df$GRP <- kmeans(df, 4)$cluster
df.order <- dplyr::arrange(df, GRP)

数据说明：

df 有 5 个数值变量。我根据这 5 个属性做了 K-means 算法，生成了一个新的分类变量GRP，它有 4 个级别。接下来，我用GRP 订购并命名为df.order。

我对 pheatmap 做了什么：

## 5 numerical variables for coloring
colormat <- df.order[c("FL", "RW", "CL", "CW", "BD")]

## Specify the annotation variable `GRP` shown on left side of the heatmap
ann_row <- df.order["GRP"]

## gap indices
gapRow <- cumsum(table(ann_row$GRP))

library(pheatmap)
pheatmap(colormat, cluster_rows = F, show_rownames = F,
         annotation_row = ann_row, gaps_row = gapRow)

annotation_colors[[colnames(annotation)[i]]] 中的错误：下标越界

这是我发现奇怪的地方：

一开始我猜是annotation_row这个参数引起的。我检查了两个数据框的行名。

all.equal(rownames(colormat), rownames(ann_row))
# [1] TRUE

你可以看到他们是平等的。但是，我执行了以下代码并且热图工作。

rownames(colormat) <- rownames(ann_row)
pheatmap(colormat, cluster_rows = F, show_rownames = F,
         annotation_row = ann_row, gaps_row = gapRow)

理论上这段代码"rownames(colormat) <- rownames(ann_row)"应该没有意义，因为这两个对象本来是相等的，但是为什么它会让pheatmap()函数起作用呢？

编辑：根据@steveb 的评论，我什至不必使用ann_row 设置行名。我刚刚设置了

rownames(colormat) <- rownames(colormat)

并且 pheatmap 也可以使用。这种情况仍然违反直觉。

最终输出：

【问题讨论】：

我不知道答案，但你的问题应该出现在 R-reproducible Hall of Fame。写的很好。
我怀疑如果您执行以下rownames(colormat) <- rownames(colormat)，那么pheatmap 将起作用；您甚至不必使用ann_row 设置rownames。

标签： r heatmap pheatmap

【解决方案1】：

简而言之，colormat 在rownames(colormat) <- rownames(colormat) 之前没有rownames，但在之后有rownames。这个答案开始触及问题的本质，但没有深入探讨pheatmap 为何或如何遇到此问题，或者 R 为何以这种方式工作。换句话说，我并没有深入研究 R 中如何处理行名的细节。

这个问题的本质与rownames 返回行号的默认向量有关；每个元素都是一个数值，但表示为一个字符串，因此第 10 行变为行名称“10”。使用attributes(colormat) 时，您会看到$row.names 在rownames(colormat) <- rownames(colormat) 之前是一个数字向量，在之后是一个字符向量（它现在有行名）。我不清楚为什么当某些东西没有设置行名时会返回任何东西（NULL 或 NA 除外）。

attributes(colormat)
## $names
## [1] "FL" "RW" "CL" "CW" "BD"
## 
## $row.names
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38
##  [39]  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
##  [77]  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
## [115] 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
## [153] 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190
## [191] 191 192 193 194 195 196 197 198 199 200
## 
## $class
## [1] "data.frame"

rownames(colormat) <- rownames(colormat)

attributes(colormat)
## $names
## [1] "FL" "RW" "CL" "CW" "BD"
## 
## $row.names
##   [1] "1"   "2"   "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"  "13"  "14"  "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25" 
##  [26] "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33"  "34"  "35"  "36"  "37"  "38"  "39"  "40"  "41"  "42"  "43"  "44"  "45"  "46"  "47"  "48"  "49"  "50" 
##  [51] "51"  "52"  "53"  "54"  "55"  "56"  "57"  "58"  "59"  "60"  "61"  "62"  "63"  "64"  "65"  "66"  "67"  "68"  "69"  "70"  "71"  "72"  "73"  "74"  "75" 
##  [76] "76"  "77"  "78"  "79"  "80"  "81"  "82"  "83"  "84"  "85"  "86"  "87"  "88"  "89"  "90"  "91"  "92"  "93"  "94"  "95"  "96"  "97"  "98"  "99"  "100"
## [101] "101" "102" "103" "104" "105" "106" "107" "108" "109" "110" "111" "112" "113" "114" "115" "116" "117" "118" "119" "120" "121" "122" "123" "124" "125"
## [126] "126" "127" "128" "129" "130" "131" "132" "133" "134" "135" "136" "137" "138" "139" "140" "141" "142" "143" "144" "145" "146" "147" "148" "149" "150"
## [151] "151" "152" "153" "154" "155" "156" "157" "158" "159" "160" "161" "162" "163" "164" "165" "166" "167" "168" "169" "170" "171" "172" "173" "174" "175"
## [176] "176" "177" "178" "179" "180" "181" "182" "183" "184" "185" "186" "187" "188" "189" "190" "191" "192" "193" "194" "195" "196" "197" "198" "199" "200"
## 
## $class
## [1] "data.frame"

问题不在于行名的数值与字符值，而在于是否设置了行名。如果您执行了以下操作：

rownames(colormat) <- 1:nrow(colormat)

您会发现这也可以解决问题，因为 rownames 现在设置为行号的数值（请参阅attributes(colormat) 输出）。

如果你在rownames(colormat) <- rownames(colormat) 之前使用tibble::has_rownames(colormat)，那么你将得到FALSE。分配后，你会得到TRUE。

tibble::has_rownames(colormat)
## [1] FALSE
rownames(colormat) <- rownames(colormat)
tibble::has_rownames(colormat)
## [1] TRUE

我不确定pheatmap 如何在内部使用colormat，但它一定会遇到rownames 未设置的问题。如果您联系到这个包的作者（可能通过 GitHub：https://github.com/raivokolde/pheatmap），他们可能会更新代码以处理下一个版本的这种极端情况。

【讨论】：

您的回答详细而有帮助。我发现在我的数据中使用 dplyr::arrange(df, GRP) 会删除行名，而 df[order(df$GRP), ] 不会。但我必须使用前者，因为我需要在实际案例中对几个变量进行排序。感谢您指出行名问题。如果没有其他答案，我会奖励它。
@DarrenTsai 听起来不错。你希望得到多深的细节？您是否对 R 如何处理行名或 pheatmap 如何处理行名更感兴趣？无论哪种方式，我认为pheatmap 应该更新，如果没有提供所需的行名，则可以提供更好的错误。