【问题标题】:R: Matching sample label order to hierarchically clustered orderR:将样本标签顺序与层次聚类顺序匹配
【发布时间】:2017-12-05 02:43:01
【问题描述】:

我有一个名为 cleaned_mayo 的数据,看起来像:

                        Source         Tissue RIN Diagnosis Gender  AgeAtDeath ApoE   FLOWCELL PMI N_unmapped N_multimapping N_noFeature N_ambiguous ENSG00000223972
1924_TCX MayoBrainBank_Dickson TemporalCortex 5.6   Control      F 90_or_above   33 AC5R6PACXX   2    2773880        9656114     8225967     2876479               1
1926_TCX MayoBrainBank_Dickson TemporalCortex 7.8   Control      F          88   33 AC44HKACXX   2    2279283       12410116     9503353     3600252               2
1935_TCX MayoBrainBank_Dickson TemporalCortex 8.6   Control      F          88   33 AC5T2GACXX   3    3120169        8650081     9640468     4603751               0
1925_TCX MayoBrainBank_Dickson TemporalCortex 6.6   Control      F          89   33 BC6178ACXX   4    2046886       10627577     7533671     3361385               1
1963_TCX MayoBrainBank_Dickson TemporalCortex 9.7   Control      M 90_or_above   33 AC5T1WACXX   4    1810116        9611375     5343437     2983079               2
         ENSG00000227232 ENSG00000278267 ENSG00000243485 ENSG00000274890 ENSG00000237613 ENSG00000268020 ENSG00000240361 ENSG00000186092 ENSG00000238009 ENSG00000239945
1924_TCX              80               7               1               0               0               0               0               0               3               0
1926_TCX             113              22               9               0               0               0               0               0               0               0
1935_TCX             181              21               2               0               0               0               0               0               0               0
1925_TCX              75               9               5               0               0               0               0               0               2               0
1963_TCX              73              14               1               0               0               0               0               0               3               0
         ENSG00000233750
1924_TCX              18
1926_TCX               2
1935_TCX               8
1925_TCX              20
1963_TCX              13

我使用以下代码对这些数据的表达式列进行分层聚类:

# Create the dendrogram for visualization
dend_expr<- cleaned_mayo[,14:60738] %>% # Isolate expression data
                  scale %>% # Normalize
                  dist  %>% # Compute distance measure
                  hclust %>% # Cluster hierarchically
                  as.dendrogram %>% # Convert to dendrogram type
                  assign_values_to_leaves_edgePar(value= cleaned_mayo$Diagnosis, edgePar= "col") %>% # Color branches by diagnosis
                  as.ggdend()

然后我使用以下方法可视化此树状图:

# Plot dendrogram
ggplot(dend_expr, horiz= T, theme= NULL, labels= F) +
  ggtitle("Mayo Cohort: Hierarchical Clustering of Patients Colored by Diagnosis")

我的问题是,使用这种assign_values_to_leaves_edgePar 分支着色技术,我的诊断顺序不再匹配聚类表达数据。因此,我的分支是根据诊断顺序着色的,这对于现在排列的样本是不正确的。

如何在聚类或正确标记分支后匹配这些数据帧的顺序?

谢谢!

【问题讨论】:

    标签: r hierarchical-clustering


    【解决方案1】:

    我自己找到了解决此问题的方法,并将其发布在此处,以防将来对任何人有所帮助。

    从创建树状图开始:

    # Create the dendrogram for visualization
    dend_expr<- cleaned_mayo[,15:60739] %>% # Isolate expression data
                      scale %>% # Normalize
                      dist  %>% # Compute distance measure
                      hclust %>% # Cluster hierarchically
                      as.dendrogram()
    

    然后我可以按照与新的分层聚类数据相同的顺序排列我的原始数据:

    # Arrange labels in order with tree
    tree_labels<- cleaned_mayo[order.dendrogram(dend_expr),]
    

    然后我可以使用以下顺序为树状图的分​​支着色:

    # Color branches by diagnosis
    dend_expr<- assign_values_to_leaves_edgePar(dend_expr, value= tree_labels$Diagnosis, edgePar= "col") %>%
                as.ggdend()
    

    然后将结果可视化:

    # Plot dendrogram
    ggplot(dend_expr, horiz= T, theme= NULL, labels= F) +
      ggtitle("Mayo Cohort: Hierarchical Clustering of Patients Colored by Diagnosis")
    

    【讨论】:

      猜你喜欢
      • 2013-08-15
      • 1970-01-01
      • 2021-10-26
      • 2014-12-21
      • 2022-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多