Rtsne：困惑太大答案

【问题标题】：Rtsne: Perplexity is too largeRtsne：困惑太大
【发布时间】：2018-06-28 18:50:46
【问题描述】：

我正在尝试在具有以下尺寸的基因表达矩阵上使用 tSNE：7x5000。我删除了低方差、低表达和重复值：

     ENSMUSG00000022037 ENSMUSG00000064351 ENSMUSG00000047517 ENSMUSG00000101111
852_1           18.04494           16.58238          14.760356           14.72078
852_2           18.33979           16.08849          15.846886           14.13721
852_3           17.27803           16.63105          13.483438           14.78686
852_4           18.08123           16.17240          13.854479           13.97815
853_1           15.87570           16.43745          10.016808           14.47457
853_2           14.13963           18.19087           8.654636           16.73305
853_3           17.95099           16.66351          17.109841           14.49093

这是我运行 tSNE 的方式：

tsne_out <- Rtsne(mat, dims = 3)

但它给了我以下错误：

Error in Rtsne.default(unique(t(highly_variable)), dims = 3) : 
  Perplexity is too large.

有人可以告诉我我做错了什么吗？

谢谢！

【问题讨论】：

您收到此错误的原因是：此函数的 perplexity 默认为 30。而您的数据只有 7 条记录。尝试使用tsne_out <- Rtsne(as.matrix(mat), dims = 3, perplexity = 1) 。它应该可以工作。
@samadhi 是否建议更改 perplexity 参数？
我猜你应该尝试不同的困惑值，理想情况下可能在 5 到 50 之间，以获得 t-SNE 的优化值。看看这篇文章distill.pub/2016/misread-tsne。

标签： r pca

【解决方案1】：

我迟到了 2 年，但在阅读了@sm925 的评论后，我去查看了文档 (?Rtsne)，发现：

perplexity    numeric; Perplexity parameter (should not be bigger
              than 3 * perplexity < nrow(X) - 1, see details for
              interpretation)

所以基本上我们可以逆向计算可接受的最高困惑度：

my_Rtsne <- Rtsne(X = data.matrix(data),
                  perplexity = floor((nrow(data) - 1) / 3),
                  dims = 2)

【讨论】：