【发布时间】:2020-11-05 04:02:27
【问题描述】:
我有一个大数据框,df1,看起来像这样:
Gene CB_1.1 CB_10.1 CB_10.2 CB_10.3
1 Gene1 10 0 0 0
2 Gene2 871 7 9 2
3 Gene3 490 2 5 8
4 Gene4 17 5 6 1
5 Gene5 75 1 1 1
6 Gene6 308 2 6 2
> dput(head(df1[,1:5]))
structure(list(X = c("Gene1", "Gene2", "Gene3",
"Gene4", "Gene5", "Gene6"), CB_1.1 = c(10L,
871L, 490L, 17L, 75L, 308L), CB_10.1 = c(0L, 7L, 2L, 5L, 1L,
2L), CB_10.2 = c(0L, 9L, 5L, 6L, 1L, 6L), CB_10.3 = c(0L, 2L,
8L, 1L, 1L, 2L)), row.names = c(NA, 6L), class = "data.frame")
还有第二个数据框df2,看起来像这样。
tissue_subcluster Class_2
1 CB_1.1 Neuron
2 CB_10.1 Neuron
3 CB_10.2 Non-Neuron
4 CB_10.3 Non-Neuron
> dput(head(df2[,c(7,9)]))
structure(list(tissue_subcluster = c("CB_1.1", "CB_10.1", "CB_10.2",
"CB_10.3", "CB_11.1", "CB_11.2"), Class_2 = c("Neuron", "Non-Neuron",
"Non-Neuron", "Non-Neuron", "Non-Neuron", "Non-Neuron")), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")
我想根据 df2 中的 Neuron 或 Non-neuron 因子对 df1 中的值进行平均。这样它看起来像这样:
Gene Neuron_mean Non-Neuron_mean
1 Gene1 5 0
2 Gene2 439 5.5
3 Gene3 246 6.2
4 Gene4 11 3.5
5 Gene5 38 1
6 Gene6 155 4
我该怎么做?任何帮助表示赞赏!
【问题讨论】:
-
请使用
dput(head(df1)并将结果粘贴到问题中,以便获得可以测试代码的数据。