【发布时间】:2020-10-17 15:49:58
【问题描述】:
我正在使用 R 中的数据转换,而我无法过滤具有相似值的行,选择具有更高“表达式值”的行,然后按表达式级别拆分列中的数据并聚合它们.由于我知道解释不会给诺贝尔奖,下面是原始数据,结果以及我到目前为止所取得的成就。
原始数据
df <- read.table(text =
"Tissue Species Expression
1 dentritic Human moderate
2 liver Human high
3 liver Human moderate
4 liver Human moderate
5 liver Human high
6 liver Monkey high
7 liver Monkey moderate
8 liver Dog high
9 liver Dog high
10 liver Minipig moderate
11 liver Rat low
12 liver Rat cutoff
13 liver Monkey moderate
14 lung Monkey high
15 quadriceps Monkey cutoff" , header = TRUE)
我需要达到的结果是,如果 Tissue 和 Species 的值都重复,则只选择 Expression 上的最大值。
Tissue High_Expression Moderate_Expression Low_Expression cutoff
1 dentritic Human
2 liver Human, Monkey,Dog Minipig Rat
3 lung Monkey
4 quadriceps Monkey
到目前为止我所拥有的:
df$Expression <- factor(df$Expression, levels = c("cutoff", "low", "moderate", "high"), ordered = TRUE)
df$Species <- as.character(df$Species)
df <- df %>%
mutate(High_expressed = ifelse(Expression == "high", Species, "")) %>%
mutate(moderate_expressed = ifelse(Expression == "moderate", Species, "")) %>%
mutate(low_expressed = ifelse(Expression == "low", Species, "")) %>%
mutate(below_cutoff_expressed = ifelse(Expression == "cutoff", Species, "")) %>%
select(-c("Expression", "Species"))
df <- aggregate(. ~ groupTissue, data = df, paste, collapse = ",")
That gives:
Tissue High_Expression Moderate_Expression Low_Expression cutoff
1 dentritic Human
2 liver Human,,,Human, ,Human,Human,,, ,,,,,,,,,Rat,, ,,,,,,,,,Rat,
Monkey,,Dog,Dog,,,, Monkey,,,Minipig,,,Monkey
3 lung Monkey
4 quadriceps Monkey
提前致谢
【问题讨论】: