隐藏特定行中的多行答案

【问题标题】：Covert many lines in a specific line隐藏特定行中的多行
【发布时间】：2013-12-11 19:07:03
【问题描述】：

我想转换这些数据：

    Sample  Genotype  Region
    sample1    A      Region1
    sample1    B      Region1
    sample1    A      Region1
    sample2    A      Region1
    sample2    A      Region1
    sample3    A      Region1
    sample4    B      Region1

在该格式中，用“E”标记具有多个基因型的样本并将具有相同基因型的样本统一2次：

    Sample  Genotype  Region   
    sample1    E      Region1
    sample2    A      Region1
    sample3    A      Region1
    sample4    B      Region1

我有一个包含多个区域的列表（Region1 - Regionx）。可以在R软件中做吗？非常感谢。

【问题讨论】：

许多选项：来自 plyr 包的 tapply 或 ddply 或 data.table 包
我想在统一行中将“基因型”列中的排除 (E) 标记为具有多个基因型的样本 (sample1)，并将行统一到具有两行重复基因型的样本 (sample2)
是data.frame，单列是因子还是带字符串的矩阵？这种分析是按地区进行的吗？或者它们是如何总结的。对于基因型，您可以使用function(x) ifelse(length(unique(x))==1,x[1],'E')

标签： r plyr reshape tapply

【解决方案1】：

一种直接的方法是使用aggregate。假设您的 data.frame 被称为“mydf”（并基于 Jorg 的评论）：

aggregate(Genotype ~ ., mydf, function(x) {
  a = unique(x)
  ifelse(length(a) > 1, "E", a) 
})
#    Sample  Region Genotype
# 1 sample1 Region1        E
# 2 sample2 Region1        A
# 3 sample3 Region1        A
# 4 sample4 Region1        B

【讨论】：