【问题标题】:Getting unique rows from pairs of rows that only differ in one value (observation)从仅在一个值上不同的行对中获取唯一行(观察)
【发布时间】:2018-03-02 20:30:41
【问题描述】:

我有一个 data.table,其中有许多对 q.val 的值不同的行。对于这些对中的每一对,我想选择具有较小值的行。换句话说,我想从 DT1 转到 DT2(见下文)。在data.table 包或其他包中是否有一种简单的方法可以做到这一点?

DT1
cluster q.val
c1  8.68E-03
c1  1.00E+00
c2  4.53E-05
c2  1.00E+00
c3  2.46E-03
c3  1.00E+00
c4  4.18E-05
c4  1.00E+00
c5  1.00E+00
c5  3.98E-05
c6  1.00E+00
c6  4.71E-06

DT2
cluster q.val
c1  8.68E-03
c2  4.53E-05
c3  2.46E-03
c4  4.18E-05
c5  3.98E-05
c6  4.71E-06

鉴于我已通过原始问题编辑的答案发布了我真实表格的一部分

cluster pathway q.val
c1  Adrenergic signaling in cardiomyocytes  -3.01E-06
c1  Adrenergic signaling in cardiomyocytes  -1.80E+00
c2  Adrenergic signaling in cardiomyocytes  -5.07E-06
c2  Adrenergic signaling in cardiomyocytes  -1.30E+00
c3  Adrenergic signaling in cardiomyocytes  -1.46E-06
c3  Adrenergic signaling in cardiomyocytes  -2.32E+00
c4  Adrenergic signaling in cardiomyocytes  -1.60E-05
c4  Adrenergic signaling in cardiomyocytes  -1.75E+00
c5  Adrenergic signaling in cardiomyocytes  2.58E+00
c5  Adrenergic signaling in cardiomyocytes  2.53E-06
c6  Adrenergic signaling in cardiomyocytes  3.54E+00
c6  Adrenergic signaling in cardiomyocytes  8.74E-08
c7  Adrenergic signaling in cardiomyocytes  -4.85E-02
c7  Adrenergic signaling in cardiomyocytes  -3.98E-03
c8  Adrenergic signaling in cardiomyocytes  9.73E-01
c8  Adrenergic signaling in cardiomyocytes  3.44E-05
c1  Aldosterone synthesis and secretion -3.01E-06
c1  Aldosterone synthesis and secretion -1.64E+00
c2  Aldosterone synthesis and secretion -5.07E-06
c2  Aldosterone synthesis and secretion -1.49E+00
c3  Aldosterone synthesis and secretion -1.46E-06
c3  Aldosterone synthesis and secretion -1.85E+00
c4  Aldosterone synthesis and secretion -1.60E-05
c4  Aldosterone synthesis and secretion -1.40E+00
c5  Aldosterone synthesis and secretion 2.58E+00
c5  Aldosterone synthesis and secretion 2.53E-06
c6  Aldosterone synthesis and secretion 3.45E+00
c6  Aldosterone synthesis and secretion 8.74E-08
c7  Aldosterone synthesis and secretion -1.28E-02
c7  Aldosterone synthesis and secretion -1.42E-02
c8  Aldosterone synthesis and secretion 4.24E-01
c8  Aldosterone synthesis and secretion 3.44E-05

【问题讨论】:

    标签: r data.table unique


    【解决方案1】:
    DT1[, min(q.val), by = c("cluster", "pathway")]
    

    data.table 中的基本语法允许您使用“by”语句来区分函数(在本例中为“min”)的应用位置。值得注意的是,这仅在每个集群没有多个不同路径的情况下才有效。如果有,每个集群会有多行。

    【讨论】:

    • OP 想要选择行。如果还有其他列,这将丢失它们。可以为此做DT1[, .SD[which.min(q.val)], by=cluster]
    • ^^ 这是真的,尽管由于示例数据集,我假设没有其他列。
    • 是的,我过度简化了我的例子,我现在用我的真实 data.table 的一个子集编辑了我的原始问题
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-18
    • 2020-02-11
    • 1970-01-01
    • 2011-09-21
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多