R数据表选择总和最大的条目答案

【问题标题】：R data table select entry with largest sumR数据表选择总和最大的条目
【发布时间】：2018-05-18 07:42:06
【问题描述】：

我有一个包含 3 列的数据表：Field1、Field2 和 Value。对于Field2中的每个属性，我想在Field1中找到与Value之和最大的属性（即数据表中有多个Field1/Field2行）。

当我尝试这个时：x[,.(Field1 = Field1[which.max(sum(Value))]),.(Field2)] 我似乎得到了每个 Field2 的第一个 Field1 行，而不是对应于 Value 的最大总和的行。

作为扩展，如果您想同时提供值的总和、总行数以及对应于 Field2 中 Value 字段的最大总和的 Field1 值，该怎么办？

以下是可重现的代码。

library(data.table)

#Set random seed
set.seed(2017)

#Create a table with the attributes we need
x = data.table(rbind(data.frame(Field1 = 1:12,Field2 = rep(1:3, each = 4), Value = runif(12)),
               data.frame(Field1 = 1:12,Field2 = rep(1:3, each = 4), Value = runif(12))))

#Let's order by Field2/ Field1 / Value
x = x[order(Field2,Field1,Value)]

#Check
print(x)

# This works, but requires 2 steps which can complicate things when needing 
# to pull other attributes too.
(x[,.(Value = sum(Value)),.(Field2,Field1)][,.SD[which.max(Value)],.(Field2)])

#This instead provides the row corresponding to the largest Value.
(x[,.(Field1 = Field1[which.max(Value)]),.(Field2)])

# This is what I was ideally looking for but it only returns the first row of the attribute 
# regardless of the value of Value, or the corresponding sum.
(x[,.(Field1 = Field1[which.max(sum(Value))]),.(Field2)])

# This works but seems clumsy

(x[, 
.SD[, .(RKCNT=length(.I),TotalValue=sum(Value)), .(Field1)]
[,.(RKCNT = sum(RKCNT), TotalValue = sum(TotalValue), 
Field1 = Field1[which.max(TotalValue)])], 
.(Field2)])

【问题讨论】：

假设没有平局，您可以按总和排序，然后使用unique:x[, lapply(.SD, sum), by=.(Field2, Field1)][order(Field2, -Value), unique(.SD, by="Field2")]。我猜这个问题在某个地方有问题。
不是您的主要问题，但仅供参考，用于排序的 data.table 习语是使用 setorder“通过引用快速重新排序 data.table 行”

标签： r data.table

【解决方案1】：

我们可以使用

x[, .SD[, sum(Value), Field1][which.max(V1)], Field2]

这很简洁，因此更容易阅读。但并没有带来任何性能提升。

【讨论】：