【发布时间】:2018-01-18 08:33:25
【问题描述】:
使用如下所示的数据框
text <- "
location_id,brand,count,driven_km,efficiency,mileage,age
23040204995,Toyota,8,2761,0.57,333,2.17
23040204995,Honda,23,2307,0.38,117.5,0.45
23040204995,Tesla,16,3578,0.65,127,0.38
23040204996,Toyota,16,3578,0.65,127,0.38
23040204996,Nissan,38,2504,0.37,563.5,0.74
23040204996,Tesla,24,892,0.32,175,0.48
23040204997,Tesla,11,1879.5,0.67,298.5,0.57
23040204998,Honda,24,892,0.32,175,0.48
"
df <- read.table(textConnection(text), sep=",", header = T)
对于每个location_id,我需要计算所有品牌的count,driven_km,efficiency,mileage,age 值与Tesla 值的差异。不同的需要计算使得Value for i - Value for Tesla where i={"Toyota", "Honda", "Nissan" ..}。有location_ids 的值Tesla 可能不存在或可能只存在Tesla 的值,它们需要被忽略,因为差异对于那些location_ids 没有意义。
我正在寻找一种优雅的方式来做到这一点 - 最好是 dplyr 方式。
预期输出
location_id,brand,count,driven_km,efficiency,mileage,age
23040204995,Toyota,-8,-817,-0.08,206,1.79
23040204995,Honda,7,-1271,-0.27,-9.5,0.07
23040204996,Toyota,-8,2686,0.33,-48,-0.1
23040204996,Nissan,14,1612,0.05,388.5,0.26
【问题讨论】: