【问题标题】:R - ddply(): Using min value of one column to find the corresponding value in different column [duplicate]R - ddply():使用一列的最小值在不同列中找到对应的值[重复]
【发布时间】:2021-11-15 14:22:15
【问题描述】:

我想获得每个国家/地区多年来与特定机场的最低(成本)摘要。数据集如下所示(大约 1000 行,每个国家/地区有多个机场)

airport  country cost    year
ORD      US      500     2010
SFO      US      800     2010
LHR      UK      250     2010
CDG      FR      300     2010
FRA      GR      200     2010
ORD      US      650     2011
SFO      US      500     2011
LHR      UK      850     2011
CDG      FR      350     2011
FRA      GR      150     2011
ORD      US      250     2012
SFO      US      650     2012
LHR      UK      350     2012
CDG      FR      450     2012
FRA      GR      100     2012

下面的代码让我总结了每个国家/地区的最低(成本)

ddply(df,c('country'), summarize, LowestCost = min(cost))

当我尝试显示国家的最低(成本)以及特定机场时,我只列出了一个机场

ddply(df,c('country'), summarize, LowestCost = min(cost), AirportName = df[which.min(df[,3]),1])

The output should look like below

country  LowestCost  AirportName
US       250         ORD
UK       250         LHR
FR       300         CDG
GR       100         FRA

But instead it looks like this
country  LowestCost  AirportName
US       250         ORD
UK       250         ORD
FR       300         ORD
GR       100         ORD

感谢任何帮助

【问题讨论】:

    标签: r dplyr plyr


    【解决方案1】:

    我们可以使用dplyr中的slice_min

    library(dplyr)
    df %>%
         select(-year) %>%
         group_by(country) %>%
         slice_min(cost, n = 1) %>%
         ungroup %>%
         rename(LowestCost = cost)
    

    -输出

    # A tibble: 4 x 3
      airport country LowestCost
      <chr>   <chr>        <int>
    1 CDG     FR             300
    2 FRA     GR             100
    3 LHR     UK             250
    4 ORD     US             250
    

    plyr 代码中,which.min 应用于整个列,而不是分组列。我们只需要指定列名

    plyr::ddply(df, c("country"), plyr::summarise, 
       LowestCost = min(cost), AirportName = airport[which.min(cost)])
      country LowestCost AirportName
    1      FR        300         CDG
    2      GR        100         FRA
    3      UK        250         LHR
    4      US        250         ORD
    

    数据

    df <- structure(list(airport = c("ORD", "SFO", "LHR", "CDG", "FRA", 
    "ORD", "SFO", "LHR", "CDG", "FRA", "ORD", "SFO", "LHR", "CDG", 
    "FRA"), country = c("US", "US", "UK", "FR", "GR", "US", "US", 
    "UK", "FR", "GR", "US", "US", "UK", "FR", "GR"), cost = c(500L, 
    800L, 250L, 300L, 200L, 650L, 500L, 850L, 350L, 150L, 250L, 650L, 
    350L, 450L, 100L), year = c(2010L, 2010L, 2010L, 2010L, 2010L, 
    2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 2012L, 2012L, 2012L, 
    2012L)), class = "data.frame", row.names = c(NA, -15L))
    

    【讨论】:

      猜你喜欢
      • 2021-03-08
      • 1970-01-01
      • 2015-11-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-04-22
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多