创建附加循环 Dplyr答案

【问题标题】：Creation of additional loop Dplyr创建附加循环 Dplyr
【发布时间】：2019-03-12 12:42:57
【问题描述】：

所以我有两个包含一年（2015 年）数据的文件：

    Product Reporter Total_trade
    Apple   Spain        100
    Apple   France       200
    Apple   Italy        300

    Product Reporter Total_trade
    Pear    Spain        400
    Pear    France       500
    Pear    Italy        600

我创建了一个循环来计算两种产品的特定比率

    product_index <- c("Apple","Pear")

    prody_matrix <- data.frame(matrix(vector(), 0, 2,
                                      dimnames=list(c(), c("Product", "ratio"))),
                               stringsAsFactors=F)


    for (l in product_index){
      infile <- paste("tradetotal_",l,".csv",sep="")
      sum_trade <- read.csv(infile)
      sum_trade <- sum_trade[,-1]


  k <- which(product_index==l)
  ratio_matrix[k,"Product"] <- l
  ratio_matrix[k,"ratio"] <- ratio[1,2] 
    }

现在我有不同年份的相同产品的数据。如何在现有循环中创建另一个循环以计算不同年份的比率？

【问题讨论】：

你得到的答案解决了你的问题吗？如果是这样，请考虑接受一个作为解决方案。如果不考虑为您的问题添加更多信息。

标签： r loops dplyr

【解决方案1】：

使用上面@Patrick 回答中的数据框。

1) 行绑定名称为年份的命名数据框列表。如果需要，可以使用更多年份的数据来扩展列表。

df <- bind_rows(list("2015" = apple_2015,
                     "2015" = pear_2015,
                     "2016" = apple_2016, 
                     "2016" = pear_2016), .id="year")

2) 使用 dplyr 进行聚合

df %>% 
  spread(Product, Total_trade) %>% 
  group_by(year, Reporter) %>% 
  summarise(Apple_Pear_ratio = Apple/Pear)

# A tibble: 6 x 3
# Groups:   year [2]
  year  Reporter Apple_Pear_ratio
  <chr> <chr>               <dbl>
1 2015  France              0.4  
2 2015  Italy               0.5  
3 2015  Spain               0.25 
4 2016  France              0.5  
5 2016  Italy               0.5  
6 2016  Spain               0.667

修改为包含传播功能

【讨论】：

spread() 也可以得到类似你的group_by() 解决方案的结果。

【解决方案2】：

这里是分组的可能解决方案。如果您需要更通用的方法，请告诉我。

    library(tidyverse)

# the product types
product_index <- c('Apple', 'Pear', 'Banana', 'Orange')

# the reporters
reporter_index <- c('Spain', 'France', 'Italy') 

## sample product data ----
dataList <- list(
  apple_2015 = tibble(
    Product = 'Apple',
    Reporter = reporter_index,
    Total_trade = c(100, 200, 300)

  ),

  pear_2015 = tibble(
    Product = 'Pear',
    Reporter = reporter_index,
    Total_trade = c(400, 500, 600)

  ),

  banana_2015 = tibble(
    Product = 'Banana',
    Reporter = reporter_index,
    Total_trade = c(100, 150, 600)

  ),

  orange_2015 = tibble(
    Product = 'Orange',
    Reporter = reporter_index,
    Total_trade = c(400, 500, 600)

  ),

  apple_2016 = tibble(
    Product = 'Apple',
    Reporter = reporter_index,
    Total_trade = c(200, 250, 300)

  ),

  pear_2016 = tibble(
    Product = 'Pear',
    Reporter = reporter_index,
    Total_trade = c(300, 500, 600)

  ),

  banana_2016 = tibble(
    Product = 'Banana',
    Reporter = reporter_index,
    Total_trade = c(200, 250, 300)

  ),

  orange_2016 = tibble(
    Product = 'Orange',
    Reporter = reporter_index,
    Total_trade = c(300, 500, 600)

  )
)

## calculation ----

# create merged list, add year and bind rows into one large tibble
mergedDF <- lapply(1:length(dataList), function(i) {

  dataList[[i]] %>%
    mutate(Year = parse_number(names(dataList))[i])

}

) %>%
  bind_rows() %>%
  group_by(Year, Reporter)

# function with different combinations of products
resultsDF <- (function(){

tmpList <- mergedDF %>%
  group_split()

lapply(1:length(tmpList), function(j) {

tmpDF <- tibble('Year' = unique(tmpList[[j]]$Year),
       'Reporter' = unique(tmpList[[j]]$Reporter))

tmpDF[combn(tmpList[[j]]$Product, 2, function(i) paste0(i[1], i[2]))] <- 
  combn(tmpList[[j]]$Total_trade, 2, function(i) i[1] / i[2])

return(tmpDF)

}
) %>%
  bind_rows()

})()

【讨论】：

如果我要计算更多的产品怎么办？我不能只创建 year_index
如果您有更多产品，您可以根据自己的意愿选择combn(product_index, 2, function(i) i)