【发布时间】:2020-06-26 21:18:59
【问题描述】:
我每天都会收到 CSV 报告,每个报告都有相同数量的变量,但时间不同。我想根据日期运行一些简单的分析并保存结果。我认为for 循环可以完成这项工作,但我只知道基础知识。理想情况下,我只需要每月运行一次脚本并获得结果。任何指导或建议表示赞赏。
假设我在一个文件夹中有两个 CSV 报告:
#File 1 - 20200624.csv
Date Market Salesman Product Quantity Price Cost
6/24/2020 A MF Apple 20 1 0.5
6/24/2020 A RP Apple 15 1 0.5
6/24/2020 A RP Banana 20 2 0.5
6/24/2020 A FR Orange 20 3 0.5
6/24/2020 B MF Apple 20 1 0.5
6/24/2020 B RP Banana 20 2 0.5
#File 2 - 20200625.csv
Date Market Salesman Product Quantity Price Cost
6/25/2020 A MF Apple 10 1 0.6
6/25/2020 A MF Banana 15 1 0.6
6/25/2020 A RP Banana 10 2 0.6
6/25/2020 A FR Orange 15 3 0.6
6/25/2020 B MF Apple 20 1 0.6
6/25/2020 B RP Banana 20 2 0.6
我使用以下代码将所有文件导入 R:
library(readr)
library(dplyr)
#Import files
files <- list.files(path = "~/JuneReports",
pattern = "*.csv", full.names = T)
tbl <- sapply(files, read_csv, simplify=FALSE) %>%
bind_rows(.id = "id")
#Remove the "id" column
tbl2 <- tbl[,-1]
#Subset the data frame to get only Mark A, as Market B is irrelavant.
tbl3 <- subset(tbl2, Market == "A")
head(tbl3)
# A tibble: 6 x 7
Date Market Salesman Product Quantity Price Cost
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 6/24/2020 A MF Apple 20 1 0.5
2 6/24/2020 A RP Apple 15 1 0.5
3 6/24/2020 A RP Banana 20 2 0.5
4 6/24/2020 A FR Orange 20 3 0.5
5 6/25/2020 A MF Apple 10 1 0.6
6 6/25/2020 A MF Banana 15 1 0.6
以下是我想要得到的结果:
Date Market Revenue Total Cost Apples Sold Bananas Sold Oranges Sold
6/24/2020 A 135 37.5 35 20 20
6/25/2020 A 90 30 15 25 15
#Revenue = sumproduct(Quantity, Price)
#Total Cost = sumproduct(Quantity, Cost)
#Apples/Bananas/Oranges Sold = sum(Product == "Apple/Banana/Orange")
【问题讨论】:
-
您可以使用
%*% -
@akrun 你能提供更多细节吗?
-
我的解决方案输出基于您显示的
head数据