【发布时间】:2018-08-13 20:04:22
【问题描述】:
R 版本 3.4.2
我正在尝试根据基于同一数据帧的其他变量的条件创建 3 个新变量。我设法实现了我的预期,但它需要几行代码才能产生我认为其他方法(可能使用 dplyr)可以轻松提供的输出。
这是一个可复制的例子:
city <- c("London", "London", "Leeds","Leeds", "Leeds", "Nottingham", "Glasgow", "Belfast", "Belfast", "Oxford", "Oxford", "Southampton", "Aberdeen", "Bath", "Bath", "Bath", "Preston", "Preston", "Liverpool", "Derby","Hereford")
transport <- c("cars", "scooters", "cars", "scooters", "bikes", "cars", "scooters", "cars", "bikes", "scooters", "bikes", "bikes", "scooters", "cars", "scooters", "bikes", "scooters", "bikes", "bikes", "cars", "bikes")
number <- c("153", "21", "267", "87", "13", "95", "17", "199", "8", "34", "5", "23", "40", "142", "79", "28", "37", "22", "19", "83", "23")
df <- data.frame(city, transport, number)
我想知道每个城市中每种交通工具的百分比,如下所示:
> df
city transport number pct.cars pct.scooters pct.bikes
1 London cars 153 87.93 12.07 0.00
2 London scooters 21 87.93 12.07 0.00
3 Leeds cars 267 72.75 23.71 3.54
4 Leeds scooters 87 72.75 23.71 3.54
5 Leeds bikes 13 72.75 23.71 3.54
6 Nottingham cars 95 100.00 0.00 0.00
7 Glasgow scooters 17 0.00 100.00 0.00
8 Belfast cars 199 96.14 0.00 3.86
9 Belfast bikes 8 96.14 0.00 3.86
10 Oxford scooters 34 0.00 87.18 12.82
11 Oxford bikes 5 0.00 87.18 12.82
12 Southampton bikes 23 0.00 0.00 100.00
13 Aberdeen scooters 40 0.00 100.00 0.00
14 Bath cars 142 57.03 31.73 11.24
15 Bath scooters 79 57.03 31.73 11.24
16 Bath bikes 28 57.03 31.73 11.24
17 Preston scooters 37 0.00 62.71 37.29
18 Preston bikes 22 0.00 62.71 37.29
19 Liverpool bikes 19 0.00 0.00 100.00
20 Derby cars 83 100.00 0.00 0.00
21 Hereford bikes 23 0.00 0.00 100.00
产生上述数据框的代码如下:
df <- tbl_df(df) %>%
mutate(., pct.cars = rep(as.numeric(0), length.out = length(df$city)),
pct.scooters = rep(as.numeric(0), length.out = length(df$city)),
pct.bikes = rep(as.numeric(0), length.out = length(df$city)))
for (i in 1:nrow(df)) {
cur_city <- df$city[i]
n_cars <- df$number[df$city == cur_city & df$transport == "cars"]
n_scooters <- df$number[df$city == cur_city & df$transport == "scooters"]
n_bikes <- df$number[df$city == cur_city & df$transport == "bikes"]
if (length(n_cars) == 1 & length(n_scooters) < 1 & length(n_bikes) < 1) {
# case: there are no scooters nor bikes
df$pct.cars[i] <- 100
df$pct.scooters[i] <- 0
df$pct.bikes[i] <- 0
} else if (length(n_cars) < 1 & length(n_scooters) == 1 & length(n_bikes) == 1) {
# case: there are no cars
df$pct.cars[i] <- 0
df$pct.scooters[i] <- (n_scooters/(n_scooters + n_bikes))*100
df$pct.bikes[i] <- (n_bikes/(n_scooters + n_bikes))*100
} else if (length(n_cars) == 1 & length(n_scooters) == 1 & length(n_bikes) < 1) {
# case: there are no bikes
df$pct.cars[i] <- (n_cars/(n_cars + n_scooters))*100
df$pct.scooters[i] <- (n_scooters/(n_cars + n_scooters))*100
df$pct.bikes[i] <- 0
} else if (length(n_cars) == 1 & length(n_scooters) < 1 & length(n_bikes) == 1) {
# case: there are no scooters
df$pct.cars[i] <- (n_cars/(n_cars + n_bikes))*100
df$pct.scooters[i] <- 0
df$pct.bikes[i] <- (n_bikes/(n_cars + n_bikes))*100
} else if (length(n_cars) < 1 & length(n_scooters) == 1 & length(n_bikes) < 1) {
# case: there are no cars nor bikes
df$pct.cars[i] <- 0
df$pct.scooters[i] <- 100
df$pct.bikes[i] <- 0
} else if (length(n_cars) < 1 & length(n_scooters) < 1 & length(n_bikes) == 1) {
# case: there are no cars nor scooters
df$pct.cars[i] <- 0
df$pct.scooters[i] <- 0
df$pct.bikes[i] <- 100
} else if (length(n_cars) == 1 & length(n_scooters) == 1 & length(n_bikes) == 1 ) {
# case: there are cars, scooters & bikes
df$pct.cars[i] <- (n_cars/(n_cars + n_scooters + n_bikes))*100
df$pct.scooters[i] <- (n_scooters/(n_cars + n_scooters + n_bikes))*100
df$pct.bikes[i] <- (n_bikes/(n_cars + n_scooters + n_bikes))*100
}
}
如果有人有更简单的解决方案或建议(可能使用 dplyr),将不胜感激。提前谢谢!
【问题讨论】:
-
基础 R 中的一个衬垫:
prop.table(xtabs(as.numeric(as.character(number)) ~ city + transport, data=df), 1) -
-
太棒了——就是这么简单!非常感谢@thelatemail!
标签: r for-loop if-statement dataframe dplyr