【发布时间】:2021-11-15 16:26:40
【问题描述】:
我有一个数据集,其中包含在特定时间段内发生的所有自然灾害。我想按年份和州对它们进行总结。总结时,我想创建一个变量(= d_disasters),向我展示自然灾害的独特类型,例如对于德克萨斯,我希望只显示飓风。
我目前正在使用 dplyr:group_by 和 dplyr::summarize 按年份和状态汇总我的数据 & dplyr::mutate 和 dplyr:map_int 以创建具有每年自然灾害总数的新变量($n_disasters 使用长度)和自然灾害的唯一数量($n_distinct 使用 n_distinct())。
起始数据集:
structure(list(year = c(1998, 1998, 1998, 1998, 1998), country = c("US",
"US", "US", "US", "US"), state = c("Texas", "Texas", "California",
"New York", "New York"), deaths = c(12, 5, 9, 10, 18), injured = c(3,
1, 3, 5, 9), disastertype = c("Hurricane", "Hurricane", "Wild fire",
"Flood", "Epidemic")), class = "data.frame", row.names = c(NA,
-5L))
结果数据集:
structure(list(year = c(1998, 1998, 1998), state = c("California",
"New York", "Texas"), u_disastertype = c("Wild fire", "Flood, Epidemic",
"Hurricane"), disastertype = c("Wild fire", "Flood, Epidemic",
"Hurricane, Hurricane"), deaths = c(9, 28, 17), injured = c(3,
14, 4), n_distinct = c(1L, 2L, 1L), n_disasters = c(1L, 2L, 2L
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L), groups = structure(list(year = 1998, .rows = structure(list(
1:3), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L), .drop = TRUE))
编辑:为澄清而编辑。
【问题讨论】: