【问题标题】:How do I sum up the values in a column, conditional on whether the values of different columns match in R? [closed]如何总结列中的值,条件是不同列的值是否在 R 中匹配? [关闭]
【发布时间】:2020-10-01 15:27:03
【问题描述】:
我有这个数据集,其中包含特定日期每个县确诊的冠状病毒病例的值。我想总结特定州所有县的每日冠状病毒病例的值,所以我有一列显示每个州每天的病例数。
【问题讨论】:
标签:
r
dplyr
conditional-statements
【解决方案1】:
这是dplyr(1.0.0 或更高版本)的一种方法:
df <- read.csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/live/us-counties.csv')
library(dplyr)
df %>%
group_by(date,state) %>%
summarize(across(c(-fips,-county), sum))
# A tibble: 55 x 8
# Groups: date [1]
date state cases deaths confirmed_cases confirmed_deaths probable_cases probable_deaths
<fct> <fct> <int> <int> <int> <int> <int> <int>
1 2020-06-11 Alabama 21989 744 21626 739 NA NA
2 2020-06-11 Alaska 642 9 642 NA NA NA
3 2020-06-11 Arizona 29981 1100 NA NA NA NA
4 2020-06-11 Arkansas 10368 165 10368 165 NA NA
5 2020-06-11 California 140123 4869 140123 4869 NA NA
6 2020-06-11 Colorado 28484 1573 NA NA NA NA
7 2020-06-11 Connecticut 44347 4120 42448 3283 1899 837
8 2020-06-11 Delaware 10056 413 NA NA NA NA
9 2020-06-11 District of Columbia 9537 499 9537 499 NA NA
10 2020-06-11 Florida 67363 2800 67363 2800 NA NA
# … with 45 more rows
【解决方案2】:
base R 中带有aggregate 的选项
aggregate(. ~ date + state, df[setdiff(names(df), 'county')], sum)
数据
df <- read.csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/live/us-counties.csv')