【问题标题】:How to sum over subsets of rows in R如何对R中的行子集求和
【发布时间】:2022-01-20 02:44:14
【问题描述】:

我正在使用 R 与美国县级 voting data 合作,这是 MIT 管家的好人。我想知道每个候选人在每个县获得的总票数。对于某些州,例如威斯康星州,这很容易:

"state", "county_name", "county_fips", "candidate", "party", "candidatevotes", "totalvotes", "mode"<br>
"WISCONSIN", "WINNEBAGO", "55139", "JO JORGENSEN", "LIBERTARIAN", 1629, 94032, "TOTAL"

对于其他州,例如犹他州,这是可行的:

"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "EARLY"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "ELECTION DAY"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "MAIL"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 65949, 111403, "TOTAL"

然而,南卡罗来纳州存在问题:

"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 13656, 144050, "ABSENTEE BY MAIL"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 22075, 144050, "ELECTION DAY"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 18, 144050, "FAILSAFE"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 176, 144050, "FAILSAFE PROVISIONAL"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 22950, 144050, "IN-PERSON ABSENTEE"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 133, 144050, "PROVISIONAL"

在我看来,应该有某种方法可以循环 FIPS 代码和政党名称以生成每个县的总数,但我很难过。

【问题讨论】:

标签: r loops subset


【解决方案1】:

这能解决您的问题吗?

library(tidyverse)

df <- read_csv("~/Desktop/countypres_2000-2020.csv")
#> Rows: 72617 Columns: 12
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (8): state, state_po, county_name, county_fips, office, candidate, party...
#> dbl (4): year, candidatevotes, totalvotes, version
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

df %>%
  filter(year == 2020) %>%
  group_by(candidate, county_fips) %>%
  summarise(
    county_name,
    total_votes_per_candidate_per_county = sum(candidatevotes)
    ) %>%
  relocate(candidate, .before = 4) %>%
  distinct() %>%
  arrange(county_fips)
#> `summarise()` has grouped output by 'candidate', 'county_fips'. You can override using the `.groups` argument.
#> # A tibble: 11,902 × 4
#> # Groups:   candidate, county_fips [11,898]
#>    county_fips county_name candidate         total_votes_per_candidate_per_coun…
#>    <chr>       <chr>       <chr>                                           <dbl>
#>  1 01001       AUTAUGA     DONALD J TRUMP                                  19838
#>  2 01001       AUTAUGA     JOSEPH R BIDEN JR                                7503
#>  3 01001       AUTAUGA     OTHER                                             429
#>  4 01003       BALDWIN     DONALD J TRUMP                                  83544
#>  5 01003       BALDWIN     JOSEPH R BIDEN JR                               24578
#>  6 01003       BALDWIN     OTHER                                            1557
#>  7 01005       BARBOUR     DONALD J TRUMP                                   5622
#>  8 01005       BARBOUR     JOSEPH R BIDEN JR                                4816
#>  9 01005       BARBOUR     OTHER                                              80
#> 10 01007       BIBB        DONALD J TRUMP                                   7525
#> # … with 11,892 more rows

reprex package (v2.0.1) 于 2022-01-20 创建

【讨论】:

  • 成功了。谢谢!
  • 不客气@eBerm,但请注意,cmets 并不意味着“说谢谢”。请参阅What should I do when someone answers my question?,特别是:“请不要对您的问题或回答说“谢谢”添加评论。评论旨在要求澄清,留下建设性的批评,或添加相关但次要的附加信息 - 不是为了社交. 如果你想说“谢谢”,投票或接受那个人的回答,或者只是通过为别人的问题提供一个很好的答案来支付它。”
猜你喜欢
  • 1970-01-01
  • 2021-10-17
  • 1970-01-01
  • 2016-07-25
  • 2012-10-10
  • 1970-01-01
  • 2016-06-05
  • 1970-01-01
相关资源
最近更新 更多