【问题标题】:R - calculate annual population conditional on survival in every yearR - 以每年的生存为条件计算年人口
【发布时间】:2020-11-11 13:59:20
【问题描述】:

我有一个包含三列的数据框:birth_year、death_year、gender。 我必须计算给定范围内(1950:1980)每年的总活着的男性和女性人口。 数据框如下所示:

birth_year   death_year   gender
1934         1988         male
1922         1993         female
1890         1966         male
1901         1956         male
1946         2009         female
1909         1976         female
1899         1945         male
1887         1949         male
1902         1984         female

如果 death_year > x & 出生年份 ,则此人在 x 年还活着

我正在寻找的输出是这样的:

year    male    female
1950    3       4
1951    2       3
1952    4       3
1953    4       5
.
.
1980    6       3

谢谢!

【问题讨论】:

    标签: r conditional-statements


    【解决方案1】:

    这行得通吗:

    library(tidyr)
    library(purrr)
    library(dplyr)
    df %>% mutate(year = map2(1950,1980, seq)) %>% unnest(year) %>% 
    mutate(isalive = case_when(year >= birth_year & year < death_year ~ 1, TRUE ~ 0))  %>% 
    group_by(year, gender) %>% summarise(alive = sum(isalive)) %>% 
    pivot_wider(names_from = gender, values_from = alive) %>% print( n = 50)
    `summarise()` regrouping output by 'year' (override with `.groups` argument)
    # A tibble: 31 x 3
    # Groups:   year [31]
        year female  male
       <int>  <dbl> <dbl>
     1  1950      4     3
     2  1951      4     3
     3  1952      4     3
     4  1953      4     3
     5  1954      4     3
     6  1955      4     3
     7  1956      4     2
     8  1957      4     2
     9  1958      4     2
    10  1959      4     2
    11  1960      4     2
    12  1961      4     2
    13  1962      4     2
    14  1963      4     2
    15  1964      4     2
    16  1965      4     2
    17  1966      4     1
    18  1967      4     1
    19  1968      4     1
    20  1969      4     1
    21  1970      4     1
    22  1971      4     1
    23  1972      4     1
    24  1973      4     1
    25  1974      4     1
    26  1975      4     1
    27  1976      3     1
    28  1977      3     1
    29  1978      3     1
    30  1979      3     1
    31  1980      3     1
    

    使用的数据:

    df
    # A tibble: 9 x 3
      birth_year death_year gender
           <dbl>      <dbl> <chr> 
    1       1934       1988 male  
    2       1922       1993 female
    3       1890       1966 male  
    4       1901       1956 male  
    5       1946       2009 female
    6       1909       1976 female
    7       1899       1945 male  
    8       1887       1949 male  
    9       1902       1984 female
    

    【讨论】:

    • Note 1976 有一个额外的女性死亡,因为那一年有一名女性死亡。这不满足条件 df$death_year > x。所以你可能想要年份
    【解决方案2】:

    这是一个简单的基本 R 解决方案。对一个逻辑向量求和会得到你的存活或死亡计数,因为 TRUE 为 1,FALSE 为 0。

    number_alive <- function(range, df){
      sapply(range, function(x) sum((df$death_year > x) & (df$birth_year <= x)))
    }
    
    output <- data.frame('year' = 1950:1980,
                         'female' = number_alive(1950:1980, df[df$gender == 'female']),
                         'male' = number_alive(1950:1980, df[df$gender == 'male']))
    # year female male
    # 1  1950    4      3
    # 2  1951    4      3
    # 3  1952    4      3
    # 4  1953    4      3
    # 5  1954    4      3
    # 6  1955    4      3
    # 7  1956    4      2
    # 8  1957    4      2
    # 9  1958    4      2
    # 10 1959    4      2
    # 11 1960    4      2
    # 12 1961    4      2
    # 13 1962    4      2
    # 14 1963    4      2
    # 15 1964    4      2
    # 16 1965    4      2
    # 17 1966    4      1
    # 18 1967    4      1
    # 19 1968    4      1
    # 20 1969    4      1
    # 21 1970    4      1
    # 22 1971    4      1
    # 23 1972    4      1
    # 24 1973    4      1
    # 25 1974    4      1
    # 26 1975    4      1
    # 27 1976    3      1
    # 28 1977    3      1
    # 29 1978    3      1
    # 30 1979    3      1
    # 31 1980    3      1
    

    【讨论】:

      【解决方案3】:

      此方法使用ifelse 来确定是存活 (1) 还是死亡 (0)。

      数据:

      df <- "birth_year   death_year   gender
      1934         1988         male
      1922         1993         female
      1890         1966         male
      1901         1956         male
      1946         2009         female
      1909         1976         female
      1899         1945         male
      1887         1949         male
      1902         1984         female"
      
      df <- read.table(text = df, header = TRUE)
      

      代码:

      library(dplyr)
      library(tidyr)
      library(tibble)
      library(purrr)
      
      df %>% 
        mutate(year = map2(1950,1980, seq)) %>% 
        unnest(year) %>% 
        select(year, birth_year, death_year, gender) %>% 
        mutate(
          alive = ifelse(year >= birth_year & year <= death_year, 1, 0)
        ) %>% 
        group_by(year, gender) %>% 
        summarise(
          is_alive = sum(alive)
        ) %>% 
        pivot_wider(
          names_from = gender,
          values_from = is_alive
        ) %>% 
        select(year, male, female)
      

      输出:

      
      #> # A tibble: 31 x 3
      #> # Groups:   year [31]
      #>     year  male female
      #>    <int> <dbl>  <dbl>
      #>  1  1950     3      4
      #>  2  1951     3      4
      #>  3  1952     3      4
      #>  4  1953     3      4
      #>  5  1954     3      4
      #>  6  1955     3      4
      #>  7  1956     3      4
      #>  8  1957     2      4
      #>  9  1958     2      4
      #> 10  1959     2      4
      #> # … with 21 more rows
      

      reprex package (v0.3.0) 于 2020 年 11 月 11 日创建

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-11-10
        • 1970-01-01
        相关资源
        最近更新 更多