【问题标题】:Frequency per Week - Dialysis Dataset每周频率 - 透析数据集
【发布时间】:2019-03-03 17:24:36
【问题描述】:

我有一个从 1995 年到 2014 年运行的透析数据集。它有变​​量“id”、“name”、“date”和“modality”

我对“高清”模式感兴趣。

数据框遵循以下结构: - 从 1995 年 4 月开始(然后逐月列出到 2014 年 12 月) - 可以在多个月内找到个人(即 Name1 可能在 1995 年 4 月至 1997 年 3 月期间接受过透析;因此为什么要多次列出) - 带有日期的每一行都是一个疗程(我需要计算出每位患者每周的疗程频率)。

希望以上内容对我正在尝试做的事情有意义。

这是一个数据集的例子:

id          name       date         modality    
10101650    name1      03-Apr-95    HD
10101650    name1      05-Apr-95    HD
10101650    name1      07-Apr-95    HD
10101650    name1      10-Apr-95    HD
10101650    name1      12-Apr-95    HD
10101650    name1      14-Apr-95    HD
10101650    name1      17-Apr-95    HD
10101650    name1      19-Apr-95    HD
10101650    name1      21-Apr-95    HD
10101650    name1      22-Apr-95    HD
10101650    name1      24-Apr-95    HD
10101650    name1      26-Apr-95    HD
10101650    name1      28-Apr-95    HD
10206042    name2      03-Apr-95    HD
10206042    name2      05-Apr-95    HD
10206042    name2      07-Apr-95    HD
10206042    name2      10-Apr-95    HD
10206042    name2      12-Apr-95    HD
10206042    name2      14-Apr-95    HD
10206042    name2      17-Apr-95    HD
10206042    name2      19-Apr-95    HD
10206042    name2      21-Apr-95    HD
10206042    name2      24-Apr-95    HD
10206042    name2      26-Apr-95    HD
10206042    name2      28-Apr-95    HD
10101650    name1      01-May-95    HD
10101650    name1      03-May-95    HD
10101650    name1      05-May-95    HD
10101650    name1      08-May-95    HD
10101650    name1      10-May-95    HD
10101650    name1      12-May-95    HD
10101650    name1      15-May-95    HD
10101650    name1      17-May-95    HD
10101650    name1      19-May-95    HD
10101650    name1      22-May-95    HD
10101650    name1      24-May-95    HD
10101650    name1      26-May-95    HD
10101650    name1      29-May-95    HD
10101650    name1      31-May-95    HD
10205987    name3      01-May-95    HD
10205987    name3      03-May-95    HD
10205987    name3      05-May-95    HD
10205987    name3      08-May-95    HD
10205987    name3      10-May-95    HD
10205987    name3      12-May-95    HD
10205987    name3      15-May-95    HD
10205987    name3      17-May-95    HD
10205987    name3      19-May-95    HD
10205987    name3      22-May-95    HD
10205987    name3      24-May-95    HD
10205987    name3      26-May-95    HD
10205987    name3      29-May-95    HD
10205987    name3      31-May-95    HD
10206042    name2      01-May-95    HD
10206042    name2      03-May-95    HD
10206042    name2      05-May-95    HD
10206042    name2      08-May-95    HD
10206042    name2      10-May-95    HD
10206042    name2      12-May-95    HD
10206042    name2      15-May-95    HD
10206042    name2      17-May-95    HD
10206042    name2      19-May-95    HD
10206042    name2      22-May-95    HD
10206042    name2      24-May-95    HD
10206042    name2      26-May-95    HD

如前所述,我需要每位患者每周的治疗次数。这将是一个平均值,因为患者可以进行几年的透析。

【问题讨论】:

    标签: r dataframe dataset frequency frequency-analysis


    【解决方案1】:

    这就是如何用dplyrlubridate package -

    library(dplyr)
    library(lubridate)
    
    df$week_year <- paste(week(df$date), year(df$date), sep = "-")
    filter(df, modality == "HD") %>%
    group_by(id, name, week_year) %>%
      summarise(sessions = n()) %>%
      group_by(id, name) %>%
      summarize(avg_sessions_per_week = mean(sessions))
    
    # A tibble: 3 x 3
    # Groups:   id [?]
    #         id name  avg_sessions_per_week
    #      <int> <chr>                 <dbl>
    # 1 10101650 name1                  3.00
    # 2 10205987 name3                  2.80
    # 3 10206042 name2                  3.00
    

    数据 -

    df <- structure(list(id = c(10101650L, 10101650L, 10101650L, 10101650L, 
    10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 
    10101650L, 10101650L, 10101650L, 10206042L, 10206042L, 10206042L, 
    10206042L, 10206042L, 10206042L, 10206042L, 10206042L, 10206042L, 
    10206042L, 10206042L, 10206042L, 10101650L, 10101650L, 10101650L, 
    10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 
    10101650L, 10101650L, 10101650L, 10101650L, 10101650L, 10205987L, 
    10205987L, 10205987L, 10205987L, 10205987L, 10205987L, 10205987L, 
    10205987L, 10205987L, 10205987L, 10205987L, 10205987L, 10205987L, 
    10205987L, 10206042L, 10206042L, 10206042L, 10206042L, 10206042L, 
    10206042L, 10206042L, 10206042L, 10206042L, 10206042L, 10206042L, 
    10206042L), name = c("name1", "name1", "name1", "name1", "name1", 
    "name1", "name1", "name1", "name1", "name1", "name1", "name1", 
    "name1", "name2", "name2", "name2", "name2", "name2", "name2", 
    "name2", "name2", "name2", "name2", "name2", "name2", "name1", 
    "name1", "name1", "name1", "name1", "name1", "name1", "name1", 
    "name1", "name1", "name1", "name1", "name1", "name1", "name3", 
    "name3", "name3", "name3", "name3", "name3", "name3", "name3", 
    "name3", "name3", "name3", "name3", "name3", "name3", "name2", 
    "name2", "name2", "name2", "name2", "name2", "name2", "name2", 
    "name2", "name2", "name2", "name2"), date = structure(c(9223, 
    9225, 9227, 9230, 9232, 9234, 9237, 9239, 9241, 9242, 9244, 9246, 
    9248, 9223, 9225, 9227, 9230, 9232, 9234, 9237, 9239, 9241, 9244, 
    9246, 9248, 9251, 9253, 9255, 9258, 9260, 9262, 9265, 9267, 9269, 
    9272, 9274, 9276, 9279, 9281, 9251, 9253, 9255, 9258, 9260, 9262, 
    9265, 9267, 9269, 9272, 9274, 9276, 9279, 9281, 9251, 9253, 9255, 
    9258, 9260, 9262, 9265, 9267, 9269, 9272, 9274, 9276), class = "Date"), 
        modality = c("HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", 
        "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", 
        "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", 
        "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", 
        "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", 
        "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", "HD", 
        "HD", "HD", "HD", "HD", "HD", "HD", "HD")), .Names = c("id", 
    "name", "date", "modality"), row.names = c(NA, -65L), class = "data.frame")
    

    【讨论】:

    • 谢谢,我的工作只需将“因子”到“日期” span>更改日期变量的类
    猜你喜欢
    • 1970-01-01
    • 2020-08-25
    • 1970-01-01
    • 2019-09-25
    • 1970-01-01
    • 2023-01-25
    • 2022-08-02
    • 2018-07-24
    • 2022-09-23
    相关资源
    最近更新 更多