【问题标题】:Create Frequency Table with Ranges创建具有范围的频率表
【发布时间】:2022-05-08 03:33:20
【问题描述】:

我有一个操作员计时数据的小型数据集。操作员 1-6 的响应是定时的。我需要创建一个频率表来总结他们以 2 秒为间隔的响应时间。

数据如下所示:

Operator 1 24.5
Operator 1 26.3
Operator 1 32.9
Operator 1 33.4
Operator 1 40.5
Operator 1 47.7

所需的输出如下所示:

Seconds Operator 1  Operator 2  Operator 3
0-2     0   2   5
3-4     1   5   3
5-6     5   0   4

【问题讨论】:

  • 您的第一个间隔在您的示例中是 3 秒间隔,是故意的吗?

标签: r


【解决方案1】:

我模拟了一些看起来像您的数据的数据,以向您展示如何做到这一点。您必须为管道%>% 安装tibblemagrittrdplyr 包,并且这些功能才能正常工作:

从这里开始:

library(tibble)
library(magrittr)
library(dplyr)

# simulate data
ops <- sample(c("Operator 1","Operator 2","Operator 3"),100,replace=TRUE)
tms <- rnorm(100,mean=20,sd=4)
df <- as.tibble(cbind(ops,tms))
df$ops <- as.factor(df$ops)
df$tms <- as.numeric(df$tms)

然后按您定义的 bin 对您的df 进行排序(更改breaks 之后的代码以根据您的计时数据的特征获得您想要的方式):

> results <- df %>% group_by(ops) %>% 
    mutate(category=cut(tms, breaks=c(-Inf,0,10,20,30,Inf), 
    labels=c("-Inf-0 sec","0-10 sec","10-20 sec","20-30 sec","30-Inf sec")))
> results
# A tibble: 100 x 3
# Groups:   ops [3]
   ops          tms category 
   <fct>      <dbl> <fct>    
 1 Operator 1  16.6 10-20 sec
 2 Operator 2  25.1 20-30 sec
 3 Operator 3  20.4 20-30 sec
 4 Operator 1  19.7 10-20 sec
 5 Operator 3  23.6 20-30 sec
 6 Operator 3  22.6 20-30 sec
 7 Operator 1  14.6 10-20 sec
 8 Operator 3  19.6 10-20 sec
 9 Operator 3  22.3 20-30 sec
10 Operator 2  18.1 10-20 sec
# ... with 90 more rows

您可以像这样检查上面指定格式的数据:

> table(results$ops,results$category)

             -Inf-0 sec 0-10 sec 10-20 sec 20-30 sec 30-Inf sec
  Operator 1          0        0        24        13          1
  Operator 2          0        0        13        13          0
  Operator 3          0        0        12        24          0

> table(results$category,results$ops)

             Operator 1 Operator 2 Operator 3
  -Inf-0 sec          0          0          0
  0-10 sec            0          0          0
  10-20 sec          23         22         18
  20-30 sec          12         13         12
  30-Inf sec          0          0          0

【讨论】:

    【解决方案2】:

    使用tidyversecutr::smart_cut,并借用@mysteRious的数据:

    数据

    set.seed(1)
    ops <- sample(c("Operator 1","Operator 2","Operator 3"),100,replace=TRUE)
    tms <- rnorm(100,mean=20,sd=4)
    df <- as.tibble(cbind(ops,tms))
    df$ops <- as.factor(df$ops)
    df$tms <- as.numeric(df$tms)
    

    解决方案:

    library(tidyverse)
    # devtools::install_github("moodymudskipper/cutr")
    library(cutr)
    df %>%
      mutate(Seconds = smart_cut(
        tms,list(2,0), "width", labels = ~paste0(.y[1], "-", .y[2]-1), open_end=TRUE)) %>%
      count(ops, Seconds) %>%
      spread(ops, n)
    
    # # A tibble: 9 x 4
    #   Seconds `Operator 1` `Operator 2` `Operator 3`
    #   <ord>          <int>        <int>        <int>
    # 1 12-13              4            2            1
    # 2 14-15              2            1            4
    # 3 16-17              6            7            6
    # 4 18-19              7            7            8
    # 5 20-21              3           10            6
    # 6 22-23              1            5            4
    # 7 24-25              2            3            4
    # 8 26-27              1            2            1
    # 9 28-29              1            1            1
    

    【讨论】:

      【解决方案3】:

      这是一个解决方案,它使用基本 R 的 cut() 函数创建间隔,并使用 reshape2 包中的 dcast() 函数从长格式重塑为宽格式,从而聚合(计数):

      # create sample dataset
      set.seed(123L)
      n_row <- 100L
      df <- data.frame(
        ops = sample(c("Operator 1", "Operator 2", "Operator 3"), n_row, replace = TRUE),
        tms = rnorm(n_row, mean = 20, sd = 4))
      
      
      # define parameter
      intval <- 2
      # create pretty breaks depending on range of response times
      breaks <-with(df, 
                    seq(floor(min(tms) / intval) * intval, max(tms) + intval, intval))
      # reshape from long to wide format and aggregate by interval
      library(reshape2)
      dcast(df, cut(tms, breaks) ~ ops, length, value.var = "tms")
      
         cut(tms, breaks) Operator 1 Operator 2 Operator 3
      1           (10,12]          1          0          1
      2           (12,14]          1          4          1
      3           (14,16]          2          4          3
      4           (16,18]          5          7          3
      5           (18,20]          9          3          9
      6           (20,22]          5          9          7
      7           (22,24]          5          2          4
      8           (24,26]          3          2          3
      9           (26,28]          1          2          1
      10          (28,30]          1          1          1
      

      【讨论】:

        【解决方案4】:

        请尝试如下安装“descriptr”包:

        install.packages("descriptr")
        

        然后——调用

        ds_freq_table(Arg1,Arg2,N_intervals)  
        

        对于频率表,其中Arg1——为数据框的名称,Arg2——为统计变量的名称;

        【讨论】:

          猜你喜欢
          • 2023-03-25
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2016-01-27
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2017-05-22
          相关资源
          最近更新 更多