【问题标题】:Assign identity code based on factor name [duplicate]根据因子名称分配身份代码[重复]
【发布时间】:2019-07-30 22:33:51
【问题描述】:

我想根据数据点的“名称”因素为每个数据点分配一个身份,并且在该因素相同的情况下,它必须具有相同的身份编号或 ID 标签。我有大量数据,所以这可以是一个随机的身份代码 - 它只需要将具有相同名称的人分组到一个单独的 ID 下,这样我可以使名称匿名,但仍将数据点分组在一起。

例如在“Aur”下面的虚拟数据中可能是A,“Cos”= B ... next ,C, D.... A1, B1, ...A2....等。

我认为这将是一些 group_by(Name, mutate()) 函数?但我不确定。

这是一些虚拟数据:

df <- structure(list(`Local Time` = structure(c(1559388960, 
1559389200, 1559394840, 1559397180, 1559397900, 1559398380, 
1559398560, 1559398680, 1559398740, 1559398800, 1559399160, 
1559399280, 1559399400, 1559399580, 1559399640, 1559399820, 
1559399940, 1559400120, 1559400240, 1559400780, 1559400840, 
1559400960, 1559401080, 1559401260, 1559401380, 1559383560, 
1559389200, 1559389440, 1559395080, 1559395320, 1559397180, 
1559397900, 1559398200, 1559398440, 1559398680, 1559398920, 
1559399220, 1559399520, 1559399820, 1559400120, 1559400360, 
1559400660, 1559400960, 1559401200, 1559401500, 1559401740, 
1559402040, 1559402280, 1559402580, 1559402880
), class = c("POSIXct", "POSIXt"), tzone = ""), COG = c(315, 
352.6, 265.6, 214.9, 240.8, 245.5, 240.3, 250.5, 262.4, 269.8, 
281.1, 262.9, 253.1, 247.7, 255.5, 249.4, 263.2, 268.6, 279.6, 
274.3, 254.6, 246.6, 253.7, 242.3, 163.5, 90, 88, 89, 93, 96, 
95, 97, 97, 98, 98, 95, 93, 94, 92, 91, 91, 91, 91, 90, 90, 92, 
89, 89, 89, 88), NAME = c("Aur", "Aur", "Aur", "Aur", "Aur", 
"Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", 
"Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", 
"Aur", "Aur", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", 
"Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", 
"Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos"
 )), row.names = c(NA, -50L), class = c("tbl_df", "tbl", 
"data.frame"))

【问题讨论】:

  • 将变量设为factor 的数字表示就足够了 - as.integer(factor(df$NAME))

标签: r filter group-by dplyr


【解决方案1】:

您可以使用dplyr::group_indices()

library(dplyr)

df <- df %>%
  mutate(id = group_indices(., NAME))

【讨论】:

    【解决方案2】:

    ID 可以是数字吗?它也应该可以工作。

    unique_name <- unique(df$NAME) 
    
    id_mapping <- 1:length(unique_name) %>%
        setNames(unique_name)
    
    df %>%
        mutate(id = id_mapping[NAME])
    
    
    # A tibble: 50 x 4
       `Local Time`          COG NAME     id
       <dttm>              <dbl> <chr> <int>
     1 2019-06-01 04:36:00  315  Aur       1
     2 2019-06-01 04:40:00  353. Aur       1
     3 2019-06-01 06:14:00  266. Aur       1
     4 2019-06-01 06:53:00  215. Aur       1
     5 2019-06-01 07:05:00  241. Aur       1
     6 2019-06-01 07:13:00  246. Aur       1
     7 2019-06-01 07:16:00  240. Aur       1
     8 2019-06-01 07:18:00  250. Aur       1
     9 2019-06-01 07:19:00  262. Aur       1
    10 2019-06-01 07:20:00  270. Aur       1
    # ... with 40 more rows
    

    【讨论】:

      【解决方案3】:

      带有data.table 的选项是.GRP

      library(data.table)
      setDT(df)[, id := .GRP,.(NAME)][]
      

      【讨论】:

        猜你喜欢
        • 2021-03-05
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-06-18
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多