将列添加到数据框，从 1 到现有分组行的唯一长度答案

【问题标题】：add column to dataframes from 1 to unique length of existing grouped rows将列添加到数据框，从 1 到现有分组行的唯一长度
【发布时间】：2017-05-25 00:48:47
【问题描述】：

这是我的例子 df:

df = read.table(text = 'colA 
22
22
22
45
45
11
11
87
90
110
32
32', header = TRUE)

我只需要添加一个基于 colA 的新 col，其值从 1 到 colA 的唯一长度。

预期输出：

   colA   newCol 
    22     1
    22     1
    22     1
    45     2
    45     2
    11     3
    11     3
    87     4
    90     5
    110    6 
    32     7
    32     7

这是我尝试但没有成功的方法：

library(dplyr)
new_df = df %>%
  group_by(colA) %>% 
  mutate(newCol = seq(1, length(unique(df$colA)), by = 1))

谢谢

【问题讨论】：

cola 的值是否像您的示例中那样聚集，或者您可能有一个像 22 45 22 这样的序列？你能恢复一个值吗？
它们是集群的。谢谢

标签： r dataframe dplyr col

【解决方案1】：

newcol = c(1, 1+cumsum(diff(df$colA) != 0))
 [1] 1 1 1 2 2 3 3 4 5 6 7 7

【讨论】：

【解决方案2】：

dplyr 包具有获取组索引的功能：

df$newcol = group_indices(df,colA)

这会返回：

    colA newcol
1    22      2
2    22      2
3    22      2
4    45      4
5    45      4
6    11      1
7    11      1
8    87      5
9    90      6
10  110      7
11   32      3
12   32      3

虽然索引不是按照出现的顺序排列的。

您也可以使用factor：

df$newcol = as.numeric(factor(df$colA,levels=unique(df$colA)))

【讨论】：

【解决方案3】：

另一种选择：您可以利用因子与基础整数相关的事实。首先创建一个与列相同级别的新因子变量，然后将其转换为数值。

newCol <- factor(df$colA, 
    levels = unique(df$colA))

df$newCol <- as.numeric(newCol)
df

   colA newCol
1    22      1
2    22      1
3    22      1
4    45      2
5    45      2
6    11      3
7    11      3
8    87      4
9    90      5
10  110      6
11   32      7
12   32      7

【讨论】：