基于更改输入列组合的新列值答案

【问题标题】：New column values based on combination of changing input columns基于更改输入列组合的新列值
【发布时间】：2018-09-19 02:16:08
【问题描述】：

我有一个针对不同国家和次国家区域的数据集。变量country 确定国家（a、b、c），变量region_country_X 具有该国家不同子区域的数值（对于另一个国家的情况为NA） .数据框见以下代码：

set.seed(6543)
df <- data.frame(country = sample(c("a", "b", "c"), 1000, replace = TRUE),
         region_country_a = sample(c(0, 1, 2, 3, 4, 5, 6, 7), 1000, replace = TRUE),
         region_country_b = sample(c(0, 1, 2, 3, 4, 5, 6, 7, 8), 1000, replace = TRUE),
         region_country_c = sample(c(0, 1, 2, 3), 1000, replace = TRUE))
df$region_country_a <- ifelse(df$country != "a", NA, df$region_country_a)
df$region_country_b <- ifelse(df$country != "b", NA, df$region_country_b)
df$region_country_c <- ifelse(df$country != "c", NA, df$region_country_c)

数据框的头部是这样的：

> head(df, 5)
  country region_country_a region_country_b region_country_c
1       c                NA                NA                 1
2       b                NA                 3                NA
3       a                 2                NA                NA
4       c                NA                NA                 1
5       b                NA                 2                NA

我现在想在一个列中添加一个包含所有区域的新变量，但不知道如何最好地解决这个问题。

我希望r 执行以下操作：

新增栏目regions
遍历列country和region_country_a、..._b、..._c，并为每个组合获取一个新值（从0开始计数，国家a，地区0向上，添加下一个每个新国家/地区组合的最高数字）。

生成的数据框如下所示：

  country regions_country_a regions_country_b regions_country_c    regions
1       c                NA                NA                 1    18      #counting with a/0 = 0 etc., a7 = 7, b0 = 8 etc. 
2       b                NA                 3                NA    11       
3       a                 2                NA                NA    2        
4       c                NA                NA                 1    18       
5       b                NA                 2                NA    10

我不确定如何最好地解决这个问题，因为我对r 很陌生，有人能指出我正确的方向吗？

【问题讨论】：

标签： r

【解决方案1】：

如果我理解正确的话。您正在尝试用数字对四列的每个组合进行编码。如果是这样，您将获得它们的唯一组合，然后从行号中获取一个 id 并将其连接回您的原始数据框。

library(dplyr)

df_un <- unique(df) %>%
  arrange(country) %>%
  mutate(region=row_number())

df <- left_join(df, df_un, by = c("country", "region_country_a", "region_country_b", "region_country_c"))

【讨论】：

【解决方案2】：

如果你只是减去1，你可以使用dplyr::group_indices

library(dplyr)
df %>%
  mutate(id = group_indices(., country, region_country_a, region_country_b, region_country_c)-1) %>%
  head(5)

#   country region_country_a region_country_b region_country_c id
# 1       c                0                0                1 18
# 2       b                0                3                0 11
# 3       a                2                0                0  2
# 4       c                0                0                1 18
# 5       b                0                2                0 10

【讨论】：

这比我的解决方案干净得多！
嘿，这是一个非常优雅的解决方案 - 非常感谢！但是，我发现该列不是永久性的；我怎样才能将它永久添加到数据框中（我稍后需要它）？