连接具有相同后缀的变量对答案

【问题标题】：Concatenate pairs of variables with same suffix连接具有相同后缀的变量对
【发布时间】：2019-05-27 11:23:31
【问题描述】：

我有一个数据框，其中包含许多变量，我想将这些变量连接到同一数据框中的新变量中。我的数据框 df 的简化版本如下所示：

first.1 second.1 first.2 second.2 
1222 3223 3333 1221 
1111 2212 2232 2113

这是我在没有 for 循环的情况下效率低下的方法：

df$concatenated.1 <- paste0(df$first.1,"-",df$second.1)
df$concatenated.2 <- paste0(df$first.2,"-",df$second.2)

这会导致以下数据框df：

first.1 second.1 first.2 second.2 concatenated.1 concatenated.2 
1222 3223 3333 1221 1222-3223 3333-1221 
1111 2212 2232 2113 1111-2212 2232-2113

我有超过 2 对变量要连接，所以我想在 for 循环中执行此操作：

for (i in 1:2){
??
}

关于如何实现这一点的任何想法？

【问题讨论】：

如果我有你的问题。在这里你可以找到正确的答案stackoverflow.com/questions/18115550/…

标签： r for-loop

【解决方案1】：

如果您的真实数据的名称遵循此示例数据中的清晰模式，那么 Ronak 的 split / lapply 答案可能是最好的。如果没有，您可以创建名称向量并将Map 与paste 一起使用。

new.names <- paste0('concatenated.', 1:2)
names.1 <- paste0('first.', 1:2)
names.2 <- paste0('second.', 1:2)

df[new.names] <- Map(paste, df[names.1], df[names.2], sep = '-')

df

#   first.1 second.1 first.2 second.2 concatenated.1 concatenated.2
# 1    1222     3223    3333     1221      1222-3223      3333-1221
# 2    1111     2212    2232     2113      1111-2212      2232-2113

【讨论】：

【解决方案2】：

如果您能找到一种方法来拆分您的列，那么它会容易得多。例如，根据提供的示例，我们可以根据列名的最后一个字符（1、1、2、2）拆分列。

使用基础 R，我们使用 split.default 根据名称拆分列（如上所述），对于每个组，我们 paste 每行并添加新列。

group_names <- substring(names(df), nchar(names(df)))
df[paste0("concatenated.", unique(group_names))] <- 
     lapply(split.default(df,group_names),  function(x)  do.call(paste, c(x, sep = "-")))

df
#  first.1 second.1 first.2 second.2 concatenated.1 concatenated.2
#1    1222     3223    3333     1221      1222-3223      3333-1221
#2    1111     2212    2232     2113      1111-2212      2232-2113

【讨论】：

【解决方案3】：

这里有一个 tidyverse 解决方案，可以帮助您完成大部分工作。唯一的区别是列是按字母顺序输出的，即“firsts”，然后是“concatenated”s，然后是“seconds”。

txt <- 'first.1 second.1 first.2 second.2 
1222 3223 3333 1221 
1111 2212 2232 2113'

df <- read.table(text = txt, header = T)

library(tidyverse)

df2 <- df %>% 
  mutate(row.num = row_number()) %>% 
  gather(variable, value, -row.num) %>% 
  separate(variable, into = c('order', 'pair')) %>% 
  spread(order, value) %>% 
  mutate(concatenated = paste0(first, '-', second)) %>% 
  gather(variable, value, -row.num, -pair) %>% 
  unite(name, variable, pair) %>% 
  spread(name, value)

  row.num concatenated_1 concatenated_2 first_1 first_2 second_1 second_2
1       1      1222-3223      3333-1221    1222    3333     3223     1221
2       2      1111-2212      2232-2113    1111    2232     2212     2113

【讨论】：

【解决方案4】：

library(tidyverse)

[已编辑：原始解决方案错误使用starts_with]

此解决方案使用ends_with() 选择适当的列，然后使用unite 将它们与- 分隔符组合：

df <- tribble(
        ~first.1, ~second.1, ~first.2, ~second.2,
        1222,3223,3333,1221,
        1111,2212,2232,2113)

df1 <- df %>%
  select(ends_with("1")) %>%
  unite(concatenated.1, sep = "-")

df2 <- df %>%
  select(ends_with("2")) %>%
  unite(concatenated.2, sep = "-")

cbind(df, df1, df2)

【讨论】：

@IceCreamToucan 你是对的。我将功能更改为ends_with ...我认为可以将其修复为 OP 的请求

【解决方案5】：

可以使用stringi包中的stri_join函数，速度非常快。

library(data.table)
library(stringi)

df <- fread("first.1 second.1 first.2 second.2 
             1222 3223 3333 1221 
             1111 2212 2232 2113")

cols <- paste0("concatenated_", 1:2)
df[, (cols) := Map(stri_join, .(first.1, first.2), .(second.1, second.2), sep = "-")]
setDF(df)

first.1 second.1 first.2 second.2 concatenated_1 concatenated_2
1    1222     3223    3333     1221      1222-3223      3333-1221
2    1111     2212    2232     2113      1111-2212      2232-2113

【讨论】：