在R中的文本文件中组合第一行和第二行答案

【问题标题】：Combining first and second rows in a text file in R在R中的文本文件中组合第一行和第二行
【发布时间】：2018-03-02 06:09:55
【问题描述】：

我有一个凌乱的数据集，其中包含 2 行属于 1 的信息。我想取第二行并将其放在第一行的末尾，并在此过程中创建新列。

For example, I would like:

       COL1      COL2
1     name1    score1
2    state1   rating1
3     name2    score2
4    state2   rating2

To become:

      COL1      COL2     COL3      COL4
1    name1    score1   state1   rating1
2    name2    score2   state2   rating2

Hadleyverse 中有什么简单的东西吗？

【问题讨论】：

对于z = mtcars（或任何数据框）执行：cbind(z[seq(1, nrow(z), 2), ],z[seq(2, nrow(z), 2), ])
难以置信。谢谢！

标签： r dplyr tidyr tidyverse readr

【解决方案1】：

我会使用来自 tidyr 的 unite() 和 separate() 以及来自 dplyr 的 lead() 来执行此操作。

library(dplyr)
library(tidyr)

df <- tribble(
~COL1,      ~COL2,
"name1",    "score1",
"state1",   "rating1",
"name2",    "score2",
"state2",   "rating2"
)


df %>% 
  unite(old_cols, COL1, COL2) %>%
  mutate(new_cols = lead(old_cols)) %>%
  filter(row_number() %% 2 == 1) %>%
  separate(old_cols, into = c("COL1", "COL2")) %>%
  separate(new_cols, into = c("COL3", "COL4"))

#> # A tibble: 2 x 4
#>    COL1   COL2   COL3    COL4
#> * <chr>  <chr>  <chr>   <chr>
#> 1 name1 score1 state1 rating1
#> 2 name2 score2 state2 rating2

【讨论】：

【解决方案2】：

这是dplyr 解决方案。

library(dplyr)

dt2 <- dt %>%
  mutate(Group = rep(1:2, times = nrow(.)/2)) %>%
  split(.$Group) %>%
  bind_cols() %>%
  select(-starts_with("Group")) %>%
  setNames(paste0("COL", 1:ncol(.)))
dt2
   COL1   COL2   COL3    COL4
1 name1 score1 state1 rating1
2 name2 score2 state2 rating2

或者我们也可以将purrr 包与dplyr 包一起使用。

library(dplyr)
library(purrr)

dt2 <- dt %>%
  mutate(Group = rep(1:2, times = nrow(.)/2)) %>%
  split(.$Group) %>%
  map_dfc(. %>% select(-Group)) %>%
  setNames(paste0("COL", 1:ncol(.)))
dt2
   COL1   COL2   COL3    COL4
1 name1 score1 state1 rating1
2 name2 score2 state2 rating2

数据

dt <- read.table(text = "       COL1      COL2
1     name1    score1
                 2    state1   rating1
                 3     name2    score2
                 4    state2   rating2",
                 header = TRUE, stringsAsFactors = FALSE)

【讨论】：

【解决方案3】：

使用base R，我们可以使用逻辑向量的循环将行子集化为list，然后是cbind

setNames(do.call(cbind, list(df[c(TRUE, FALSE),], 
      df[c(FALSE, TRUE),])), paste0("COL", 1:4))
#   COL1   COL2   COL3    COL4
#1 name1 score1 state1 rating1
#3 name2 score2 state2 rating2

【讨论】：

【解决方案4】：

您应该将数据框分成两个数据框：一个包含偶数行，另一个包含奇数行。 注意：如果行数为奇数，最后一行将在新添加的列中包含NA。

奇数行：df[seq(1, nrow(df), 2), ]

偶数行：df[seq(2, nrow(df), 2), ]

下一步是cbind他们：

df_new = cbind(df[seq(1, nrow(df), 2), ], df[seq(2, nrow(df), 2), ])

最后一步应该是重命名列：

colnames(df_new) = c("COL1", "COL2", "COL3", "COL4")

【讨论】：