【问题标题】:Combine multiple rows per ID into one row per ID R将每个 ID 的多行合并为每个 ID R 的一行
【发布时间】:2021-04-22 15:01:02
【问题描述】:

鉴于下面的数据框df1,我想将其转换为数据框df2。理想情况下,目标是将每个 ID 的多行合并为每个 ID 的一行。如果解决方案可以在 dplyr、tidyverse 等中实现,那就太好了!!

df1 <- data.frame (ID  = c("1", "1", "1", "1", "1", "1", "2", "2",
                           "2", "2", "2", "2", "3", "3", "3", "3", "3",
                           "4", "4"),
                   fruit_name = c("Apple", "Banana", "Cherry",
                                  "Orange", "Blueberry", "Peach",
                                  "Apple", "Banana", "Cherry",
                                  "Orange", "Blueberry", "Peach",
                                  "Apple", "Banana", "Cherry",
                                  "Orange", "Blueberry",
                                  "Apple", "Cherry"),
                   count_one = c("2", "2", "2",
                                  "2", "2", "2",
                                  "4", "4", "4",
                                  "4", "4", "4",
                                  "3", "3", "3",
                                  "3", "3",
                                  "5", "5"),
                   count_two = c("1", "NA", "NA",
                                 "NA", "NA", "NA",
                                 "NA", "NA", "4",
                                 "NA", "NA", "NA",
                                 "NA", "NA", "NA",
                                 "NA", "3",
                                 "5", "NA"))

进入...

df2 <- data.frame (ID  = c("1", "2", "3", "4"),
                   count_one = c("2", "4", "3", "5"),
                   count_two = c("1", "4", "3", "5"))

谢谢你,非常感谢!

【问题讨论】:

  • 您是否希望聚合 ID 并返回最大 count_one、最大 count_two? fruit_name 对问题没有影响?
  • 顺便问一下逻辑是什么?第一个值/最后一个值??
  • 所以对于count_one,只有ID 的唯一编号,对于count_two,只有ID 的非空值
  • 是的正确,fruit_name 对问题没有影响

标签: r dplyr rows na


【解决方案1】:

试试:

library(dplyr)
df1 %>% distinct(ID,count_one,count_two) %>% filter(count_two != "NA")

但是,您应该考虑您的计数必须是 numeric 类。您可以像这样转换它们:

df1 <- df1 %>% mutate(count_one = as.numeric(count_one),
               count_two = as.numeric(recode(count_two,"NA"=NA_character_)))

现在您可以应用不同的策略:

R base:

df2 <- na.omit(df1)
df2$fruit_name <- NULL

tidyr:

library(tidyr)
df2 <- df1 %>% select(-fruit_name) %>% drop_na()

【讨论】:

    【解决方案2】:

    我一直在研究 pivot_wider 函数。我知道这不是真的需要,但它可以完成工作。

    df2 <- df1[,-2] %>%
      as.data.frame() %>%
      pivot_wider(id_cols = c(ID, count_one, count_two)) %>%
      na.rm()
    

    我确实将 NA 值重新编码为不是字符,以便让 na.rm 发挥作用

    【讨论】:

      猜你喜欢
      • 2017-11-23
      • 2019-05-16
      • 1970-01-01
      • 2021-09-18
      • 2021-08-29
      • 1970-01-01
      • 2016-06-15
      • 1970-01-01
      • 2018-07-02
      相关资源
      最近更新 更多