【问题标题】:How to merge multiple rows in R with multiple columns in a dataset如何将R中的多行与数据集中的多列合并
【发布时间】:2021-03-17 12:00:05
【问题描述】:

我想根据type 列将每个record_id 的行合并为一行,但record_id 列中的志愿者在repeat 列中有两次重复。我想要这些的第二行。每个 record_id 对应一个人,该人要么参加过一次(重复=1)或两次测试,因此在 repeat 列中有两个条目。

这是我的数据的样子

structure(list(record_id = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 
4, 4, 4), type = c(NA, "data_collection", "test", NA, "data_collection", 
"test", NA, "data_collection", "test", "test", NA, "cata_collection", 
"test", "test"), `repeat` = c(NA, 1, 1, NA, 1, 1, NA, 1, 1, 2, 
NA, 1, 1, 2), dt_volunteer_reg = structure(c(1597246320, NA, 
NA, 1599217080, NA, NA, 1596184500, NA, NA, NA, 1598192280, NA, 
NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), age = c(26, 
NA, NA, 64, NA, NA, 51, NA, NA, NA, 39, NA, NA, NA), gender = c(0, 
NA, NA, 1, NA, NA, 0, NA, NA, NA, 1, NA, NA, NA), case_type = c(NA, 
1, NA, NA, 2, NA, NA, 1, NA, NA, NA, 1, NA, NA), test_dis_dt = structure(c(NA, 
NA, 1597250220, NA, NA, 1600012980, NA, NA, 1596382080, 1601980740, 
NA, NA, 1598284020, 1603118700), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), test_dis_res = c(NA, NA, 1, NA, NA, 1, NA, 
NA, 2, 2, NA, NA, 2, 2), test_dis_in = c(NA, NA, NA, NA, NA, 
0.02, NA, NA, 6.13, 4.75, NA, NA, 7.23, 3.85), test_cont_dt = structure(c(NA, 
NA, 1597250280, NA, NA, 1608636120, NA, NA, NA, 1601980740, NA, 
NA, 1605704940, 1603205340), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
  test_cont_res = c(NA, NA, 2, NA, NA, 1, NA, NA, NA, 2, NA, 
  NA, 2, 2), test_cont_val = c(NA, NA, 123, NA, NA, 0, NA, 
  NA, NA, 40000, NA, NA, 471.6, 306.5)), row.names = c(NA, 
-14L), class = c("tbl_df", "tbl", "data.frame"))

这就是我希望得到的

  structure(list(record_id = c(1, 2, 3, 3, 4, 4), `repeat` = c(1, 
1, 1, 2, 1, 2), dt_volunteer_reg = structure(c(1597246320, 1599217080, 
1596184500, 1596184500, 1598192280, 1598192280), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), age = c(26, 64, 51, 51, 39, 39), gender = c(0, 
1, 0, 0, 1, 1), case_type = c(1, 2, 1, 1, 1, 1), test_dis_dt = structure(c(1597250220, 
1600012980, 1596382080, 1601980740, 1598284020, 1603118700), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), test_dis_res = c(1, 1, 2, 2, 2, 2), 
    test_dis_in = c(NA, 0.02, 6.13, 4.75, 7.23, 3.85), test_cont_dt = structure(c(1597250280, 
    1608636120, NA, 1601980740, 1605704940, 1603205340), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), test_cont_res = c(2, 1, NA, 2, 
    2, 2), test_cont_val = c(123, 0, NA, 40000, 471.6, 306.5)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

【问题讨论】:

  • 请提供带有dput的可重现数据集以及您已经尝试过的数据。
  • 我已将数据集替换为 dput 输出。我试过使用spread(dat,type, repeat),但它返回的数据集不变。道歉 - 我对数据争论很陌生

标签: r dplyr concatenation reshape tidyr


【解决方案1】:

假设第一个数据帧被称为输入并且你很高兴使用 tidyverse,你可以这样做。

input %>%
  nest(data = c(-record_id)) %>%
  mutate(
    data = map(data, ~replace_na(., as.list(head(., 1)))),  # Fill in speciment details
    data = map(data, filter, !is.na(`repeat`)),             # Remove speciment details
    data = map(data, ~replace_na(., as.list(head(., 1)))),  # Fill in test data with data collection details
    data = map(data, filter, type == "test")                # Remove data collection rows
  ) %>%
  unnest(data) %>%
  select(-type

有一些方法可以更简洁和/或更快地做到这一点,但这可能更具可读性。

【讨论】:

  • 谢谢一百万!我很想了解更多,所以请分享我可以研究的任何其他功能或方法。只要不是太麻烦就行。再次感谢,非常感谢。
猜你喜欢
  • 2021-12-27
  • 2020-12-24
  • 1970-01-01
  • 2018-05-25
  • 2018-08-14
  • 2013-11-13
  • 1970-01-01
  • 2021-09-10
  • 2019-11-10
相关资源
最近更新 更多