R，合并两个数据集，分成多列答案

【问题标题】：R, Merge two datasets, splitting into multiple columnsR，合并两个数据集，分成多列
【发布时间】：2021-08-03 00:18:32
【问题描述】：

我有两个数据集：

    PeopleList<-structure(list(MRN = c("53634", "65708", "64320", "40458", "03935", 
"67473", "20281", "52479", "10261", "40945", "40630", "92295", 
"43505", "80719", "39492", "44720", "70691", "21351", "03457", 
"02182"), DOB = c("9/13/1953", "4/5/1948", "4/18/1944", "9/6/1953", 
"1/14/1957", "8/25/1952", "6/4/1967", "7/22/1988", "6/22/1947", 
"5/10/1957", "1/12/1968", "4/3/1979", "8/26/1961", "5/25/1965", 
"8/21/1955", "9/17/1936", "9/13/1965", "3/23/1942", "5/16/1992", 
"3/6/1969"), Gender = c("Female", "Female", "Male", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female"), `Smoking Status` = c("Never Smoker", "Former Smoker", 
"Never Smoker", "Never Smoker", "Former Smoker", "Former Smoker", 
"Never Smoker", "Never Smoker", "Former Smoker", "Never Smoker", 
"Never Smoker", "Former Smoker", "Never Smoker", "Former Smoker", 
"Former Smoker", "Former Smoker", "Never Smoker", "Never Smoker", 
"Never Smoker", "Never Smoker")), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

Complications<-structure(list(MRN = c("03412", "25052", "64320", "64320", "64320", 
"47595", "47595", "45175", "45337", "93708", "03348", "12964", 
"12964", "46272", "46272", "46272", "46272", "71331", "57923", 
"57923"), `ENCOUNTER DIAGNOSES` = c("Rupture of implant of right breast, subsequent encounter [T85.43XD]; Rupture of implant of right breast, subsequent encounter [T85.43XD]; Rupture of implant of right breast, subsequent encounter [T85.43XD]", 
"Breast asymmetry [N64.89]; Rupture of implant of left breast, sequela [T85.43XS]; Rupture of implant of left breast, sequela [T85.43XS]; Rupture of implant of left breast, sequela [T85.43XS]", 
"Extrusion of breast implant, subsequent encounter [T85.49XD]; Extrusion of breast implant, subsequent encounter [T85.49XD]; Extrusion of breast implant, subsequent encounter [T85.49XD]", 
"Extrusion of breast implant, subsequent encounter [T85.49XD]; Extrusion of breast implant, subsequent encounter [T85.49XD]; Extrusion of breast implant, subsequent encounter [T85.49XD]", 
"Breast asymmetry [N64.89]", "Fat necrosis (segmental) of breast [N64.1]", 
"Fat necrosis (segmental) of breast [N64.1]", "Hematoma of breast [N64.89]", 
"Acquired breast deformity [N64.89]", "Capsular contracture of breast implant, sequela [T85.44XS]; Capsular contracture of breast implant, sequela [T85.44XS]; Capsular contracture of breast implant, sequela [T85.44XS]", 
"Infected sebaceous cyst [L72.3, L08.9]", "Pain due to any device, implant or graft, subsequent encounter [T85.848D]", 
"Pain due to any device, implant or graft, sequela [T85.848S]", 
"Breast asymmetry [N64.89]", "Breast asymmetry [N64.89]", "Breast asymmetry [N64.89]", 
"Breast asymmetry [N64.89]", "Acquired breast deformity [N64.89]", 
"Hematoma of breast [N64.89]", "Hematoma of breast [N64.89]")), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

“并发症”是一个包含数千人的数据框，我可能不一定关心。 “人物名单”是我关心的 500 人左右。我想做的是将“并发症”中的信息合并到 MRN 的“PeopleList”中，只保留“PeopleList”中的 MRN。

这部分很简单，我可以做到PeopleList<-PeopleList%>%left_join(Complications,by="MRN")

但问题是我只想合并不重复的“遭遇诊断”，而且如果我有多个匹配的 MRN，我希望它们拆分为多个列，而不是行（不应该有更多超过 5-6 个新列顶部）。这就是我的意思：

【问题讨论】：

标签： r left-join tidyverse

【解决方案1】：

怎么样？

PeopleList%>%left_join(
  Complications %>% #pipework to have 1 row per MRN
    unique() %>% #drop duplicates
    group_by(MRN) %>%
    mutate(
      rank = row_number(), #rownumber per MRN
      rank = paste('Diagnosis', rank, sep = "_") #give this a tidier name
    ) %>%
    spread(rank, `ENCOUNTER DIAGNOSES`), #make this a 'wide' dataset rather than long
  by = "MRN" #join on
)

【讨论】：