【问题标题】:Need help reshaping dataset in R需要帮助在 R 中重塑数据集
【发布时间】:2020-07-22 05:41:47
【问题描述】:

我有一个如下所示的数据集:

Name. Position number.student? Married?

Bob. 0001. YES. NO.
Susan. 0002. YES. YES.
Mark. 0003. NO.NO
Becky.0004.NO.YES
Billy.0005.YES.YES

我需要它看起来像这样:

Bob. 0001. YES. NO.
Susan. 0002. YES. NO.
Susan.0002.NO.YES
Mark. 0003. NO.NO
Becky.0004.NO.YES
Billy.0005.YES.NO
Billy.0005.NO.YES

换句话说,我有多个以“是”和“否”作为值的列标题。我需要为每个“是”分配一行,每个人都有必要的行数,具体取决于他们有多少“是”值。

如何在 R 中完成此操作?

【问题讨论】:

标签: r dplyr tidyverse tidyr


【解决方案1】:

一种方法是获取长格式数据,仅保留"YES" 行,为每个Name 创建一个序列列,并获取宽格式数据,用"NO" 填充空值。

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = student:Married) %>%
  filter(value == 'YES') %>%
  group_by(Name) %>%
  mutate(row = row_number()) %>%
  pivot_wider(values_fill = "NO") %>%
  select(-row)


#  Name  Positionnumber student Married
#  <chr>          <int> <chr>   <chr>  
#1 Bob                1 YES     NO     
#2 Susan              2 YES     NO     
#3 Susan              2 NO      YES    
#4 Becky              4 NO      YES    
#5 Billy              5 YES     NO     
#6 Billy              5 NO      YES    

然而,这会删除Name 具有两个"NO"s 的值,这是@Walker Harrison 建议的一个细微变化。

df %>%   
  pivot_longer(cols = student:Married) %>%   
  arrange(Name, name) %>%   
  group_by(Name) %>%   
  filter(value == 'YES' | (name == "student" & 
         value == 'NO' & lag(value) == 'NO')) %>%   
  mutate(row = row_number()) %>%   
  pivot_wider(values_fill = "NO") %>%   
  select(-row)


# Name  Positionnumber Married student
#  <chr>          <int> <chr>   <chr>  
#1 Becky              4 YES     NO     
#2 Billy              5 YES     NO     
#3 Billy              5 NO      YES    
#4 Bob                1 NO      YES    
#5 Mark               3 NO      NO     
#6 Susan              2 YES     NO     
#7 Susan              2 NO      YES    

数据

df <- structure(list(Name = c("Bob", "Susan", "Mark", "Becky", "Billy"
), Positionnumber = 1:5, student = c("YES", "YES", "NO", "NO", 
"YES"), Married = c("NO", "YES", "NO", "YES", "YES")), class = 
"data.frame", row.names = c(NA, -5L))

【讨论】:

  • 我认为您在某处丢失了 Mark —— 只需在过滤器中添加一个其他条件即可捕获两者均为“否”的情况。类似df %&gt;% pivot_longer(cols = student:Married) %&gt;% arrange(Name, name) %&gt;% group_by(Name) %&gt;% filter(value == 'YES' | (name == "student" &amp; value == 'NO' &amp; lag(value) == 'NO')) %&gt;% mutate(row = row_number()) %&gt;% pivot_wider(values_fill = "NO") %&gt;% select(-row)
  • @WalkerHarrison 谢谢,我用你的建议更新了答案。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-03-20
  • 1970-01-01
  • 1970-01-01
  • 2015-09-24
相关资源
最近更新 更多