【问题标题】:crossing two different datasets to mutate and summarise results from one dataset into another交叉两个不同的数据集以将一个数据集的结果变异和汇总到另一个数据集
【发布时间】:2020-09-26 00:17:21
【问题描述】:

假设我有这个假数据框

这里还有这个“external_table”:

我想通过以下过程“同步”两个数据集以向 external_table 添加一个新列:

  1. 访问 fake_ds
  2. 对来自 external_table %>% itens_fator 的项目执行函数
  3. 将结果变异到外部表中

我想要的输出(带有假结果)

这是脚本应该执行的伪函数:

fake_ds %>%  #get my ds
  mutate(cronbach_alpha = fake_ds %>% 
           select(external_table, itens_fator) %>% 
           alpha(.)$total[1]) #get variables from external table

如果此功能难以实现,我将向其他产生所需输出的方法开放。

我想留在 tidyverse。

重新创建我的数据的代码:

library(dplyr)
library(tidyr)

x <- paste0("y",seq(1:96)) #create X
y <- rep(0:5, 96*2) #create values
fake_ds <- data.frame(x,y) #dataframe
fake_ds %>% 
  pivot_wider(names_from = x, values_from=y, values_fn = {mean}) -> fake_ds
fake_ds <- fake_ds %>% slice(rep(1:n(), each = 50)) #replicate
fake_ds <- rbind(fake_ds, seq(1:96)) #add variability
    
#external table
external_table <- structure(list(
  name = c("X5", "X1", "X2", "X0", "X3", "X4"), 
  itens_fator = c("y1,y12,y59,y76,y78,y92,y93,y94,y96", 
                  "y5,y14,y15,y16,y17,y18,y20,y24,y40,y60,y62,y64,y75", 
                  "y10,y19,y32,y34,y36,y37,y47,y56,y58,y72,y80,y85", 
                  "y13,y30,y39,y53,y54,y55,y66,y73,y84,y91", 
                  "y42,y43,y45,y63,y69,y77,y87,y88", 
                  "y44,y49,y50,y68,y82,y89")), 
  row.names = c(NA, -6L), 
  groups = structure(list(name = c("X0", "X1", "X2", "X3", "X4", "X5"), 
                          .rows = structure(list(4L, 2L, 3L, 5L, 6L, 1L), 
                                            ptype = integer(0), 
                                            class = c("vctrs_list_of", 
                                                      "vctrs_vctr", "list"))), 
                     row.names = c(NA, -6L), 
                     class = c("tbl_df", "tbl", "data.frame"), 
                     .drop = TRUE), 
  class = c("grouped_df", "tbl_df", "tbl", "data.frame"))

【问题讨论】:

  • 使用 mutate the number of items you have is different from the number of rows 基本上不可能实现您想要实现的目标,并且按照您的方式将 em 附加到数据框没有任何意义
  • 枢轴更宽的代码也失败了
  • 那么,“fake_ds”中的“external_table”中显示的变量不可以选择吗?

标签: r dplyr tidyr


【解决方案1】:

正如我在 cmets 中提到的,您想要实现的目标是不可能的......要获得 cronbach 的 alpha,您需要做的就是使用 lapply 和 strsplit..

获取 cronbach 的 alphas

lapply(strsplit(external_table$itens_fator,","), function(x) fake_ds %>% 
           select(all_of(x)) %>% 
           alpha(.)%>% .$total %>% .$raw_alpha)

整洁的方式

external_table$itens_fator %>% strsplit(",") %>% purrr::map_dbl(function(x) fake_ds %>% 
           select(all_of(x)) %>% 
           psych::alpha(.)%>% .$total %>% .$raw_alpha)

数据

library(psych)
library(dplyr)
library(tidyr)

x <- paste0("y",seq(1:96)) #create X
y <- rep(0:5, 96*2) #create values
fake_ds <- data.frame(var=x,val=y) #dataframe
fake_ds %>% 
  pivot_wider(names_from = var, values_from=val, values_fn=list(val=mean)) -> fake_ds
fake_ds <- fake_ds %>% slice(rep(1:n(), each = 50)) #replicate
fake_ds <- rbind(fake_ds, seq(1:96)) #add variability
    
#external table
external_table <- structure(list(
  name = c("X5", "X1", "X2", "X0", "X3", "X4"), 
  itens_fator = c("y1,y12,y59,y76,y78,y92,y93,y94,y96", 
                  "y5,y14,y15,y16,y17,y18,y20,y24,y40,y60,y62,y64,y75", 
                  "y10,y19,y32,y34,y36,y37,y47,y56,y58,y72,y80,y85", 
                  "y13,y30,y39,y53,y54,y55,y66,y73,y84,y91", 
                  "y42,y43,y45,y63,y69,y77,y87,y88", 
                  "y44,y49,y50,y68,y82,y89")), 
  row.names = c(NA, -6L), 
  groups = structure(list(name = c("X0", "X1", "X2", "X3", "X4", "X5"), 
                          .rows = structure(list(4L, 2L, 3L, 5L, 6L, 1L), 
                                            ptype = integer(0), 
                                            class = c("vctrs_list_of", 
                                                      "vctrs_vctr", "list"))), 
                     row.names = c(NA, -6L), 
                     class = c("tbl_df", "tbl", "data.frame"), 
                     .drop = TRUE), 
  class = c("grouped_df", "tbl_df", "tbl", "data.frame"))

【讨论】:

  • 我不知道该怎么感谢你。这部分 (lapply(strsplit(external_table$itens_fator,",")) 刚刚解决了我的问题,我可以用您的解决方案替换 50 多行。也非常感谢您的患者。仅出于我个人理解的一点小评论,可以请指点我在哪里可以学习用 tidyverse 例程翻译 lapply?
  • @Luis 我想这是尽可能的整洁
  • 我很高兴@Luis 检查编辑。我使用了purr:map_dbl,以便将输出强制转换为双向量。
猜你喜欢
  • 1970-01-01
  • 2020-08-03
  • 1970-01-01
  • 2022-11-15
  • 1970-01-01
  • 1970-01-01
  • 2012-12-16
  • 2016-09-08
  • 2011-02-16
相关资源
最近更新 更多