【问题标题】:r - Create One-way ANOVAs, summary statistics and plots for multiple pairs of variables using loopr - 使用循环为多对变量创建单向方差分析、汇总统计和绘图
【发布时间】:2021-12-11 06:06:59
【问题描述】:

我是新来的,对编程也很陌生,所以任何帮助都将不胜感激。

我有一个如下所示的数据框 df1:

Picture Emotion Gender Type Trial Attr_scores Fear_scores Appr_scores Avoid_scores
1 happy male human first 11 3 21 21
2 sad male human first 12 6 22 22
3 neutral male human first 13 2 23 23
4 happy male cartoon first 14 3 24 24
5 sad male cartoon first 15 6 25 25
6 neutral male cartoon first 16 2 26 26
7 happy male animal first 17 3 27 27
8 sad male animal first 18 6 28 28
9 neutral male animal first 19 2 29 29
10 happy female human first 20 3 21 30
11 sad female human first 21 6 22 31
12 neutral female human first 22 2 23 32
13 happy female cartoon first 23 3 24 33
14 sad female cartoon first 24 6 25 34
15 neutral female cartoon first 25 2 26 35
16 happy female animal first 26 3 27 36
17 sad female animal first 27 6 28 37
18 neutral female animal first 28 2 29 38

这是生成它的代码:

Picture <- c(1:18)
Emotion <- rep(c('happy','sad','neutral'),times=6)
Gender <- rep(c('male','female'),each=9)
Type <- rep(c('human','cartoon','animal','human','cartoon','animal'),each=3)
Trial <- rep(c('first'),times=18)
Attr_scores <- c(11:28)
Fear_scores <- rep(c(3,6,2),times=6)
Appr_scores <- rep(c(21:29),times=2)
Avoid_scores <- c(21:38)
df1<-data.frame(Picture,Emotion,Gender,Type,Trial,Attr_scores,Fear_scores,Appr_scores,Avoid_scores)

我需要获取几对变量(一个自变量 + 一个因变量,例如 Emotion + Attr_scores、Emotion + Fear_scores、Gender + Attr_scores、Gender + Avoid_scores),并对它们中的每一个:1)运行汇总统计(比较平均值和标准差),2)运行单向方差分析,3)创建散点图。

到目前为止,我已经为第一对变量(Gender + Attr_scores)创建了代码。代码如下:

# Summary Statistics 
library(dplyr)
group_by(df1, Gender) %>%
  summarise(
    N = n(),
    Mean = mean(Attr_scores, na.rm = TRUE),
    Sd = sd(Attr_scores, na.rm = TRUE)
  )
# ANOVA
res.aov <- aov(Attr_scores ~ Gender, data = df1)
summary(res.aov)
#Plot
gender_attr_plot <- ggplot(df1, aes(x=Gender, y=Attr_scores)) + 
  geom_jitter(position=position_jitter(0.2))+ 
  stat_summary(fun.data=mean_sdl, fun.args = list(mult = 1), 
               geom="pointrange", color="red")
ggsave("gender_attr_plot.png", gender_attr_plot, width = 1600, height = 900, units = "px")

我可以为每对额外的变量复制粘贴代码并每次手动更改变量名称,但这似乎是一种非常低效的处理方式。此外,如果我需要对任何额外的变量对运行相同的分析,我将不得不再次复制整个代码来执行此操作。

我想要做的是创建一个带有变量对的表或嵌套列表(如果需要额外的变量对,以后可以轻松更新)并编写一个循环来遍历这些变量对并执行每个动作的所有 3 个动作(汇总统计、ANOVA 和绘图)。

我认为它应该看起来像这样(这与实际的工作代码相去甚远,只是给出一个大概的想法):

variables <- list(
c(Gender, Attr_scores),
c(Gender, Fear_scores), 
c(Type, Appr_scores), 
c(Emotion, Avoid_scores))

for(i in variables){
  library(dplyr)
  group_by(df1, variables,'[[',1) %>%
    summarise(
      N = n(),
      Mean = mean(variables,'[[',2, na.rm = TRUE),
      Sd = sd(variables,'[[',2, na.rm = TRUE)
    )
  res.aov <- aov(variables,'[[',2 ~ variables,'[[',1, data = df1)
  summary(res.aov)
  plot <- ggplot(df1, aes(x=variables,'[[',1, y=variables,'[[',2)) + 
    geom_jitter(position=position_jitter(0.2))+
    stat_summary(fun.data=mean_sdl, fun.args = list(mult = 1),
                 geom="pointrange", color="red")
  ggsave("??????.png", plot, width = 1600, height = 900, units = "px")
}

显然,这不起作用,我一直在互联网上搜索解决方案,但我对 R 的了解还不足以弄清楚如何使它起作用。任何帮助将不胜感激!

【问题讨论】:

    标签: r dataframe loops anova


    【解决方案1】:

    以下是您的任务的可能解决方案: 我稍微修改了您的代码并创建了一个函数my_function,使用此函数您可以获得一对数据集的所需输出。结果以列表形式返回!

    library(dplyr)
    library(ggplot2)
    
    
    my_function <- function(df, x, y) { 
    # Summary
      a <- group_by(df, {{x}}) %>% 
        summarise(
          N = n(),
          Mean = mean({{y}}, na.rm = TRUE),
          Sd = sd({{y}}, na.rm = TRUE)
        )
    # ANOVA
      res.aov <- aov({{y}} ~ {{x}}, data = df)
      b <- summary(res.aov)
    # Plot
    c <- ggplot(df1, aes(x={{x}}, y={{y}})) + 
      geom_jitter(position=position_jitter(0.2))+ 
      stat_summary(fun.data=mean_sdl, fun.args = list(mult = 1), 
                   geom="pointrange", color="red")
      ggsave(paste0(deparse(substitute(x)), "_",
                   deparse(substitute(y)), ".png"), width = 1600, height = 900, units = "px")
      
      output<-list(a,b,c)
      return(output)
      
      }
    
    # cases 1 - 4
    my_function(df1, Gender, Attr_scores)
    my_function(df1, Gender, Avoid_scores)
    my_function(df1, Emotion, Attr_scores)
    my_function(df1, Emotion, Fear_scores)
    

    【讨论】:

    • 非常感谢你,TarJae!我在我的脚本中使用了你的解决方案,一切似乎都很完美。非常感谢您的帮助!
    • 如果有人对此解决方案也感兴趣:我添加了为 x 和 y 绘图轴生成合适名称的方法:+xlab(deparse(substitute(x)))+ ylab(deparse(substitute(y))),在 stat_summary(fun.data=mean_sdl, fun.args = list(mult = 1), geom="pointrange", color="red") 部分之后添加:c &lt;- ggplot(df1, aes(x={{x}}, y={{y}})) + geom_jitter(position=position_jitter(0.2))+ stat_summary(fun.data=mean_sdl, fun.args = list(mult = 1), geom="pointrange", color="red")+ xlab(deparse(substitute(x)))+ ylab(deparse(substitute(y)))
    【解决方案2】:

    这可能有用

    https://r4ds.had.co.nz/iteration.html#the-map-functions https://aosmith.rbind.io/2018/08/20/automating-exploratory-plots/

    
    variables <-
      structure(list(
        x = c("Gender", "Gender", "Type", "Emotion"),
        y = c("Attr_scores", "Fear_scores", "Appr_scores", "Avoid_scores")
      ),
      class = "data.frame",
      row.names = c(NA,-4L))
    
    variables
    #>         x            y
    #> 1  Gender  Attr_scores
    #> 2  Gender  Fear_scores
    #> 3    Type  Appr_scores
    #> 4 Emotion Avoid_scores
    
    library(tidyverse)
    # GROUP
    map2(
      .x = variables$x,
      .y = variables$y,
      .f = ~ group_by(df,!!sym(.x)) %>%
        summarise(
          N = n(),
          Mean = mean(!!sym(.y), na.rm = TRUE),
          Sd = sd(!!sym(.y), na.rm = TRUE)
        )) %>% 
      set_names(nm = str_c(variables$x, variables$y, sep = "#"))
    #> $`Gender#Attr_scores`
    #> # A tibble: 2 x 4
    #>   Gender     N  Mean    Sd
    #>   <chr>  <int> <dbl> <dbl>
    #> 1 female     9    24  2.74
    #> 2 male       9    15  2.74
    #> 
    #> $`Gender#Fear_scores`
    #> # A tibble: 2 x 4
    #>   Gender     N  Mean    Sd
    #>   <chr>  <int> <dbl> <dbl>
    #> 1 female     9  3.67  1.80
    #> 2 male       9  3.67  1.80
    #> 
    #> $`Type#Appr_scores`
    #> # A tibble: 3 x 4
    #>   Type        N  Mean    Sd
    #>   <chr>   <int> <dbl> <dbl>
    #> 1 animal      6    28 0.894
    #> 2 cartoon     6    25 0.894
    #> 3 human       6    22 0.894
    #> 
    #> $`Emotion#Avoid_scores`
    #> # A tibble: 3 x 4
    #>   Emotion     N  Mean    Sd
    #>   <chr>   <int> <dbl> <dbl>
    #> 1 happy       6  28.5  5.61
    #> 2 neutral     6  30.5  5.61
    #> 3 sad         6  29.5  5.61
    
    
    # ANOVA
    map2(
      .x = variables$x,
      .y = variables$y,
      .f = ~ aov(as.formula(str_c(.y, .x, sep = "~")), data = df)
    ) %>%
      set_names(nm = str_c(variables$x, variables$y, sep = "#"))
    #> $`Gender#Attr_scores`
    #> Call:
    #>    aov(formula = as.formula(str_c(.y, .x, sep = "~")), data = df)
    #> 
    #> Terms:
    #>                 Gender Residuals
    #> Sum of Squares   364.5     120.0
    #> Deg. of Freedom      1        16
    #> 
    #> Residual standard error: 2.738613
    #> Estimated effects may be unbalanced
    #> 
    #> $`Gender#Fear_scores`
    #> Call:
    #>    aov(formula = as.formula(str_c(.y, .x, sep = "~")), data = df)
    #> 
    #> Terms:
    #>                 Gender Residuals
    #> Sum of Squares       0        52
    #> Deg. of Freedom      1        16
    #> 
    #> Residual standard error: 1.802776
    #> Estimated effects may be unbalanced
    #> 
    #> $`Type#Appr_scores`
    #> Call:
    #>    aov(formula = as.formula(str_c(.y, .x, sep = "~")), data = df)
    #> 
    #> Terms:
    #>                 Type Residuals
    #> Sum of Squares   108        12
    #> Deg. of Freedom    2        15
    #> 
    #> Residual standard error: 0.8944272
    #> Estimated effects may be unbalanced
    #> 
    #> $`Emotion#Avoid_scores`
    #> Call:
    #>    aov(formula = as.formula(str_c(.y, .x, sep = "~")), data = df)
    #> 
    #> Terms:
    #>                 Emotion Residuals
    #> Sum of Squares     12.0     472.5
    #> Deg. of Freedom       2        15
    #> 
    #> Residual standard error: 5.612486
    #> Estimated effects may be unbalanced
    
    #PLOT
    
    f <- function(x, y) {
      gender_attr_plot <- ggplot(df, aes(x = .data[[x]], y = .data[[y]])) +
        geom_jitter(position = position_jitter(0.2)) +
        stat_summary(
          fun.data = mean_sdl,
          fun.args = list(mult = 1),
          geom = "pointrange",
          color = "red"
        )
    }
    
    all_plots <- map2(.x = variables$x, .y = variables$y, .f = f)
    
    plotnames <- str_c(variables$x, "#", variables$y, ".png") 
    
    walk2(
      .x = plotnames,
      .y = all_plots,
      .f = ~ ggsave(
        filename = .x,
        plot = .y,
        width = 1600,
        height = 900,
        units = "px"
      )
    )
    

    reprex package (v2.0.1) 于 2021 年 10 月 25 日创建

    数据

    Picture <- c(1:18)
    Emotion <- rep(c('happy', 'sad', 'neutral'), times = 6)
    Gender <- rep(c('male', 'female'), each = 9)
    Type <-
      rep(c('human', 'cartoon', 'animal', 'human', 'cartoon', 'animal'),
          each = 3)
    Trial <- rep(c('first'), times = 18)
    Attr_scores <- c(11:28)
    Fear_scores <- rep(c(3, 6, 2), times = 6)
    Appr_scores <- rep(c(21:29), times = 2)
    Avoid_scores <- c(21:38)
    df <-
      data.frame(
        Picture,
        Emotion,
        Gender,
        Type,
        Trial,
        Attr_scores,
        Fear_scores,
        Appr_scores,
        Avoid_scores
      )
    

    【讨论】:

    • 非常感谢,尤里!
    • 这个解决方案很有帮助。我将在我当前的脚本中使用 TarJae 的解决方案,因为它提供了额外的灵活性,但您的解决方案也为我提供了一些非常有用的工具,我将在未来使用它们。我试图将您的评论投票为有用,并收到一条消息,说我还没有足够的声望点来投票,但我很感激你:)
    猜你喜欢
    • 2015-01-04
    • 2015-01-09
    • 2020-11-27
    • 2022-01-22
    • 1970-01-01
    • 2018-04-01
    • 1970-01-01
    • 2023-04-03
    • 2014-02-01
    相关资源
    最近更新 更多