在 R 中创建一个循环来计算不同表中特定列的词频

【问题标题】：Making a loop in R for counting word frequencies from specific columns in different tables在 R 中创建一个循环来计算不同表中特定列的词频
【发布时间】：2021-10-23 18:09:22
【问题描述】：

我有 15 个不同的表格，每个表格都包含一个带有长文本的“文本”列（民意调查问题的一系列答案）。我想通过在名为“word”的列中为“text”中的每个单词创建一行来整理表格。然后我想知道每个表的词频。我写了这段代码：

Table1.tidy <- Table1 %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
Table1.tidy %>%
  count(word, sort = TRUE)

它工作正常，但现在我想避免为每个表重复此代码。有人知道怎么做吗？

【问题讨论】：

标签： r loops dry unnest

【解决方案1】：

(1) 将所有 data.frames 放入一个列表中。

(2) 使用purrr 的map 函数来应用您的工作流程：

library(dplyr)
library(tidyr)
library(purrr)

my_list <- list(Table1, Table2, Table3)

my_tidy_list <- my_list %>%
  map(~ .x %>%
        unnest_tokens(word, text) %>%
        anti_join(stop_words) %>%
#        Table1.tidy %>% # I think this line is a mistake?
        count(word, sort = TRUE))

my_tidy_list[[1]] 返回Table1.tidy，my_tidy_list[[2]] 返回Table2.tidy 等

【讨论】：

谢谢马丁！你真棒。我没有提到这些表并非都是从 1 到 15 编号的。它们被称为 Table1、Table1a、Table2、Table2a、Table2b、Table3 等。我需要保留这些名称，因为它们对应于特定系列的答案数据集中的问题。但我通过命名它们解决了这个问题：names(my_list) <- c("Table1", "Table1a", etc)