R：根据列名和模式提取列表列答案

【问题标题】：R: Extract list columns based on column names and patternsR：根据列名和模式提取列表列
【发布时间】：2015-09-30 02:18:08
【问题描述】：

我有一个列表（这里只有示例数据）

my_list <- list(structure(list(sample = c(2L, 6L), data1 = c(56L, 78L), 
    data2 = c(59L, 27L), data3 = c(90L, 28L), data1namet = structure(c(1L, 
    1L), .Label = "Sam1", class = "factor"), data2namab = structure(c(1L, 
    1L), .Label = "Test2", class = "factor"), dataame = structure(c(1L, 
    1L), .Label = "Ex3", class = "factor"), ma = c("Jay", "Jay"
    )), .Names = c("sample", "data1", "data2", "data3", "data1namet", 
"data2namab", "dataame", "ma"), row.names = c(NA, -2L), class = "data.frame"), 
    structure(list(sample = c(12L, 13L, 17L), data1 = c(56L, 
    78L, 3L), data2 = c(59L, 27L, 2L), datest = structure(c(1L, 
    1L, 1L), .Label = "Exa9", class = "factor"), dattestr = structure(c(1L, 
    1L, 1L), .Label = "cz1", class = "factor"), add = c(2, 2, 
    2)), .Names = c("sample", "data1", "data2", "datest", "dattestr", 
    "add"), row.names = c(NA, -3L), class = "data.frame"))

my_list
[[1]]
  sample data1 data2 data3 data1namet data2namab dataame  ma
1      2    56    59    90       Sam1      Test2     Ex3 Jay
2      6    78    27    28       Sam1      Test2     Ex3 Jay

[[2]]
  sample data1 data2 datest dattestr add
1     12    56    59   Exa9      cz1   2
2     13    78    27   Exa9      cz1   2
3     17     3     2   Exa9      cz1   2

我有两个问题：我想根据列名的模式提取此列表中的列，例如所有列名中包含“数据”一词的列。我无法通过 grep 找到解决方案。

我知道如何根据索引号提取一列（参见下面的示例），但我如何直接根据列名（而不是列号）进行此选择？

out <- lapply(my_list, `[`, 1) # extract "sample" column

【问题讨论】：

标签： r list indexing extract

【解决方案1】：

试试

lapply(my_list, function(df) df[, grep("data", names(df), fixed = TRUE)] )
# [[1]]
# data1 data2 data3 data1namet data2namab dataame
# 1    56    59    90       Sam1      Test2     Ex3
# 2    78    27    28       Sam1      Test2     Ex3
# 
# [[2]]
# data1 data2
# 1    56    59
# 2    78    27
# 3     3     2

lapply(my_list, "[", "sample")
# [[1]]
# sample
# 1      2
# 2      6
# 
# [[2]]
# sample
# 1     12
# 2     13
# 3     17

【讨论】：

你在这里使用fixed=TRUE有什么原因吗？
没有具体原因。只是在寻找固定字符串时习惯了它 - 在这种情况下 - “data”。表现！ ;-)
谢谢，这很有帮助。有没有办法可以进行多项选择，例如选择所有包含“样本”的列以及所有列名中包含“数据”的列？
一种方法是使用带有 OR（管道）的正则表达式：grep("data|sample", names(df))
当我使用此代码并运行它时，会显示“未定义尺寸”错误。请帮帮我