【问题标题】:read tables and assign to a string in a loop in R读取表并分配给R中循环中的字符串
【发布时间】:2017-06-18 04:01:21
【问题描述】:

我确信对此有一个简单的答案,但我似乎找不到正确的代码。我有一个文件列表和一个字符串列表,我想将这些文件的内容分配给数据框。然后我想在同一个循环中对数据帧执行其他操作。我还需要为下游工作保留每个数据框。这是我的代码:

samples <- c('fc14','g14','fc18','g18','fc21','g21')
fc_samples <- grep("fc", samples, value=TRUE)
fc_files <- c('fc14_g14_full_annot_uniq.txt','fc18_g18_full_annot_uniq.txt','fc21_g21_full_annot_uniq.txt')


# make dataframes
for (file in fc_files)
{   fc_n <- 1
    g_n <- 1
    print(file);

    # THE BIT THAT DOESN'T WORK
    assign(paste("data", fc_samples[fc_n], sep='_'), read.table(file,sep = "\t", header=T));

    # HERE I EXPECT THE TOP OF MY DF TO BE PRINTED BUT IT ISN'T
    head(data_fc14);

    # I TRY THIS INSTEAD
    do.call("<-",list(paste("data", fc_samples[fc_n], sep='_'), read.table(file,sep = "\t", header=T)))

    # I TRY TO PRINT THE DF AGAIN BUT STILL NO LUCK
    head(paste("data", fc_samples[fc_n], sep='_'))

    # FIRST DOWNSTREAM THING I WOULD LIKE TO DO,
    # WON'T WORK UNTIL I SOLVE THE DF ASSIGNMENT ISSUE
    names(paste("data", fc_samples[fc_n], sep='_'))[names(paste("data", fc_samples[fc_n], sep='_'))==c('SAMPLE_fc','CHROM_fc','START_fc','REF_fc','ALT_fc','REGION_fc','DP_fc','FREQ_fc','GENE_fc','AFFECTS_fc','dbSNP_fc',
    #                           'NOVEL_fc')] <- c('SAMPLE','CHROM','START','REF','ALT','REGION','DP','FREQ','GENE','AFFECTS','dbSNP','NOVEL')

    # ITERATE TO THE NEXT FILE
    fc_n <- fc_n+1
}

我尝试了herehere 的解决方案,但没有帮助。 如果有人对此有一个优雅的解决方案,那就太好了!提前致谢!

【问题讨论】:

  • 循环执行head 不会打印到控制台。你必须明确地print它。
  • 只是没有打印,还是没有创建对象?
  • 感谢您的回复,所以我在 head 语句周围添加了一个打印语句,现在它打印了我期望的内容,但是我无法在循环中的过程中引用该对象。我收到此错误:名称错误(粘贴(“数据”,fc_samples [fc_n],sep =“_”))[名称(粘贴(“数据”,:分配目标扩展到非语言对象
  • 基本上我需要一个解决方案来引用作为循环的一部分创建的对象,而不涉及直接调用它'data_fc14'
  • fc_n &lt;- 1 应该在循环之外。你的assign() 似乎对我有用。我想你要找的是get("data_fc_14")。虽然它不适用于names(get("data_fc_14")) &lt;- ... 表达式的左侧。你必须复制它,修改它的名字,然后重新影响它。

标签: r string loops dataframe


【解决方案1】:

修复您的代码:

samples <- c('fc14','g14','fc18','g18','fc21','g21')
fc_samples <- grep("fc", samples, value=TRUE)

# Make dummy example files
fc_files <- file.path("example-data", c(
  'fc14_g14_full_annot_uniq.txt','fc18_g18_full_annot_uniq.txt',
  'fc21_g21_full_annot_uniq.txt'))
set.seed(123) ; dummy_df <- 
  setNames(
    as.data.frame(replicate(12, rnorm(7))),
    c('SAMPLE_fc','CHROM_fc','START_fc','REF_fc','ALT_fc','REGION_fc',
      'DP_fc','FREQ_fc','GENE_fc','AFFECTS_fc','dbSNP_fc','NOVEL_fc')
  )
if (!dir.exists("./example-data")) dir.create("example-data")
invisible({
  lapply(fc_files, write.table, x = dummy_df, sep = "\t")
})

# "fc_n <- 1" should be outside the loop:
fc_n <- 1
for (file in fc_files) {
  g_n <- 1
  assign(paste("data", fc_samples[fc_n], sep='_'), 
         read.table(file,sep = "\t", header=T))
  # Copy data to be able to change its names
  f <- get(paste("data", fc_samples[fc_n], sep='_'))
  names(f)[names(f) == c('SAMPLE_fc','CHROM_fc','START_fc',
                         'REF_fc','ALT_fc','REGION_fc',
                         'DP_fc','FREQ_fc','GENE_fc','AFFECTS_fc',
                         'dbSNP_fc','NOVEL_fc')] <- 
    c('SAMPLE','CHROM','START','REF','ALT','REGION','DP','FREQ',
      'GENE','AFFECTS','dbSNP','NOVEL')
  # Assign it back, now that names have been changed
  assign(paste("data", fc_samples[fc_n], sep='_'), f)
  fc_n <- fc_n+1
}

一种“更优雅”的方式:
assign()ing 不是最佳实践,而是使用列表。
虽然我自己偶尔会使用它,但有时也有很好的理由。

# For the '%>%' pipe
library(magrittr)

data <-
  samples %>% 
  grep(pattern = "fc", value = TRUE) %>% 
  setNames(nm = .) %>% 
  lapply(grep, x = fc_files, value = TRUE) %>% 
  lapply(read.table, sep = "\t", header = TRUE) %>% 
  lapply(function(f) setNames(f, sub("_fc", "", names(f))))

identical(data_fc14, data$fc14)
# [1] TRUE
identical(data_fc18, data$fc18)
# [1] TRUE
identical(data_fc21, data$fc21)
# [1] TRUE

# Clean up
print(unlink("example-data", recursive = TRUE))

【讨论】:

  • 这看起来是一个非常好的解决方案,代码最短,所以我接受了这个答案,虽然我可能需要一段时间才能弄清楚所有部分都在做什么,但谢谢
【解决方案2】:
samples <- c('fc14','g14','fc18','g18','fc21','g21')
fc_samples <- grep("fc", samples, value=TRUE)
fc_files <- c('fc14_g14_full_annot_uniq.txt','fc18_g18_full_annot_uniq.txt','fc21_g21_full_annot_uniq.txt')
g_files <- c('g14_full_annot_uniq.txt','g18_full_annot_uniq.txt','g21_full_annot_uniq.txt')

# make dataframes
df_names <- c("data_fc14","data_fc18","data_fc21")
fc_n <- 1
for (file in fc_files)
{   

    assign(df_names[fc_n], read.table(file,sep = "\t", header=T)); #WORKS
    #do.call("<-",list(paste("data", fc_samples[fc_n], sep='_'), read.table(file,sep = "\t", header=T))); #ALSO WORKS

    print(head(df_names[fc_n])) 
    print(head(eval(as.symbol(df_names[fc_n]))))

    df <- eval(as.symbol(df_names[fc_n]))

    names(df)[names(df) == c('SAMPLE_fc','CHROM_fc','START_fc','REF_fc','ALT_fc','REGION_fc','DP_fc','FREQ_fc','GENE_fc','AFFECTS_fc','dbSNP_fc',
                                'NOVEL_fc')] <- c('SAMPLE','CHROM','START','REF','ALT','REGION','DP','FREQ','GENE','AFFECTS','dbSNP','NOVEL')

    assign(df_names[fc_n], df)
    print(head(eval(as.symbol(df_names[fc_n]))))
    print(file);
    fc_n <- fc_n+1
}

感谢所有的帮助,我最终使用“apom”的建议解决了这个问题,因为它对于更多的 R 新手用户来说是最直观的。

【讨论】:

    猜你喜欢
    • 2013-01-12
    • 1970-01-01
    • 2020-10-08
    • 1970-01-01
    • 2016-08-31
    • 1970-01-01
    • 2015-03-24
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多