【发布时间】:2015-09-30 13:54:18
【问题描述】:
我有 24 个数据文件 (bsls)。每个文件包含固定数量的行但可变数量的列 (sites)。我有 23 个sites 的干净列表,但由于与每个站点相关的列名包含其他信息,因此无法进行完全匹配。
我已使用以下代码将这些文件读入R:
#list files from dir and read, skipping rows until 'Q Num'
temp <- list.files() # e.g. info-stuff-nameofbsl-otherStuff.csv
# read.xls and strip bsl name from file and assign as object name
for(i in temp){
assign(unlist(strsplit(i, split = '-', fixed = T))[3],
read.xls(i, pattern = "Q Num"))
}
#create list of dataframes (24 bsls)
bsls <- Filter(function(x) is(x, "data.frame"), mget(ls()))
#clean list of site names
sites <- ("NewYork","London","Sydney","Paris","Manchester","Angers","Venice","Bangkok","Glasgow","Boston","Perth","Canberra","Lyons","Washington","Milan","Cardiff","Dublin","Frankfurt","Ottawa","Toronto","El.Salvador","Taltal","Caldera")
24 个bsls 数据集的1 个的前3 行示例
例如BSL1
QNum, QuestionText, % unrelatedCol, NewYork_Other_info, London_some_other_info, Venice_other_diff_info,
q17a, question?, 74%, 69%, 81%, 76%,
q17b, Another question?, 72%, 73%, 77%, 74%,
我需要的结果是 23 个 sites 中的每一个都有一个 .csv 文件,其中包含在 24 个数据文件 (bsls) 中找到的所有列。
我目前的尝试...
for(site in sites){ #for each site
assign(site, data.frame()) #create empty data frame to add vectors to
for(bsl in dfs){ #for each dataset
if (grepl(site, colnames(bsl))){ #substring match
next #go back to for loop
}
assign(site$bsl, bsl[,grepl("site", colnames(bsl))]) #assign column to dataframe
}
}
解决方案如下所示...
例如 London.csv
QNum, QuestionText, BSLname1_Other_info, BSLname2_some_other_info, BSL5other_diff_info,
q17a, question?, 74%, 69%, 81%, 76%,
q17b, Another question?, 72%, 73%, 77%, 74%,
将有 23 个文件,每个站点一个,包含来自 24 个输入 bsl 文件的与站点相关的列。
编辑 - 值得一提的是,每个 bsls 不被称为 bsl1、bsl2... 等,但实际上是唯一的字符串,例如unit,section,team...等
【问题讨论】: