【发布时间】:2018-08-16 00:44:20
【问题描述】:
所以我有一个包含 49 行和 109 个特征的数据集,其中数据被格式化,以便每个条目都有一个平均值和 sd 值。这是一个示例:
> head(score_data[,1:4])
# A tibble: 6 x 4
Variable Overall `18 to 29` `30 to 39`
<chr> <chr> <chr> <chr>
1 ts.tsmart_partisan_score (mean (sd)) 94.01 (9.73) 92.56 (10.82) 94.14 (9.55)
2 ts.tsmart_presidential_general_turnout_score (mean (sd)) 66.23 (24.38) 51.56 (20.02) 58.44 (24.36)
3 ts.tsmart_midterm_general_turnout_score (mean (sd)) 50.29 (29.05) 31.09 (18.81) 34.82 (22.15)
4 ts.tsmart_offyear_general_turnout_score (mean (sd)) 20.71 (15.08) 25.38 (17.36) 18.84 (14.35)
5 ts.tsmart_presidential_primary_turnout_score (mean (sd)) 48.34 (28.12) 38.26 (22.26) 36.19 (22.72)
6 ts.tsmart_non_presidential_primary_turnout_score (mean (sd)) 40.21 (29.00) 27.03 (20.14) 23.52 (19.32)
我希望从数据集中提取数据集中所有 109 列的平均值。由于特征是字符,我知道我可以使用单独的命令根据第一个括号的索引将数据分成两列,如下所示:
data <- data %>% separate(PrecinctName, into = c("Precinct", "PrecinctCode"), sep = 5)
但是,我想对整个数据集中的每个特征都执行此操作,并且使用上述方法既耗时又痛苦。有没有人有更优雅的解决方案?我并不特别关心保存 sd 数据,因此该方法不必将其包含在其函数中。
根据要求,这里是替代输出:
> dput( head(score_data[,1:4]))
structure(list(Variable = c("ts.tsmart_partisan_score (mean (sd))",
"ts.tsmart_presidential_general_turnout_score (mean (sd))", "ts.tsmart_midterm_general_turnout_score (mean (sd))",
"ts.tsmart_offyear_general_turnout_score (mean (sd))", "ts.tsmart_presidential_primary_turnout_score (mean (sd))",
"ts.tsmart_non_presidential_primary_turnout_score (mean (sd))"
), Overall = c("94.01 (9.73)", "66.23 (24.38)", "50.29 (29.05)",
"20.71 (15.08)", "48.34 (28.12)", "40.21 (29.00)"), `18 to 29` = c("92.56 (10.82)",
"51.56 (20.02)", "31.09 (18.81)", "25.38 (17.36)", "38.26 (22.26)",
"27.03 (20.14)"), `30 to 39` = c("94.14 (9.55)", "58.44 (24.36)",
"34.82 (22.15)", "18.84 (14.35)", "36.19 (22.72)", "23.52 (19.32)"
)), .Names = c("Variable", "Overall", "18 to 29", "30 to 39"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
【问题讨论】:
-
您的目标是分开它们还是删除/删除括号中的那些?
-
@Onyambu 我的意思是删除括号中的那些和括号之前的前导空格。
-
你能发帖
dput( head(score_data[,1:4])) -
刚刚编辑帖子以包含它。
标签: r dplyr substring character tidyr