【问题标题】:combining columns in R合并R中的列
【发布时间】:2013-04-28 19:36:49
【问题描述】:

我有想要组合成向量的列列表。列元素可以是名称或字符串“0”。我想将具有名称的列元素列表放入一个名为df$keywords 的字符向量中。我在下面粘贴了一个示例数据框。我希望它变成

df$keywords[1,] 将是一个空向量

df$keywords[2,] 将是 (ACT Science, study skills, MCAT)

任何帮助将不胜感激

    structure(list(V31 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L), .Label = "0", class = "factor"), V32 = structure(c(1L, 
    2L, 4L, 5L, 7L, 8L, 6L, 5L, 3L, 3L), .Label = c("0", "ACT Science", 
    "English", "Microsoft PowerPoint", "physics", "proofreading", 
    "reading", "writing"), class = "factor"), V33 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"), 
    V34 = structure(c(1L, 7L, 5L, 5L, 8L, 2L, 6L, 5L, 3L, 4L), .Label = c("0", 
    "geography", "Italian", "literature", "prealgebra", "SAT reading", 
    "study skills", "trigonometry"), class = "factor"), V35 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"), 
    V36 = structure(c(1L, 3L, 4L, 4L, 7L, 2L, 6L, 4L, 5L, 5L), .Label = c("0", 
    "English", "MCAT", "precalculus", "proofreading", "SAT writing", 
    "writing"), class = "factor"), V37 = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"), 
    V38 = structure(c(1L, 1L, 5L, 5L, 2L, 1L, 4L, 5L, 3L, 6L), .Label = c("0", 
    "English", "GED", "physical science", "reading", "spelling"
    ), class = "factor")), .Names = c("V31", "V32", "V33", "V34", 
    "V35", "V36", "V37", "V38"), class = "data.frame", row.names = c(NA, 
    -10L))

【问题讨论】:

  • 嗨,对不起,但这没有多大意义。您描述为输出的内容与示例数据不匹配。另外,您要将它们组合成vector 还是data.frame?为了清楚起见,请考虑修改您的问题。

标签: r dataframe


【解决方案1】:

假设您的数据分配给x,那么以下实现了我认为您所追求的:

apply(x, 1, function(r) {tmp <- unique(r); tmp[tmp != 0]})

apply 作用于数据框的每一行,获取每一行中的唯一元素并删除 0 条目。结果是一个不同长度的向量列表,每行具有唯一的非零元素。

【讨论】:

  • 小错字:apply(x 我想应该是apply(tmp
  • 更简洁的写法是apply(df, 1, function(r) unique(r[r != 0]))
  • @SlowLearner 没有错字!! xapply 函数结束的对象。要适用于 OP 的数据,它应该是 apply(df...),但我认为 sean 给出了一般情况。
  • @SimonO101 啊,是的,当我查看它时,我实际上将我的数据命名为tmp 而不是x,所以这当然有效,但我搞错了。很抱歉弄混了水,感谢您的澄清。
  • 谢谢!效果很好!
【解决方案2】:

在第一篇文章中,我没有正确理解所需的输出,稍微不同的方法是跨行使用%in% 运算符,如下所示:

df$keywords <- apply(df,1, function(x) c( x[! x %in% "0"]))
df$keywords
#                                                                                                            keywords
#1                                                                                                                    
#2                                                    ACT Science, study skills, MCAT, ACT Science, study skills, MCAT
#3      Microsoft PowerPoint, prealgebra, precalculus, reading, Microsoft PowerPoint, prealgebra, precalculus, reading
#4                                physics, prealgebra, precalculus, reading, physics, prealgebra, precalculus, reading
#5                                    reading, trigonometry, writing, English, reading, trigonometry, writing, English
#6                                                            writing, geography, English, writing, geography, English
#7  proofreading, SAT reading, SAT writing, physical science, proofreading, SAT reading, SAT writing, physical science
#8                                physics, prealgebra, precalculus, reading, physics, prealgebra, precalculus, reading
#9                                            English, Italian, proofreading, GED, English, Italian, proofreading, GED
#10                           English, literature, proofreading, spelling, English, literature, proofreading, spelling

如果你想要unique 每行的技能集,只需像这样添加命令unique

df$keywords <- apply(df,1, function(x) c( unique(x[ ! x %in% "0" ] ) ) )
df["keywords"]
#                                                  keywords
#1                                                          
#2                           ACT Science, study skills, MCAT
#3    Microsoft PowerPoint, prealgebra, precalculus, reading
#4                 physics, prealgebra, precalculus, reading
#5                   reading, trigonometry, writing, English
#6                               writing, geography, English
#7  proofreading, SAT reading, SAT writing, physical science
#8                 physics, prealgebra, precalculus, reading
#9                       English, Italian, proofreading, GED
#10              English, literature, proofreading, spelling

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2011-12-02
    • 1970-01-01
    • 2013-07-24
    • 1970-01-01
    • 1970-01-01
    • 2021-12-02
    • 1970-01-01
    相关资源
    最近更新 更多