将具有因子级别的变量列添加到数据框中答案

【问题标题】：adding variable column with factor levels to a dataframe将具有因子级别的变量列添加到数据框中
【发布时间】：2015-09-04 09:54:58
【问题描述】：

我有一个包含两个嵌套因子的实验。例如，性别（1,2）和条件（1,2），比如：

    factor A factor B
    male     cond.1
    male     cond.2
    female   cond.1
    female   cond.2

不幸的是，我用来导出因变量值的程序结合了标题中的因子水平，例如

    male_cond.1, male_cond.2, female_cond.1, female_cond.2
    456        , 5654       , 566          , 456
       ...           ...          ...            ...

这很不方便，因为当我将数据框融合为 ANOVA 适当的长格式时，我无法再根据因子的不同级别分离数据。它看起来像这样：

    1st column,    2nd column (DV)
    male_cond.1,   454
    male_cond.2,   5654
    female_cond.1, 566
    female_cond.2, 456

那么，如何插入两个新列，它们的长度与数据框的长度一样长，重复我的因子值？这两列应如下所示：

    1st column (gender), 2nd column (condition),  
    male,                cond.1               
    male,                cond.2          
    femal,               cond.1         
    female,              cond.2            
      ...                 ...

我自己的数据框有四个因素：电极(63) x soa(2) x stimulstype(3) x itemtype(2)。这是我原来的数据框的样子：

    File Fp1.PD_ShortSOA_FAM Fp1.PD_LongSOA_FAM Fp1.PD_ShortSOA_SEMplus_REAL Fp1.PD_ShortSOA_SEMplus_FICT
    sub0001            0,446222          2,524,804            0,272959                    1,281,349
    sub0002           1,032,688          2,671,048           1,033,278                    1,217,817

然后这就是转置的样子：

    row.names                            V1         V2
    File                            sub0001    sub0002
    Fp1.PD_ShortSOA_FAM            0,446222  1,032,688
    Fp1.PD_LongSOA_FAM            2,524,804  2,671,048
    Fp1.PD_ShortSOA_SEMplus_REAL   0,272959  1,033,278
    Fp1.PD_ShortSOA_SEMplus_FICT  1,281,349  1,217,817
    Fp1.PD_ShortSOA_SEMminus_REAL  0,142739  1,405,100
    Fp1.PD_ShortSOA_SEMminus_FICT 1,515,577 -1,990,458

我希望我的因子列显示为：

    electrode, SOA, stimulustype, itemtype
    Fp1.    ShortSOA  FAM            
    Fp1.    LongSOA   FAM           
    Fp1.    ShortSOA  SEMplus       REAL   
     ...       ...     ...           ...

我尝试使用this post 中的“strsplit”，但没有成功。

【问题讨论】：

标签： r string dataframe

【解决方案1】：

Melting 让您快到了，您只需将 variable 列解析为单独的列。

library(reshape2)

d <- transform(melt(yourdf, id = NULL),
               gender = gsub("_.*$", "", variable),
               condition = gsub("^[^_]*_", "", variable))

d
#        variable value   gender condition
# 1   male_cond.1   456     male    cond.1
# 2   male_cond.2  5654     male    cond.2
# 3 female_cond.1   566   female    cond.1
# 4 female_cond.2   456   female    cond.2

这使用正则表达式替换通过删除 _ 之后的所有内容来获取因子 A（性别）和通过删除 _ 之前的所有内容来获取因子 B（条件）。

如果您希望列按特定顺序排列，只需执行以下操作：

d <- transform(melt(yourdf, id = NULL),
               gender = gsub("_.*$", "", variable),
               condition = gsub("^[^_]*_", "", variable),
               DV = value)

d <- d[, -c(1, 2)]

d
#   gender condition   DV
# 1   male    cond.1  456
# 2   male    cond.2 5654
# 3 female    cond.1  566
# 4 female    cond.2  456

【讨论】：

感谢您的回答，但不幸的是，当我将其调整为我自己的数据和变量时它不起作用。
@stevezissou：那么您能否在帖子中提供更能代表您实际拥有的数据的示例数据？
@stevezissou：太好了，谢谢。变量是字符还是数字？（您可以通过str(yourdata)查看。）
第一个变量列（即编辑称为“row.names”似乎没有被任何函数调用，例如 p1t[1,1] 返回“sub0001”而不是“ File”，同样 str 以列 V1 开头，而不是 row.names。你知道这是为什么吗？
你可以在这里看到我所拥有的：现在我一直在 excel stackoverflow.com/questions/30940492/… 中设计我的因子列