【问题标题】:Why does cSplit returns TRUE instead of the character为什么 cSplit 返回 TRUE 而不是字符
【发布时间】:2019-05-07 17:06:41
【问题描述】:

我有这个(简化的)数据集:

x <- read.table(text  = '  id                                                 seq
1  1 AACCAAGCCCTTGCTCAAATCGAAAAAAAGTTGAGCAAACCGAGTTTTGAG
2  2 AAGTTGAGCAAACCGAGTTTTGAGACTTGGATGAAGTCAACCAAAGCCCAC')

看起来像这样:

  id                                                 seq
1  1 AACCAAGCCCTTGCTCAAATCGAAAAAAAGTTGAGCAAACCGAGTTTTGAG
2  2 AAGTTGAGCAAACCGAGTTTTGAGACTTGGATGAAGTCAACCAAAGCCCAC

当我随后对其进行 cSplit 处理时: cSplit(x, 'seq', direction = 'wide', stripWhite = FALSE, sep = '') 它为位置 20 和 32 返回 TRUE 而不是字符本身:

   id seq_01 seq_02 seq_03 seq_04 seq_05 seq_06 seq_07 seq_08 seq_09 seq_10 seq_11 seq_12 seq_13 seq_14 seq_15 seq_16 seq_17 seq_18
1:  1      A      A      C      C      A      A      G      C      C      C      T      T      G      C      T      C      A      A
2:  2      A      A      G      T      T      G      A      G      C      A      A      A      C      C      G      A      G      T
   seq_19 seq_20 seq_21 seq_22 seq_23 seq_24 seq_25 seq_26 seq_27 seq_28 seq_29 seq_30 seq_31 seq_32 seq_33 seq_34 seq_35 seq_36
1:      A   TRUE      C      G      A      A      A      A      A      A      A      G      T   TRUE      G      A      G      C
2:      T   TRUE      T      G      A      G      A      C      T      T      G      G      A   TRUE      G      A      A      G
   seq_37 seq_38 seq_39 seq_40 seq_41 seq_42 seq_43 seq_44 seq_45 seq_46 seq_47 seq_48 seq_49 seq_50 seq_51
1:      A      A      A      C      C      G      A      G      T      T      T      T      G      A      G
2:      T      C      A      A      C      C      A      A      A      G      C      C      C      A      C

(如果我改为将 direction = 'wide' 更改为 direction = 'long' 并使用 tidyr::spread 自己传播它看起来不错)

【问题讨论】:

    标签: r split


    【解决方案1】:

    问题在于type.convert,默认情况下为TRUE。因此,如果一列中只有TF,它认为TRUE/FALSE 而不是字符串“T”或“F”,并将其转换为logical 类型

    library(splitstackshape)
    cSplit(x, 'seq', direction = 'wide', stripWhite = FALSE,
         sep = '', type.convert = FALSE)
    # id seq_01 seq_02 seq_03 seq_04 seq_05 seq_06 seq_07 seq_08 seq_09 seq_10 seq_11 seq_12 seq_13 seq_14 seq_15
    #1:  1      A      A      C      C      A      A      G      C      C      C      T      T      G      C      T
    #2:  2      A      A      G      T      T      G      A      G      C      A      A      A      C      C      G
    #   seq_16 seq_17 seq_18 seq_19 seq_20 seq_21 seq_22 seq_23 seq_24 seq_25 seq_26 seq_27 seq_28 seq_29 seq_30
    #1:      C      A      A      A      T      C      G      A      A      A      A      A      A      A      G
    #2:      A      G      T      T      T      T      G      A      G      A      C      T      T      G      G
    #   seq_31 seq_32 seq_33 seq_34 seq_35 seq_36 seq_37 seq_38 seq_39 seq_40 seq_41 seq_42 seq_43 seq_44 seq_45
    #1:      T      T      G      A      G      C      A      A      A      C      C      G      A      G      T
    #2:      A      T      G      A      A      G      T      C      A      A      C      C      A      A      A
    #   seq_46 seq_47 seq_48 seq_49 seq_50 seq_51
    #1:      T      T      T      G      A      G
    #2:      G      C      C      C      A      C
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-03-21
      • 1970-01-01
      • 1970-01-01
      • 2015-06-21
      • 2010-11-10
      • 1970-01-01
      • 2012-04-12
      • 1970-01-01
      相关资源
      最近更新 更多