【问题标题】:removing duplicates from a column R从列 R 中删除重复项
【发布时间】:2017-07-25 09:25:03
【问题描述】:

我有一列包含不同长度的 Id 列表,其中一些 Id 具有版本号。

rownames(x)

"ENSP00000424360.1-D4"
"ENSP00000424360.2-D4"
"ENSP00000424360.3-D4"
"ENSP00000437781-D59"
"XP_010974537.1"
"XP_010974538.1"
"XP_010974538.2"

我想把这些改成:

"ENSP00000424360"
"ENSP00000424360.1"
"ENSP00000424360.2"
"ENSP00000437781"
"XP_010974537"
"XP_010974538"
"XP_010974538.1"

我可以单独使用 ENSxxXPxx 转换

make.unique(substr(rownames(x),1,15))

make.unique(substr(rownames(dds),1,12)) 

如何更改代码以获得所需的结果?

【问题讨论】:

    标签: r unique substr


    【解决方案1】:

    我们删除带有sub 的子字符串并应用make.unique

    make.unique(sub("-.*$", "", sub("\\..*", "", rownames(x))))
    #[1] "ENSP00000424360"   "ENSP00000424360.1" "ENSP00000424360.2"
    #[4] "ENSP00000437781"   "XP_010974537"      "XP_010974538"      "XP_010974538.1"   
    

    数据

    x <- structure(list(v1 = 1:7), .Names = "v1", row.names = c("ENSP00000424360.1-D4", 
     "ENSP00000424360.2-D4", "ENSP00000424360.3-D4", "ENSP00000437781-D59", 
     "XP_010974537.1", "XP_010974538.1", "XP_010974538.2"), class = "data.frame")
    

    【讨论】:

    • 很抱歉,我的问题(输出部分)有误。现在我已经编辑了它。
    猜你喜欢
    • 2015-09-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-09-30
    • 2018-04-08
    • 1970-01-01
    • 2020-04-27
    • 2021-04-02
    相关资源
    最近更新 更多