重命名行名答案

【问题标题】：Rename rownames重命名行名
【发布时间】：2020-02-17 15:25:09
【问题描述】：

我想通过删除行名的公共部分来重命名行名

          a b  c
CDA_Part  1 4  4
CDZ_Part  3 4  4
CDX_Part  1 4  4

结果

     a b  c
CDA  1 4  4
CDZ  3 4  4
CDX  1 4  4

【问题讨论】：

你是不是也在问如何通过R代码来识别公共部分？

标签： r rowname

【解决方案1】：

1.创建minimal reproducible example:

df <- data.frame(a = 1:3, b = 4:6)
rownames(df) <- c("CDA_Part", "CDZ_Part", "CDX_Part")

df

         a b
CDA_Part 1 4
CDZ_Part 2 5
CDX_Part 3 6

2.建议使用基础Rs gsub的解决方案：

rownames(df) <- gsub("_Part", "", rownames(df), fixed=TRUE)

df

    a b
CDA 1 4
CDZ 2 5
CDX 3 6

解释：

gsub 使用regex 来识别和替换部分字符串。前三个参数是：

pattern 要替换的模式 - 即“_Part”
replacement 要用作替换的字符串 - 即空字符串“”
x 我们要替换的字符串 - 即行名

附加参数（不在前 3 个中）：

fixed 表示 pattern 是正则表达式还是“只是”一个普通字符串 - 即只是一个字符串

【讨论】：

使用fixed = TRUE 获得更快的速度和精确的字符串匹配
感谢您的评论，我正在编辑我的答案。
我不知道这里是否真的需要它，但您的答案是找不到行名的公共部分而不是输入它

【解决方案2】：

您可以尝试这种方法，您可以使用 Reduce 和 intersect 来确定名称中的公共部分，注意我在这里假设您的数据集中具有如下结构，其中下划线是两个单词之间的分隔符。此解决方案适用于 word_commonpart 或 commonpart_word，如下例所示。

逻辑：使用 strsplit，拆分列基础下划线（也不吃下划线，所以使用环视零宽度断言），现在使用 Reduce 来查找所有行名的字符串之间的交集。然后将找到的内容粘贴为带有管道分隔项的正则表达式，并使用 gsub 替换为 Nothing。

输入：

structure(list(a = 1:4, b = 4:7), class = "data.frame", row.names = c("CDA_Part", 
"CDZ_Part", "CDX_Part", "Part_ABC"))

解决方案：

red <- Reduce('intersect', strsplit(rownames(df),"(?=_)",perl=T)) 
##1. determining the common parts
e <- expand.grid(red, red) 
##2. getting all the combinations of underscores and the remaining parts
rownames(df) <- gsub(paste0(do.call('paste0', e[e$Var1!=e$Var2,]), collapse = "|"), '', rownames(df)) 
##3. filtering only those combinations which are different and pasting together using do.call
##4. using paste0 to get regex seperated by pipe
##5.replacing the common parts with nothing here

输出：

> df
#        a b
#    CDA 1 4
#    CDZ 2 5
#    CDX 3 6
#    ABC 4 7

【讨论】：