R 删除附加的数字答案

【问题标题】：R Removing attached numbersR 删除附加的数字
【发布时间】：2018-10-03 18:42:16
【问题描述】：

我正在清理一些数据，并且整个单元格中都有我想要删除的脚注编号。行名中也有使用数字的单元格，所以我不能只提取单词。

data <- data.frame(Characteristic =  c('Race3 and Origin', 'Sex','Age 18 to
45', 'Age 55 and older'), Number =  c(40, 50, 60, 1), Margin4 = c(12, 22, 5,
1))

data$Characteristic <- as.character(data$Characteristic)

我尝试了多种模式，最近一次：

df$Characteristic <- str_extract_all(df$Characteristic, "([:alpha:]* 
[:space:]?\\d{2,})|([:alpha:]*)|[:space:]")

但这给我留下了<chr [2]>的列表

执行 str_extract（没有全部）只返回第一个单词。

我错过了什么？

【问题讨论】：

如果要删除，为什么要使用str_extract_all？您的模式无助于了解您遇到的问题。
预期输出是什么
我使用 str_extract/_all 仅保留单词、空格和未附加的数字（将数字附加到单词，但将单词保留在结果列之外）。
尝试去掉粘在字母上的数字，data$Characteristic <- str_replace_all(data$Characteristic, "(?<=\\p{L})\\d+", "")
@ajbentley 我发布了基本 R 和 stringr 解决方案。

标签： r regex stringr

【解决方案1】：

您可以使用

删除所有粘在字母上的数字（在单词的末尾）

data$Characteristic <- gsub("(?<=\\p{L})\\d+\\b", "", data$Characteristic, perl=TRUE)

或者

library(stringr)
data$Characteristic <- str_replace_all(data$Characteristic, "(?<=\\p{L})\\d+\\b", "")

模式匹配

(?<=\\p{L}) - 任何以字母开头的位置
\\d+ - 1 位或多位数字
\\b - 单词边界。

见regex demo

【讨论】：

【解决方案2】：

这是你想要的吗？

sub("([a-zA-Z]*)[0-9]*(\\s*\\s)","\\1\\2"  , data$C)

[1] "Race and Origin"  "Sex"              "Age 18 to\n45"    "Age 55 and older"

【讨论】：