sub 可以用正确的正则表达式做到这一点
Text = c("Positron emission tomography using flutemetamol (18F)
with computed tomography of brain (procedure)",
"Urinary tract infection prophylaxis (procedure)",
"Xanthoma of eyelid (disorder)",
"Ventricular tachyarrhythmia (disorder)",
"Abnormal urine odor (finding)",
"Coloboma of iris (disorder)",
"Macroencephaly (disorder)",
"Right main coronary artery thrombosis (disorder)")
sub(".*\\((.*)\\).*", "\\1", Text)
[1] "procedure" "procedure" "disorder" "disorder" "finding" "disorder"
[7] "disorder" "disorder"
附录:正则表达式的详细解释
该问题要求在字符串中查找 final 组括号的内容。这个表达式有点混乱,因为它包含了两种不同的括号用法,一种是表示正在处理的字符串中的括号,另一种是设置一个“捕获组”,即我们指定表达式应该返回什么部分的方式.表达式由五个基本单元组成:
1. Initial .* - matches everything up to the final open parenthesis.
Note that this is relying on "greedy matching"
2. \\( ... \\) - matches the final set of parentheses.
Because ( by itself means something else, we need to "escape" the
parentheses by preceding them with \. That is we want the regular
expression to say \( ... \). However, the way R interprets strings,
if we just typed \( and \), R would interpret the \ as escaping the (
and so interpret this as just ( ... ). So we escape the backslash.
R will interpret \\( ... \\) as \( ... \) meaning the literal
characters ( & ).
3. ( ... ) Inside the pair in part 2
This is making use of the special meaning of parentheses. When we
enclose an expression in parentheses, whatever value is inside them
will be stored in a variable for later use. That variable is called
\1, which is what was used in the substitution pattern. Again, is
we just wrote \1, R would interpret it as if we were trying to escape
the 1. Writing \\1 is interpreted as the character \ followed by 1,
i.e. \1.
4. Central .* Inside the pair in part 3
This is what we are looking for, all characters inside the parentheses.
5. Final .*
This is in the expression to match any characters that may follow the
final set of parentheses.
子函数将使用它来用替换模式\1替换匹配的模式(在这种情况下,字符串中的所有字符),即包含第一个(仅在我们的情况下)捕获中的任何内容的变量的内容group - 最后括号内的东西。