将文本拆分为字符和数字答案

【问题标题】：splitting text into character and numeric将文本拆分为字符和数字
【发布时间】：2018-03-01 22:21:09
【问题描述】：

谁能帮我拆分这个字符串：

string <- "Rolling in the deep    $15.25"

我正在尝试从中获得两个输出：

1) Rolling in the Deep  # character
2) 15.25                # numeric value

我知道如何在 excel 中执行此操作，但对 R 有点迷茫

【问题讨论】：

您是否要提取数值？
不，我想要两个，分别是字符和值
在这种特殊情况下提取数据很容易，但对于一般情况，我们需要更多数据。如果您认为合适，请提供更多。例如，我们可以期望一个数值作为字符串中的最后一个条目吗？它总是以美元表示吗？
我已经更新了我的 string 变量 - 很抱歉我第一次错过了

标签： r stringr readr

【解决方案1】：

使用strsplit 可以解决问题。解决方案如下：

string <- "Rolling in the deep    $15.25"

strsplit(string, "\\s+\\$")
                    ^   ^___ find a $ (escaped with \\ because $ means end of word)
                     \______ find 1 or more whitespaces
# Result
#"Rolling in the deep" "15.25"

strsplit(string, "\\s+\\$")[[1]][1]
#[1] "Rolling in the deep"

strsplit(string, "\\s+\\$")[[1]][2]
#[1] "15.25"

【讨论】：

也许你可以解释一下这里使用的正则表达式。
@RomanLuštrik 非常感谢。我曾计划根据您的建议对其进行编辑，但它搁置了好几个小时。

【解决方案2】：

只要右侧总是以美元符号开头，您就需要“转义”美元符号。试试这个：

# you will need stringr, which you could load alone but the tidyverse is amazing
library(tidyverse)
string <- "Rolling in the deep    $15.25"
str_split_fixed(string, "\\$", n = 2)

【讨论】：

【解决方案3】：

以下是仅使用正则表达式提取信息的方法：

x <- c("Rolling in the deep    $15.25",
       "Apetite for destruction    $20.00",
       "Piece of mind    $19")

rgx <- "^(.*)\\s{2,}(\\$.*)$"
data.frame(album = trimws(gsub(rgx, "\\1", x)),
           price = trimws(gsub(rgx, "\\2", x))
           )

                    album  price
1     Rolling in the deep $15.25
2 Apetite for destruction $20.00
3           Piece of mind    $19

【讨论】：