【问题标题】:How to split a character column into two columns by removing the brackets in R?如何通过删除R中的括号将字符列分成两列?
【发布时间】:2017-02-25 02:15:00
【问题描述】:

我有一个关于每个地理区域或理事会的社会护理支出的理事会数据,如下所示:

Council                     Expenditure
Cumbria (102)               100
South Tyneside (109)        200
Bexley (718)                150
Nottingham (512)            178

正如您在数据框的Council 列中看到的,您在括号中给出了理事会名称及其各自的代码,即 (102)、(109) 等。

但我想将理事会名称及其各自的代码分成两个不同的列,并删除理事会代码周围的括号,使其看起来更像这样:

Council          Council Code                 Expenditure
Cumbria          102                          100
South Tyneside   109                          200
Bexley           718                          150
Nottingham       178                          178

我在 Stackoverflow 上查看了其他类似的帖子来解决这类问题,并使用了字符串操作数组,例如 strsplit()gsub() 等,但无济于事。我尤其对括号有困难。

您能否建议我如何在 R 中执行此操作?

【问题讨论】:

    标签: r string dataframe split


    【解决方案1】:

    这是使用groupingregular expression 完成它的一种方法:

    数据:

    Council <- read.table(
      text = "Council,Expenditure
    Cumbria (102),100
    South Tyneside (109),200
    Bexley (718),150
    Nottingham (512),78",
      header = T,
      sep = ",",
      stringsAsFactors = F
    )
    

    代码:

    Council <- transform(Council,
           # Get the Coucil_Code column
           Council_Code = as.numeric(gsub("([^\\d]+)(\\d+)(\\))","\\2",
                                                   Council, 
                                                   perl = T)),
           # Clean up the Council column
           Council = trimws(gsub("([a-zA-z\\s]+)([\\d\\(\\)]+)","\\1",
                                          Council, 
                                          perl = T))
    )
    

    输出:

     Council        Expenditure Council_Code
     Cumbria        100         102         
     South Tyneside 200         109         
     Bexley         150         718         
     Nottingham      78         512 
    

    我希望这会有所帮助。

    【讨论】:

    • 谢谢阿布杜。这确实很有帮助,并且代码有效!
    【解决方案2】:

    使用gsub

    res <- setNames(data.frame(trimws(gsub("[[:digit:]\\()]","",df$Council))
                        , df$Expenditure, gsub("[^[:digit:]]","",df$Council)),
                    c("Council","Expenditure","Council Code"))
    
    #         Council Expenditure Council Code
    #1        Cumbria         100          102
    #2 South Tyneside         200          109
    #3         Bexley         150          718
    #4     Nottingham          78          512
    
    • [[:digit:]\\()]: 只提取名字
    • [^[:digit:]]:提取数字

    【讨论】:

    • 非常感谢。这确实很有帮助。
    【解决方案3】:

    tidyr 选项是 extract

    library(tidyr)
    extract(df1, Council, into = c("Council", "CouncilCode"), "([^(]+)\\s+\\(([0-9]+).")
    #         Council CouncilCode Expenditure
    #1        Cumbria         102         100
    #2 South Tyneside         109         200
    #3         Bexley         718         150
    #4     Nottingham         512          78
    

    【讨论】:

    • 谢谢!代码非常简洁明了。 tidyr 是一个非常有用的软件包。我需要更频繁地练习和使用 tidyr 来清理数据。
    【解决方案4】:
    library(reshape2)
    colsplit(string = gsub(pattern = "\\(|\\)",replacement = "",x = Council$Council),
         pattern = " ",names = c("Council","Council_code"))
    

    结果:

        Council Council_code
    1. Cumbria          102
    2. South Tyneside   109
    3. Bexley           718
    4. Nottingham       512
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-06-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多