【问题标题】:split a string but keep together certain substrings拆分字符串但将某些子字符串保持在一起
【发布时间】:2020-05-28 16:13:28
【问题描述】:

我想通过某些分隔字符(即空格、逗号和分号)来拆分数据框中的字符列。但是,我想从拆分中排除某些短语(在我的示例中,我想排除“我的测试”)。

我设法得到了普通的字符串拆分,但不知道如何排除某些短语。

library(tidyverse)

test <- data.frame(string = c("this is a,test;but I want to exclude my test",
                              "this is another;of my tests",
                              "this is my 3rd test"),
                   stringsAsFactors = FALSE)

test %>%
  mutate(new_string = str_split(test$string, pattern = " |,|;")) %>%
  unnest_wider(new_string)

这给出了:

# A tibble: 3 x 12
  string                                       ...1  ...2  ...3    ...4  ...5  ...6  ...7  ...8  ...9    ...10 ...11
  <chr>                                        <chr> <chr> <chr>   <chr> <chr> <chr> <chr> <chr> <chr>   <chr> <chr>
1 this is a,test;but I want to exclude my test this  is    a       test  but   I     want  to    exclude my    test 
2 this is another;of my tests                  this  is    another of    my    tests NA    NA    NA      NA    NA   
3 this is my 3rd test                          this  is    my      3rd   test  NA    NA    NA    NA      NA    NA

但是,我想要的输出是(不包括“我的测试”):

# A tibble: 3 x 12
  string                                       ...1  ...2  ...3    ...4  ...5      ...6  ...7  ...8  ...9    ...10
  <chr>                                        <chr> <chr> <chr>   <chr> <chr>     <chr> <chr> <chr> <chr>   <chr>
1 this is a,test;but I want to exclude my test this  is    a       test  but       I     want  to    exclude my test 
2 this is another;of my tests                  this  is    another of    my tests  NA    NA    NA    NA      NA   
3 this is my 3rd test                          this  is    my      3rd   test      NA    NA    NA    NA      NA

有什么想法吗? (附带问题:知道如何命名 unnest_wider 中的列吗?)

【问题讨论】:

    标签: r dplyr tidyr strsplit


    【解决方案1】:

    一个简单的解决方法是添加 _ 并稍后将其删除:

    test %>%
      mutate(string = gsub("my test", "my_test", string),
        new_string = str_split(string, pattern = "[ ,;]")) %>%
      unnest_wider(new_string) %>%
      mutate_all(~ gsub("my_test", "my test", .x))
    

    为了给列赋予更有意义的名称,您可以使用pivot_wider 中的其他选项。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-12-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-12-27
      • 2021-01-19
      • 1970-01-01
      相关资源
      最近更新 更多