【问题标题】:Splitting character with different string lengths while keeping the positional meaning在保持位置含义的同时拆分具有不同字符串长度的字符
【发布时间】:2021-10-20 04:24:58
【问题描述】:

我有一个对位置进行编码的字符列。位置可以是“城市、州、国家”或“州、国家”或“国家”的形式。我想将它们分成三列,但要确保如果元素少于三个,则每个值都会进入正确的州或国家/地区列。

这是我的数据和我尝试过的:

tib <- tribble(~obs, ~location,
1, "Miami, Florida, United States",
2, "Astrakhan Oblast, Russia",
3, "Mozambique")

separate(tib, location, c("city", "state", "country"), ", ")

结果:

# A tibble: 3 × 4
    obs city             state   country      
  <dbl> <chr>            <chr>   <chr>        
1     1 Miami            Florida United States
2     2 Astrakhan Oblast Russia  NA           
3     3 Mozambique       NA      NA       

从某种意义上说,我想以相反的顺序运行separate 函数,以便结果如下所示:

# A tibble: 3 × 4
    obs city  state            country      
  <dbl> <chr> <chr>            <chr>        
1     1 Miami Florida          United States
2     2 NA    Astrakhan Oblast Russia
3     3 NA    NA               Mozambique

更新:

这是一个可行的选择,但我希望更简单:

tib %>% mutate(country = str_extract(location, "[A-Za-z ]+$"), 
state = str_extract(location, "(?<=\\,)[A-Za-z ]+(?=\\,)"), 
city = str_extract(location, "^[A-Za-z ]+(?=\\,)"))

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    对于您的具体示例,您可以使用separate() 中的fill 参数更改为从左侧而不是右侧填充缺失值。

    tidyr::separate(tib, location, c("city", "state", "country"), ", ", fill = "left")
    # A tibble: 3 x 4
        obs city  state            country      
      <dbl> <chr> <chr>            <chr>        
    1     1 Miami Florida          United States
    2     2 NA    Astrakhan Oblast Russia       
    3     3 NA    NA               Mozambique   
    

    【讨论】:

      【解决方案2】:

      这是使用separate_rows 的另一种方法:这是在亲爱的@akrun Using complete to fill groups with NA to have same length as the maximum group 的帮助下创建的。第一次尝试是使用complete

      library(dplyr)
      library(tidyr)
        tib %>%    
          separate_rows("location", sep = ", ") %>% 
          group_by(obs) %>% 
          mutate(new = rev(c("country", "state", "city")[row_number()])) %>% 
          ungroup %>% 
          pivot_wider(names_from = new, values_from = location)
      

      输出:

          obs city  state            country      
        <dbl> <chr> <chr>            <chr>        
      1     1 Miami Florida          United States
      2     2 NA    Astrakhan Oblast Russia       
      3     3 NA    NA               Mozambique   
      

      【讨论】:

        猜你喜欢
        • 2021-01-19
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-07-29
        • 2021-09-10
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多