【问题标题】:Splitting values in different columns in R拆分R中不同列中的值
【发布时间】:2020-10-04 20:42:01
【问题描述】:

我的数据集中的一列包含类似的值

utm_source=google&utm_medium=cpc&utm_campaign=1234567&utm_term=brand%20&utm_content=Brand&gclid=ERtyuiipotf_YTj

我应该如何用它在 R 中的值将其拆分为不同的列?

utm_source utm_medium  utm_campaign utm_brand  utm_content
  google      cpc          1234567   brand%20     Brand

dput(column) 给出以下输出

structure(list("null", "gclid=ertyyhglkdl-kjkY", 
    "utm_source=google&utm_medium=cpc&utm_campaign=1234556&utm_term=brand%20shirts&utm_content=Brand&gclid=jhajsgjdgd_ajs", 
    "utm_source=google&utm_medium=cpc&utm_campaign=1674814043&utm_term=brand%20shirts&utm_content=Brand&gclid=KvgMsEAAYASAAEgLq6vD_BwE", 
    "null", "null", "null", "null", "null", "null", "null", "null", 
    "null", "null", "utm_source=fb&utm_medium=ctw&utm_campaign=Shirt_rem&utm_content=CasciaShirt"), class = c("extracted", 
"list"))

【问题讨论】:

  • 签出separate in dplyr
  • Example @AmadouKone?..假设我的数据集是 Mydata,列名是 col1,我希望将它拆分为不同的列,如 utm_source、utm_campaign 等。

标签: r split strsplit


【解决方案1】:

我不确定这是否是预期的输出。以下可能是您目标的基本 R 选项

Reduce(
  function(...) merge(..., all = TRUE),
  lapply(
    column,
    function(x) {
      u <- unlist(strsplit(x, "&"))
      setNames(data.frame(as.list(gsub(".*=", "", u))), gsub("=.*", "", u))
    }
  )
)

给了

  utm_source utm_medium utm_campaign utm_content null                    gclid
1         fb        ctw    Shirt_rem CasciaShirt <NA>                     <NA>
2     google        cpc      1234556       Brand <NA>           jhajsgjdgd_ajs
3     google        cpc   1674814043       Brand <NA> KvgMsEAAYASAAEgLq6vD_BwE
4       <NA>       <NA>         <NA>        <NA> null         ertyyhglkdl-kjkY
        utm_term
1           <NA>
2 brand%20shirts
3 brand%20shirts
4           <NA>

更新

如果你想保留所有数据即使是null,你可以试试下面的代码

Reduce(
  function(x, y) {
    if (all(is.na(x)) | all(is.na(y))) {
      return(rbind(x, y))
    }
    dplyr::full_join(x, y)
  },
  lapply(
    column,
    function(x) {
      if (x == "null") {
        return(NA)
      }
      u <- unlist(strsplit(x, "&"))
      setNames(data.frame(as.list(gsub(".*=", "", u))), gsub("=.*", "", u))
    }
  )
)

给了

                      gclid utm_source utm_medium utm_campaign       utm_term
1                      <NA>       <NA>       <NA>         <NA>           <NA>
2          ertyyhglkdl-kjkY       <NA>       <NA>         <NA>           <NA>
3            jhajsgjdgd_ajs     google        cpc      1234556 brand%20shirts
4  KvgMsEAAYASAAEgLq6vD_BwE     google        cpc   1674814043 brand%20shirts
5                      <NA>       <NA>       <NA>         <NA>           <NA>
6                      <NA>       <NA>       <NA>         <NA>           <NA>
7                      <NA>       <NA>       <NA>         <NA>           <NA>
8                      <NA>       <NA>       <NA>         <NA>           <NA>
9                      <NA>       <NA>       <NA>         <NA>           <NA>
10                     <NA>       <NA>       <NA>         <NA>           <NA>
11                     <NA>       <NA>       <NA>         <NA>           <NA>
12                     <NA>       <NA>       <NA>         <NA>           <NA>
13                     <NA>       <NA>       <NA>         <NA>           <NA>
14                     <NA>       <NA>       <NA>         <NA>           <NA>
15                     <NA>         fb        ctw    Shirt_rem           <NA>
   utm_content
1         <NA>
2         <NA>
3        Brand
4        Brand
5         <NA>
6         <NA>
7         <NA>
8         <NA>
9         <NA>
10        <NA>
11        <NA>
12        <NA>
13        <NA>
14        <NA>
15 CasciaShirt

【讨论】:

  • 这是有效的@ThomasIsCoding..但它只给我那些有值的列..即使有空值,我如何提取完整的结果,我希望它们在数据中
【解决方案2】:

使用 OP 的更新示例为 list,我们循环遍历 listif 元素不是 "null",然后创建一个 tibble,将 &amp; 处的列拆分为 @987654327 @ 然后将该列拆分为多个列 (separate),使用 as_tibble_row 从命名向量 (deframe) 创建一个 tibble)

library(dplyr)
library(tidyr)
library(tibble)
library(purrr)
map_dfr(lst1, ~ if(.x != "null") tibble(col1 = .x) %>% 
             separate_rows(col1, sep="&") %>% 
             separate(col1, into = c('col1', 'col2'), sep="\\=") %>%
             deframe %>% 
             as_tibble_row())

-输出

# A tibble: 4 x 6
#  gclid                    utm_source utm_medium utm_campaign utm_term       utm_content
#  <chr>                    <chr>      <chr>      <chr>        <chr>          <chr>      
#1 ertyyhglkdl-kjkY         <NA>       <NA>       <NA>         <NA>           <NA>       
#2 jhajsgjdgd_ajs           google     cpc        1234556      brand%20shirts Brand      
#3 KvgMsEAAYASAAEgLq6vD_BwE google     cpc        1674814043   brand%20shirts Brand      
#4 <NA>                     fb         ctw        Shirt_rem    <NA>           CasciaShirt

或者,我们可以将list 转换为data.frame 中的一列,而不是循环执行此操作,执行一次并转换为宽格式

library(data.table)
keep(lst1, ~ .x != "null") %>%
     flatten_chr %>% 
     tibble(col1 = .) %>%
     mutate(rn = row_number()) %>% 
     separate_rows(col1, sep='&') %>% 
     separate(col1, into = c('col1', 'col2'), sep="\\=") %>%
     pivot_wider(names_from = col1, values_from = col2) %>% 
     select(-rn)
# A tibble: 4 x 6
#  gclid                    utm_source utm_medium utm_campaign utm_term       utm_content
#  <chr>                    <chr>      <chr>      <chr>        <chr>          <chr>      
#1 ertyyhglkdl-kjkY         <NA>       <NA>       <NA>         <NA>           <NA>       
#2 jhajsgjdgd_ajs           google     cpc        1234556      brand%20shirts Brand      
#3 KvgMsEAAYASAAEgLq6vD_BwE google     cpc        1674814043   brand%20shirts Brand      
#4 <NA>                     fb         ctw        Shirt_rem    <NA>           CasciaShirt

数据

lst1 <- structure(list("null", "gclid=ertyyhglkdl-kjkY", "utm_source=google&utm_medium=cpc&utm_campaign=1234556&utm_term=brand%20shirts&utm_content=Brand&gclid=jhajsgjdgd_ajs", 
    "utm_source=google&utm_medium=cpc&utm_campaign=1674814043&utm_term=brand%20shirts&utm_content=Brand&gclid=KvgMsEAAYASAAEgLq6vD_BwE", 
    "null", "null", "null", "null", "null", "null", "null", "null", 
    "null", "null", "utm_source=fb&utm_medium=ctw&utm_campaign=Shirt_rem&utm_content=CasciaShirt"), class = c("extracted", 
"list"))

【讨论】:

  • 对不起。我收到此错误找不到函数“separate_rows”
  • 是的,用过。现在在 UseMethod("separate_rows_") 中出现此错误:没有适用于 'separate_rows_' 的方法应用于“list”类的对象我正在考虑的列的类是“extracted”“list”
  • 添加到问题本身
  • 那么@akrun 有什么解决方案吗?
  • 我有一个列表。在此之前,我使用了 Mydata$query=ex_between(Mydata$"Ref URL", "query_string=", "|") 和我正在考虑的列question是从上面的代码中得到的查询
猜你喜欢
  • 2021-07-16
  • 1970-01-01
  • 1970-01-01
  • 2020-08-13
  • 2017-05-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-11-22
相关资源
最近更新 更多