【问题标题】:How to separate strings of a column and store them in a vector [duplicate]如何分隔列的字符串并将它们存储在向量中[重复]
【发布时间】:2022-01-20 16:40:19
【问题描述】:

我有一个类似于数据的数据框(参见下面的示例)。我想创建一个向量,其中包含用逗号分隔的所有 IIIF 字符串字符,如 in out。

data=data.frame(IIIT=c("a", "b", "c", "d", "e", "f", "g"), IIIF=c("aze,hyt,fre", NA, "ade", "ijh, deg","oij,erf", "eft,kij", "efg,kijj,lerod,kjhyg"))

data
  IIIT                 IIIF
1    a          aze,hyt,fre
2    b                 <NA>
3    c                  ade
4    d             ijh, deg
5    e              oij,erf
6    f              eft,kij
7    g efg,kijj,lerod,kjhyg

out
 [1] "aze"   "hyt"   "fre"   NA      "ade"   "ijh"   "deg"   "oij"   "erf"   "eft"   "kij"   "efg"   "kijj"  "lerod" "kjhyg"

我该怎么做?

【问题讨论】:

    标签: r dataframe


    【解决方案1】:

    Base R 有strsplit() 将创建一个列表,列表中的每个元素都是在原始向量中找到的每个单独单词的字符向量。然后您可以使用unlist() 组合结果:

    > unlist(strsplit(data$IIIF, split = ","))
     [1] "aze"   "hyt"   "fre"   NA      "ade"   "ijh"   " deg"  "oij"   "erf"  
    [10] "eft"   "kij"   "efg"   "kijj"  "lerod" "kjhyg"
    

    【讨论】:

      【解决方案2】:

      我们可以试试scan,如下所示

      > scan(text = data$IIIF, sep = ",", what = "character")
      Read 15 items
       [1] "aze"   "hyt"   "fre"   NA      "ade"   "ijh"   " deg"  "oij"   "erf"
      [10] "eft"   "kij"   "efg"   "kijj"  "lerod" "kjhyg"
      

      【讨论】:

        【解决方案3】:

        tidyverse 解决方案:

        library(tidyverse)
        
        data=data.frame(IIIT=c("a", "b", "c", "d", "e", "f", "g"), IIIF=c("aze,hyt,fre", NA, "ade", "ijh, deg","oij,erf", "eft,kij", "efg,kijj,lerod,kjhyg"))
        
        data %>% 
          separate_rows(IIIF, sep=",") %>% 
          select(IIIF) %>% unlist %>% set_names(NULL)
        
        #>  [1] "aze"   "hyt"   "fre"   NA      "ade"   "ijh"   " deg"  "oij"   "erf"  
        #> [10] "eft"   "kij"   "efg"   "kijj"  "lerod" "kjhyg"
        

        编辑

        根据@Adam 的以下评论,可以简化上述解决方案,我感谢:

        library(tidyverse)
        
        data %>% 
          separate_rows(IIIF, sep=",") %>% 
          pull(IIIF)
        

        【讨论】:

        • 简单地使用pull() 而不是select() 并在那里结束管道会更有效。不需要unlist()set_names()
        • 谢谢@Adam,让我知道!我刚刚编辑了我的解决方案,以纳入您的评论。
        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2017-10-06
        • 2020-10-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多