【问题标题】:Separating arrow separated values in data frame to separate unequal columns using R?使用R分隔数据框中的箭头分隔值以分隔不相等的列?
【发布时间】:2014-10-15 06:20:56
【问题描述】:

我有一个包含以下示例值的数据框。

[1] "entry.cei"                                                                               
[2] "entry.lifecycle->hist.open.personal demand chequing account->exit.lifecycle->entry.cei"  
[3] "entry.lifecycle->hist.open.personal demand savings account->exit.lifecycle->entry.cei"   
[4] "entry.transaction->txn.no source available->exit.transaction->entry.cei"                 
[5] "entry.branch->exit.branch->entry.transaction->txn.in-branch->exit.transaction->entry.cei"

我需要用“->”将它们分开,将它们放在不同的列中,比如 V1、V2 等。 例如:

           V1                             V2               V3             V4           V5     V6    V7
1   entry.cei   
2   entry.lifecycle hist.open.personal demand chequing account  exit.lifecycle  entry.cei   
3   entry.lifecycle hist.open.personal demand savings account   exit.lifecycle  entry.cei   

如何在 R 中实现这一点? 我尝试将 rbind 与 strsplit() 一起使用,但我认为它需要相同数量的列。

【问题讨论】:

  • 看看我更新的答案。 read.csv 更容易

标签: r csv strsplit


【解决方案1】:

最简单的方法是使用gsub-> 替换为逗号,然后使用read.csv。如果数据中有逗号,那么只需使用> 而不是逗号就可以了。

read.csv(text = gsub("->", ",", x, fixed = TRUE), header = FALSE)
#                  V1                                         V2                V3            V4               V5        V6
# 1         entry.cei                                                                                                      
# 2   entry.lifecycle hist.open.personal demand chequing account    exit.lifecycle     entry.cei                           
# 3   entry.lifecycle  hist.open.personal demand savings account    exit.lifecycle     entry.cei                           
# 4 entry.transaction                    txn.no source available  exit.transaction     entry.cei                           
# 5      entry.branch                                exit.branch entry.transaction txn.in-branch exit.transaction entry.cei

或者

read.table(text = gsub("->", ",", x, fixed = TRUE), sep = ",", fill = TRUE)

您仍然可以使用rbindstrsplit,只要首先使所有列表元素的长度相同。 length<- 替换功能可以帮助解决这个问题。

s <- strsplit(x, "->", fixed = TRUE)
data.frame(do.call(rbind, lapply(s, `length<-`, max(sapply(s, length)))))
#                  X1                                         X2                X3            X4               X5        X6
# 1         entry.cei                                       <NA>              <NA>          <NA>             <NA>      <NA>
# 2   entry.lifecycle hist.open.personal demand chequing account    exit.lifecycle     entry.cei             <NA>      <NA>
# 3   entry.lifecycle  hist.open.personal demand savings account    exit.lifecycle     entry.cei             <NA>      <NA>
# 4 entry.transaction                    txn.no source available  exit.transaction     entry.cei             <NA>      <NA>
# 5      entry.branch                                exit.branch entry.transaction txn.in-branch exit.transaction entry.cei

原始x 向量在哪里

x <- c("entry.cei", 
 "entry.lifecycle->hist.open.personal demand chequing account->exit.lifecycle->entry.cei", 
 "entry.lifecycle->hist.open.personal demand savings account->exit.lifecycle->entry.cei", 
 "entry.transaction->txn.no source available->exit.transaction->entry.cei", 
 "entry.branch->exit.branch->entry.transaction->txn.in-branch->exit.transaction->entry.cei")

【讨论】:

    猜你喜欢
    • 2018-07-18
    • 2021-12-27
    • 1970-01-01
    • 1970-01-01
    • 2018-03-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-09-23
    相关资源
    最近更新 更多