【问题标题】:Split an uneven character string in R with space用空格分割R中不均匀的字符串
【发布时间】:2013-01-11 19:49:45
【问题描述】:

我阅读了许多关于在 R 中拆分字符串的帖子。但是,我遇到了一个错误,我认为这是由于变量被读入 R 的方式,即在某些情况下,由于 ID 较短,因此日期之后的空格。我正在尝试将字符变量“VESSELID”拆分为 2 个新变量:“vesselID”和“DATE”。下面是我的数据集的一个子集。

> dput(df)
structure(list(SETID = c(24153L, 24187L, 24215L, 31990L, 31990L, 
31995L, 31995L, 31995L, 31996L, 31996L, 31996L, 31997L, 31997L, 
32002L, 32002L, 32002L, 32002L, 32003L, 32003L, 32003L), VESSELID = c("6830 2002/08/13  ", 
"6830 2002/08/12  ", "6830 2002/08/15  ", "105372 2002/08/23", 
"105372 2002/08/23", "104234 2002/07/20", "104234 2002/07/20", 
"104234 2002/07/20", "104234 2002/07/21", "104234 2002/07/21", 
"104234 2002/07/21", "104234 2002/07/22", "104234 2002/07/22", 
"5744 2002/08/14  ", "5744 2002/08/14  ", "5744 2002/08/14  ", 
"5744 2002/08/14  ", "5744 2002/08/13  ", "5744 2002/08/13  ", 
"5744 2002/08/13  ")), .Names = c("SETID", "VESSELID"), row.names = c(1L, 
2L, 3L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L), class = "data.frame")

我确实尝试了以下方法:

library(reshape2)
test <- data.frame(df, colsplit(df$VESSELID, split= " ",names=c("vesselID","DATE")))

但是,我收到以下错误消息:

Error in colsplit(log21$VESSELID, split = " ", names = c("vesselID", "DATE")) : 
      unused argument(s) (split = " ")

split 命令似乎无法正常工作。我不知道如何修复我的字符串。

【问题讨论】:

  • 你检查了吗?colsplit
  • @geektrader 你是对的......我从另一篇文章中得到了这个答案,但我错过了那个细节(arrg!)。谢谢!!

标签: string r split dataframe


【解决方案1】:

我实际上只是在该列上使用read.table,如下所示。假设您的数据集称为“mydata”:

mydata.new <- cbind(mydata[-2], 
                    read.table(text = as.character(mydata$VESSELID), 
                               strip.white=TRUE, header = FALSE))
names(mydata.new)[2:3] <- c("VesselID", "Date")
mydata.new
#    SETID VesselID       Date
# 1  24153     6830 2002/08/13
# 2  24187     6830 2002/08/12
# 3  24215     6830 2002/08/15
# 10 31990   105372 2002/08/23
# 11 31990   105372 2002/08/23
# 12 31995   104234 2002/07/20
# 13 31995   104234 2002/07/20
# 14 31995   104234 2002/07/20
# 15 31996   104234 2002/07/21
# 16 31996   104234 2002/07/21
# 17 31996   104234 2002/07/21
# 18 31997   104234 2002/07/22
# 19 31997   104234 2002/07/22
# 20 32002     5744 2002/08/14
# 21 32002     5744 2002/08/14
# 22 32002     5744 2002/08/14
# 23 32002     5744 2002/08/14
# 24 32003     5744 2002/08/13
# 25 32003     5744 2002/08/13
# 26 32003     5744 2002/08/13

【讨论】:

    【解决方案2】:

    参数名称不是split,而是pattern

    test <- data.frame(df, colsplit(df$VESSELID, pattern = " ",names=c("vesselID","DATE")))
    

    给:

       SETID          VESSELID vesselID         DATE
    1  24153 6830 2002/08/13       6830 2002/08/13  
    2  24187 6830 2002/08/12       6830 2002/08/12  
    3  24215 6830 2002/08/15       6830 2002/08/15  
    10 31990 105372 2002/08/23   105372   2002/08/23
    11 31990 105372 2002/08/23   105372   2002/08/23
    12 31995 104234 2002/07/20   104234   2002/07/20
    13 31995 104234 2002/07/20   104234   2002/07/20
    14 31995 104234 2002/07/20   104234   2002/07/20
    15 31996 104234 2002/07/21   104234   2002/07/21
    16 31996 104234 2002/07/21   104234   2002/07/21
    17 31996 104234 2002/07/21   104234   2002/07/21
    18 31997 104234 2002/07/22   104234   2002/07/22
    19 31997 104234 2002/07/22   104234   2002/07/22
    20 32002 5744 2002/08/14       5744 2002/08/14  
    21 32002 5744 2002/08/14       5744 2002/08/14  
    22 32002 5744 2002/08/14       5744 2002/08/14  
    23 32002 5744 2002/08/14       5744 2002/08/14  
    24 32003 5744 2002/08/13       5744 2002/08/13  
    25 32003 5744 2002/08/13       5744 2002/08/13  
    26 32003 5744 2002/08/13       5744 2002/08/13  
    

    【讨论】:

    • 谢谢。星期一早上。应该承认这个明显的错误!
    【解决方案3】:

    尝试:

    do.call("rbind", strsplit(VESSELID, " "))
    

    应该返回类似:

    [,1]     [,2]         [,3]    
    [1,] "6830"   "2002/08/13" ""      
    [2,] "6830"   "2002/08/12" ""      
    [3,] "6830"   "2002/08/15" ""      
    [4,] "105372" "2002/08/23" "105372"
    [5,] "105372" "2002/08/23" "105372"
    [6,] "104234" "2002/07/20" "104234"
    [7,] "104234" "2002/07/20" "104234"
    [8,] "104234" "2002/07/20" "104234"
    [9,] "104234" "2002/07/21" "104234"
    [10,] "104234" "2002/07/21" "104234"
    [11,] "104234" "2002/07/21" "104234"
    [12,] "104234" "2002/07/22" "104234"
    [13,] "104234" "2002/07/22" "104234"
    [14,] "5744"   "2002/08/14" ""      
    [15,] "5744"   "2002/08/14" ""      
    [16,] "5744"   "2002/08/14" ""      
    [17,] "5744"   "2002/08/14" ""      
    [18,] "5744"   "2002/08/13" ""      
    [19,] "5744"   "2002/08/13" ""      
    [20,] "5744"   "2002/08/13" "" 
    

    从那里拿走你需要的东西

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-09-22
      • 2012-01-02
      • 2014-12-13
      • 2013-08-02
      • 2023-04-02
      • 2021-12-23
      • 1970-01-01
      • 2013-10-11
      相关资源
      最近更新 更多