【问题标题】:R string manipulationR字符串操作
【发布时间】:2017-08-16 20:30:53
【问题描述】:

我有一个 csv 文件,第一列有一个长字符串。如何截断字符串以限制为“NineOneTwo”或“NineOneTwo”而不是其余的?

前3行是这样的:

HEADERLINE,Time,Name,Owner,Dummy1,Dummy2,Number
NineOneOne [911; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; 
Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 
06:04:25,DR,A,0.000000,0.000000,1472.233
NineOneOne [911; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; 
Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:14:25,SO,A,0.000000,0.000000,1550.388
NineOneTwo [912; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; 
Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:19:25,LM,A,0.000000,0.000000,1439.232

脚本:

dat <- read.csv(csvfile, header = TRUE)
abc <- filter( dat, Number > 1000 )
hinum <- select( abc,Time,Number,HEADERLINE)
print (hinum)

谢谢。

【问题讨论】:

  • filterselect 不是基本 R 函数。请在帖子中包含您正在使用的任何软件包的名称。
  • 请您重新表述您的问题?对不起,不清楚
  • 您只想保留第一列中的第一个单词,应该是“Nineoneone”或“Nineonetwo”?试试这个substr(x[[1]], 1, 10)?或sub(' .+$', '', x[[1]])

标签: r string truncate


【解决方案1】:

将 dplyr 与 mutate 一起使用:

read.table(text = "HEADERLINE,Time,Name,Owner,Dummy1,Dummy2,Number
    NineOneOne [911; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:04:25,DR,A,0.000000,0.000000,1472.233
    NineOneOne [911; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:14:25,SO,A,0.000000,0.000000,1550.388
    NineOneTwo [912; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:19:25,LM,A,0.000000,0.000000,1439.232", 
    sep=",", header=T) %>% 
dplyr::mutate(HEADERLINE = substr(HEADERLINE, 1, 10))

或者,如果单词的长度可以变化,则将每个 HEADLINE 用空格分开并取第一个单词:

read.table(text = "HEADERLINE,Time,Name,Owner,Dummy1,Dummy2,Number
    NineOneOne [911; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:04:25,DR,A,0.000000,0.000000,1472.233
    NineOneOne [911; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:14:25,SO,A,0.000000,0.000000,1550.388
    NineOneTwo [912; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:19:25,LM,A,0.000000,0.000000,1439.232", 
    sep=",", header=T) %>% 
dplyr::mutate(HEADERLINE = as.character(HEADERLINE), 
              HEADERLINE = sapply(strsplit(HEADERLINE, " "), function(x) return(x[1])))

输出:

  HEADERLINE                Time Name Owner Dummy1 Dummy2   Number
1 NineOneOne 07/19/2017 06:04:25   DR     A      0      0 1472.233
2 NineOneOne 07/19/2017 06:14:25   SO     A      0      0 1550.388
3 NineOneTwo 07/19/2017 06:19:25   LM     A      0      0 1439.232

【讨论】:

  • 替代substr(),在空白处拆分并选择第一个元素:strsplit(HEADERLINE, " ")[[1]][1]
  • 很有趣,我正在编辑我的答案,就在你写的时候,因为我有同样的想法 :) 好!
  • 或者也可以使用sub(),如上面的评论 - 可能比sapply()更好
【解决方案2】:

不是plyr 用户,这是一个基本方法

abc <- read.csv(text = "HEADERLINE,Time,Name,Owner,Dummy1,Dummy2,Number
    NineOneOne [911; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:04:25,DR,A,0.000000,0.000000,1472.233
    NineOneOne [911; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:14:25,SO,A,0.000000,0.000000,1550.388
    NineOneTwo [912; OUHOST2 - sumo.6973; - sumo.6973; sumo.6973; Limi69sumo.6973; - sumo.6973; sumo.6973; sumo.6973; NJ sumo.6973; sumo.6973; sumo.6973],07/19/2017 06:19:25,LM,A,0.000000,0.000000,1439.232", 
    header=TRUE)

abc$HEADERLINE <- sapply(strsplit(trimws(abc$HEADERLINE), " "), '[', 1)
str(abc)
# 'data.frame': 3 obs. of  7 variables:
#  $ HEADERLINE: chr  "NineOneOne" "NineOneOne" "NineOneTwo"
#  $ Time      : Factor w/ 3 levels "07/19/2017 06:04:25",..: 1 2 3
#  $ Name      : Factor w/ 3 levels "DR","LM","SO": 1 3 2
#  $ Owner     : Factor w/ 1 level "A": 1 1 1
#  $ Dummy1    : num  0 0 0
#  $ Dummy2    : num  0 0 0
#  $ Number    : num  1472 1550 1439

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-12-16
    • 2020-12-31
    • 2011-05-12
    • 1970-01-01
    相关资源
    最近更新 更多